Efficient Data Structures for the Factor Periodicity Problem



Similar documents
Data Warehousing und Data Mining

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Finding and counting given length cycles

Randomized algorithms

All trees contain a large induced subgraph having all degrees 1 (mod k)

Complexity of Union-Split-Find Problems. Katherine Jane Lai

How To Find An Optimal Search Protocol For An Oblivious Cell

SOLUTIONS TO ASSIGNMENT 1 MATH 576

Cycles and clique-minors in expanders

ON THE COMPLEXITY OF THE GAME OF SET.

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Using Data-Oblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

IB Math Research Problem

Generating models of a matched formula with a polynomial delay

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

The Butterfly, Cube-Connected-Cycles and Benes Networks

Full and Complete Binary Trees

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

DATA STRUCTURES USING C

Closest Pair Problem

CHAPTER 7 INTRODUCTION TO SAMPLING DISTRIBUTIONS

Internet Packet Filter Management and Rectangle Geometry

Appendix: Simple Methods for Shift Scheduling in Multi-Skill Call Centers

Arithmetic Coding: Introduction

The Ideal Class Group

Graphs without proper subgraphs of minimum degree 3 and short cycles

A New Approach to Dynamic All Pairs Shortest Paths

5.4 Closest Pair of Points

Classification - Examples

Position-Restricted Substring Searching

Physical Database Design and Tuning

Mathematics Course 111: Algebra I Part IV: Vector Spaces

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

IB Maths SL Sequence and Series Practice Problems Mr. W Name

1. Physical Database Design in Relational Databases (1)

Approximation of an Open Polygonal Curve with a Minimum Number of Circular Arcs and Biarcs

Multi-dimensional index structures Part I: motivation

Algorithms and Methods for Distributed Storage Networks 9 Analysis of DHT Christian Schindelhauer

arxiv: v5 [math.co] 26 Dec 2014

PYTHAGOREAN TRIPLES KEITH CONRAD

The Relation between Two Present Value Formulae

On line construction of suffix trees 1

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

INTERSECTION OF LINE-SEGMENTS

1. Prove that the empty set is a subset of every set.

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Algorithmics on SLP-Compressed Strings: A Survey

The degree, size and chromatic index of a uniform hypergraph

Dynamic Programming. Lecture Overview Introduction

The Goldberg Rao Algorithm for the Maximum Flow Problem

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

School Timetabling in Theory and Practice

Group Testing a tool of protecting Network Security

On an anti-ramsey type result

Factoring & Primality

Lecture 2: Universality

How To Know If A Domain Is Unique In An Octempo (Euclidean) Or Not (Ecl)

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Removing Local Extrema from Imprecise Terrains

Longest Common Extensions via Fingerprinting

The program also provides supplemental modules on topics in geometry and probability and statistics.

CHAPTER 5. Number Theory. 1. Integers and Division. Discussion

The LCA Problem Revisited

9.2 Summation Notation

EXAM. Exam #3. Math 1430, Spring April 21, 2001 ANSWERS

Pigeonhole Principle Solutions

Offline sorting buffers on Line

Computing Primitively-Rooted Squares and Runs in Partial Words

High degree graphs contain large-star factors

Notes on Factoring. MA 206 Kurt Bryan

Diffuse Reflection Radius in a Simple Polygon

The Set Data Model CHAPTER What This Chapter Is About

Basics of Counting. The product rule. Product rule example. 22C:19, Chapter 6 Hantao Zhang. Sample question. Total is 18 * 325 = 5850

Continued Fractions and the Euclidean Algorithm

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

How Efficient can Memory Checking be?

Labeling outerplanar graphs with maximum degree three

Factoring Algorithms

Efficient LDPC Code Based Secret Sharing Schemes and Private Data Storage in Cloud without Encryption

Ph.D. Thesis. Judit Nagy-György. Supervisor: Péter Hajnal Associate Professor

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

GREATEST COMMON DIVISOR

DEFINABLE TYPES IN PRESBURGER ARITHMETIC

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. A Parallel Tree Contraction Algorithm on Non-Binary Trees

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Large induced subgraphs with all degrees odd

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

Clique coloring B 1 -EPG graphs

Improved Single and Multiple Approximate String Matching

A Weighted-Sum Mixed Integer Program for Bi-Objective Dynamic Portfolio Optimization

Oracle Database 10g: Introduction to SQL

Transcription:

Efficient Data Structures for the Factor Periodicity Problem Tomasz Kociumaka Wojciech Rytter Jakub Radoszewski Tomasz Waleń University of Warsaw, Poland SPIRE 2012 Cartagena, October 23, 2012 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 1/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w[11..22] are 5 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w[11..22] are 5, 10 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w[11..22] are 5, 10, 11 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w[11..22] are 5, 10, 11 and 12. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Factor Periodicity Problem w: a b a a b a b a a b a a b a b a a b a b a a b a a b a b a a b a a b 11 22 a a b a b a a b a b a a Periods of w[11..22] are 5, 10, 11 and 12. Notation P er(w[11..22]) ={5, 10, 11, 12}, per(w[11..22]) = 5. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 2/19

Arithmetic sets A word of length m might have Θ(m) periods, e.g. a m. Definition A set A = {a, a + d, a + 2d,..., a + kd} Z is called arithmetic. An integer d is called the difference of A. Observe that an arithmetic set can be represented by three integers: a, d and k. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 3/19

Arithmetic sets A word of length m might have Θ(m) periods, e.g. a m. Definition A set A = {a, a + d, a + 2d,..., a + kd} Z is called arithmetic. An integer d is called the difference of A. Observe that an arithmetic set can be represented by three integers: a, d and k. Fact Let v be a word of length m. Then P er(v) is a union of at most log m disjoint arithmetic sets. For example P er(w[11..22]) = {5} {10, 11, 12} = {5, 10} {11, 12}. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 3/19

Formal problem statement Problem (Period Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 i j n) compute P er(w[i..j]) respresented as a union of O(log n) arithmetic sets. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/19

Formal problem statement Problem (Period Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 i j n) compute P er(w[i..j]) respresented as a union of O(log n) arithmetic sets. Definition We say that p is an (1 + δ)-period of v if v (1 + δ)p. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/19

Formal problem statement Problem (Period Queries) Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 i j n) compute P er(w[i..j]) respresented as a union of O(log n) arithmetic sets. Definition We say that p is an (1 + δ)-period of v if v (1 + δ)p. Problem ((1 + δ)-period Queries) Let us fix a real number δ > 0. Design a data structure that for a fixed word w of length n answers the following queries. Given integers i, j (1 i j n) compute all (1 + δ)-periods of w[i..j] respresented as a union of O(1) arithmetic sets. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 4/19

Related work To the best of our knowledge no previous research on the general case of Period Queries. Even for computing the shortest period, only straightforward solutions: memorize all answers O(n 2 ) space, O(1) query time compute the answer from scratch for each query no extra space, O(n) query time Efficient data structures for primitivity testing (generalized by 2-Period Queries) Karhumäki, Lifshits & Rytter; CPM 2007 O(n log n) space, O(1) query time, Crochemore et. al; SPIRE 2010 O(n log ε n) space, O(log n) query time. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 5/19

Our results Several results based on the common idea but different tools. Space All periods (1 + δ)-periods O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n(log log n) 2 ) O((log log n) 2 ) O(n log ε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 6/19

Our results Several results based on the common idea but different tools. Space All periods (1 + δ)-periods O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n(log log n) 2 ) O((log log n) 2 ) O(n log ε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) Standard assumptions on the model of computation: word RAM model with w = Ω(log n), randomization. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 6/19

Our approach Let Borders(v) = { u : u is a border of v}. Fact P er(v) = v Borders(v) = { v b : b Borders(v)}. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/19

Our approach Let Borders(v) = { u : u is a border of v}. Fact P er(v) = v Borders(v) = { v b : b Borders(v)}. We compute Borders(v) {2 k,..., 2 k+1 } separately for each k {0,..., log v }. 2 k 2 k 2 k 2 k v Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/19

Our approach Let Borders(v) = { u : u is a border of v}. Fact P er(v) = v Borders(v) = { v b : b Borders(v)}. We compute Borders(v) {2 k,..., 2 k+1 } separately for each k {0,..., log v }. 2 k 2 k 2 k 2 k v p s Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/19

Our approach Let Borders(v) = { u : u is a border of v}. Fact P er(v) = v Borders(v) = { v b : b Borders(v)}. We compute Borders(v) {2 k,..., 2 k+1 } separately for each k {0,..., log v }. 2 k 2 k 2 k 2 k v p border s Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/19

Our approach Let Borders(v) = { u : u is a border of v}. Fact P er(v) = v Borders(v) = { v b : b Borders(v)}. We compute Borders(v) {2 k,..., 2 k+1 } separately for each k {0,..., log v }. 2 k 2 k 2 k 2 k v p border s Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 7/19

Close occurrences Let Occ(v, u) be the set of positions of v where an occurrence of u starts. Arithmetic sets naturally appear as the Occ sets. Fact Let v 2 u. Then Occ(v, u) is arithmetic. Moreover, if Occ(v, u) 3 then its difference is equal to per(u). v u period of u Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 8/19

A formula for border lengths 2 k 2 k 2 k 2 k v p p s s Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/19

A formula for border lengths 2 k 2 k 2 k 2 k v p p s s P = Occ(s s, p) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/19

A formula for border lengths 2 k 2 k 2 k 2 k v p p s s S = Occ(pp, s) P = Occ(s s, p) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/19

A formula for border lengths 2 k 2 k 2 k 2 k v p p s s l S = Occ(pp, s) P = Occ(s s, p) l Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/19

A formula for border lengths 2 k 2 k 2 k 2 k l l v p p s s l S = Occ(pp, s) P = Occ(s s, p) l Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/19

A formula for border lengths 2 k 2 k 2 k 2 k l l v p p s s l S = Occ(pp, s) P = Occ(s s, p) l Fact Let 0 l < 2 k. Then the word v has a border of length 2 k + l if and only if l + 1 S and 2 k l P. Consequently Borders(v) {2 k,..., 2 k+1 } is arithmetic. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 9/19

Intersecting arithmetic sets Lemma If P 3 and S 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/19

Intersecting arithmetic sets Lemma If P 3 and S 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v p p s s S P Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/19

Intersecting arithmetic sets Lemma If P 3 and S 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v p p s s S P 2per(s) 2per(p) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/19

Intersecting arithmetic sets Lemma If P 3 and S 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v p p s s S P 2per(s) of period both per(p) and per(s) 2per(p) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/19

Intersecting arithmetic sets Lemma If P 3 and S 3, then per(p) = per(s). Consequently P and S are arithmetic of common difference. v p p s s S P 2per(s) of period both per(p) and per(s) 2per(p) Intersecting two arithmetic sets can be performed in O(1) time, when one of them is small or when they share a common difference. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 10/19

Summary of the combinatorial part Problem (Occurrence Queries) Design a data structure that for a word w can answer the following queries. Given a basic factor u and a factor v of w such that v 2 u (both represented by one of their occurrences) compute the arithmetic set Occ(v, u). Theorem Assume there is a data structure answering the Occurrence Queries in O(f(n)) time. Then this data structure can answer Period Queries in O(f(n) log n) time and (1 + δ)-period Queries in O(f(n)) time. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 11/19

Occurrence Queries in O(1) time Fix 2 k n, Split w into parts of length 2 k+1 with overlaps of size 2 k, 2 k+1 w 2 k+1 Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/19

Occurrence Queries in O(1) time Fix 2 k n, Split w into parts of length 2 k+1 with overlaps of size 2 k, Consider a basic factor u, u = 2 k. 2 k+1 w 2 k+1 Each occurrence of u occurs within a single part. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/19

Occurrence Queries in O(1) time Fix 2 k n, Split w into parts of length 2 k+1 with overlaps of size 2 k, Consider a basic factor u, u = 2 k. 2 k+1 w 2 k+1 arithmetic sets Each occurrence of u occurs within a single part. Occurrences in a single part form an arithmetic set. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 12/19

Occurrence Queries in O(1) time Imagine a (large) array with columns indexed by parts and rows by identifiers of all basic factors of length 2 k. The identifiers are obtained from the DBF (Dictionary of Basic Factors) id(u) id(u ) id(u ) [ 0, 2 k+1 ] [ 2 k, 3 2 k] [ 2 2 k, 4 2 k][ 3 2 k, 5 2 k] Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/19

Occurrence Queries in O(1) time Imagine a (large) array with columns indexed by parts and rows by identifiers of all basic factors of length 2 k. The identifiers are obtained from the DBF (Dictionary of Basic Factors) id(u) id(u ) id(u ) [ 0, 2 k+1 ] [ 2 k, 3 2 k] [ 2 2 k, 4 2 k][ 3 2 k, 5 2 k] This array has Θ ( n 2 2 k ) cells. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/19

Occurrence Queries in O(1) time Imagine a (large) array with columns indexed by parts and rows by identifiers of all basic factors of length 2 k. The identifiers are obtained from the DBF (Dictionary of Basic Factors) id(u) id(u ) id(u ) [ 0, 2 k+1 ] [ 2 k, 3 2 k] [ 2 2 k, 4 2 k][ 3 2 k, 5 2 k] This array has Θ ( n 2 2 k ) cells. All factors of length 2 k have n occurrences in total, so n non-empty fields perfect hashing can be used. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/19

Occurrence Queries in O(1) time Imagine a (large) array with columns indexed by parts and rows by identifiers of all basic factors of length 2 k. The identifiers are obtained from the DBF (Dictionary of Basic Factors) id(u) id(u ) id(u ) [ 0, 2 k+1 ] [ 2 k, 3 2 k] [ 2 2 k, 4 2 k][ 3 2 k, 5 2 k] This array has Θ ( n 2 2 k ) cells. All factors of length 2 k have n occurrences in total, so n non-empty fields perfect hashing can be used. This gives O(n log n) size in total for all values of k. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 13/19

Occurrence Queries in O(1) time Answering queries: v w 2 k+1 2 k+1 v lies within at most two consecutive parts, Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/19

Occurrence Queries in O(1) time Answering queries: v w 2 k+1 id(u) 2 k+1 v lies within at most two consecutive parts, get the occurrences of u from the hash table, Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/19

Occurrence Queries in O(1) time Answering queries: v w 2 k+1 id(u) 2 k+1 v lies within at most two consecutive parts, get the occurrences of u from the hash table, Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/19

Occurrence Queries in O(1) time Answering queries: v w 2 k+1 id(u) 2 k+1 v lies within at most two consecutive parts, get the occurrences of u from the hash table, crop and merge these arithmetic sets to obtain the result. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/19

Occurrence Queries in O(1) time Answering queries: v w 2 k+1 id(u) 2 k+1 v lies within at most two consecutive parts, get the occurrences of u from the hash table, crop and merge these arithmetic sets to obtain the result. Corollary There exists a data structure of O(n log n) size that answers the Occurrence Queries in O(1) time. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 14/19

Range Predecessor/Successor Queries Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an occurrence in w) and i {1... n} find P RED(u, i) the last occurrence of u ending at a position i, SUCC(u, i) the first occurrence of u starting at a position i. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/19

Range Predecessor/Successor Queries Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an occurrence in w) and i {1... n} find P RED(u, i) the last occurrence of u ending at a position i, SUCC(u, i) the first occurrence of u starting at a position i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w i v j Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/19

Range Predecessor/Successor Queries Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an occurrence in w) and i {1... n} find P RED(u, i) the last occurrence of u ending at a position i, SUCC(u, i) the first occurrence of u starting at a position i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w SUCC(u, i) i i v j Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/19

Range Predecessor/Successor Queries Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an occurrence in w) and i {1... n} find P RED(u, i) the last occurrence of u ending at a position i, SUCC(u, i) the first occurrence of u starting at a position i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w SUCC(u, i) i i SUCC(u, i + 1) v j Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/19

Range Predecessor/Successor Queries Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an occurrence in w) and i {1... n} find P RED(u, i) the last occurrence of u ending at a position i, SUCC(u, i) the first occurrence of u starting at a position i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w SUCC(u, i) i i SUCC(u, i + 1) v j P RED(u, j) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/19

Range Predecessor/Successor Queries Problem (Range Predecessor/Successor Queries) Design a data structure that for a word w can answer the following queries. Given a factor u of w (represented by an occurrence in w) and i {1... n} find P RED(u, i) the last occurrence of u ending at a position i, SUCC(u, i) the first occurrence of u starting at a position i. The Occurrence Queries can be reduced to three Range Predecessor/Successor Queries, where u is a basic factor of w. w SUCC(u, i) i i SUCC(u, i + 1) v j P RED(u, j) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 15/19

Range Predecessor/Successor Queries Theorem (Nekrich, Navarro; 2012) There exist data structures that given the locus of u in the suffix tree of w answer the Range Predecessor/Successor queries in and satisfy the following space and time bounds: Space Query time O(n) O(log ε n) O(n log log n) O((log log n) 2 ) O(n log ε n) O(log log n) Theorem (Weighted LA Kopelovitz, Lewenstein; 2007) There exists a data structure of size O(n), which given an interval [i..j] finds the locus of w[i..j] in the suffix tree of w in O(log log n) time. Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 16/19

Summary Theorem (this paper) There exist data structures that satisfy following time and space bounds for size, Period Queries query time and (1 + δ)-period Queries query time: Space Period Queries (1 + δ)-period Q. O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n(log log n) 2 ) O((log log n) 2 ) O(n log ε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 17/19

Further research Space Period Queries (1 + δ)-period Q. O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n(log log n) 2 ) O((log log n) 2 ) O(n log ε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 18/19

Further research Currently in progress: Space Period Queries 2-Period Queries O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n(log log n) 2 ) O((log log n) 2 ) O(n log ε n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) O(n) O(1) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 18/19

Further research Currently in progress: Space Period Queries 2-Period Queries O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) O(n) O(1) Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 18/19

Further research Currently in progress: Space Period Queries 2-Period Queries O(n) O(log 1+ε n) O(log ε n) O(n log log n) O(log n log log n) O(log log n) O(n log n) O(log n) O(1) O(n) O(1) Open problems: Can the O(n log n) time preprocessing be improved with o(n) query time? Can the shortest period be found faster than O(log n) with o(n 2 ) space? Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 18/19

Thank you for your attention Thank you! Tomasz Kociumaka Efficient Data Structures for the Factor Periodicity Problem 19/19