Longest Common Extensions via Fingerprinting
|
|
- Grant Evans
- 7 years ago
- Views:
Transcription
1 Longest Common Extensions via Fingerprinting Philip Bille Inge Li Gørtz Jesper Kristensen Technical University of Denmark DTU Informatics LATA, March 9, / 17
2 Contents Introduction The LCE Problem Existing Results The DirectComp algorithm The SuffixNca and LcpRmq Algorithms Practical results The Fingerprint k Algorithm Data Structure Query Preprocessing Practical Results Cache Optimization Summary 2 / 17
3 The LCE Problem LCE value LCE s (i, j) is the length of the longest common prefix of the two suffixes of a string s starting at index i and j LCE problem Efficiently query multiple LCE values on a static string s and varying pairs (i, j) Example: input: s = abbababba, (i, j) = (4, 6) suffix i of s = ababba suffix j of s = abba longest common prefix = ab LCE s (i, j) = 2 3 / 17
4 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a / 17
5 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a / 17
6 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a 1 match / 17
7 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a 1 match / 17
8 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a 2 matches / 17
9 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a 2 matches / 17
10 Existing algorithm: DirectComp Input s = abbababba (i, j) = (4, 6) The DirectComp algorithm s = a b b a b a b b a 2 matches Result LCE s (4, 6) = / 17
11 Existing Algorithm: DirectComp Space Query Average query O(1) + s O(LCE(i, j)) = O(n) O(1) For a string length n and alphabet size σ, the average LCE value over all n σ strings and n 2 query pairs is O(1). References L. Ilie, G. Navarro, and L. Tinta. The longest common extension problem revisited and applications to approximate string searching. J. Disc. Alg., 8(4): , / 17
12 Existing Algorithms: SuffixNca and LcpRmq Two algorithms with best known bounds: SuffixNca Nearest common ancestor queries on a suffix tree LcpRmq Range minimum queries on a longest common prefix array Space Query Average query O(n) O(1) O(1) References J. Fischer, and V. Heun. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. In Proc. 17th CPM, pages 36-48, D. Harel, R. E. Tarjan. Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput., 13(2): , / 17
13 Existing Algorithms: Practical Results Query times of DirectComp and LcpRmq by string length 1e-06 Average case 1e-05 Worst case 1e-06 1e-07 1e-07 1e-08 1e e+06 1e+07 s = random characters σ = e+06 1e+07 s = aaaaa... 7 / 17
14 The Fingerprint k Algorithm: Data Structure For a string s[1.. n], the t-length fingerprints F t [1.. n] are natural numbers, such that F t [i] = F t [j] if and only if s[i.. i + t 1] = s[j.. j + t 1]. k levels, 1 k log n For each level, l = 0.. k 1: t l = Θ(n l/k ), t 0 = 1 Hl = F tl H 2[i] H 1[i] s = H 0[i] i a b b a a b b a b a b b a a b b a b a b a a b a b a $ Space O(k n) 8 / 17
15 The Fingerprint k Algorithm: Query 1. As long as H l [i + v] = H l [j + v], increment v by t l, increment l by one, and repeat this step unless and l = k As long as H l [i + v] = H l [j + v], increment v by t l and repeat this step. 3. Stop and return v when l = 0, otherwise decrement l by one and go to step two. H 2[i + v] H 1[i + v] H 0[i + v] i + v a b b a a b b a b a b b a a b b a b a b a a b a b a $ H 2[j + v] H 1[j + v] H 0[j + v] j + v a b b a a b b a b a b b a a b b a b a b a a b a b a $ Query O(k n 1/k ) Average query O(1) LCE(3, 12) = 9 9 / 17
16 The Fingerprint k Algorithm: Preprocessing For each level l For each t l -length substring in lexicographically sorted order If the current substring s[sa[i].. SA[i] + t l 1] is equal to the previous substring, give it the same fingerprint as the previous substring, otherwise give it a new unused fingerprint. The two substrings are equal when LCP[i] t l. Subst. a aba abb abb ba bab bab bba bba H l[i] i s = abbababba For t l = 3: H l = [3, 8, 6, 2, 6, 3, 8, 5, 1] Preprocessing O(k n + sort(n, σ)) 10 / 17
17 The Fingerprint k Algorithm 1 k log n Space O(k n) Query O(k n 1/k ) Average query O(1) k = 1 k = 2 k = log n Space O(n) O(n) O(n log n) Query O(n) O( n) O(log n) Average query O(1) O(1) O(1) Space for Fingerprint k is the same as for LcpRmq when k = / 17
18 Practical Results Query times of DirectComp, Fingerprint 2, Fingerprint 3, Fingerprint log n and LcpRmq by string length 1e-06 Average case 1e-05 Worst case 1e-06 1e-07 1e-07 1e-08 1e e+06 1e e+06 1e / 17
19 Cache Optimization of Fingerprint k Original: Data structure: Hl [i] = F tl [i] Size: Hl ( = n ) I/O: O k n 1/k Cache optimized: Data structure: H l [((i 1) mod t l ) n/t l + (i 1)/t l + 1] = F tl [i] Size: Hl = ( n + t l )) I/O: O n (k 1/k B + 1 Best when k is small = n 1/k is large. H2[i + v] H1[i + v] H0[i + v] a b b a a b b a b a b b a a b b i + v a b a b a a b a b a $ H2[j + v] H1[j + v] H0[j + v] a b b a a b b a b a b b a a b b j + v a b a b a a b a b a $ / 17
20 Cache Optimization, Practical Results Is I/O optimization good in practice? Pro: better cache efficiency Best for small k, no change for k = log n Con: Calculating memory addresses is more complicated ((i 1) mod t l ) n/t l + (i 1)/t l + 1 vs. i 14 / 17
21 Cache Optimization, Practical Results Query times of Fingerprint 2 without cache optimization and with cache optimization using shift operations vs. multiplication, division and modulo 1e-06 Average case 1e-05 Worst case 1e-06 1e-07 1e-07 1e-08 1e e+06 1e e+06 1e / 17
22 Cache Optimization, Practical Results Query times of Fingerprint 3 without cache optimization and with cache optimization using shift operations vs. multiplication, division and modulo 1e-06 Average case 1e-05 Worst case 1e-06 1e-07 1e-07 1e-08 1e e+06 1e e+06 1e / 17
23 Summary Direct- LcpRmq / Comp SuffixNca Fingerprint k Space O(1) O(n) O(k n) Query O(n) O(1) O(k n 1/k ) Average query O(1) O(1) O(1) fast slow fast ) ( k Query I/O O ( n B O(1) O Code complexity very simple complex simple ( n 1/k B + 1 )) Cache optimization of Fingerprint k improves query times at k = 2 and worsens query times at k 3 Space for Fingerprint k is the same as for LcpRmq when k = / 17
Longest Common Extensions via Fingerprinting
Longest Common Extensions via Fingerprinting Philip Bille, Inge Li Gørtz, and Jesper Kristensen Technical University of Denmark, DTU Informatics, Copenhagen, Denmark Abstract. The longest common extension
More informationFast string matching
Fast string matching This exposition is based on earlier versions of this lecture and the following sources, which are all recommended reading: Shift-And/Shift-Or 1. Flexible Pattern Matching in Strings,
More informationLossless Data Compression Standard Applications and the MapReduce Web Computing Framework
Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern
More informationShortest Inspection-Path. Queries in Simple Polygons
Shortest Inspection-Path Queries in Simple Polygons Christian Knauer, Günter Rote B 05-05 April 2005 Shortest Inspection-Path Queries in Simple Polygons Christian Knauer, Günter Rote Institut für Informatik,
More informationThe LCA Problem Revisited
The LA Problem Revisited Michael A. Bender Martín Farach-olton SUNY Stony Brook Rutgers University May 16, 2000 Abstract We present a very simple algorithm for the Least ommon Ancestor problem. We thus
More informationPersistent Data Structures and Planar Point Location
Persistent Data Structures and Planar Point Location Inge Li Gørtz Persistent Data Structures Ephemeral Partial persistence Full persistence Confluent persistence V1 V1 V1 V1 V2 q ue V2 V2 V5 V2 V4 V4
More informationLempel-Ziv Coding Adaptive Dictionary Compression Algorithm
Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm 1. LZ77:Sliding Window Lempel-Ziv Algorithm [gzip, pkzip] Encode a string by finding the longest match anywhere within a window of past symbols
More informationImproved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile
More informationSorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)
Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse
More informationLempel-Ziv Factorization: LZ77 without Window
Lempel-Ziv Factorization: LZ77 without Window Enno Ohlebusch May 13, 2016 1 Sux arrays To construct the sux array of a string S boils down to sorting all suxes of S in lexicographic order (also known as
More informationFingerprints in Compressed Strings
Downloaded from orbit.dtu.dk on: Dec 20, 2015 Fingerprints in Compressed Strings Bille, Philip; Cording, Patrick Hagge; Gørtz, Inge Li; Sach, Benjamin; Vildhøj, Hjalte Wedel; Vind, Søren Juhl Published
More informationDynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time
Dynamic 3-sided Planar Range Queries with Expected Doubly Logarithmic Time Gerth Stølting Brodal 1, Alexis C. Kaporis 2, Spyros Sioutas 3, Konstantinos Tsakalidis 1, Kostas Tsichlas 4 1 MADALGO, Department
More informationData Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015
Data Structures Jaehyun Park CS 97SI Stanford University June 29, 2015 Typical Quarter at Stanford void quarter() { while(true) { // no break :( task x = GetNextTask(tasks); process(x); // new tasks may
More informationKnuth-Morris-Pratt Algorithm
December 18, 2011 outline Definition History Components of KMP Example Run-Time Analysis Advantages and Disadvantages References Definition: Best known for linear time for exact matching. Compares from
More informationApproximate String Matching in DNA Sequences
Approximate String Matching in DNA Sequences Lok-Lam Cheng David W. Cheung Siu-Ming Yiu Department of Computer Science and Infomation Systems, The University of Hong Kong, Pokflum Road, Hong Kong {llcheng,dcheung,smyiu}@csis.hku.hk
More informationData Structures. Topic #12
Data Structures Topic #12 Today s Agenda Sorting Algorithms insertion sort selection sort exchange sort shell sort radix sort As we learn about each sorting algorithm, we will discuss its efficiency Sorting
More informationSymbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
More informationCS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions
CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions Theory of Formal Languages In the English language, we distinguish between three different identities: letter, word, sentence.
More informationSemantic Analysis: Types and Type Checking
Semantic Analysis Semantic Analysis: Types and Type Checking CS 471 October 10, 2007 Source code Lexical Analysis tokens Syntactic Analysis AST Semantic Analysis AST Intermediate Code Gen lexical errors
More informationA Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Journal of Computer Science 4 (5): 393-401, 2008 ISSN 1549-3636 2008 Science Publications A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) Amjad Hudaib, Rola Al-Khalid, Dima Suleiman, Mariam
More informationCMPSCI 250: Introduction to Computation. Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013
CMPSCI 250: Introduction to Computation Lecture #19: Regular Expressions and Their Languages David Mix Barrington 11 April 2013 Regular Expressions and Their Languages Alphabets, Strings and Languages
More informationPosition-Restricted Substring Searching
Position-Restricted Substring Searching Veli Mäkinen 1 and Gonzalo Navarro 2 1 AG Genominformatik, Technische Fakultät Universität Bielefeld, Germany. veli@cebitec.uni-bielefeld.de 2 Center for Web Research
More informationGeneral Incremental Sliding-Window Aggregation
General Incremental Sliding-Window Aggregation Kanat Tangwongsan Mahidol University International College, Thailand (part of this work done at IBM Research, Watson) Joint work with Martin Hirzel, Scott
More informationVirtual Memory Paging
COS 318: Operating Systems Virtual Memory Paging Kai Li Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Today s Topics Paging mechanism Page replacement algorithms
More information6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010 Class 4 Nancy Lynch Today Two more models of computation: Nondeterministic Finite Automata (NFAs)
More informationA Comparison of Dictionary Implementations
A Comparison of Dictionary Implementations Mark P Neyer April 10, 2009 1 Introduction A common problem in computer science is the representation of a mapping between two sets. A mapping f : A B is a function
More informationLZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b
LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =
More informationStructure for String Keys
Burst Tries: A Fast, Efficient Data Structure for String Keys Steen Heinz Justin Zobel Hugh E. Williams School of Computer Science and Information Technology, RMIT University Presented by Margot Schips
More informationVGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams Chen Li University of California, Irvine CA 9697, USA chenli@ics.uci.edu Bin Wang Northeastern University
More informationLecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
More information4.2 Sorting and Searching
Sequential Search: Java Implementation 4.2 Sorting and Searching Scan through array, looking for key. search hit: return array index search miss: return -1 public static int search(string key, String[]
More informationOn line construction of suffix trees 1
(To appear in ALGORITHMICA) On line construction of suffix trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN 00014 University of Helsinki,
More informationA Catalogue of the Steiner Triple Systems of Order 19
A Catalogue of the Steiner Triple Systems of Order 19 Petteri Kaski 1, Patric R. J. Östergård 2, Olli Pottonen 2, and Lasse Kiviluoto 3 1 Helsinki Institute for Information Technology HIIT University of
More informationA Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching
A Partition-Based Efficient Algorithm for Large Scale Multiple-Strings Matching Ping Liu Jianlong Tan, Yanbing Liu Software Division, Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
More informationOmega Automata: Minimization and Learning 1
Omega Automata: Minimization and Learning 1 Oded Maler CNRS - VERIMAG Grenoble, France 2007 1 Joint work with A. Pnueli, late 80s Summary Machine learning in general and of formal languages in particular
More informationOutline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques
Searching (Unit 6) Outline Introduction Linear Search Ordered linear search Unordered linear search Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques
More informationZabin Visram Room CS115 CS126 Searching. Binary Search
Zabin Visram Room CS115 CS126 Searching Binary Search Binary Search Sequential search is not efficient for large lists as it searches half the list, on average Another search algorithm Binary search Very
More informationDesign of Practical Succinct Data Structures for Large Data Collections
Design of Practical Succinct Data Structures for Large Data Collections Roberto Grossi and Giuseppe Ottaviano Dipartimento di Informatica Università di Pisa {grossi,ottaviano}@di.unipi.it Abstract. We
More informationStorage Management for Files of Dynamic Records
Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science
More informationData Structures and Data Manipulation
Data Structures and Data Manipulation What the Specification Says: Explain how static data structures may be used to implement dynamic data structures; Describe algorithms for the insertion, retrieval
More informationNode-Based Structures Linked Lists: Implementation
Linked Lists: Implementation CS 311 Data Structures and Algorithms Lecture Slides Monday, March 30, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org
More informationImproved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredrisson and Gonzalo Navarro 2 Department of Computer Science, University of Joensuu fredri@cs.joensuu.fi 2 Department of Computer Science,
More informationTheoretical Aspects of Storage Systems Autumn 2009
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements
More informationLecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching
COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/
More informationAlgebraic Recognizability of Languages
of Languages LaBRI, Université Bordeaux-1 and CNRS MFCS Conference, Prague, August 2004 The general problem Problem: to specify and analyse infinite sets by finite means The general problem Problem: to
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
More informationWell-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications
Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications Jie Gao Department of Computer Science Stanford University Joint work with Li Zhang Systems Research Center Hewlett-Packard
More informationSuccinct Data Structures for Data Mining
Succinct Data Structures for Data Mining Rajeev Raman University of Leicester ALSIP 2014, Tainan Overview Introduction Compressed Data Structuring Data Structures Applications Libraries End Big Data vs.
More information(appeared in GInfo 15/7, November 2005) Suffix arrays a programming contest approach. Adrian Vladu and Cosmin Negruşeri. Summary
(appeared in GInfo 15/7, November 2005) Suffix arrays a programming contest approach Adrian Vladu and Cosmin Negruşeri Summary An important algorithmic area often used in practical tasks is that of algorithms
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationUsing Data-Oblivious Algorithms for Private Cloud Storage Access. Michael T. Goodrich Dept. of Computer Science
Using Data-Oblivious Algorithms for Private Cloud Storage Access Michael T. Goodrich Dept. of Computer Science Privacy in the Cloud Alice owns a large data set, which she outsources to an honest-but-curious
More informationI/O-Efficient Data Structures for Colored Range and Prefix Reporting
I/O-Efficient Data Structures for Colored Range and Prefix Reporting Kasper Green Larsen, MADALGO, Aarhus University Rasmus Pagh, IT University of Copenhagen Presenter: Yakov Nekrich 1 Motivating example
More informationDiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices
DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices (Extended Abstract) N.M. Mosharaf Kabir Chowdhury, Md. Mostofa Akbar, and M. Kaykobad Department of Computer Science & Engineering
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationB-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
More informationEfficient Multi-Feature Index Structures for Music Data Retrieval
header for SPIE use Efficient Multi-Feature Index Structures for Music Data Retrieval Wegin Lee and Arbee L.P. Chen 1 Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan 300,
More informationrecursion, O(n), linked lists 6/14
recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list
More informationApproximate Search Engine Optimization for Directory Service
Approximate Search Engine Optimization for Directory Service Kai-Hsiang Yang and Chi-Chien Pan and Tzao-Lin Lee Department of Computer Science and Information Engineering, National Taiwan University, Taipei,
More informationDATABASE DESIGN - 1DL400
DATABASE DESIGN - 1DL400 Spring 2015 A course on modern database systems!! http://www.it.uu.se/research/group/udbl/kurser/dbii_vt15/ Kjell Orsborn! Uppsala Database Laboratory! Department of Information
More informationSMALL INDEX LARGE INDEX (SILT)
Wayne State University ECE 7650: Scalable and Secure Internet Services and Architecture SMALL INDEX LARGE INDEX (SILT) A Memory Efficient High Performance Key Value Store QA REPORT Instructor: Dr. Song
More informationPrevious Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
More informationData storage Tree indexes
Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating
More informationCounting Problems in Flash Storage Design
Flash Talk Counting Problems in Flash Storage Design Bongki Moon Department of Computer Science University of Arizona Tucson, AZ 85721, U.S.A. bkmoon@cs.arizona.edu NVRAMOS 09, Jeju, Korea, October 2009-1-
More informationSearching BWT compressed text with the Boyer-Moore algorithm and binary search
Searching BWT compressed text with the Boyer-Moore algorithm and binary search Tim Bell 1 Matt Powell 1 Amar Mukherjee 2 Don Adjeroh 3 November 2001 Abstract: This paper explores two techniques for on-line
More informationComplexity of Union-Split-Find Problems. Katherine Jane Lai
Complexity of Union-Split-Find Problems by Katherine Jane Lai S.B., Electrical Engineering and Computer Science, MIT, 2007 S.B., Mathematics, MIT, 2007 Submitted to the Department of Electrical Engineering
More informationA Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,
More informationThe Pairwise Sorting Network
The Pairwise Sorting Network Ian Parberry Center for Research in Parallel and Distributed Computing Department of Computer Sciences University of North Texas Abstract A new sorting network with exactly
More informationCompressed Text Indexes with Fast Locate
Compressed Text Indexes with Fast Locate Rodrigo González and Gonzalo Navarro Dept. of Computer Science, University of Chile. {rgonzale,gnavarro}@dcc.uchile.cl Abstract. Compressed text (self-)indexes
More informationOrganization of Programming Languages CS320/520N. Lecture 05. Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.
Organization of Programming Languages CS320/520N Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Names, Bindings, and Scopes A name is a symbolic identifier used
More informationInternet Packet Filter Management and Rectangle Geometry
Internet Packet Filter Management and Rectangle Geometry David Eppstein S. Muthukrishnan Abstract We consider rule sets for internet packet routing and filtering, where each rule consists of a range of
More informationLecture 2 February 12, 2003
6.897: Advanced Data Structures Spring 003 Prof. Erik Demaine Lecture February, 003 Scribe: Jeff Lindy Overview In the last lecture we considered the successor problem for a bounded universe of size u.
More informationAnalysis of Compression Algorithms for Program Data
Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory
More informationLecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
More informationAutomata and Formal Languages
Automata and Formal Languages Winter 2009-2010 Yacov Hel-Or 1 What this course is all about This course is about mathematical models of computation We ll study different machine models (finite automata,
More informationResilient Dynamic Programming
Resilient Dynamic Programming Irene Finocchi, Saverio Caminiti, and Emanuele Fusco Dipartimento di Informatica, Sapienza Università di Roma via Salaria, 113-00198 Rome, Italy. {finocchi, caminiti, fusco}@di.uniroma1.it
More informationData Structures and Algorithms!
Data Structures and Algorithms! The material for this lecture is drawn, in part, from! The Practice of Programming (Kernighan & Pike) Chapter 2! 1 Motivating Quotation!!! Every program depends on algorithms
More informationExtended Application of Suffix Trees to Data Compression
Extended Application of Suffix Trees to Data Compression N. Jesper Larsson A practical scheme for maintaining an index for a sliding window in optimal time and space, by use of a suffix tree, is presented.
More informationData Structures and Algorithms!
Data Structures and Algorithms! The material for this lecture is drawn, in part, from! The Practice of Programming (Kernighan & Pike) Chapter 2! 1 Motivating Quotation!!! Every program depends on algorithms
More informationCryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur Lecture No. #06 Cryptanalysis of Classical Ciphers (Refer
More informationBM307 File Organization
BM307 File Organization Gazi University Computer Engineering Department 9/24/2014 1 Index Sequential File Organization Binary Search Interpolation Search Self-Organizing Sequential Search Direct File Organization
More informationAlgorithmic Techniques for Big Data Analysis. Barna Saha AT&T Lab-Research
Algorithmic Techniques for Big Data Analysis Barna Saha AT&T Lab-Research Challenges of Big Data VOLUME Large amount of data VELOCITY Needs to be analyzed quickly VARIETY Different types of structured
More informationBinary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R
Binary Search Trees A Generic Tree Nodes in a binary search tree ( B-S-T) are of the form P parent Key A Satellite data L R B C D E F G H I J The B-S-T has a root node which is the only node whose parent
More informationCloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman 2 In a DBMS, input is under the control of the programming staff. SQL INSERT commands or bulk loaders. Stream management is important
More informationarxiv:0810.2390v2 [cs.ds] 15 Oct 2008
Efficient Pattern Matching on Binary Strings Simone Faro 1 and Thierry Lecroq 2 arxiv:0810.2390v2 [cs.ds] 15 Oct 2008 1 Dipartimento di Matematica e Informatica, Università di Catania, Italy 2 University
More informationLecture 18 Regular Expressions
Lecture 18 Regular Expressions Many of today s web applications require matching patterns in a text document to look for specific information. A good example is parsing a html file to extract tags
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationMATHEMATICAL ENGINEERING TECHNICAL REPORTS. The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation
MATHEMATICAL ENGINEERING TECHNICAL REPORTS The Best-fit Heuristic for the Rectangular Strip Packing Problem: An Efficient Implementation Shinji IMAHORI, Mutsunori YAGIURA METR 2007 53 September 2007 DEPARTMENT
More informationIntroduction to Finite Automata
Introduction to Finite Automata Our First Machine Model Captain Pedro Ortiz Department of Computer Science United States Naval Academy SI-340 Theory of Computing Fall 2012 Captain Pedro Ortiz (US Naval
More informationData Structures for Moving Objects
Data Structures for Moving Objects Pankaj K. Agarwal Department of Computer Science Duke University Geometric Data Structures S: Set of geometric objects Points, segments, polygons Ask several queries
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationEfficient Data Structures for the Factor Periodicity Problem
Efficient Data Structures for the Factor Periodicity Problem Tomasz Kociumaka Wojciech Rytter Jakub Radoszewski Tomasz Waleń University of Warsaw, Poland SPIRE 2012 Cartagena, October 23, 2012 Tomasz Kociumaka
More informationzdelta: An Efficient Delta Compression Tool
zdelta: An Efficient Delta Compression Tool Dimitre Trendafilov Nasir Memon Torsten Suel Department of Computer and Information Science Technical Report TR-CIS-2002-02 6/26/2002 zdelta: An Efficient Delta
More informationIn-memory databases and innovations in Business Intelligence
Database Systems Journal vol. VI, no. 1/2015 59 In-memory databases and innovations in Business Intelligence Ruxandra BĂBEANU, Marian CIOBANU University of Economic Studies, Bucharest, Romania babeanu.ruxandra@gmail.com,
More informationHelvetic Coding Contest 2016
Helvetic Coding Contest 6 Solution Sketches July, 6 A Collective Mindsets Author: Christian Kauth A Strategy: In a bottom-up approach, we can determine how many brains a zombie of a given rank N needs
More informationThe Benefits of WordPress Specific Web Hosting. Jamii Corley, Southwest Cyberport
The Benefits of WordPress Specific Web Hosting Jamii Corley, Southwest Cyberport Jamii Corley Southwest Cyberport Web Hosting since 1994 Web Hosting Options Free - (Banners, type of URL you can use, owning
More informationTuring Machines, Part I
Turing Machines, Part I Languages The $64,000 Question What is a language? What is a class of languages? Computer Science Theory 2 1 Now our picture looks like Context Free Languages Deterministic Context
More informationAPP INVENTOR. Test Review
APP INVENTOR Test Review Main Concepts App Inventor Lists Creating Random Numbers Variables Searching and Sorting Data Linear Search Binary Search Selection Sort Quick Sort Abstraction Modulus Division
More informationNCPC 2013 Presentation of solutions
NCPC 2013 Presentation of solutions Head of Jury: Lukáš Poláček 2013-10-05 NCPC Jury authors Andreas Björklund (LTH) Jaap Eldering (Imperial) Daniel Espling (UMU) Christian Ledig (Imperial) Ulf Lundström
More informationFast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs
Fast Edge Splitting and Edmonds Arborescence Construction for Unweighted Graphs Anand Bhalgat Ramesh Hariharan Telikepalli Kavitha Debmalya Panigrahi Abstract Given an unweighted undirected or directed
More information