Improved Single and Multiple Approximate String Matching


 Rolf Arnold
 2 years ago
 Views:
Transcription
1 Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile CPM 04 p.1/26
2 The Problem Setting & Complexity Given text alphabet of size, find the approximate occurrences of from, allowing at most differences (edit operations). and pattern over some finite Exact matching (single pattern) lower bound: character comparisons (Yao, 79). Approximate matcing lower bound: (Chang & Marr, 94). We will search simultaneously a set of patterns. "!###! & %$ lower bound for (Fredriksson & Navarro, 2003) ( patterns CPM 04 p.2/26
3 " Previous work Only a few algorithms exist for multipattern approximate searching under the differences model. Naïve approach: search the patterns separately, using any of the single pattern search algorithms. (Muth & Manber, 1996): average time algorithm using space. The algorithm is based on hashing, and works only for. (BaezaYates & Navarro, 1997): Partitioning into exact search: on average ( preprocessing), but can be improved to. Works for. Other less interesting ones. CPM 04 p.3/26
4 (Fredriksson & Navarro, 2003): The first averageoptimal algorithm. Previous work averageoptimal up to error level. linear on average up to error level. (Hyyrö, Fredriksson & Navarro, 2004): worst case for short patterns, where is the number of bits in machine word. CPM 04 p.4/26
5 We have improved the (optimal) algorithm of (Fredriksson & Navarro, 2003) Faster in practice, and......allows error levels up to Our algorithm runs in time, which is optimal. Preprocessing time is algorithm needs., and the space, where. This work average The fastest algorithm in practice for intermediate and small. CPM 04 p.5/26
6 The method in brief: The algorithm is based on the preprocessing/filtering/verification paradigm. The preprocessing phase generates all strings of lenght, and computes their minimum distance over the set of patterns. The filtering phase searches (approximately) text grams from the patterns, using the precomputed distance table, accumulating the differences. The verification phase uses dynamic programming algorithm, and is applied to each pattern separately. CPM 04 p.6/26
7 Preprocessing Build a table as follows: 1. Choose a number in the range 2. For every string of length ( gram), search for in 3. Store in the smallest number of differences needed to match inside (a number between 0 and ). requires space for computed in entries and can be time. CPM 04 p.7/26
8 Filtering Any occurrence is at least characters long use a sliding window of characters over Invariant: all occurreces starting before the window are already reported. Read grams from right to left: " ### T: S 3 S 2 S 1 text window m k characters from the text window, Any occurrence starting at the beginning of the window must contain all the grams read. CPM 04 p.8/26
9 " Filtering Accumulate a sum of necessary differences:. If for some (i.e. the smallest) then no occurrence can contain the grams becomes, ### slide the window past the first character of E.g. T: T: " : S 3 S 2 S 1 text window m k characters new window position. CPM 04 p.9/26
10 If, then the window might contain an occurrence the occurrence can be verify the area position of the window T: S 3 S 2 S 1 Verification characters long, so, where is the starting verification area m+k characters text window m k characters The verification is done for each of the patterns, using standard dynamic programming algorithm. CPM 04 p.10/26
11 Stricter matching condition Our basic algorithm: text grams can match anywhere inside the patterns. If, then we know that no occurrence can contain the grams in any position. ### The matching area can be made smaller without losing this property. CPM 04 p.11/26
12 " " " " Stricter matching condition Consider an approximate occurrence of inside the pattern. cannot be closer than end of the pattern. positions from the For precompute a table, which considers its best match in the area rather than. In general, for preprocess a table, using the area Compute as CPM 04 p.12/26
13 Stricter matching condition P: T: D [ 1 ] Area for 1 S D [ 2 ] Area for 2 S D [ 3 ] Area for 3 S S 3 S 2 S 1 text window CPM 04 p.13/26
14 Stricter matching condition for any and the smallest that permits shifting the window is never smaller than for the basic method. this variant never examines more more windows, nor shifts less. grams, verifies Drawback: needs more space and preprocessing effort Can be slower in practice. The matching condition can be made even stricter Work less per window......but the shift can be smaller. CPM 04 p.14/26
15 Analysis It can be shown that the basic algorithm has optimal average case complexity. This holds for. The worst case complexity can be made (filtering verification). The preprocessing cost is requires space. ", and it Since the algorithm with the stricter matching condition is never worse than the basic version, it is also optimal. CPM 04 p.15/26
16 Analysis For a single pattern our complexity is the same as the algorithm of Chang & Marr, i.e.... (...but our filter works up to, whereas the filter of Chang & Marr works only up to. CPM 04 p.16/26
17 Experimental results Implementation in C, compiled using icc 7.1 with full optimizations, run in a 2GHZ Pentium 4, with 512MB RAM, running Linux Experiments for alphabet sizes (DNA) and (proteins), both random and real texts. Text lengths were 64Mb, and patterns 64 characters. In the implementation we used several practical improvements described in (Fredriksson & Navarro, 2003) Bitparallel counters Hierarchical / bitparallel verification CPM 04 p.17/26
18 Experimental results We used for DNA, and for proteins. the maximum values we can use in practice, otherwise the preprocessing cost becomes too high. Analytical results: for DNA, and (depending on ). # ## # ## for proteins Altought our algorithms are fast, in practice they cannot cope with as high difference ratios as predicted by the analysis. CPM 04 p.18/26
19 Experimental results Comparison against: CM: Our previous optimal filtering algorithm LT: Our previous linear time filter EXP: Partitioning into exact search MM: Muth & Manber algorithm, works only for ABNDM: Approximate BNDM algorithm, a single pattern approximate search algorithm extending classical BDM. BPM: Bitparallel Myers, currently the best nonfiltering algorithm for single patterns. CPM 04 p.19/26
20 Experimental results Comparison against Muth and Manber ( ): Alg. DNA MM Ours Alg. proteins MM Ours CPM 04 p.20/26
21 Experimental results, random DNA 1 time (s) 0.1 Ours, l=6 Ours, l=8 Ours, strict k Ours, strictest CM LT EXP BPM ABNDM CPM 04 p.21/26
22 Experimental results, random DNA 100 time (s) 10 1 Ours, l=6 Ours, l=8 Ours, strict k Ours, strictest CM LT EXP BPM ABNDM CPM 04 p.22/26
23 Experimental results, random proteins 10 time (s) Ours Ours, stricter Ours, strictest k CM LT EXP BPM ABNDM CPM 04 p.23/26
24 Experimental results, random proteins 100 time (s) Ours Ours, stricter Ours, strictest k CM LT EXP BPM ABNDM CPM 04 p.24/26
25 Experimental results Areas where each algorithm performs best. From left to right, DNA ( ), and proteins ( ). Top row: random data. bottom row: real data. 256 r 256 r Ours EXP Ours EXP 1 BPM k 1 E X P k r 256 r Ours EXP Ours EXP 1 BPM k 1 E X P k CPM 04 p.25/26
26 Conclusions Our new algorithm becomes the fastest for low The larger, the smaller values are tolerated. When applied to just one pattern, our algorithm becomes the fastest for low difference ratios. Our basic algorithm usually beats the extensions. True only if we use the same parameter for both algorithms. For limited memory we can use the stricter matching condition with smaller, and beat the basic algorithm Our algorithm would be favored on even longer texts (relative preprocessing cost decreases).. CPM 04 p.26/26
Improved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredrisson and Gonzalo Navarro 2 Department of Computer Science, University of Joensuu fredri@cs.joensuu.fi 2 Department of Computer Science,
More informationMultiple Approximate String Matching by. Counting. Gonzalo Navarro 12. Blanco Encalada 2120, Santiago, Chile.
Multiple Approximate String Matching by Counting Gonzalo Navarro 12 1 Dept. of Computer Science, University of Chile. Blanco Encalada 2120, Santiago, Chile. gnavarro@dcc.uchile.cl. 2 This work has been
More informationAn efficient matching algorithm for encoded DNA sequences and binary strings
An efficient matching algorithm for encoded DNA sequences and binary strings Simone Faro and Thierry Lecroq faro@dmi.unict.it, thierry.lecroq@univrouen.fr Dipartimento di Matematica e Informatica, Università
More informationApproximate Search Engine Optimization for Directory Service
Approximate Search Engine Optimization for Directory Service KaiHsiang Yang and ChiChien Pan and TzaoLin Lee Department of Computer Science and Information Engineering, National Taiwan University, Taipei,
More informationA Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro and Thierry Lecroq Università di Catania, Viale A.Doria n.6, 95125 Catania, Italy Université de Rouen, LITIS EA 4108,
More informationFast string matching
Fast string matching This exposition is based on earlier versions of this lecture and the following sources, which are all recommended reading: ShiftAnd/ShiftOr 1. Flexible Pattern Matching in Strings,
More informationA Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
More information2ND QUARTER 2006, VOLUME 8, NO. 2
ND QUARTER 6, VOLUME, NO. www.comsoc.org/pubs/surveys PROFILING AND ACCELERATING STRING MATCHING ALGORITHMS IN THREE NETWORK CONTENT SECURITY APPLICATIONS POCHING LIN, ZHIXIANG LI, AND YINGDAR LIN,
More informationLecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching
COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/
More informationMemory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging
Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can
More informationA SURVEY OF SOFTWAREBASED STRING MATCHING ALGORITHMS FOR FORENSIC ANALYSIS
A SURVEY OF SOFTWAREBASED STRING MATCHING ALGORITHMS FOR FORENSIC ANALYSIS YiChing Liao Norwegian Information Security Laboratory Gjøvik University College, Norway yiching.liao@hig.no ABSTRACT Employing
More informationContiguous Allocation. Contiguous Allocation. Free List. Dynamic StorageAllocation Problem. Fragmentation
Contiguous Allocation Main memory usually into two partitions: Resident operating system, usually held in low memory with interrupt vector. User processes then held in high memory. Memory Management 
More informationCHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 17 by Kuopao Yang
CHAPTER 6 Memory 6.1 Memory 313 6.2 Types of Memory 313 6.3 The Memory Hierarchy 315 6.3.1 Locality of Reference 318 6.4 Cache Memory 319 6.4.1 Cache Mapping Schemes 321 6.4.2 Replacement Policies 333
More informationNew Techniques for Regular Expression Searching
New Techniques for Regular Expression Searching Gonzalo Navarro Mathieu Raffinot Abstract We present two new techniques for regular expression searching and use them to derive faster practical algorithms.
More informationA PartitionBased Efficient Algorithm for Large Scale. MultipleStrings Matching
A PartitionBased Efficient Algorithm for Large Scale MultipleStrings Matching Ping Liu Jianlong Tan, Yanbing Liu Software Division, Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
More informationHash Tables. Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia University, Montreal, Canada
Hash Tables Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia University, Montreal, Canada These slides has been extracted, modified and updated from original slides of :
More informationzdelta: An Efficient Delta Compression Tool
zdelta: An Efficient Delta Compression Tool Dimitre Trendafilov Nasir Memon Torsten Suel Department of Computer and Information Science Technical Report TRCIS200202 6/26/2002 zdelta: An Efficient Delta
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationA Survey and Analysis on String Matching Techniques and its Applications
A Survey and Analysis on String Matching Techniques and its Applications Ritesh Kothari 1, Nishchol Mishra 2, Sanjeev Sharma 3,Ravindra Patel 4 1 M.Tech. Scholar, School of Information Technology, RGPV,
More informationGAST, A GENOMIC ALIGNMENT SEARCH TOOL
Kalle Karhu, Juho Mäkinen, Jussi Rautio, Jorma Tarhio Department of Computer Science and Engineering, Aalto University, Espoo, Finland {kalle.karhu, jorma.tarhio}@aalto.fi Hugh Salamon AbaSci, LLC, San
More informationOptimizing Pattern Matching for Intrusion Detection
Optimizing Pattern Matching for Intrusion Detection Marc Norton Abstract This paper presents an optimized version of the AhoCorasick [1] algorithm. This design represents a significant enhancement to
More informationarxiv:0810.2390v2 [cs.ds] 15 Oct 2008
Efficient Pattern Matching on Binary Strings Simone Faro 1 and Thierry Lecroq 2 arxiv:0810.2390v2 [cs.ds] 15 Oct 2008 1 Dipartimento di Matematica e Informatica, Università di Catania, Italy 2 University
More informationCH 7. MAIN MEMORY. Base and Limit Registers. MemoryManagement Unit (MMU) Chapter 7: Memory Management. Background. Logical vs. Physical Address Space
Chapter 7: Memory Management CH 7. MAIN MEMORY Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation adapted from textbook slides Background Base and Limit Registers
More information1.2 Block Sorting data compression The Block Sorting is a lossless data compression scheme [4]. Though its compression ratio is comparable with varian
A Fast Algorithm for Making Sux Arrays and for BurrowsWheeler Transformation Kunihiko Sadakane Department of Information Science, University of Tokyo 731 Hongo, Bunkyoku, Tokyo 113, JAPAN sada@is.s.utokyo.ac.jp
More informationMemory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 602
Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement
More informationFile Systems Management and Examples
File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationMain Memory. Memory. Address binding. Memory spaces. All processes need main memory.
Memory If we define memory as a place where data is stored there are many levels of memory: Processor registers Primary (or main) memory RAM Secondary memory slower and more permanent disks Tertiary memory
More informationCS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College April 7, 2015
CS 31: Intro to Systems Virtual Memory Kevin Webb Swarthmore College April 7, 2015 Reading Quiz Memory Abstraction goal: make every process think it has the same memory layout. MUCH simpler for compiler
More informationRobust Quick String Matching Algorithm for Network Security
18 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 Robust Quick String Matching Algorithm for Network Security Jianming Yu, 1,2 and Yibo Xue, 2,3 1 Department
More informationDistributed Optimization of Fiber Optic Network Layout using MATLAB. R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl
Distributed Optimization of Fiber Optic Network Layout using MATLAB R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and A. Uhl uhl@cosy.sbg.ac.at R. Pfarrhofer, M. Kelz, P. Bachhiesl, H. Stögner, and
More informationFaster polynomial multiplication via multipoint Kronecker substitution
Faster polynomial multiplication via multipoint Kronecker substitution 5th February 2009 Kronecker substitution KS = an algorithm for multiplying polynomials in Z[x]. Example: f = 41x 3 + 49x 2 + 38x +
More informationDFC: Accelerating. String Pattern Matching for. Network Applications
DFC: Accelerating String Pattern Matching for Network Applications 2 Trend : Popularity of Network Function Virtualization (NFV) NFV : Commodity hardware appliances Software layer  Virtualizes entire
More informationOperating Systems Memory Management
Operating Systems Memory Management ECE 344 ECE 344 Operating Systems 1 Memory Management Contiguous Memory Allocation Paged Memory Management Virtual Memory ECE 344 Operating Systems 2 Binding of Instructions
More informationPractical issues in DIY RAID Recovery
www.freeraidrecovery.com Practical issues in DIY RAID Recovery Based on years of technical support experience 2012 www.freeraidrecovery.com This guide is provided to supplement our ReclaiMe Free RAID Recovery
More informationChapter 8: Memory Management
Chapter 8: Memory Management Chapter 8: Memory Management Background Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging 8.2 Memory Management Examine basic (not virtual) memory
More informationStorage Management for Files of Dynamic Records
Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science
More informationLongest Common Extensions via Fingerprinting
Longest Common Extensions via Fingerprinting Philip Bille, Inge Li Gørtz, and Jesper Kristensen Technical University of Denmark, DTU Informatics, Copenhagen, Denmark Abstract. The longest common extension
More informationA3 Computer Architecture
A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory
More informationContributing Efforts of Various String Matching Methodologies in Real World Applications
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume4, IssueI EISSN: 23472693 Contributing Efforts of Various String Matching Methodologies in Real World Applications
More informationA Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Journal of Computer Science 4 (5): 393401, 2008 ISSN 15493636 2008 Science Publications A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) Amjad Hudaib, Rola AlKhalid, Dima Suleiman, Mariam
More informationSearch Trees for Strings
Search Trees for Strings A balanced binary search tree is a powerful data structure that stores a set of objects and supports many operations including: Insert and Delete. Lookup: Find if a given object
More informationObservations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications
Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications Roman Pfarrhofer and Andreas Uhl uhl@cosy.sbg.ac.at R. Pfarrhofer & A. Uhl 1 Carinthia Tech Institute
More informationChapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
More informationThe Assembly Language Level
The Assembly Language Level Translators can be divided into two groups. When the source language is essentially a symbolic representation for a numerical machine language, the translator is called an assembler,
More informationLongest Common Extensions via Fingerprinting
Longest Common Extensions via Fingerprinting Philip Bille Inge Li Gørtz Jesper Kristensen Technical University of Denmark DTU Informatics LATA, March 9, 2012 1 / 17 Contents Introduction The LCE Problem
More informationAlgorithms. Theresa MiglerVonDollen CMPS 5P
Algorithms Theresa MiglerVonDollen CMPS 5P 1 / 32 Algorithms Write a Python function that accepts a list of numbers and a number, x. If x is in the list, the function returns the position in the list
More informationDictionaries and Hash Tables
Dictionaries and Hash Tables 0 1 2 3 0256120001 9811010002 4 4512290004 Dictionaries and Hash Tables 1 Dictionary ADT ( 8.1.1) The dictionary ADT models a searchable collection of keyelement items
More informationCommon Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data
More informationHardware and Software Requirements for Installing California.pro
Hardware and Requirements for Installing California.pro This document lists the hardware and software requirements to install and run California.pro. Workstation with SQL Server type: Pentium IVcompatible
More informationTechnical Information. Digital Signals. 1 bit. Part 1 Fundamentals
Technical Information Digital Signals 1 1 bit Part 1 Fundamentals t Technical Information Part 1: Fundamentals Part 2: Selfoperated Regulators Part 3: Control Valves Part 4: Communication Part 5: Building
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationcharacter E T A S R I O D frequency
Data Compression Data compression is any process by which a digital (e.g. electronic) file may be transformed to another ( compressed ) file, such that the original file may be fully recovered from the
More informationHow to prove an algorithm is correct?
How to prove an algorithm is correct? To prove the incorrectness of an algorithm, one counterexample is enough. Proving the correctness of an algorithm is similar to proving a mathematical theorem; fundamentally,
More informationPaging & Segmentation
& Frédéric Haziza Department of Computer Systems Uppsala University Spring 2007 Outline 1 Paging Implementation Protection Sharing 2 Setup Implementation Definition Paging Memorymanagement
More informationMemory Management (Ch.9)
Memory Management (Ch.9)! Background! Address Binding  Linking and Loading! Swapping! Memory Protection! Contiguous Memory Allocation! Paging! Segmentation! Combined Paging and Segmentation Silberschatz
More informationSux $NHf3S
Sux Array9=@.%"%k%4%j%:% $NHf3S Dj7s K.I El5~Bg3XBg3X1!M}3X7O8&5f2J>pJs2J3X@l96 ") 113 El5~ETJ85~6hK\6? 731 sada@is.s.utokyo.ac.jp J8=q%G!
More informationAPP INVENTOR. Test Review
APP INVENTOR Test Review Main Concepts App Inventor Lists Creating Random Numbers Variables Searching and Sorting Data Linear Search Binary Search Selection Sort Quick Sort Abstraction Modulus Division
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationScalable Machine Learning  or what to do with all that Big Data infrastructure
 or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Clickthrough prediction Personalized Spam Detection
More informationLecture 4: Memory Management
Lecture 4: Memory Management Background Program must be brought into memory and placed within a process for it to be run Input queue collection of processes on the disk that are waiting to be brought into
More informationInteger multiplication
Integer multiplication Suppose we have two unsigned integers, A and B, and we wish to compute their product. Let A be the multiplicand and B the multiplier: A n 1... A 1 A 0 multiplicand B n 1... B 1 B
More informationLoad Balancing in MapReduce Based on Scalable Cardinality Estimates
Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching
More informationarm DBMS File Organization, Indexes 1. Basics of Hard Disks
DBMS File Organization, Indexes 1. Basics of Hard Disks All data in a DB is stored on hard disks (HD). In fact, all files and the way they are organised (e.g. the familiar tree of folders and subfolders
More informationCache Mapping. COMP375 Computer Architecture and Organization
Cache Mapping COMP375 Computer Architecture and Organization The only problem in computer architecture that is really hard to overcome is not having enough address bits. Gordon Bell Exam on Wednesday The
More informationPERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0
PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer
More informationMemory Management. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum
Memory Management Yücel Saygın These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum 1 Memory Management Ideally programmers want memory that is large fast non volatile
More informationLZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b
LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =
More informationA. V. Gerbessiotis CS Spring 2014 PS 3 Mar 24, 2014 No points
A. V. Gerbessiotis CS 610102 Spring 2014 PS 3 Mar 24, 2014 No points Problem 1. Suppose that we insert n keys into a hash table of size m using open addressing and uniform hashing. Let p(n, m) be the
More informationIncreasing Performance of ext3 with USB Flash Drives
Increasing Performance of ext3 with USB Flash Drives Robbie Hott December 2, 2005 1 Introduction There has been a mass movement in operating systems to Journaling File Systems, such as ext3 and NTFS. Journaling
More information3. Memory Management
Principles of Operating Systems CS 446/646 3. Memory Management René Doursat Department of Computer Science & Engineering University of Nevada, Reno Spring 2006 Principles of Operating Systems CS 446/646
More informationSignal Compression Survey of the lectures Hints for exam
Signal Compression Survey of the lectures Hints for exam Chapter 1 Use one statement to define the three basic signal compression problems. Answer: (1) designing a good code for an independent source;
More informationPerformance Example memory access time = 100 nanoseconds swap fault overhead = 25 msec page fault rate = 1/1000 EAT = (1p) * p * (25 msec)
Memory Management Outline Operating Systems Processes Memory Management Basic Paging Virtual memory Virtual Memory Motivation Demand Paging Logical address space larger than physical memory Virtual Memory
More informationApproximate String Matching in DNA Sequences
Approximate String Matching in DNA Sequences LokLam Cheng David W. Cheung SiuMing Yiu Department of Computer Science and Infomation Systems, The University of Hong Kong, Pokflum Road, Hong Kong {llcheng,dcheung,smyiu}@csis.hku.hk
More informationThe enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images Borusyak A.V. Research Institute of Applied Mathematics and Cybernetics Lobachevsky Nizhni Novgorod
More informationA binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heaporder property
CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called
More informationDocument Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Condro Wibawa, Irwan Bastian, Metty Mustikasari Department of Information Systems, Faculty of Computer Science and
More informationSystem Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems)
System Design&Methodologies Fö 1&21 System Design&Methodologies Fö 1&22 Course Information System Design and Methodology/ Embedded Systems Design (Modeling and Design of Embedded Systems) TDTS30/TDDI08
More informationChapter 2 Basic Structure of Computers. JinFu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers JinFu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
More informationSymbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
More informationOlumide O. Owolabi Department of Computer Science. University of Abuja, Abuja, Nigeria
Volume 6, Issue 5, May 206 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Indexed Method
More informationHow to recover a failed Storage Spaces
www.storagespacesrecovery.com How to recover a failed Storage Spaces ReclaiMe Storage Spaces Recovery User Manual 2013 www.storagespacesrecovery.com Contents Overview... 4 Storage Spaces concepts and
More information14.1 Rentorbuy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationCompressed Text Indexes with Fast Locate
Compressed Text Indexes with Fast Locate Rodrigo González and Gonzalo Navarro Dept. of Computer Science, University of Chile. {rgonzale,gnavarro}@dcc.uchile.cl Abstract. Compressed text (self)indexes
More informationChapter 8: Memory Management!
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still
More informationBinary Search. Search for x in a sorted array A.
Divide and Conquer A general paradigm for algorithm design; inspired by emperors and colonizers. Threestep process: 1. Divide the problem into smaller problems. 2. Conquer by solving these problems. 3.
More informationComputing basics. Ruurd Kuiper
Computing basics Ruurd Kuiper October 29, 2009 Overview (cf Schaum Chapter 1) Basic computing science is about using computers to do things for us. These things amount to processing data. The way a computer
More informationNAND Flash Memories. Using Linux MTD compatible mode. on ELNEC Universal Device Programmers. (Quick Guide)
NAND Flash Memories Using Linux MTD compatible mode on ELNEC Universal Device Programmers (Quick Guide) Application Note April 2012 an_elnec_linux_mtd, version 1.04 Version 1.04/04.2012 Page 1 of 16 As
More informationMicrosoft Office Outlook 2013: Part 1
Microsoft Office Outlook 2013: Part 1 Course Specifications Course Length: 1 day Overview: Email has become one of the most widely used methods of communication, whether for personal or business communications.
More informationA Visual Basic Software for Computing Fisher s Exact Probability
A Visual Basic Software for Computing Fisher s Exact Probability Haseeb Ahmad Khan Research Center, Armed Forces Hospital, Riyadh, Saudi Arabia. ABSTRACT Fisher s exact test (FET) is an important statistical
More informationA FAST STRING MATCHING ALGORITHM
Ravendra Singh et al, Int. J. Comp. Tech. Appl., Vol 2 (6),877883 A FAST STRING MATCHING ALGORITHM H N Verma, 2 Ravendra Singh Department of CSE, Sachdeva Institute of Technology, Mathura, India, hnverma@rediffmail.com
More informationAddressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)
Addressing The problem Objectives: When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stackmachine concept Slide
More informationPublic Key Cryptography. Performance Comparison and Benchmarking
Public Key Cryptography Performance Comparison and Benchmarking Tanja Lange Department of Mathematics Technical University of Denmark tanja@hyperelliptic.org 28.08.2006 Tanja Lange Benchmarking p. 1 What
More informationIn mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write
Big O notation (with a capital letter O, not a zero), also called Landau's symbol, is a symbolism used in complexity theory, computer science, and mathematics to describe the asymptotic behavior of functions.
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationSorting algorithms. CS 127 Fall A brief diversion: What's on the midterm. Heaps: what to know. Binary trees: What to know
CS 127 Fall 2003 Sorting algorithms A brief diversion: What's on the midterm Same format as last time 5 questions, 20 points each, 100 points total 15% of your grade 2 topics: trees, trees, and more trees
More informationBreaking An IdentityBased Encryption Scheme based on DHIES
Breaking An IdentityBased Encryption Scheme based on DHIES Martin R. Albrecht 1 Kenneth G. Paterson 2 1 SALSA Project  INRIA, UPMC, Univ Paris 06 2 Information Security Group, Royal Holloway, University
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationAn OnLine Algorithm for Checkpoint Placement
An OnLine Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM  Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute
More informationAlgorithms. Introduction to C. Writing C Programs. Topics
Algorithms Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs the smallest integer. (Hint: Think about keeping
More information