Improved Single and Multiple Approximate String Matching
|
|
- Rolf Arnold
- 8 years ago
- Views:
Transcription
1 Improved Single and Multiple Approximate String Matching Kimmo Fredriksson Department of Computer Science, University of Joensuu, Finland Gonzalo Navarro Department of Computer Science, University of Chile CPM 04 p.1/26
2 The Problem Setting & Complexity Given text alphabet of size, find the approximate occurrences of from, allowing at most differences (edit operations). and pattern over some finite Exact matching (single pattern) lower bound: character comparisons (Yao, 79). Approximate matcing lower bound: (Chang & Marr, 94). We will search simultaneously a set of patterns. "!###! & %$ lower bound for (Fredriksson & Navarro, 2003) ( patterns CPM 04 p.2/26
3 " Previous work Only a few algorithms exist for multipattern approximate searching under the differences model. Naïve approach: search the patterns separately, using any of the single pattern search algorithms. (Muth & Manber, 1996): average time algorithm using space. The algorithm is based on hashing, and works only for. (Baeza-Yates & Navarro, 1997): Partitioning into exact search: on average ( preprocessing), but can be improved to. Works for. Other less interesting ones. CPM 04 p.3/26
4 (Fredriksson & Navarro, 2003): The first average-optimal algorithm. Previous work average-optimal up to error level. linear on average up to error level. (Hyyrö, Fredriksson & Navarro, 2004): worst case for short patterns, where is the number of bits in machine word. CPM 04 p.4/26
5 We have improved the (optimal) algorithm of (Fredriksson & Navarro, 2003) Faster in practice, and......allows error levels up to Our algorithm runs in time, which is optimal. Preprocessing time is algorithm needs., and the space, where. This work average The fastest algorithm in practice for intermediate and small. CPM 04 p.5/26
6 The method in brief: The algorithm is based on the preprocessing/filtering/verification paradigm. The preprocessing phase generates all strings of lenght, and computes their minimum distance over the set of patterns. The filtering phase searches (approximately) text -grams from the patterns, using the precomputed distance table, accumulating the differences. The verification phase uses dynamic programming algorithm, and is applied to each pattern separately. CPM 04 p.6/26
7 Preprocessing Build a table as follows: 1. Choose a number in the range 2. For every string of length ( gram), search for in 3. Store in the smallest number of differences needed to match inside (a number between 0 and ). requires space for computed in entries and can be time. CPM 04 p.7/26
8 Filtering Any occurrence is at least characters long use a sliding window of characters over Invariant: all occurreces starting before the window are already reported. Read -grams from right to left: " ### T: S 3 S 2 S 1 text window m k characters from the text window, Any occurrence starting at the beginning of the window must contain all the -grams read. CPM 04 p.8/26
9 " Filtering Accumulate a sum of necessary differences:. If for some (i.e. the smallest) then no occurrence can contain the -grams becomes, ### slide the window past the first character of E.g. T: T: " : S 3 S 2 S 1 text window m k characters new window position. CPM 04 p.9/26
10 If, then the window might contain an occurrence the occurrence can be verify the area position of the window T: S 3 S 2 S 1 Verification characters long, so, where is the starting verification area m+k characters text window m k characters The verification is done for each of the patterns, using standard dynamic programming algorithm. CPM 04 p.10/26
11 Stricter matching condition Our basic algorithm: text -grams can match anywhere inside the patterns. If, then we know that no occurrence can contain the -grams in any position. ### The matching area can be made smaller without losing this property. CPM 04 p.11/26
12 " " " " Stricter matching condition Consider an approximate occurrence of inside the pattern. cannot be closer than end of the pattern. positions from the For precompute a table, which considers its best match in the area rather than. In general, for preprocess a table, using the area Compute as CPM 04 p.12/26
13 Stricter matching condition P: T: D [ 1 ] Area for 1 S D [ 2 ] Area for 2 S D [ 3 ] Area for 3 S S 3 S 2 S 1 text window CPM 04 p.13/26
14 Stricter matching condition for any and the smallest that permits shifting the window is never smaller than for the basic method. this variant never examines more more windows, nor shifts less. -grams, verifies Drawback: needs more space and preprocessing effort Can be slower in practice. The matching condition can be made even stricter Work less per window......but the shift can be smaller. CPM 04 p.14/26
15 Analysis It can be shown that the basic algorithm has optimal average case complexity. This holds for. The worst case complexity can be made (filtering verification). The preprocessing cost is requires space. ", and it Since the algorithm with the stricter matching condition is never worse than the basic version, it is also optimal. CPM 04 p.15/26
16 Analysis For a single pattern our complexity is the same as the algorithm of Chang & Marr, i.e.... (...but our filter works up to, whereas the filter of Chang & Marr works only up to. CPM 04 p.16/26
17 Experimental results Implementation in C, compiled using icc 7.1 with full optimizations, run in a 2GHZ Pentium 4, with 512MB RAM, running Linux Experiments for alphabet sizes (DNA) and (proteins), both random and real texts. Text lengths were 64Mb, and patterns 64 characters. In the implementation we used several practical improvements described in (Fredriksson & Navarro, 2003) Bit-parallel counters Hierarchical / bit-parallel verification CPM 04 p.17/26
18 Experimental results We used for DNA, and for proteins. the maximum values we can use in practice, otherwise the preprocessing cost becomes too high. Analytical results: for DNA, and (depending on ). # ## # ## for proteins Altought our algorithms are fast, in practice they cannot cope with as high difference ratios as predicted by the analysis. CPM 04 p.18/26
19 Experimental results Comparison against: CM: Our previous optimal filtering algorithm LT: Our previous linear time filter EXP: Partitioning into exact search MM: Muth & Manber algorithm, works only for ABNDM: Approximate BNDM algorithm, a single pattern approximate search algorithm extending classical BDM. BPM: Bit-parallel Myers, currently the best non-filtering algorithm for single patterns. CPM 04 p.19/26
20 Experimental results Comparison against Muth and Manber ( ): Alg. DNA MM Ours Alg. proteins MM Ours CPM 04 p.20/26
21 Experimental results, random DNA 1 time (s) 0.1 Ours, l=6 Ours, l=8 Ours, strict k Ours, strictest CM LT EXP BPM ABNDM CPM 04 p.21/26
22 Experimental results, random DNA 100 time (s) 10 1 Ours, l=6 Ours, l=8 Ours, strict k Ours, strictest CM LT EXP BPM ABNDM CPM 04 p.22/26
23 Experimental results, random proteins 10 time (s) Ours Ours, stricter Ours, strictest k CM LT EXP BPM ABNDM CPM 04 p.23/26
24 Experimental results, random proteins 100 time (s) Ours Ours, stricter Ours, strictest k CM LT EXP BPM ABNDM CPM 04 p.24/26
25 Experimental results Areas where each algorithm performs best. From left to right, DNA ( ), and proteins ( ). Top row: random data. bottom row: real data. 256 r 256 r Ours EXP Ours EXP 1 BPM k 1 E X P k r 256 r Ours EXP Ours EXP 1 BPM k 1 E X P k CPM 04 p.25/26
26 Conclusions Our new algorithm becomes the fastest for low The larger, the smaller values are tolerated. When applied to just one pattern, our algorithm becomes the fastest for low difference ratios. Our basic algorithm usually beats the extensions. True only if we use the same parameter for both algorithms. For limited memory we can use the stricter matching condition with smaller, and beat the basic algorithm Our algorithm would be favored on even longer texts (relative preprocessing cost decreases).. CPM 04 p.26/26
Improved Single and Multiple Approximate String Matching
Improved Single and Multiple Approximate String Matching Kimmo Fredrisson and Gonzalo Navarro 2 Department of Computer Science, University of Joensuu fredri@cs.joensuu.fi 2 Department of Computer Science,
More informationApproximate Search Engine Optimization for Directory Service
Approximate Search Engine Optimization for Directory Service Kai-Hsiang Yang and Chi-Chien Pan and Tzao-Lin Lee Department of Computer Science and Information Engineering, National Taiwan University, Taipei,
More informationAn efficient matching algorithm for encoded DNA sequences and binary strings
An efficient matching algorithm for encoded DNA sequences and binary strings Simone Faro and Thierry Lecroq faro@dmi.unict.it, thierry.lecroq@univ-rouen.fr Dipartimento di Matematica e Informatica, Università
More informationA Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
More informationA Multiple Sliding Windows Approach to Speed Up String Matching Algorithms
A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro and Thierry Lecroq Università di Catania, Viale A.Doria n.6, 95125 Catania, Italy Université de Rouen, LITIS EA 4108,
More informationFast string matching
Fast string matching This exposition is based on earlier versions of this lecture and the following sources, which are all recommended reading: Shift-And/Shift-Or 1. Flexible Pattern Matching in Strings,
More informationLecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching
COSC 348: Computing for Bioinformatics Definitions A pattern (keyword) is an ordered sequence of symbols. Lecture 4: Exact string searching algorithms Lubica Benuskova http://www.cs.otago.ac.nz/cosc348/
More information2ND QUARTER 2006, VOLUME 8, NO. 2
ND QUARTER 6, VOLUME, NO. www.comsoc.org/pubs/surveys PROFILING AND ACCELERATING STRING MATCHING ALGORITHMS IN THREE NETWORK CONTENT SECURITY APPLICATIONS PO-CHING LIN, ZHI-XIANG LI, AND YING-DAR LIN,
More informationMemory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging
Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can
More informationSIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
More informationNew Techniques for Regular Expression Searching
New Techniques for Regular Expression Searching Gonzalo Navarro Mathieu Raffinot Abstract We present two new techniques for regular expression searching and use them to derive faster practical algorithms.
More informationA Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching
A Partition-Based Efficient Algorithm for Large Scale Multiple-Strings Matching Ping Liu Jianlong Tan, Yanbing Liu Software Division, Institute of Computing Technology, Chinese Academy of Sciences, Beijing,
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationOptimizing Pattern Matching for Intrusion Detection
Optimizing Pattern Matching for Intrusion Detection Marc Norton Abstract This paper presents an optimized version of the Aho-Corasick [1] algorithm. This design represents a significant enhancement to
More informationzdelta: An Efficient Delta Compression Tool
zdelta: An Efficient Delta Compression Tool Dimitre Trendafilov Nasir Memon Torsten Suel Department of Computer and Information Science Technical Report TR-CIS-2002-02 6/26/2002 zdelta: An Efficient Delta
More informationFaster polynomial multiplication via multipoint Kronecker substitution
Faster polynomial multiplication via multipoint Kronecker substitution 5th February 2009 Kronecker substitution KS = an algorithm for multiplying polynomials in Z[x]. Example: f = 41x 3 + 49x 2 + 38x +
More informationStorage Management for Files of Dynamic Records
Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science
More informationGAST, A GENOMIC ALIGNMENT SEARCH TOOL
Kalle Karhu, Juho Mäkinen, Jussi Rautio, Jorma Tarhio Department of Computer Science and Engineering, Aalto University, Espoo, Finland {kalle.karhu, jorma.tarhio}@aalto.fi Hugh Salamon AbaSci, LLC, San
More informationarxiv:0810.2390v2 [cs.ds] 15 Oct 2008
Efficient Pattern Matching on Binary Strings Simone Faro 1 and Thierry Lecroq 2 arxiv:0810.2390v2 [cs.ds] 15 Oct 2008 1 Dipartimento di Matematica e Informatica, Università di Catania, Italy 2 University
More informationTechnical Information. Digital Signals. 1 bit. Part 1 Fundamentals
Technical Information Digital Signals 1 1 bit Part 1 Fundamentals t Technical Information Part 1: Fundamentals Part 2: Self-operated Regulators Part 3: Control Valves Part 4: Communication Part 5: Building
More informationFile Systems Management and Examples
File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size
More informationContributing Efforts of Various String Matching Methodologies in Real World Applications
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-I E-ISSN: 2347-2693 Contributing Efforts of Various String Matching Methodologies in Real World Applications
More informationLongest Common Extensions via Fingerprinting
Longest Common Extensions via Fingerprinting Philip Bille, Inge Li Gørtz, and Jesper Kristensen Technical University of Denmark, DTU Informatics, Copenhagen, Denmark Abstract. The longest common extension
More informationA3 Computer Architecture
A3 Computer Architecture Engineering Science 3rd year A3 Lectures Prof David Murray david.murray@eng.ox.ac.uk www.robots.ox.ac.uk/ dwm/courses/3co Michaelmas 2000 1 / 1 6. Stacks, Subroutines, and Memory
More informationA Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)
Journal of Computer Science 4 (5): 393-401, 2008 ISSN 1549-3636 2008 Science Publications A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW) Amjad Hudaib, Rola Al-Khalid, Dima Suleiman, Mariam
More informationChapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
More informationRobust Quick String Matching Algorithm for Network Security
18 IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.7B, July 26 Robust Quick String Matching Algorithm for Network Security Jianming Yu, 1,2 and Yibo Xue, 2,3 1 Department
More informationCommon Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationLongest Common Extensions via Fingerprinting
Longest Common Extensions via Fingerprinting Philip Bille Inge Li Gørtz Jesper Kristensen Technical University of Denmark DTU Informatics LATA, March 9, 2012 1 / 17 Contents Introduction The LCE Problem
More informationPractical issues in DIY RAID Recovery
www.freeraidrecovery.com Practical issues in DIY RAID Recovery Based on years of technical support experience 2012 www.freeraidrecovery.com This guide is provided to supplement our ReclaiMe Free RAID Recovery
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationLoad Balancing in MapReduce Based on Scalable Cardinality Estimates
Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching
More informationA binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property
CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationScalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
More informationApproximate String Matching in DNA Sequences
Approximate String Matching in DNA Sequences Lok-Lam Cheng David W. Cheung Siu-Ming Yiu Department of Computer Science and Infomation Systems, The University of Hong Kong, Pokflum Road, Hong Kong {llcheng,dcheung,smyiu}@csis.hku.hk
More informationThe enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images
The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images Borusyak A.V. Research Institute of Applied Mathematics and Cybernetics Lobachevsky Nizhni Novgorod
More informationLZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b
LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =
More informationHigh-performance local search for planning maintenance of EDF nuclear park
High-performance local search for planning maintenance of EDF nuclear park Frédéric Gardi Karim Nouioua Bouygues e-lab, Paris fgardi@bouygues.com Laboratoire d'informatique Fondamentale - CNRS UMR 6166,
More informationObservations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications
Observations on Data Distribution and Scalability of Parallel and Distributed Image Processing Applications Roman Pfarrhofer and Andreas Uhl uhl@cosy.sbg.ac.at R. Pfarrhofer & A. Uhl 1 Carinthia Tech Institute
More informationUdacity cs101: Building a Search Engine. Extracting a Link
Udacity cs101: Building a Search Engine Unit 1: How to get started: your first program Extracting a Link Introducing the Web Crawler (Video: Web Crawler)... 2 Quiz (Video: First Quiz)...2 Programming (Video:
More informationSIMS 255 Foundations of Software Design. Complexity and NP-completeness
SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 mdw@cs.berkeley.edu 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity
More informationPublic Key Cryptography. Performance Comparison and Benchmarking
Public Key Cryptography Performance Comparison and Benchmarking Tanja Lange Department of Mathematics Technical University of Denmark tanja@hyperelliptic.org 28.08.2006 Tanja Lange Benchmarking p. 1 What
More informationAn On-Line Algorithm for Checkpoint Placement
An On-Line Algorithm for Checkpoint Placement Avi Ziv IBM Israel, Science and Technology Center MATAM - Advanced Technology Center Haifa 3905, Israel avi@haifa.vnat.ibm.com Jehoshua Bruck California Institute
More informationSecure Way of Storing Data in Cloud Using Third Party Auditor
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 12, Issue 4 (Jul. - Aug. 2013), PP 69-74 Secure Way of Storing Data in Cloud Using Third Party Auditor 1 Miss.
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationChapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationFaster deterministic integer factorisation
David Harvey (joint work with Edgar Costa, NYU) University of New South Wales 25th October 2011 The obvious mathematical breakthrough would be the development of an easy way to factor large prime numbers
More informationSymbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
More informationAPP INVENTOR. Test Review
APP INVENTOR Test Review Main Concepts App Inventor Lists Creating Random Numbers Variables Searching and Sorting Data Linear Search Binary Search Selection Sort Quick Sort Abstraction Modulus Division
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationIn mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write
Big O notation (with a capital letter O, not a zero), also called Landau's symbol, is a symbolism used in complexity theory, computer science, and mathematics to describe the asymptotic behavior of functions.
More informationTheory of Computation Chapter 2: Turing Machines
Theory of Computation Chapter 2: Turing Machines Guan-Shieng Huang Feb. 24, 2003 Feb. 19, 2006 0-0 Turing Machine δ K 0111000a 01bb 1 Definition of TMs A Turing Machine is a quadruple M = (K, Σ, δ, s),
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationCryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Karagpur Lecture No. #06 Cryptanalysis of Classical Ciphers (Refer
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationDocument Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model
Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Condro Wibawa, Irwan Bastian, Metty Mustikasari Department of Information Systems, Faculty of Computer Science and
More informationWhitepaper. Innovations in Business Intelligence Database Technology. www.sisense.com
Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationAddressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)
Addressing The problem Objectives:- When & Where do we encounter Data? The concept of addressing data' in computations The implications for our machine design(s) Introducing the stack-machine concept Slide
More informationHardware and Software Requirements for Installing California.pro
Hardware and Requirements for Installing California.pro This document lists the hardware and software requirements to install and run California.pro. Workstation with SQL Server type: Pentium IV-compatible
More informationHardware Configuration Guide
Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...
More informationA hierarchical multicriteria routing model with traffic splitting for MPLS networks
A hierarchical multicriteria routing model with traffic splitting for MPLS networks João Clímaco, José Craveirinha, Marta Pascoal jclimaco@inesccpt, jcrav@deecucpt, marta@matucpt University of Coimbra
More informationApproaches to Qualitative Evaluation of the Software Quality Attributes: Overview
4th International Conference on Software Methodologies, Tools and Techniques Approaches to Qualitative Evaluation of the Software Quality Attributes: Overview Presented by: Denis Kozlov Department of Computer
More informationBootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
More informationHow to recover a failed Storage Spaces
www.storage-spaces-recovery.com How to recover a failed Storage Spaces ReclaiMe Storage Spaces Recovery User Manual 2013 www.storage-spaces-recovery.com Contents Overview... 4 Storage Spaces concepts and
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationPERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0
PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0 15 th January 2014 Al Chrosny Director, Software Engineering TreeAge Software, Inc. achrosny@treeage.com Andrew Munzer Director, Training and Customer
More informationCompressed Text Indexes with Fast Locate
Compressed Text Indexes with Fast Locate Rodrigo González and Gonzalo Navarro Dept. of Computer Science, University of Chile. {rgonzale,gnavarro}@dcc.uchile.cl Abstract. Compressed text (self-)indexes
More informationProtecting Websites from Dissociative Identity SQL Injection Attacka Patch for Human Folly
International Journal of Computer Sciences and Engineering Open Access ReviewPaper Volume-4, Special Issue-2, April 2016 E-ISSN: 2347-2693 Protecting Websites from Dissociative Identity SQL Injection Attacka
More informationRethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
More informationStructure for String Keys
Burst Tries: A Fast, Efficient Data Structure for String Keys Steen Heinz Justin Zobel Hugh E. Williams School of Computer Science and Information Technology, RMIT University Presented by Margot Schips
More informationGeneralized Widening
Generalized Widening Tristan Cazenave Abstract. We present a new threat based search algorithm that outperforms other threat based search algorithms and selective knowledge-based for open life and death
More informationHYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE
HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE Subodha Kumar University of Washington subodha@u.washington.edu Varghese S. Jacob University of Texas at Dallas vjacob@utdallas.edu
More informationSpark. Fast, Interactive, Language- Integrated Cluster Computing
Spark Fast, Interactive, Language- Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC
More informationBranch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
More informationIMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationIncreasing Flash Throughput for Big Data Applications (Data Management Track)
Scale Simplify Optimize Evolve Increasing Flash Throughput for Big Data Applications (Data Management Track) Flash Memory 1 Industry Context Addressing the challenge A proposed solution Review of the Benefits
More informationFast Sequential Summation Algorithms Using Augmented Data Structures
Fast Sequential Summation Algorithms Using Augmented Data Structures Vadim Stadnik vadim.stadnik@gmail.com Abstract This paper provides an introduction to the design of augmented data structures that offer
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationProbability Using Dice
Using Dice One Page Overview By Robert B. Brown, The Ohio State University Topics: Levels:, Statistics Grades 5 8 Problem: What are the probabilities of rolling various sums with two dice? How can you
More information! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
More informationThe Relative Worst Order Ratio for On-Line Algorithms
The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, joan@imada.sdu.dk
More informationChapter 2 Data Storage
Chapter 2 22 CHAPTER 2. DATA STORAGE 2.1. THE MEMORY HIERARCHY 23 26 CHAPTER 2. DATA STORAGE main memory, yet is essentially random-access, with relatively small differences Figure 2.4: A typical
More informationR-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions
More informationMoving Target Search. 204 Automated Reasoning
Moving Target Search Toru Ishida NTT Communications and Information Processing Laboratories 1-2356, Take, Yokosuka, 238-03, JAPAN ishida%nttkb.ntt.jp@relay.cs.net Richard E. Korf Computer Science Department
More informationWHY USE OVM FOR ORACLE DATABASES. Prepared by: Francisco Munoz Alvarez Oracle Professional Services Manager. June 2013 V 1.0.
WHY USE OVM FOR ORACLE DATABASES Prepared by: Francisco Munoz Alvarez Oracle Professional Services Manager June 2013 V 1.0 Page 1 of 15 TABLE OF CONTENTS The Author... 3 The Benchmark... 3 The Environment...
More informationDistributed storage for structured data
Distributed storage for structured data Dennis Kafura CS5204 Operating Systems 1 Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics
More informationA FAST STRING MATCHING ALGORITHM
Ravendra Singh et al, Int. J. Comp. Tech. Appl., Vol 2 (6),877-883 A FAST STRING MATCHING ALGORITHM H N Verma, 2 Ravendra Singh Department of CSE, Sachdeva Institute of Technology, Mathura, India, hnverma@rediffmail.com
More informationNAND Flash Memories. Understanding NAND Flash Factory Pre-Programming. Schemes
NAND Flash Memories Understanding NAND Flash Factory Pre-Programming Schemes Application Note February 2009 an_elnec_nand_schemes, version 1.00 Version 1.00/02.2009 Page 1 of 20 NAND flash technology enables
More informationMicrosoft Office Outlook 2013: Part 1
Microsoft Office Outlook 2013: Part 1 Course Specifications Course Length: 1 day Overview: Email has become one of the most widely used methods of communication, whether for personal or business communications.
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationNAND Flash Memories. Using Linux MTD compatible mode. on ELNEC Universal Device Programmers. (Quick Guide)
NAND Flash Memories Using Linux MTD compatible mode on ELNEC Universal Device Programmers (Quick Guide) Application Note April 2012 an_elnec_linux_mtd, version 1.04 Version 1.04/04.2012 Page 1 of 16 As
More informationExtended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms
Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms Daniil Chivilikhin PhD student ITMO University Vladimir Ulyantsev PhD student ITMO University Anatoly Shalyto Dr.Sci.,
More information1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.
Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S
More informationEfficiently Identifying Inclusion Dependencies in RDBMS
Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany bauckmann@informatik.hu-berlin.de
More information