Practical Survey on Hash Tables. Aurelian Țuțuianu

Size: px
Start display at page:

Download "Practical Survey on Hash Tables. Aurelian Țuțuianu"

Transcription

1 Practical Survey on Hash Tables Aurelian Țuțuianu

2 In memoriam Mihai Pătraşcu (17 July June 2012) I have no intention to ever teach computer science. I want to teach the love for computer science, and let the learning happen. Teaching Statement (

3 Abstract Hash table definition Collision resolving schemas: Chained hashing Linear and quadratic probing Cuckoo hashing Some hash function theory Simple tabulation hashing

4 Omnipresence of hash tables Symbol tables in compilers Cache implementations Database storages Manage memory pages in Linux Route tables Large number of documents

5 Hash Tables Considering a set of elements S from a finite and much larger universe U. A hash table consists of: hash function h: U {0,.., m 1} vector v of size m

6 Collisions same hash for two different keys What to do? Ignore them Chain colliding values Skip and try again Hash and displace Find a perfect hash function

7 War Story: cache with hash tables application Problem: An application which gets some data from an expensive repository. hash table Data source Solution: Hash table with collision replacement. Key point: a big chunk of users watched a lot of common data.

8 Collision Resolution Schemas Chained hashing Open hash: linear and quadratic probing Cuckoo hashing And many many others: perfect hashing, coalesced hashing, Robin Hood hashing, hopscotch hashing, etc.

9 Chained Hashing 0 Each slot contains a linked list. 1 O( n m ) = O(1) for all operations. 2 y Load factor: n m < x z w easy to implement works with weak hash functions consumes significant memory default implementation

10 Linear and quadratic probing All records are stored in the bucket array itself. h(x,i) = 4 + i w y z x Probe a try to find an empty place. Linear probing h x, i = h 0 (x) + i Quadratic probing i + i i h x, i = h 0 (x) + 2

11 War Story: Linear probing trick Min. 1st Qu. Median Mean 3rd Qu. Max linear probing chained hashing

12 War Story: Let it be quadratic! Replace library implementation with a home-made hash table 4 hours of work

13 Cuckoo hashing T 1 T 2 Two hash tables, T 1, T 2, of size m, and two hash functions h 1, h 2 : U -> {0,..., m 1}. h 1 (x) x z y Value x stored in cell h 1 (x) of T1 or in cell h 2 (x) of T2. Hash and displace. Lookup is constant in worst case! w h 2 (x) Updates in constant amortized time.

14 What about hash functions? Any hash function is good? What does a good hash function mean? Can I have my own?

15 The beginning of time Introduced by Alfred Dumey in 1956 for the symbol table in a compiler. He used a crazy, chaotic, random function h:u->{0..m-1}. h(x)=(x mod p) mod m, with p a big prime number. Is seems to work, but why?

16 First station: rigorous analysis Consider that h really is a random function! Knuth established a way to make a complete analysis, but based on a false assumption. No matter how long you stare at h(x)=(x mod p) mod m, it will not morph into a random function!

17 Next station: universality and k-independence Wegman and Carter (1978) A family of hash functions No need of perfect random hash function, but universal : x 1,x 2 S x 1 x 2, Pr[h(x 1 )=h(x 2 )] 1 N In generalized form the k-independence model uses statistics to measure how much random can a family of hash functions produce!

18 How it works? Random data x formula h(x) Universal multiplicative shift: h a x = a x l l out 2-independent multiplicative shift: h a,b x = a x + b 2l l out k-independent polynomial hashing: k 1 h x = i=0 a i x i mod p mod 2 l out

19 Facts on k-independence Chained hashing Wegman, Carter: requires only universal hashing Linear probing 1990 Siegel, Schmidt: O(logn)-independece is enough 2007 Pagh 5-independence suffices 2010 Patrascu,Thorup 4-independence is not enough Cuckoo hashing 2001 Pagh: O(logn)-independence is enough 2005 Cohen, Kane: 5-independence is not enough 2006 Cohen, Kane: 6-independence is enough

20 Simple tabulation hashing Simple tabulation is the fastest 3-independent family of hash functions known. Key x of length len (required bit width to store values) is divided into c chars x 1, x 2,.., x c We create c tables R 1, R 2,.., R c, filled with independent random values Hash value is created with function h x = R 1 x 1 R 2 x 2 R c x c x R 1 x 1 R 2 x 2 R 3 x 3 R 4 x 4 4 lookup tables with random 8-bit values h(x)

21 The power of simple tabulation! The power of simple tabulation hashing Mihai Pătrașcu, Mikkel Thorup December 6, 2011 According to this paper, even if is only 3-independent, we have: Constant time for linear probing Constant time for static cuckoo hashing => There are also other probabilistic properties which can be exploited, other than ones captured in k-independence theory

22 Summary Easy ways to implement optimal hash tables Simple scheme to generate a hash function family Theory produces practical results and is still alive! There are a lot of occasions to apply these ideas, so: Work hard, have fun and make history!

23 Questions?

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m. Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

More information

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search Chapter Objectives Chapter 9 Search Algorithms Data Structures Using C++ 1 Learn the various search algorithms Explore how to implement the sequential and binary search algorithms Discover how the sequential

More information

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016

CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016 CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions Linda Shapiro Spring 2016 Announcements Friday: Review List and go over answers to Practice Problems 2 Hash Tables: Review Aim for constant-time

More information

Review of Hashing: Integer Keys

Review of Hashing: Integer Keys CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing $ Linear/Quadratic Probing $ Double Hashing Rehashing

More information

A Survey on Efficient Hashing Techniques in Software Configuration Management

A Survey on Efficient Hashing Techniques in Software Configuration Management A Survey on Efficient Hashing Techniques in Software Configuration Management Bernhard Grill Vienna University of Technology Vienna, Austria Email: e1028282@student.tuwien.ac.at Abstract This paper presents

More information

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES

A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 7: Hash Tables Michael Bader Winter 2014/15 Chapter 7: Hash Tables, Winter 2014/15 1 Generalised Search Problem Definition (Search Problem) Input: a sequence or set A of

More information

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly

More information

Cuckoo Filter: Practically Better Than Bloom

Cuckoo Filter: Practically Better Than Bloom Cuckoo Filter: Practically Better Than Bloom Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Carnegie Mellon University, Intel Labs, Harvard University {binfan,dga}@cs.cmu.edu, michael.e.kaminsky@intel.com,

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Hash Tables Tables so far set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Worst O(lg n) O(lg n) O(lg n) Table naïve array implementation

More information

The Advantages and Disadvantages of Network Computing Nodes

The Advantages and Disadvantages of Network Computing Nodes Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node

More information

Scalable Prefix Matching for Internet Packet Forwarding

Scalable Prefix Matching for Internet Packet Forwarding Scalable Prefix Matching for Internet Packet Forwarding Marcel Waldvogel Computer Engineering and Networks Laboratory Institut für Technische Informatik und Kommunikationsnetze Background Internet growth

More information

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman

Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman To motivate the Bloom-filter idea, consider a web crawler. It keeps, centrally, a list of all the URL s it has found so far. It

More information

Lecture 2 February 12, 2003

Lecture 2 February 12, 2003 6.897: Advanced Data Structures Spring 003 Prof. Erik Demaine Lecture February, 003 Scribe: Jeff Lindy Overview In the last lecture we considered the successor problem for a bounded universe of size u.

More information

Anti-persistence: History Independent Data Structures

Anti-persistence: History Independent Data Structures Anti-persistence: History Independent Data Structures Moni Naor Vanessa Teague Dept. of Computer Science and Applied Math Dept. of Computer Science Weizmann Institute Stanford University naor@wisdom.weizmann.ac.il

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Vulnerability Analysis of Hash Tables to Sophisticated DDoS Attacks

Vulnerability Analysis of Hash Tables to Sophisticated DDoS Attacks International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 12 (2014), pp. 1167-1173 International Research Publications House http://www. irphouse.com Vulnerability

More information

SHARED HASH TABLES IN PARALLEL MODEL CHECKING

SHARED HASH TABLES IN PARALLEL MODEL CHECKING SHARED HASH TABLES IN PARALLEL MODEL CHECKING IPA LENTEDAGEN 2010 ALFONS LAARMAN JOINT WORK WITH MICHAEL WEBER AND JACO VAN DE POL 23/4/2010 AGENDA Introduction Goal and motivation What is model checking?

More information

Two Binary Algorithms for Calculating the Jacobi Symbol and a Fast Systolic Implementation in Hardware

Two Binary Algorithms for Calculating the Jacobi Symbol and a Fast Systolic Implementation in Hardware Two Binary Algorithms for Calculating the Jacobi Symbol and a Fast Systolic Implementation in Hardware George Purdy, Carla Purdy, and Kiran Vedantam ECECS Department, University of Cincinnati, Cincinnati,

More information

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc() CS61: Systems Programming and Machine Organization Harvard University, Fall 2009 Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc() Prof. Matt Welsh October 6, 2009 Topics for today Dynamic

More information

History-Independent Cuckoo Hashing

History-Independent Cuckoo Hashing History-Independent Cuckoo Hashing Moni Naor Gil Segev Udi Wieder Abstract Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case

More information

BM307 File Organization

BM307 File Organization BM307 File Organization Gazi University Computer Engineering Department 9/24/2014 1 Index Sequential File Organization Binary Search Interpolation Search Self-Organizing Sequential Search Direct File Organization

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set

More information

MODELING RANDOMNESS IN NETWORK TRAFFIC

MODELING RANDOMNESS IN NETWORK TRAFFIC MODELING RANDOMNESS IN NETWORK TRAFFIC - LAVANYA JOSE, INDEPENDENT WORK FALL 11 ADVISED BY PROF. MOSES CHARIKAR ABSTRACT. Sketches are randomized data structures that allow one to record properties of

More information

Factoring Algorithms

Factoring Algorithms Institutionen för Informationsteknologi Lunds Tekniska Högskola Department of Information Technology Lund University Cryptology - Project 1 Factoring Algorithms The purpose of this project is to understand

More information

A Comparison of Dictionary Implementations

A Comparison of Dictionary Implementations A Comparison of Dictionary Implementations Mark P Neyer April 10, 2009 1 Introduction A common problem in computer science is the representation of a mapping between two sets. A mapping f : A B is a function

More information

Chapter 13. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing

More information

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing

Chapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files

More information

H/wk 13, Solutions to selected problems

H/wk 13, Solutions to selected problems H/wk 13, Solutions to selected problems Ch. 4.1, Problem 5 (a) Find the number of roots of x x in Z 4, Z Z, any integral domain, Z 6. (b) Find a commutative ring in which x x has infinitely many roots.

More information

Data Structures in Java. Session 15 Instructor: Bert Huang http://www1.cs.columbia.edu/~bert/courses/3134

Data Structures in Java. Session 15 Instructor: Bert Huang http://www1.cs.columbia.edu/~bert/courses/3134 Data Structures in Java Session 15 Instructor: Bert Huang http://www1.cs.columbia.edu/~bert/courses/3134 Announcements Homework 4 on website No class on Tuesday Midterm grades almost done Review Indexing

More information

1 Formulating The Low Degree Testing Problem

1 Formulating The Low Degree Testing Problem 6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,

More information

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven) Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)

More information

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT By GROUP Avadhut Gurjar Mohsin Patel Shraddha Pandhe Page 1 Contents 1. Introduction... 3 2. DNS Recursive Query Mechanism:...5 2.1. Client

More information

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

More information

Project Group High- performance Flexible File System 2010 / 2011

Project Group High- performance Flexible File System 2010 / 2011 Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain

More information

Digital Signatures. (Note that authentication of sender is also achieved by MACs.) Scan your handwritten signature and append it to the document?

Digital Signatures. (Note that authentication of sender is also achieved by MACs.) Scan your handwritten signature and append it to the document? Cryptography Digital Signatures Professor: Marius Zimand Digital signatures are meant to realize authentication of the sender nonrepudiation (Note that authentication of sender is also achieved by MACs.)

More information

Project: Simulated Encrypted File System (SEFS)

Project: Simulated Encrypted File System (SEFS) Project: Simulated Encrypted File System (SEFS) Omar Chowdhury Fall 2015 CS526: Information Security 1 Motivation Traditionally files are stored in the disk in plaintext. If the disk gets stolen by a perpetrator,

More information

How Caching Affects Hashing

How Caching Affects Hashing How Caching Affects Hashing Gregory L. Heileman heileman@ece.unm.edu Department of Electrical and Computer Engineering University of New Mexico, Albuquerque, NM Wenbin Luo wluo@stmarytx.edu Engineering

More information

Chapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure

Chapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure Chapter 11: File System Implementation Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Primality Testing and Factorization Methods

Primality Testing and Factorization Methods Primality Testing and Factorization Methods Eli Howey May 27, 2014 Abstract Since the days of Euclid and Eratosthenes, mathematicians have taken a keen interest in finding the nontrivial factors of integers,

More information

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING

CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Chapter 13: Disk Storage, Basic File Structures, and Hashing 1 CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 13.23 Consider a disk with the following characteristics

More information

Ex. 2.1 (Davide Basilio Bartolini)

Ex. 2.1 (Davide Basilio Bartolini) ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips

More information

Real-Time (Paradigms) (70)

Real-Time (Paradigms) (70) Real-Time (Paradigms) (70) Taxonomy of Medium Access Control - Protocols: MAC-protocols Collision avoidance Collision resolution Reservation-based Token-based Master- Time-based Priority-based Slave static

More information

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

More information

Factoring - Solve by Factoring

Factoring - Solve by Factoring 6.7 Factoring - Solve by Factoring Objective: Solve quadratic equation by factoring and using the zero product rule. When solving linear equations such as 2x 5 = 21 we can solve for the variable directly

More information

ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory

ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory Biplob Debnath Sudipta Sengupta Jin Li Microsoft Research, Redmond, WA, USA University of Minnesota, Twin Cities, USA Abstract Storage

More information

New Hash Function Construction for Textual and Geometric Data Retrieval

New Hash Function Construction for Textual and Geometric Data Retrieval Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan

More information

Introduction. Appendix D Mathematical Induction D1

Introduction. Appendix D Mathematical Induction D1 Appendix D Mathematical Induction D D Mathematical Induction Use mathematical induction to prove a formula. Find a sum of powers of integers. Find a formula for a finite sum. Use finite differences to

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1 Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible

More information

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study

Bloom Filter based Inter-domain Name Resolution: A Feasibility Study Bloom Filter based Inter-domain Name Resolution: A Feasibility Study Konstantinos V. Katsaros, Wei Koong Chai and George Pavlou University College London, UK Outline Inter-domain name resolution in ICN

More information

Complexity Attack Resistant Flow Lookup Schemes for IPv6: A Measurement Based Comparison

Complexity Attack Resistant Flow Lookup Schemes for IPv6: A Measurement Based Comparison Complexity Attack Resistant Flow Lookup Schemes for IPv6: A Measurement Based Comparison David Malone and R. Joshua Tobin Abstract In this paper we look at the problem of choosing a good flow state lookup

More information

Privacy and Security in library RFID Issues, Practices and Architecture

Privacy and Security in library RFID Issues, Practices and Architecture Privacy and Security in library RFID Issues, Practices and Architecture David Molnar and David Wagner University of California, Berkeley CCS '04 October 2004 Overview Motivation RFID Background Library

More information

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.

INTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium. Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically

More information

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case. Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity

More information

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging

Memory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can

More information

Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

18-548/15-548 Associativity 9/16/98. 7 Associativity. 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998

18-548/15-548 Associativity 9/16/98. 7 Associativity. 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998 7 Associativity 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998 Required Reading: Cragon pg. 166-174 Assignments By next class read about data management policies: Cragon 2.2.4-2.2.6,

More information

Multi-dimensional index structures Part I: motivation

Multi-dimensional index structures Part I: motivation Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for

More information

Rethinking SIMD Vectorization for In-Memory Databases

Rethinking SIMD Vectorization for In-Memory Databases SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest

More information

John S. Otto Fabián E. Bustamante

John S. Otto Fabián E. Bustamante John S. Otto Fabián E. Bustamante Northwestern, EECS AIMS-4 CAIDA, SDSC, San Diego, CA Feb 10, 2012 http://aqualab.cs.northwestern.edu ! CDNs direct web clients to nearby content replicas! Several motivations

More information

Factoring & Primality

Factoring & Primality Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

More information

NETWORK SECURITY: How do servers store passwords?

NETWORK SECURITY: How do servers store passwords? NETWORK SECURITY: How do servers store passwords? Servers avoid storing the passwords in plaintext on their servers to avoid possible intruders to gain all their users passwords. A hash of each password

More information

Life Cycle of a Memory Request. Ring Example: 2 requests for lock 17

Life Cycle of a Memory Request. Ring Example: 2 requests for lock 17 Life Cycle of a Memory Request (1) Use AQR or AQW to place address in AQ (2) If A[31]==0, check for hit in DCache Ring (3) Read Hit: place cache word in RQ; Write Hit: replace cache word with WQ RDDest/RDreturn

More information

Trading regret rate for computational efficiency in online learning with limited feedback

Trading regret rate for computational efficiency in online learning with limited feedback Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai

More information

Study of algorithms for factoring integers and computing discrete logarithms

Study of algorithms for factoring integers and computing discrete logarithms Study of algorithms for factoring integers and computing discrete logarithms First Indo-French Workshop on Cryptography and Related Topics (IFW 2007) June 11 13, 2007 Paris, France Dr. Abhijit Das Department

More information

Topological Properties

Topological Properties Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any

More information

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data

More information

Modélisation et résolutions numérique et symbolique

Modélisation et résolutions numérique et symbolique Modélisation et résolutions numérique et symbolique via les logiciels Maple et Matlab Jeremy Berthomieu Mohab Safey El Din Stef Graillat Mohab.Safey@lip6.fr Outline Previous course: partial review of what

More information

Analysing equity portfolios in R

Analysing equity portfolios in R Analysing equity portfolios in R Using the portfolio package by David Kane and Jeff Enos Introduction 1 R is used by major financial institutions around the world to manage billions of dollars in equity

More information

ZQL. a cryptographic compiler for processing private data. George Danezis. Joint work with Cédric Fournet, Markulf Kohlweiss, Zhengqin Luo

ZQL. a cryptographic compiler for processing private data. George Danezis. Joint work with Cédric Fournet, Markulf Kohlweiss, Zhengqin Luo ZQL Work in progress a cryptographic compiler for processing private data George Danezis Joint work with Cédric Fournet, Markulf Kohlweiss, Zhengqin Luo Microsoft Research and Joint INRIA-MSR Centre Data

More information

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited Hash Tables Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited We ve considered several data structures that allow us to store and search for data

More information

LECTURE 4. Last time: Lecture outline

LECTURE 4. Last time: Lecture outline LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

More information

Scalable Bloom Filters

Scalable Bloom Filters Scalable Bloom Filters Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça CCTC/Departamento de Informática Universidade do Minho CITI/Departamento de Informática FCT, Universidade Nova de Lisboa David Hutchison

More information

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction Class Overview CSE 326: Data Structures Introduction Introduction to many of the basic data structures used in computer software Understand the data structures Analyze the algorithms that use them Know

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Exploratory Data Analysis

Exploratory Data Analysis Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory Data Analysis Spring, 2012 1 / 46 Outline Data, revisited The purpose of exploratory data analysis Learning

More information

Packet forwarding using improved Bloom filters

Packet forwarding using improved Bloom filters Packet forwarding using improved Bloom filters Thomas Zink thomas.zink@uni-konstanz.de A Master Thesis submitted to the Department of Computer and Information Science University of Konstanz in fulfillment

More information

Digital Signatures. Murat Kantarcioglu. Based on Prof. Li s Slides. Digital Signatures: The Problem

Digital Signatures. Murat Kantarcioglu. Based on Prof. Li s Slides. Digital Signatures: The Problem Digital Signatures Murat Kantarcioglu Based on Prof. Li s Slides Digital Signatures: The Problem Consider the real-life example where a person pays by credit card and signs a bill; the seller verifies

More information

Visual Basic Programming. An Introduction

Visual Basic Programming. An Introduction Visual Basic Programming An Introduction Why Visual Basic? Programming for the Windows User Interface is extremely complicated. Other Graphical User Interfaces (GUI) are no better. Visual Basic provides

More information

Faster deterministic integer factorisation

Faster deterministic integer factorisation David Harvey (joint work with Edgar Costa, NYU) University of New South Wales 25th October 2011 The obvious mathematical breakthrough would be the development of an easy way to factor large prime numbers

More information

Static analysis of parity games: alternating reachability under parity

Static analysis of parity games: alternating reachability under parity 8 January 2016, DTU Denmark Static analysis of parity games: alternating reachability under parity Michael Huth, Imperial College London Nir Piterman, University of Leicester Jim Huan-Pu Kuo, Imperial

More information

OPERATING SYSTEMS MEMORY MANAGEMENT

OPERATING SYSTEMS MEMORY MANAGEMENT OPERATING SYSTEMS MEMORY MANAGEMENT Jerry Breecher 8: Memory Management 1 OPERATING SYSTEM Memory Management What Is In This Chapter? Just as processes share the CPU, they also share physical memory. This

More information

Answer Key for California State Standards: Algebra I

Answer Key for California State Standards: Algebra I Algebra I: Symbolic reasoning and calculations with symbols are central in algebra. Through the study of algebra, a student develops an understanding of the symbolic language of mathematics and the sciences.

More information

Sudoku puzzles and how to solve them

Sudoku puzzles and how to solve them Sudoku puzzles and how to solve them Andries E. Brouwer 2006-05-31 1 Sudoku Figure 1: Two puzzles the second one is difficult A Sudoku puzzle (of classical type ) consists of a 9-by-9 matrix partitioned

More information

Accelerate Cloud Computing with the Xilinx Zynq SoC

Accelerate Cloud Computing with the Xilinx Zynq SoC X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce

More information

Habanero Extreme Scale Software Research Project

Habanero Extreme Scale Software Research Project Habanero Extreme Scale Software Research Project Comp215: Java Method Dispatch Zoran Budimlić (Rice University) Always remember that you are absolutely unique. Just like everyone else. - Margaret Mead

More information

Overview of Cryptographic Tools for Data Security. Murat Kantarcioglu

Overview of Cryptographic Tools for Data Security. Murat Kantarcioglu UT DALLAS Erik Jonsson School of Engineering & Computer Science Overview of Cryptographic Tools for Data Security Murat Kantarcioglu Pag. 1 Purdue University Cryptographic Primitives We will discuss the

More information

Less Hashing, Same Performance: Building a Better Bloom Filter

Less Hashing, Same Performance: Building a Better Bloom Filter Less Hashing, Same Performance: Building a Better Bloom Filter Adam Kirsch and Michael Mitzenmacher Division of Engineering and Applied Sciences Harvard University, Cambridge, MA 02138 {kirsch, michaelm}@eecs.harvard.edu

More information

Record Storage and Primary File Organization

Record Storage and Primary File Organization Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

More information

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data coming soon... to an NSI near you John Dunne Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data is beginning to be explored and exploited to inform policy making. However these

More information

Deterministic load balancing and dictionaries in the parallel disk model

Deterministic load balancing and dictionaries in the parallel disk model Deterministic load balancing and dictionaries in the parallel disk model Mette Berger Esben Rune Hansen Rasmus Pagh Mihai Pǎtraşcu Milan Ružić Peter Tiedemann ABSTRACT We consider deterministic dictionaries

More information

Partitioning under the hood in MySQL 5.5

Partitioning under the hood in MySQL 5.5 Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael Ronström, Partitioning author Who are we? Mikael is a founder of the technology behind NDB

More information

Data Structures For IP Lookup With Bursty Access Patterns

Data Structures For IP Lookup With Bursty Access Patterns Data Structures For IP Lookup With Bursty Access Patterns Sartaj Sahni & Kun Suk Kim sahni, kskim @cise.ufl.edu Department of Computer and Information Science and Engineering University of Florida, Gainesville,

More information

An Overview of Integer Factoring Algorithms. The Problem

An Overview of Integer Factoring Algorithms. The Problem An Overview of Integer Factoring Algorithms Manindra Agrawal IITK / NUS The Problem Given an integer n, find all its prime divisors as efficiently as possible. 1 A Difficult Problem No efficient algorithm

More information

Models and Techniques for Proving Data Structure Lower Bounds

Models and Techniques for Proving Data Structure Lower Bounds Models and Techniques for Proving Data Structure Lower Bounds Kasper Green Larsen PhD Dissertation Department of Computer Science Aarhus University Denmark Models and Techniques for Proving Data Structure

More information

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures

Outline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures Outline Computer Science 418 Digital Signatures Mike Jacobson Department of Computer Science University of Calgary Week 12 1 Digital Signatures 2 Signatures via Public Key Cryptosystems 3 Provable 4 Mike

More information