Practical Survey on Hash Tables. Aurelian Țuțuianu
|
|
- Charlotte Marshall
- 8 years ago
- Views:
Transcription
1 Practical Survey on Hash Tables Aurelian Țuțuianu
2 In memoriam Mihai Pătraşcu (17 July June 2012) I have no intention to ever teach computer science. I want to teach the love for computer science, and let the learning happen. Teaching Statement (
3 Abstract Hash table definition Collision resolving schemas: Chained hashing Linear and quadratic probing Cuckoo hashing Some hash function theory Simple tabulation hashing
4 Omnipresence of hash tables Symbol tables in compilers Cache implementations Database storages Manage memory pages in Linux Route tables Large number of documents
5 Hash Tables Considering a set of elements S from a finite and much larger universe U. A hash table consists of: hash function h: U {0,.., m 1} vector v of size m
6 Collisions same hash for two different keys What to do? Ignore them Chain colliding values Skip and try again Hash and displace Find a perfect hash function
7 War Story: cache with hash tables application Problem: An application which gets some data from an expensive repository. hash table Data source Solution: Hash table with collision replacement. Key point: a big chunk of users watched a lot of common data.
8 Collision Resolution Schemas Chained hashing Open hash: linear and quadratic probing Cuckoo hashing And many many others: perfect hashing, coalesced hashing, Robin Hood hashing, hopscotch hashing, etc.
9 Chained Hashing 0 Each slot contains a linked list. 1 O( n m ) = O(1) for all operations. 2 y Load factor: n m < x z w easy to implement works with weak hash functions consumes significant memory default implementation
10 Linear and quadratic probing All records are stored in the bucket array itself. h(x,i) = 4 + i w y z x Probe a try to find an empty place. Linear probing h x, i = h 0 (x) + i Quadratic probing i + i i h x, i = h 0 (x) + 2
11 War Story: Linear probing trick Min. 1st Qu. Median Mean 3rd Qu. Max linear probing chained hashing
12 War Story: Let it be quadratic! Replace library implementation with a home-made hash table 4 hours of work
13 Cuckoo hashing T 1 T 2 Two hash tables, T 1, T 2, of size m, and two hash functions h 1, h 2 : U -> {0,..., m 1}. h 1 (x) x z y Value x stored in cell h 1 (x) of T1 or in cell h 2 (x) of T2. Hash and displace. Lookup is constant in worst case! w h 2 (x) Updates in constant amortized time.
14 What about hash functions? Any hash function is good? What does a good hash function mean? Can I have my own?
15 The beginning of time Introduced by Alfred Dumey in 1956 for the symbol table in a compiler. He used a crazy, chaotic, random function h:u->{0..m-1}. h(x)=(x mod p) mod m, with p a big prime number. Is seems to work, but why?
16 First station: rigorous analysis Consider that h really is a random function! Knuth established a way to make a complete analysis, but based on a false assumption. No matter how long you stare at h(x)=(x mod p) mod m, it will not morph into a random function!
17 Next station: universality and k-independence Wegman and Carter (1978) A family of hash functions No need of perfect random hash function, but universal : x 1,x 2 S x 1 x 2, Pr[h(x 1 )=h(x 2 )] 1 N In generalized form the k-independence model uses statistics to measure how much random can a family of hash functions produce!
18 How it works? Random data x formula h(x) Universal multiplicative shift: h a x = a x l l out 2-independent multiplicative shift: h a,b x = a x + b 2l l out k-independent polynomial hashing: k 1 h x = i=0 a i x i mod p mod 2 l out
19 Facts on k-independence Chained hashing Wegman, Carter: requires only universal hashing Linear probing 1990 Siegel, Schmidt: O(logn)-independece is enough 2007 Pagh 5-independence suffices 2010 Patrascu,Thorup 4-independence is not enough Cuckoo hashing 2001 Pagh: O(logn)-independence is enough 2005 Cohen, Kane: 5-independence is not enough 2006 Cohen, Kane: 6-independence is enough
20 Simple tabulation hashing Simple tabulation is the fastest 3-independent family of hash functions known. Key x of length len (required bit width to store values) is divided into c chars x 1, x 2,.., x c We create c tables R 1, R 2,.., R c, filled with independent random values Hash value is created with function h x = R 1 x 1 R 2 x 2 R c x c x R 1 x 1 R 2 x 2 R 3 x 3 R 4 x 4 4 lookup tables with random 8-bit values h(x)
21 The power of simple tabulation! The power of simple tabulation hashing Mihai Pătrașcu, Mikkel Thorup December 6, 2011 According to this paper, even if is only 3-independent, we have: Constant time for linear probing Constant time for static cuckoo hashing => There are also other probabilistic properties which can be exploited, other than ones captured in k-independence theory
22 Summary Easy ways to implement optimal hash tables Simple scheme to generate a hash function family Theory produces practical results and is still alive! There are a lot of occasions to apply these ideas, so: Work hard, have fun and make history!
23 Questions?
Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.
Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we
More informationChapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search
Chapter Objectives Chapter 9 Search Algorithms Data Structures Using C++ 1 Learn the various search algorithms Explore how to implement the sequential and binary search algorithms Discover how the sequential
More informationCSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions Linda Shapiro Spring 2016 Announcements Friday: Review List and go over answers to Practice Problems 2 Hash Tables: Review Aim for constant-time
More informationReview of Hashing: Integer Keys
CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing $ Linear/Quadratic Probing $ Double Hashing Rehashing
More informationA Survey on Efficient Hashing Techniques in Software Configuration Management
A Survey on Efficient Hashing Techniques in Software Configuration Management Bernhard Grill Vienna University of Technology Vienna, Austria Email: e1028282@student.tuwien.ac.at Abstract This paper presents
More informationA COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES
A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the
More informationFundamental Algorithms
Fundamental Algorithms Chapter 7: Hash Tables Michael Bader Winter 2014/15 Chapter 7: Hash Tables, Winter 2014/15 1 Generalised Search Problem Definition (Search Problem) Input: a sequence or set A of
More informationCS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
More informationCuckoo Filter: Practically Better Than Bloom
Cuckoo Filter: Practically Better Than Bloom Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Carnegie Mellon University, Intel Labs, Harvard University {binfan,dga}@cs.cmu.edu, michael.e.kaminsky@intel.com,
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationTables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)
Hash Tables Tables so far set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n) Worst O(lg n) O(lg n) O(lg n) Table naïve array implementation
More informationThe Advantages and Disadvantages of Network Computing Nodes
Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node
More informationScalable Prefix Matching for Internet Packet Forwarding
Scalable Prefix Matching for Internet Packet Forwarding Marcel Waldvogel Computer Engineering and Networks Laboratory Institut für Technische Informatik und Kommunikationsnetze Background Internet growth
More informationCloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman To motivate the Bloom-filter idea, consider a web crawler. It keeps, centrally, a list of all the URL s it has found so far. It
More informationLecture 2 February 12, 2003
6.897: Advanced Data Structures Spring 003 Prof. Erik Demaine Lecture February, 003 Scribe: Jeff Lindy Overview In the last lecture we considered the successor problem for a bounded universe of size u.
More informationAnti-persistence: History Independent Data Structures
Anti-persistence: History Independent Data Structures Moni Naor Vanessa Teague Dept. of Computer Science and Applied Math Dept. of Computer Science Weizmann Institute Stanford University naor@wisdom.weizmann.ac.il
More informationSolution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
More informationVulnerability Analysis of Hash Tables to Sophisticated DDoS Attacks
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 12 (2014), pp. 1167-1173 International Research Publications House http://www. irphouse.com Vulnerability
More informationSHARED HASH TABLES IN PARALLEL MODEL CHECKING
SHARED HASH TABLES IN PARALLEL MODEL CHECKING IPA LENTEDAGEN 2010 ALFONS LAARMAN JOINT WORK WITH MICHAEL WEBER AND JACO VAN DE POL 23/4/2010 AGENDA Introduction Goal and motivation What is model checking?
More informationTwo Binary Algorithms for Calculating the Jacobi Symbol and a Fast Systolic Implementation in Hardware
Two Binary Algorithms for Calculating the Jacobi Symbol and a Fast Systolic Implementation in Hardware George Purdy, Carla Purdy, and Kiran Vedantam ECECS Department, University of Cincinnati, Cincinnati,
More informationLecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()
CS61: Systems Programming and Machine Organization Harvard University, Fall 2009 Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc() Prof. Matt Welsh October 6, 2009 Topics for today Dynamic
More informationHistory-Independent Cuckoo Hashing
History-Independent Cuckoo Hashing Moni Naor Gil Segev Udi Wieder Abstract Cuckoo hashing is an efficient and practical dynamic dictionary. It provides expected amortized constant update time, worst case
More informationBM307 File Organization
BM307 File Organization Gazi University Computer Engineering Department 9/24/2014 1 Index Sequential File Organization Binary Search Interpolation Search Self-Organizing Sequential Search Direct File Organization
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
More informationMODELING RANDOMNESS IN NETWORK TRAFFIC
MODELING RANDOMNESS IN NETWORK TRAFFIC - LAVANYA JOSE, INDEPENDENT WORK FALL 11 ADVISED BY PROF. MOSES CHARIKAR ABSTRACT. Sketches are randomized data structures that allow one to record properties of
More informationFactoring Algorithms
Institutionen för Informationsteknologi Lunds Tekniska Högskola Department of Information Technology Lund University Cryptology - Project 1 Factoring Algorithms The purpose of this project is to understand
More informationA Comparison of Dictionary Implementations
A Comparison of Dictionary Implementations Mark P Neyer April 10, 2009 1 Introduction A common problem in computer science is the representation of a mapping between two sets. A mapping f : A B is a function
More informationChapter 13. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible Hashing
More informationChapter 13. Chapter Outline. Disk Storage, Basic File Structures, and Hashing
Chapter 13 Disk Storage, Basic File Structures, and Hashing Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files
More informationH/wk 13, Solutions to selected problems
H/wk 13, Solutions to selected problems Ch. 4.1, Problem 5 (a) Find the number of roots of x x in Z 4, Z Z, any integral domain, Z 6. (b) Find a commutative ring in which x x has infinitely many roots.
More informationData Structures in Java. Session 15 Instructor: Bert Huang http://www1.cs.columbia.edu/~bert/courses/3134
Data Structures in Java Session 15 Instructor: Bert Huang http://www1.cs.columbia.edu/~bert/courses/3134 Announcements Homework 4 on website No class on Tuesday Midterm grades almost done Review Indexing
More information1 Formulating The Low Degree Testing Problem
6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,
More informationAlgorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)
Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)
More informationDNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT
DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT By GROUP Avadhut Gurjar Mohsin Patel Shraddha Pandhe Page 1 Contents 1. Introduction... 3 2. DNS Recursive Query Mechanism:...5 2.1. Client
More informationIMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
More informationProject Group High- performance Flexible File System 2010 / 2011
Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain
More informationDigital Signatures. (Note that authentication of sender is also achieved by MACs.) Scan your handwritten signature and append it to the document?
Cryptography Digital Signatures Professor: Marius Zimand Digital signatures are meant to realize authentication of the sender nonrepudiation (Note that authentication of sender is also achieved by MACs.)
More informationProject: Simulated Encrypted File System (SEFS)
Project: Simulated Encrypted File System (SEFS) Omar Chowdhury Fall 2015 CS526: Information Security 1 Motivation Traditionally files are stored in the disk in plaintext. If the disk gets stolen by a perpetrator,
More informationHow Caching Affects Hashing
How Caching Affects Hashing Gregory L. Heileman heileman@ece.unm.edu Department of Electrical and Computer Engineering University of New Mexico, Albuquerque, NM Wenbin Luo wluo@stmarytx.edu Engineering
More informationChapter 11: File System Implementation. Chapter 11: File System Implementation. Objectives. File-System Structure
Chapter 11: File System Implementation Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationPrimality Testing and Factorization Methods
Primality Testing and Factorization Methods Eli Howey May 27, 2014 Abstract Since the days of Euclid and Eratosthenes, mathematicians have taken a keen interest in finding the nontrivial factors of integers,
More informationCHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING
Chapter 13: Disk Storage, Basic File Structures, and Hashing 1 CHAPTER 13: DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Answers to Selected Exercises 13.23 Consider a disk with the following characteristics
More informationEx. 2.1 (Davide Basilio Bartolini)
ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips
More informationReal-Time (Paradigms) (70)
Real-Time (Paradigms) (70) Taxonomy of Medium Access Control - Protocols: MAC-protocols Collision avoidance Collision resolution Reservation-based Token-based Master- Time-based Priority-based Slave static
More informationInteger Factorization using the Quadratic Sieve
Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give
More informationFactoring - Solve by Factoring
6.7 Factoring - Solve by Factoring Objective: Solve quadratic equation by factoring and using the zero product rule. When solving linear equations such as 2x 5 = 21 we can solve for the variable directly
More informationChunkStash: Speeding up Inline Storage Deduplication using Flash Memory
ChunkStash: Speeding up Inline Storage Deduplication using Flash Memory Biplob Debnath Sudipta Sengupta Jin Li Microsoft Research, Redmond, WA, USA University of Minnesota, Twin Cities, USA Abstract Storage
More informationNew Hash Function Construction for Textual and Geometric Data Retrieval
Latest Trends on Computers, Vol., pp.483-489, ISBN 978-96-474-3-4, ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan
More informationIntroduction. Appendix D Mathematical Induction D1
Appendix D Mathematical Induction D D Mathematical Induction Use mathematical induction to prove a formula. Find a sum of powers of integers. Find a formula for a finite sum. Use finite differences to
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 13-1
Slide 13-1 Chapter 13 Disk Storage, Basic File Structures, and Hashing Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and Extendible
More informationBloom Filter based Inter-domain Name Resolution: A Feasibility Study
Bloom Filter based Inter-domain Name Resolution: A Feasibility Study Konstantinos V. Katsaros, Wei Koong Chai and George Pavlou University College London, UK Outline Inter-domain name resolution in ICN
More informationComplexity Attack Resistant Flow Lookup Schemes for IPv6: A Measurement Based Comparison
Complexity Attack Resistant Flow Lookup Schemes for IPv6: A Measurement Based Comparison David Malone and R. Joshua Tobin Abstract In this paper we look at the problem of choosing a good flow state lookup
More informationPrivacy and Security in library RFID Issues, Practices and Architecture
Privacy and Security in library RFID Issues, Practices and Architecture David Molnar and David Wagner University of California, Berkeley CCS '04 October 2004 Overview Motivation RFID Background Library
More informationINTRODUCTION The collection of data that makes up a computerized database must be stored physically on some computer storage medium.
Chapter 4: Record Storage and Primary File Organization 1 Record Storage and Primary File Organization INTRODUCTION The collection of data that makes up a computerized database must be stored physically
More informationEfficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.
Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity
More informationMemory Management Outline. Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging
Memory Management Outline Background Swapping Contiguous Memory Allocation Paging Segmentation Segmented Paging 1 Background Memory is a large array of bytes memory and registers are only storage CPU can
More informationChapter 13 Disk Storage, Basic File Structures, and Hashing.
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright 2004 Pearson Education, Inc. Chapter Outline Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files
More informationTheoretical Aspects of Storage Systems Autumn 2009
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements
More informationKrishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C
Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate
More information18-548/15-548 Associativity 9/16/98. 7 Associativity. 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998
7 Associativity 18-548/15-548 Memory System Architecture Philip Koopman September 16, 1998 Required Reading: Cragon pg. 166-174 Assignments By next class read about data management policies: Cragon 2.2.4-2.2.6,
More informationMulti-dimensional index structures Part I: motivation
Multi-dimensional index structures Part I: motivation 144 Motivation: Data Warehouse A definition A data warehouse is a repository of integrated enterprise data. A data warehouse is used specifically for
More informationRethinking SIMD Vectorization for In-Memory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for In-Memory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
More informationJohn S. Otto Fabián E. Bustamante
John S. Otto Fabián E. Bustamante Northwestern, EECS AIMS-4 CAIDA, SDSC, San Diego, CA Feb 10, 2012 http://aqualab.cs.northwestern.edu ! CDNs direct web clients to nearby content replicas! Several motivations
More informationFactoring & Primality
Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount
More informationNETWORK SECURITY: How do servers store passwords?
NETWORK SECURITY: How do servers store passwords? Servers avoid storing the passwords in plaintext on their servers to avoid possible intruders to gain all their users passwords. A hash of each password
More informationLife Cycle of a Memory Request. Ring Example: 2 requests for lock 17
Life Cycle of a Memory Request (1) Use AQR or AQW to place address in AQ (2) If A[31]==0, check for hit in DCache Ring (3) Read Hit: place cache word in RQ; Write Hit: replace cache word with WQ RDDest/RDreturn
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai Shalev-Shwartz TTI-C Hebrew University On-line Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationStudy of algorithms for factoring integers and computing discrete logarithms
Study of algorithms for factoring integers and computing discrete logarithms First Indo-French Workshop on Cryptography and Related Topics (IFW 2007) June 11 13, 2007 Paris, France Dr. Abhijit Das Department
More informationTopological Properties
Advanced Computer Architecture Topological Properties Routing Distance: Number of links on route Node degree: Number of channels per node Network diameter: Longest minimum routing distance between any
More informationCommon Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data
More informationModélisation et résolutions numérique et symbolique
Modélisation et résolutions numérique et symbolique via les logiciels Maple et Matlab Jeremy Berthomieu Mohab Safey El Din Stef Graillat Mohab.Safey@lip6.fr Outline Previous course: partial review of what
More informationAnalysing equity portfolios in R
Analysing equity portfolios in R Using the portfolio package by David Kane and Jeff Enos Introduction 1 R is used by major financial institutions around the world to manage billions of dollars in equity
More informationZQL. a cryptographic compiler for processing private data. George Danezis. Joint work with Cédric Fournet, Markulf Kohlweiss, Zhengqin Luo
ZQL Work in progress a cryptographic compiler for processing private data George Danezis Joint work with Cédric Fournet, Markulf Kohlweiss, Zhengqin Luo Microsoft Research and Joint INRIA-MSR Centre Data
More informationHash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited
Hash Tables Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited We ve considered several data structures that allow us to store and search for data
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationScalable Bloom Filters
Scalable Bloom Filters Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça CCTC/Departamento de Informática Universidade do Minho CITI/Departamento de Informática FCT, Universidade Nova de Lisboa David Hutchison
More informationClass Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction
Class Overview CSE 326: Data Structures Introduction Introduction to many of the basic data structures used in computer software Understand the data structures Analyze the algorithms that use them Know
More informationChapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationExploratory Data Analysis
Exploratory Data Analysis Paul Cohen ISTA 370 Spring, 2012 Paul Cohen ISTA 370 () Exploratory Data Analysis Spring, 2012 1 / 46 Outline Data, revisited The purpose of exploratory data analysis Learning
More informationPacket forwarding using improved Bloom filters
Packet forwarding using improved Bloom filters Thomas Zink thomas.zink@uni-konstanz.de A Master Thesis submitted to the Department of Computer and Information Science University of Konstanz in fulfillment
More informationDigital Signatures. Murat Kantarcioglu. Based on Prof. Li s Slides. Digital Signatures: The Problem
Digital Signatures Murat Kantarcioglu Based on Prof. Li s Slides Digital Signatures: The Problem Consider the real-life example where a person pays by credit card and signs a bill; the seller verifies
More informationVisual Basic Programming. An Introduction
Visual Basic Programming An Introduction Why Visual Basic? Programming for the Windows User Interface is extremely complicated. Other Graphical User Interfaces (GUI) are no better. Visual Basic provides
More informationFaster deterministic integer factorisation
David Harvey (joint work with Edgar Costa, NYU) University of New South Wales 25th October 2011 The obvious mathematical breakthrough would be the development of an easy way to factor large prime numbers
More informationStatic analysis of parity games: alternating reachability under parity
8 January 2016, DTU Denmark Static analysis of parity games: alternating reachability under parity Michael Huth, Imperial College London Nir Piterman, University of Leicester Jim Huan-Pu Kuo, Imperial
More informationOPERATING SYSTEMS MEMORY MANAGEMENT
OPERATING SYSTEMS MEMORY MANAGEMENT Jerry Breecher 8: Memory Management 1 OPERATING SYSTEM Memory Management What Is In This Chapter? Just as processes share the CPU, they also share physical memory. This
More informationAnswer Key for California State Standards: Algebra I
Algebra I: Symbolic reasoning and calculations with symbols are central in algebra. Through the study of algebra, a student develops an understanding of the symbolic language of mathematics and the sciences.
More informationSudoku puzzles and how to solve them
Sudoku puzzles and how to solve them Andries E. Brouwer 2006-05-31 1 Sudoku Figure 1: Two puzzles the second one is difficult A Sudoku puzzle (of classical type ) consists of a 9-by-9 matrix partitioned
More informationAccelerate Cloud Computing with the Xilinx Zynq SoC
X C E L L E N C E I N N E W A P P L I C AT I O N S Accelerate Cloud Computing with the Xilinx Zynq SoC A novel reconfigurable hardware accelerator speeds the processing of applications based on the MapReduce
More informationHabanero Extreme Scale Software Research Project
Habanero Extreme Scale Software Research Project Comp215: Java Method Dispatch Zoran Budimlić (Rice University) Always remember that you are absolutely unique. Just like everyone else. - Margaret Mead
More informationOverview of Cryptographic Tools for Data Security. Murat Kantarcioglu
UT DALLAS Erik Jonsson School of Engineering & Computer Science Overview of Cryptographic Tools for Data Security Murat Kantarcioglu Pag. 1 Purdue University Cryptographic Primitives We will discuss the
More informationLess Hashing, Same Performance: Building a Better Bloom Filter
Less Hashing, Same Performance: Building a Better Bloom Filter Adam Kirsch and Michael Mitzenmacher Division of Engineering and Applied Sciences Harvard University, Cambridge, MA 02138 {kirsch, michaelm}@eecs.harvard.edu
More informationRecord Storage and Primary File Organization
Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records
More informationBig data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie
Big data coming soon... to an NSI near you John Dunne Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data is beginning to be explored and exploited to inform policy making. However these
More informationDeterministic load balancing and dictionaries in the parallel disk model
Deterministic load balancing and dictionaries in the parallel disk model Mette Berger Esben Rune Hansen Rasmus Pagh Mihai Pǎtraşcu Milan Ružić Peter Tiedemann ABSTRACT We consider deterministic dictionaries
More informationPartitioning under the hood in MySQL 5.5
Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael Ronström, Partitioning author Who are we? Mikael is a founder of the technology behind NDB
More informationData Structures For IP Lookup With Bursty Access Patterns
Data Structures For IP Lookup With Bursty Access Patterns Sartaj Sahni & Kun Suk Kim sahni, kskim @cise.ufl.edu Department of Computer and Information Science and Engineering University of Florida, Gainesville,
More informationAn Overview of Integer Factoring Algorithms. The Problem
An Overview of Integer Factoring Algorithms Manindra Agrawal IITK / NUS The Problem Given an integer n, find all its prime divisors as efficiently as possible. 1 A Difficult Problem No efficient algorithm
More informationModels and Techniques for Proving Data Structure Lower Bounds
Models and Techniques for Proving Data Structure Lower Bounds Kasper Green Larsen PhD Dissertation Department of Computer Science Aarhus University Denmark Models and Techniques for Proving Data Structure
More informationOutline. Computer Science 418. Digital Signatures: Observations. Digital Signatures: Definition. Definition 1 (Digital signature) Digital Signatures
Outline Computer Science 418 Digital Signatures Mike Jacobson Department of Computer Science University of Calgary Week 12 1 Digital Signatures 2 Signatures via Public Key Cryptosystems 3 Provable 4 Mike
More information