van Emde Boas Data Structure

Similar documents
Derivatives Math 120 Calculus I D Joyce, Fall 2013

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Instantaneous Rate of Change:

The EOQ Inventory Formula

SAT Subject Math Level 1 Facts & Formulas

ACT Math Facts & Formulas

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

Verifying Numerical Convergence Rates

1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution

Writing Mathematics Papers

Binary Search Trees. Adnan Aziz. Heaps can perform extract-max, insert efficiently O(log n) worst case

Math 113 HW #5 Solutions

2 Limits and Derivatives

Average and Instantaneous Rates of Change: The Derivative

Geometric Stratification of Accounting Data

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula

f(a + h) f(a) f (a) = lim

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1


Catalogue no XIE. Survey Methodology. December 2004

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION

The modelling of business rules for dashboard reporting using mutual information

Math Test Sections. The College Board: Expanding College Opportunity

FINITE DIFFERENCE METHODS

Tangent Lines and Rates of Change

College Planning Using Cash Value Life Insurance

How To Ensure That An Eac Edge Program Is Successful

SAT Math Facts & Formulas

New Vocabulary volume

Chapter 7 Numerical Differentiation and Integration

Pressure. Pressure. Atmospheric pressure. Conceptual example 1: Blood pressure. Pressure is force per unit area:

Optimized Data Indexing Algorithms for OLAP Systems

Sections 3.1/3.2: Introducing the Derivative/Rules of Differentiation

For Sale By Owner Program. We can help with our for sale by owner kit that includes:


13 PERIMETER AND AREA OF 2D SHAPES

A strong credit score can help you score a lower rate on a mortgage

schema binary search tree schema binary search trees data structures and algorithms lecture 7 AVL-trees material

Chapter 11. Limits and an Introduction to Calculus. Selected Applications

In other words the graph of the polynomial should pass through the points

Chapter 10: Refrigeration Cycles

CHAPTER 7. Di erentiation

Shell and Tube Heat Exchanger

Guide to Cover Letters & Thank You Letters

2.23 Gambling Rehabilitation Services. Introduction

Solutions by: KARATUĞ OZAN BiRCAN. PROBLEM 1 (20 points): Let D be a region, i.e., an open connected set in

Notes: Most of the material in this chapter is taken from Young and Freedman, Chap. 12.

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle

Projective Geometry. Projective Geometry

2.12 Student Transportation. Introduction

SAT Math Must-Know Facts & Formulas

Lecture 2 February 12, 2003

6. Differentiating the exponential and logarithm functions

3 Ans. 1 of my $30. 3 on. 1 on ice cream and the rest on 2011 MATHCOUNTS STATE COMPETITION SPRINT ROUND

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.

f(x + h) f(x) h as representing the slope of a secant line. As h goes to 0, the slope of the secant line approaches the slope of the tangent line.

Theoretical calculation of the heat capacity

Strategic trading in a dynamic noisy market. Dimitri Vayanos

f(x) f(a) x a Our intuition tells us that the slope of the tangent line to the curve at the point P is m P Q =

Distances in random graphs with infinite mean degrees

Schedulability Analysis under Graph Routing in WirelessHART Networks

CHAPTER 8: DIFFERENTIAL CALCULUS

Recall from last time: Events are recorded by local observers with synchronized clocks. Event 1 (firecracker explodes) occurs at x=x =0 and t=t =0

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

KM client format supported by KB valid from 13 May 2015

The use of visualization for learning and teaching mathematics

The Dynamics of Movie Purchase and Rental Decisions: Customer Relationship Implications to Movie Studios

a joint initiative of Cost of Production Calculator

The Derivative as a Function

THE NEISS SAMPLE (DESIGN AND IMPLEMENTATION) 1997 to Present. Prepared for public release by:

2.1: The Derivative and the Tangent Line Problem

Pioneer Fund Story. Searching for Value Today and Tomorrow. Pioneer Funds Equities

4.4 The Derivative. 51. Disprove the claim: If lim f (x) = L, then either lim f (x) = L or. 52. If lim x a. f (x) = and lim x a. g(x) =, then lim x a

The Demand for Food Away From Home Full-Service or Fast Food?

CHAPTER TWO. f(x) Slope = f (3) = Rate of change of f at 3. x 3. f(1.001) f(1) Average velocity = s(0.8) s(0) 0.8 0

NAFN NEWS SPRING2011 ISSUE 7. Welcome to the Spring edition of the NAFN Newsletter! INDEX. Service Updates Follow That Car! Turn Back The Clock

An inquiry into the multiplier process in IS-LM model

Model Quality Report in Business Statistics

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

Referendum-led Immigration Policy in the Welfare State

THE ROLE OF LABOUR DEMAND ELASTICITIES IN TAX INCIDENCE ANALYSIS WITH HETEROGENEOUS LABOUR

A system to monitor the quality of automated coding of textual answers to open questions

EC201 Intermediate Macroeconomics. EC201 Intermediate Macroeconomics Problem set 8 Solution

Factoring Synchronous Grammars By Sorting

Analysis of Algorithms I: Binary Search Trees

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

Cyber Epidemic Models with Dependences

Multigrid computational methods are

Grade 12 Assessment Exemplars

Welfare, financial innovation and self insurance in dynamic incomplete markets models

An Orientation to the Public Health System for Participants and Spectators

Unemployment insurance/severance payments and informality in developing countries

An Introduction to Milankovitch Cycles

Working Capital 2013 UK plc s unproductive 69 billion

ANALYTICAL REPORT ON THE 2010 URBAN EMPLOYMENT UNEMPLOYMENT SURVEY

Heat Exchangers. Heat Exchanger Types. Heat Exchanger Types. Applied Heat Transfer Part Two. Topics of This chapter

arxiv: v2 [cs.cc] 10 Apr 2013

Transcription:

Lecture 15 van Emde Boas Data Structure Supplemental reading in CLRS: Capter 20 Given a fixed integer n, wat is te best way to represent a subset S of {0,..., n 1} in memory, assuming tat space is not a concern? Te simplest way is to use an n-bit array A, setting A[i] = 1 if and only if i S, as we saw in Lecture 10. Tis solution gives O(1) running time for insertions, deletions, and lookups (i.e., testing weter a given number x is in S). 1 Wat if we want our data structure to support more operations, toug? Peraps we want to be able not just to insert, delete and lookup, but also to find te minimum and maximum elements of S. Or, given some element x S, we may want to find te successor and predecessor of x, wic are te smallest element of S tat is greater tan x and te largest element of S tat is less tan x, respectively. On a bit array, tese operations would all take time Θ( S ) in te worst case, as we migt need to examine all te elements of S. Te van Emde Boas (veb) data structure is a clever alternative solution wic outperforms a bit array for tis purpose: Operation Bit array van Emde Boas INSERT O(1) Θ(lg lg n) DELETE O(1) Θ(lg lg n) LOOKUP O(1) Θ(lg lg n) MAXIMUM, MINIMUM Θ(n) O(1) SUCCESSOR, PREDECESSOR Θ(n) Θ(lg lg n) We will not discuss te implementation of SUCCESSOR and PREDECESSOR; tose will be left to recitation. 15.1 Analogy: Te Two-Coconut Problem Te following riddle nicely illustrates te idea of te van Emde Boas data structure. I am somewat embarrassed to give te riddle because it sows a complete misunderstanding of materials science and botany, but it is a standard example and I can t tink of a better one. Problem 15.1. Tere exists some unknown integer k between 1 and 100 (or in general, between 1 and some large integer n) suc tat, wenever a coconut is dropped from a eigt of k inces or 1 Tis performance really is unbeatable, even for small n. Eac operation on te bit array requires only a single memory access.

n k Figure 15.1. One strategy for te two-coconut problem is to divide te integers {1,..., n} into blocks of size, and ten use te first coconut to figure out wic block k is in. Once te first coconut breaks, it takes at most 1 more drops to find te value of k. more, te coconut will crack, and wenever a coconut is dropped from a eigt of less tan k inces, te coconut will be completely undamaged and it will be as if we ad not dropped te coconut at all. (Tus, we could drop te coconut from a eigt of k 1 inces a million times and noting would appen.) Our goal is to find k wit as few drops as possible, given a certain number of test coconuts wic cannot be reused after tey crack. If we ave one coconut, ten clearly we must first try 1 inc, ten 2 inces, and so on. Te riddle asks, wat is te best way to proceed if we ave two coconuts? An approximate answer to te riddle is given by te following strategy: Divide te n-inc range into b blocks of eigt eac (we will coose b and later; see Figure 15.1). Drop te first coconut from eigt, ten from eigt 2, and so on, until it cracks. Say te first coconut cracks at eigt b 0. Ten drop te second coconut from eigts (b 0 1) + 1, (b 0 1) + 2, and so on, until it cracks. Tis metod requires at most b + 1 = n + 1 drops, wic is minimized wen b = = n. Notice tat, once te first coconut cracks, our problem becomes identical to te one-coconut version (except tat instead of looking for a number between 1 and n, we are looking for a number between (b 0 1) +1 and b 0 1). Similarly, if we started wit tree coconuts instead of two, ten it would be a good idea to divide te range {1,..., n} into equally-sized blocks and execute te solution to te two-cononut problem once te first coconut cracked. Exercise 15.1. If we use te above strategy to solve te tree-coconut problem, wat size sould we coose for te blocks? (Hint: it is not n.) 15.2 Implementation: A Recursive Data Structure Just as in te two-coconut problem, te van Emde Boas data structure divides te range {0,..., n 1} into blocks of size n, wic we call clusters. Eac cluster is itself a veb structure of size n. In addition, tere is a summary structure tat keeps track of wic clusters are nonempty (see Figure Lec 15 pg. 2 of 6

veb: size = 16 max = 13 min = 2 summary = clusters = veb of size 4 representing {0,1,3} veb of size 4 representing {2,3} NIL veb of size 4 representing {1,3} veb of size 4 representing {1} Figure 15.2. A veb structure of size 16 representing te set {2,3,5,7,13} {0,...,15}. 15.2). Te summary structure is analogous to te first coconut, wic told us wat block k ad to lie in. 15.2.1 Crucial implementation detail Te following implementation detail, wic may seem unimportant at first, is actually crucial to te performance of te van Emde Boas data structure: Do not store te minimum and maximum elements in clusters. Instead, store tem as separate data fields. Tus, te data fields of a veb structure V of size n are as follows: V.size te size of V, namely n V.max te maximum element of V, or NIL if V is empty V.min te minimum element of V, or NIL if V is empty V.clusters an array of size n wic stores te clusters. For performance reasons, te value stored in eac entry of V.clusters will initially be NIL; we will wait to build eac cluster until we ave to insert someting into it. V.summary a van Emde Boas structure of size n tat keeps track of wic clusters are nonempty. As wit te entries of V.clusters, we initially set V.summary NIL; we do not build te recursive van Emde Boas structure referenced by V.summary until we ave to insert someting into it (i.e., until te first time we create a cluster for V ). 15.2.2 Insertions To simplify te exposition, in tis lecture we use a model of computation in wic it takes constant time to initialize any array (setting all entries equal to NIL), no matter ow big te array is. 2 Tus, 2 Of course, tis is ceating; real computers need an initialization time tat depends on te size of te array. Still, tere are use cases for wic tis assumption is warranted. We can preload our veb structure by creating all possible veb Lec 15 pg. 3 of 6

it takes constant time to create an empty van Emde Boas structure, and it also takes constant time to insert te first element into a van Emde Boas structure: Algoritm: VEB-FIRST-INSERTION(V, x) 1 V.min x 2 V.max x Using tis fact, we will sow tat te procedure V.INSERT(x) as only one non constant-time step. Say V.clusters[i] is te cluster corresponding to x (for example, if n = 100 and x = 64, ten i = 6). Ten: If V.clusters[i] is NIL, ten it takes constant time to create a veb structure containing only te element corresponding to x. (For example, if n = 100 and x = 64, ten it takes constant time to create a veb structure of size 10 containing only 4.) We update V.clusters[i] to point to tis new veb structure. Tus, te only non constant-time operation we ave to perform is to update V.summary to reflect te fact tat V.clusters[i] is now nonempty. If V.clusters[i] is empty 3 (we can ceck tis by cecking weter V.clusters[i].min = NIL), ten it takes constant time to insert te appropriate entry into V.clusters[i]. So again, te only non constant-time operation we ave to perform is to update V.summary to reflect te fact tat V.clusters[i] is now nonempty. If V.clusters[i] is nonempty, ten we ave to make te recursive call V.clusters[i].INSERT(x). However, we do not need to make any canges to V.summary. In eac case, we find tat te running time T of INSERT satisfies te recurrence T(n) = T ( n ) + O(1). (15.1) As we will see in 15.3 below, te solution to tis recurrence is T INSERT = Θ(lglg n). 15.2.3 Deletions Similarly, te procedure V.DELETE(x) requires only one recursive call. To see tis, let i be as above. Ten: First suppose V as no nonempty clusters (we can ceck tis by cecking weter V.summary is eiter NIL or empty; te latter appens wen V.summary.min = NIL). If x is te only element of V, ten we simply set V.min V.max NIL. Tus, deleting te only element of a single-element veb structure takes constant time; we will use tis fact later. Oterwise, V contains only two elements including x. Let y be te oter element of V. If x = V.min, ten y = V.max and we set V.min y. If x = V.max, ten y = V.min and we set V.max y. substructures up front rater tan using NIL for te empty ones. Tis way, after an initial O(n) preloading time, array initialization never becomes an issue. Anoter possibility is to use a dynamically resized as table for V.clusters rater tan an array. Eac of tese possible improvements as its sare of extra details tat must be addressed in te running time analysis; we ave cosen to ignore initialization time for te sake of brevity and readability. 3 Tis would appen if V.clusters[i] was once nonempty, but became empty due to deletions. Lec 15 pg. 4 of 6

Next suppose V as some nonempty clusters. If x = V.min, ten we will need to update V.min. Te new minimum takes only constant time to calculate, toug: it is y, were y V.clusters[V.summary.min].min; in oter words, it s te smallest element in te lowest nonempty cluster of V. After making tis update, we need to make a recursive call to V.clusters[V.summary.min].DELETE( y) to remove te new value of V.min from its cluster. At tis point, tere are two possibilities: * y is not te only element in its cluster. Ten V.summary does not need to be updated. * y is te only element in its cluster. Ten we must make a recursive call to V.summary.DELETE(V.summary.min) to reflect te fact tat y s cluster is now empty. However, since y was te only element in its cluster, it took constant time to remove y from its cluster: as we said above, it takes only constant time to remove te last element from a one-element veb structure. Eiter way, tere is only one non constant-time step in V.DELETE: a recursive call to DELETE on a veb structure of size n. By entirely te same argument (peraps in mirror-image), we find tat te case x = V.max as identical running time to te case x = V.min. If x is neiter V.min nor V.max, ten we must delete x from its cluster and, if tis causes x s cluster to become empty, make a recursive call to V.summary.DELETE to reflect tis update. As above, te recursive call to V.summary.DELETE will only appen wen te deletion of x from its cluster took constant time. In eac case, te only non constant-time step in DELETE is a single recursive call to DELETE on a veb structure of size n. Tus, te running time T for DELETE satisfies (15.1), wic we repeat ere for convenience: T(n) = T ( n ) + O(1). (copy of 15.1) Again, te solution is T DELETE = Θ(lglg n). 15.2.4 Lookups Finally, we consider te operation V.LOOKUP(x), wic returns TRUE or FALSE according as x is or is not in V. Te implementation is easy: First we ceck weter x = V.min, ten weter x = V.max. If neiter of tese is true, ten we recursively call LOOKUP on te cluster corresponding to x. Tus, te running time T of LOOKUP satisfies (15.1), and we ave T LOOKUP = Θ(lglg n). Exercise 15.2. Go troug tis section again, circling eac step or claim tat relies on te decision not to store V.min and V.max in clusters. Be careful tere may be more tings to circle tan you tink! Lec 15 pg. 5 of 6

15.3 Solving te Recurrence As promised, we now solve te recurrence (15.1), wic we repeat ere for convenience: T(n) = T ( n ) + O(1). (copy of 15.1) 15.3.1 Base case Before we begin, it s important to note tat we ave to lay down a base case at wic recursive structures stop occurring. Matematically tis is necessary because one often uses induction to prove solutions to recurrences. From an implementation standpoint, te need for a base case is obvious: ow could a veb structure of size 2 make good use of smaller veb substructures? So we will lay down n = 2 as our base case, in wic we simply take V to be an array of two bits. 15.3.2 Solving by descent Te equation (15.1) means tat we start on a structure of size n, ten pass to a structure of size n = n 1/2, ten to a structure of size n = n 1/4, and so on, spending a constant amount of time at eac level of recursion. So te total running time sould be proportional to te number of levels of recursion before arriving at our base case, wic is te number l suc tat n 1/2l = 2. Solving for l, we find l = lglg n. Tus T(n) = Θ(lglg n). 15.3.3 Solving by substitution Anoter way to solve te recurrence is to make a substitution wic reduces it to a recurrence tat we already know ow to solve. Let T (m) = T ( 2 m). Taking m = lg n, (15.1) can be rewritten as T (m) = T (m/2) + O(1), wic we know to ave solution T (m) = Θ(lg m). Substituting back n = 2 m, we get T(n) = Θ(lglg n). Lec 15 pg. 6 of 6

MIT OpenCourseWare ttp://ocw.mit.edu 6.046J / 18.410J Design and Analysis of Algoritms Spring 2012 For information about citing tese materials or our Terms of Use, visit: ttp://ocw.mit.edu/terms.