Lecture Notes CMSC 251. Figure 14: Partitioning intermediate structure.

Similar documents
Topic 5: Confidence Intervals (Chapter 9)

Confidence Intervals for Linear Regression Slope

TI-89, TI-92 Plus or Voyage 200 for Non-Business Statistics

TI-83, TI-83 Plus or TI-84 for Non-Business Statistics

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Infinite Sequences and Series

5 Boolean Decision Trees (February 11)

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 2: Karger s Min Cut Algorithm

3. Greatest Common Divisor - Least Common Multiple

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

CS103X: Discrete Structures Homework 4 Solutions

Soving Recurrence Relations

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Confidence Intervals (2) QMET103

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Properties of MLE: consistency, asymptotic normality. Fisher information.

Section 11.3: The Integral Test

More examples for Hypothesis Testing


1. C. The formula for the confidence interval for a population mean is: x t, which was

Overview of some probability distributions.

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Basic Elements of Arithmetic Sequences and Series

A probabilistic proof of a binomial identity

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

3D BUILDING MODEL RECONSTRUCTION FROM POINT CLOUDS AND GROUND PLANS

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Domain 1: Designing a SQL Server Instance and a Database Solution

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Unit 11 Using Linear Regression to Describe Relationships

T-test for dependent Samples. Difference Scores. The t Test for Dependent Samples. The t Test for Dependent Samples. s D

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

On k-connectivity and Minimum Vertex Degree in Random s-intersection Graphs

Quantitative Computer Architecture

On Formula to Compute Primes. and the n th Prime

A technical guide to 2014 key stage 2 to key stage 4 value added measures

How To Solve The Homewor Problem Beautifully

Math C067 Sampling Distributions

Concept: Types of algorithms

Determining the sample size

Incremental calculation of weighted mean and variance

12.4 Problems. Excerpt from "Introduction to Geometry" 2014 AoPS Inc. Copyrighted Material CHAPTER 12. CIRCLES AND ANGLES

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Sequences and Series

Measures of Spread and Boxplots Discrete Math, Section 9.4

Chapter 14 Nonparametric Statistics

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Asymptotic Growth of Functions

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Notes on exponential generating functions and structures.

v = x t = x 2 x 1 t 2 t 1 The average speed of the particle is absolute value of the average velocity and is given Distance travelled t

Output Analysis (2, Chapters 10 &11 Law)

LECTURE 13: Cross-validation

THE ABRACADABRA PROBLEM

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

CASE STUDY ALLOCATE SOFTWARE

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Universal coding for classes of sources

Review of Multiple Regression Richard Williams, University of Notre Dame, Last revised January 13, 2015

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

INFINITE SERIES KEITH CONRAD

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

Rainbow options. A rainbow is an option on a basket that pays in its most common form, a nonequally

Lecture 4: Cheeger s Inequality

CHAPTER 3 THE TIME VALUE OF MONEY

1. MATHEMATICAL INDUCTION

TIME SERIES ANALYSIS AND TRENDS BY USING SPSS PROGRAMME

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

How Euler Did It. In a more modern treatment, Hardy and Wright [H+W] state this same theorem as. n n+ is perfect.

The Stable Marriage Problem

CHAPTER 3 DIGITAL CODING OF SIGNALS

THE PRINCIPLE OF THE ACTIVE JMC SCATTERER. Seppo Uosukainen

(VCP-310)

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Normal Distribution.

Convexity, Inequalities, and Norms

S. Tanny MAT 344 Spring be the minimum number of moves required.

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Chapter 7 Methods of Finding Estimators

Now here is the important step

Lesson 15 ANOVA (analysis of variance)

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

FEDERATION OF ARAB SCIENTIFIC RESEARCH COUNCILS

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

A Guide to the Pricing Conventions of SFE Interest Rate Products

Multiplexers and Demultiplexers

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Hypothesis testing. Null and alternative hypotheses

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

Transcription:

Lecture Note CMSC 51 p p x? x < x >= x? wap r r Itermediate cofiguratio Iitial cofiguratio < x x >= x Fial cofiguratio Figure 14: Partitioig itermediate tructure. all of the elemet have bee proceed. To fiih thig off we wap A[p] the pivot with A[], ad retur the value of. Here i the complete code: Partitio Partitioit p, it r, array A { x = A[p] = p for = p+1 to r do { if A[] < x { = +1 wap A[] with A[] wap A[p] with A[] retur // 3-way partitio of A[p..r] // pivot item i A[p] // put the pivot ito fial poitio // retur locatio of pivot A example i how below. Lecture 15: QuickSort Tueday, Mar 17, 1998 Revied: March 18. Fixed a bug i the aalyi. Read: Chapt 8 i CLR. My preetatio ad aalyi are omewhat differet tha the text. QuickSort ad Radomized Algorithm: Early i the emeter we dicued the fact that we uually tudy the wort-cae ruig time of algorithm, but ometime average-cae i a more meaigful meaure. Today we will tudy QuickSort. It i a wort-cae Θ algorithm, whoe expected-cae ruig time i Θ log. We will preet QuickSort a a radomized algorithm, that i, a algorithm which make radom choice. There are two commo type of radomized algorithm: Mote Carlo algorithm: Thee algorithm may produce the wrog reult, but the probability of thi occurrig ca be made arbitrarily mall by the uer. Uually the lower you make thi probability, the loger the algorithm take to ru. 47

Lecture Note CMSC 51 p r 5 3 8 6 4 7 3 1 5 3 4 6 8 7 3 1 5 3 8 6 4 7 3 1 5 3 4 6 8 7 3 1 5 3 8 6 4 7 3 1 5 3 4 3 8 7 6 1 5 3 8 6 4 7 3 1 5 3 4 3 1 7 6 8 1 3 4 3 5 7 6 8 Fial wap Figure 15: Partitioig example. La Vega algorithm: Thee algorithm alway produce the correct reult, but the ruig time i a radom variable. I thee cae the expected ruig time, averaged over all poible radom choice i the meaure of the algorithm ruig time. The mot well kow Mote Carlo algorithm i oe for determiig whether a umber i prime. Thi i a importat problem i cryptography. The QuickSort algorithm that we will dicu today i a example of a La Vega algorithm. Note that QuickSort doe ot eed to be implemeted a a radomized algorithm, but a we hall ee, thi i geerally coidered the afet implemetatio. QuickSort Overview: QuickSort i alo baed o the divide-ad-couer deig paradigm. Ulike Merge- Sort where mot of the work i doe after the recurive call retur, i QuickSort the work i doe before the recurive call i made. Here i a overview of QuickSort. Note the imilarity with the electio algorithm, which we dicued earlier. Let A[p..r] be the ubarray to be orted. The iitial call i to A[1..]. Bai: If the lit cotai 0 or 1 elemet, the retur. Select pivot: Select a radom elemet x from the array, called the pivot. Partitio: Partitio the array i three ubarray, thoe elemet A[1.. 1] x, A[] = x, ad A[ +1..] x. Recure: Recurively ort A[1.. 1] ad A[ +1..]. The peudocode for QuickSort i give below. The iitial call i QuickSort1,, A. The Partitio routie wa dicued lat time. Recall that Partitio aume that the pivot i tored i the firt elemet of A. Sice we wat a radom pivot, we pick a radom idex i from p to r, ad the wap A[i] with A[p]. QuickSort QuickSortit p, it r, array A { if r <= p retur i = a radom idex from [p..r] // Sort A[p..r] // 0 or 1 item, retur // pick a radom elemet 48

Lecture Note CMSC 51 wap A[i] with A[p] = Partitiop, r, A QuickSortp, -1, A QuickSort+1, r, A // wap pivot ito A[p] // partitio A about pivot // ort A[p..-1] // ort A[+1..r] QuickSort Aalyi: The correcte of QuickSort hould be pretty obviou. However it aalyi i ot o obviou. It tur out that the ruig time of QuickSort deped heavily o how good a job we do i electig the pivot. I particular, if the rak of the pivot recall that thi mea it poitio i the fial orted lit i very large or very mall, the the partitio will be ubalaced. We will ee that ubalaced partitio like ubalaced biary tree are bad, ad reult i poor ruig time. However, if the rak of the pivot i aywhere ear the middle portio of the array, the the plit will be reaoably well balaced, ad the overall ruig time will be good. Sice the pivot i choe at radom by our algorithm, we may do well mot of the time ad poorly occaioally. We will ee that the expected ruig time i O log. Wort-cae Aalyi: Let begi by coiderig the wort-cae performace, becaue it i eaier tha the average cae. Sice thi i a recurive program, it i atural to ue a recurrece to decribe it ruig time. But ulike MergeSort, where we had cotrol over the ize of the recurive call, here we do ot. It deped o how the pivot i choe. Suppoe that we are ortig a array of ize, A[1..], ad further uppoe that the pivot that we elect i of rak, for ome i the rage 1 to. It take Θ time to do the partitioig ad other overhead, ad we make two recurive call. The firt i to the ubarray A[1.. 1] which ha 1 elemet, ad the other i to the ubarray A[ +1..] which ha r +1+1=r elemet. So if we igore the Θ a uual we get the recurrece: T =T 1 + T +. Thi deped o the value of. To get the wort cae, we maximize over all poible value of. Aa bai we have that T 0 = T 1 = Θ1. Puttig thi together we have { 1 if 1 T = max 1 T 1 + T + otherwie. Recurrece that have max ad mi embedded i them are very mey to olve. The key i determiig which value of give the maximum. A rule of thumb of algorithm aalyi i that the wort cae ted to happe either at the extreme or i the middle. So I would plug i the value =1,=, ad = / ad work each out. I thi cae, the wort cae happe at either of the extreme but ee the book for a more careful aalyi baed o a aalyi of the ecod derivative. If we expad the recurrece i the cae =1we get: T T 0 + T 1 + = 1+T 1 + = T 1++1 = T + ++1 = T 3+ 1 + ++1 = T 4+ + 1 + ++1 =... = k T k+ i. i= 1 49

Lecture Note CMSC 51 For the bai, T 1=1we et k = 1 ad get 3 T T 1 + i i= 1 = 1+3+4+5+...+ 1 + ++ 1 +1 i = i=1 + 1 + O. I fact, a more careful aalyi reveal that it i Θ i thi cae. Average-cae Aalyi: Next we how that i the average cae QuickSort ru i Θ log time. Whe we talked about average-cae aalyi at the begiig of the emeter, we aid that it deped o ome aumptio about the ditributio of iput. However, i thi cae, the aalyi doe ot deped o the iput ditributio at all it oly deped o the radom choice that the algorithm make. Thi i good, becaue it mea that the aalyi of the algorithm performace i the ame for all iput. I thi cae the average i computed over all poible radom choice that the algorithm might make for the choice of the pivot idex i the ecod tep of the QuickSort procedure above. To aalyze the average ruig time, we let T deote the average ruig time of QuickSort o a lit of ize. It will implify the aalyi to aume that all of the elemet are ditict. The algorithm ha radom choice for the pivot elemet, ad each choice ha a eual probability of 1/ of occurig. So we ca modify the above recurrece to compute a average rather tha a max, givig: { 1 if 1 T = 1 =1 T 1 + T + otherwie. Thi i ot a tadard recurrece, o we caot apply the Mater Theorem. Expaio i poible, but rather tricky. Itead, we will attempt a cotructive iductio to olve it. We kow that we wat a Θ log ruig time, o let try T a lg + b. Properly we hould write lg becaue ulike MergeSort, we caot aume that the recurive call will be made o array ize that are power of, but we ll be loppy becaue thig will be mey eough ayway. Theorem: There exit a cotat c uch that T c l, for all. Notice that we have replaced lg with l. Thi ha bee doe to make the proof eaier, a we hall ee. Proof: The proof i by cotructive iductio o. For the bai cae =we have T = 1 T 1 + T + =1 = 1 T 0 + T 1 + + T 1 + T 0 + = 8 = 4. We wat thi to be at mot climplyig that c 4/ l.885. For the iductio tep, we aume that 3, ad the iductio hypothei i that for ay <, we have T c l. We wat to prove it i true for T. By expadig the defiitio of T, ad movig the factor of outide the um we have: T = 1 = 1 T 1 + T + =1 T 1 + T +. =1 50

Lecture Note CMSC 51 Oberve that if we plit the um ito two um, they both add the ame value T 0 + T 1 +...+T 1, jut that oe cout up ad the other cout dow. Thu we ca replace thi with 1 =0 T. Becaue they do t follow the formula, we ll extract T 0 ad T 1 ad treat them pecially. If we make thi ubtitutio ad apply the iductio hypothei to the remaiig um we have which we ca becaue <wehave T = 1 T + = 1 T 0 + T 1 + T + =0 1 1+1+ c lg + = = c 1 c l + + 4. = We have ever ee thi um before. Later we will how that 1 S = l l 4. Aumig thi for ow, we have = T = c l + + 4 4 = c l c + + 4 = c l + 1 c + 4. To fiih the proof, we wat all of thi to be at mot c l. If we cacel the commo c l we ee that thi will be true if we elect c uch that 1 c + 4 0. After ome imple maipulatio we ee that thi i euivalet to: = 0 c + 4 c + 4 c + 8. Sice 3, we oly eed to elect c o that c + 8 9, ad o electig c =3will work. From the bai cae we have c.885, o we may chooe c =3to atify both the cotrait. The Leftover Sum: The oly miig elemet to the proof i dealig with the um 1 S = l. = 51

Lecture Note CMSC 51 To boud thi, recall the itegratio formula for boudig ummatio which we paraphrae here. For ay mootoically icreaig fuctio fx b 1 fi i=a b a fxdx. The fuctio fx =xl x i mootoically icreaig, ad o we have S x l xdx. If you are a calculu macho ma, the you ca itegrate thi by part, ad if you are a calculu wimp like me the you ca look it up i a book of itegral x l xdx = x x l x 4 = l l 1 l 4 4. x= Thi complete the ummatio boud, ad hece the etire proof. Summary: So eve though the wort-cae ruig time of QuickSort i Θ, the average-cae ruig time i Θ log. Although we did ot how it, it tur out that thi doe t jut happe much of the time. For large value of, the ruig time i Θ log with high probability. I order to get Θ time the algorithm mut make poor choice for the pivot at virtually every tep. Poor choice are rare, ad o cotiuouly makig poor choice are very rare. You might ak, could we make QuickSort determiitic Θ log by callig the electio algorithm to ue the media a the pivot. The awer i that thi would work, but the reultig algorithm would be o low practically that o oe would ever ue it. QuickSort like MergeSort i ot formally a i-place ortig algorithm, becaue it doe make ue of a recurio tack. I MergeSort ad i the expected cae for QuickSort, the ize of the tack i Olog, o thi i ot really a problem. QuickSort i the mot popular algorithm for implemetatio becaue it actual performace o typical moder architecture i o good. The reao for thi tem from the fact that ulike Heaport which ca make large jump aroud i the array, the mai work i QuickSort i partitioig ped mot of it time acceig elemet that are cloe to oe aother. The reao it ted to outperform MergeSort which alo ha good locality of referece i that mot compario are made agait the pivot elemet, which ca be tored i a regiter. I MergeSort we are alway comparig two array elemet agait each other. The mot efficiet verio of QuickSort ue the recurio for large ubarray, but oce the ize of the ubarray fall below ome miimum ize e.g. 0 it witche to a imple iterative algorithm, uch a electio ort. Lecture 16: Lower Boud for Sortig Thurday, Mar 19, 1998 Read: Chapt. 9 of CLR. Review of Sortig: So far we have ee a umber of algorithm for ortig a lit of umber i acedig order. Recall that a i-place ortig algorithm i oe that ue o additioal array torage however, we allow QuickSort to be called i-place eve though they eed a tack of ize Olog for keepig track of the recurio. A ortig algorithm i table if duplicate elemet remai i the ame relative poitio after ortig. 5