Using Stein s Method to Show Poisson and Normal Limit Laws for Fringe Subtrees

Similar documents
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION

Firewall Design: Consistency, Completeness, and Compactness

Factoring Dickson polynomials over finite fields

Sensitivity Analysis of Non-linear Performance with Probability Distortion

Ch 10. Arithmetic Average Options and Asian Opitons

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

Math , Fall 2012: HW 1 Solutions

Sensor Network Localization from Local Connectivity : Performance Analysis for the MDS-MAP Algorithm

On Adaboost and Optimal Betting Strategies

A Comparison of Performance Measures for Online Algorithms

Inverse Trig Functions

10.2 Systems of Linear Equations: Matrices

The Quick Calculus Tutorial

2r 1. Definition (Degree Measure). Let G be a r-graph of order n and average degree d. Let S V (G). The degree measure µ(s) of S is defined by,

Which Networks Are Least Susceptible to Cascading Failures?

Lagrangian and Hamiltonian Mechanics

Measures of distance between samples: Euclidean

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

The one-year non-life insurance risk

Pythagorean Triples Over Gaussian Integers

Notes on tangents to parabolas

Modelling and Resolving Software Dependencies

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

Continued Fractions and the Euclidean Algorithm

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

1 if 1 x 0 1 if 0 x 1

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

Graphs without proper subgraphs of minimum degree 3 and short cycles

Full and Complete Binary Trees

SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

Coupled best proximity point theorems for proximally g-meir-keelertypemappings in partially ordered metric spaces

A New Evaluation Measure for Information Retrieval Systems

BV has the bounded approximation property

How To Find Out How To Calculate Volume Of A Sphere

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik Trier, Germany

On the k-path cover problem for cacti

HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!

UNIFIED BIJECTIONS FOR MAPS WITH PRESCRIBED DEGREES AND GIRTH

Digital barrier option contract with exponential random time

Lemma 5.2. Let S be a set. (1) Let f and g be two permutations of S. Then the composition of f and g is a permutation of S.

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

Mathematics Review for Economists

15.2. First-Order Linear Differential Equations. First-Order Linear Differential Equations Bernoulli Equations Applications

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Mannheim curves in the three-dimensional sphere

Mathematical Models of Therapeutical Actions Related to Tumour and Immune System Competition

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

Differentiability of Exponential Functions

An example of a computable

Calibration of the broad band UV Radiometer

Metric Spaces. Chapter Metrics

Department of Mathematical Sciences, University of Copenhagen. Kandidat projekt i matematik. Jens Jakob Kjær. Golod Complexes

Example Optimization Problems selected from Section 4.7

Mathematical Induction

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Optimal Energy Commitments with Storage and Intermittent Supply

QUASIRANDOM LOAD BALANCING

Lecture L25-3D Rigid Body Kinematics

Option Pricing for Inventory Management and Control

RESULTANT AND DISCRIMINANT OF POLYNOMIALS

Optimizing Multiple Stock Trading Rules using Genetic Algorithms

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

Linear Programming. March 14, 2014

CONTINUED FRACTIONS AND PELL S EQUATION. Contents 1. Continued Fractions 1 2. Solution to Pell s Equation 9 References 12

NOTES ON LINEAR TRANSFORMATIONS

1 Short Introduction to Time Series

Systems of Linear Equations

An Introduction to Event-triggered and Self-triggered Control

MATH10040 Chapter 2: Prime and relatively prime numbers

Regular Languages and Finite Automata

Web Appendices to Selling to Overcon dent Consumers

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND

arxiv: v1 [math.co] 7 Mar 2012

Random variables P(X = 3) = P(X = 3) = 1 8, P(X = 1) = P(X = 1) = 3 8.

FACTORING IN THE HYPERELLIPTIC TORELLI GROUP

LEARNING OBJECTIVES FOR THIS CHAPTER

The wave equation is an important tool to study the relation between spectral theory and geometry on manifolds. Let U R n be an open set and let

Rules for Finding Derivatives

THE SIGN OF A PERMUTATION

Forecasting and Staffing Call Centers with Multiple Interdependent Uncertain Arrival Streams

Inner Product Spaces

A Data Placement Strategy in Scientific Cloud Workflows

Definition of the spin current: The angular spin current and its physical consequences

Stern Sequences for a Family of Multidimensional Continued Fractions: TRIP-Stern Sequences

GRAPH THEORY LECTURE 4: TREES

The mean-field computation in a supermarket model with server multiple vacations

Game Theoretic Modeling of Cooperation among Service Providers in Mobile Cloud Computing Environments

GPRS performance estimation in GSM circuit switched services and GPRS shared resource systems *

A Universal Sensor Control Architecture Considering Robot Dynamics

Calculating Viscous Flow: Velocity Profiles in Rivers and Pipes

Lecture 1: Course overview, circuits, and formulas

Optimal Control Of Production Inventory Systems With Deteriorating Items And Dynamic Costs

Cost Efficient Datacenter Selection for Cloud Services

Transcription:

AofA 2014, Paris, France DMTCS proc. (subm., by the authors, 1 12 Using Stein s Metho to Show Poisson an Normal Limit Laws for Fringe Subtrees Cecilia Holmgren 1 an Svante Janson 2 1 Department of Mathematics, Stockholm University, 114 18 Stockholm, Sween 2 Department of Mathematics, Uppsala University, SE-75310 Uppsala, Sween Abstract. We consier sums of functions of fringe subtrees of binary search trees an ranom recursive trees (of total size n. The use of Stein s metho an certain couplings allow provision of simple proofs showing that in both of these trees, the number of fringe subtrees of size k < n, where k, can be approximate by a Poisson istribution. Combining these results an another version of Stein s metho, we can also show that for k = o( n, the number of fringe subtrees in both types of ranom trees has asymptotically a normal istribution as n. Furthermore, using the Cramér Wol evice, we show that a ranom vector with components corresponing to the ranom number of copies of certain fixe fringe subtrees T i, has asymptotically a multivariate normal istribution. We can then use these general results on fringe subtrees to obtain simple solutions to a broa range of problems relating to ranom trees; as an example, we can prove that the number of protecte noes in the binary search tree has asymptotically a normal istribution. Keywors: Fringe subtrees. Stein s metho. Couplings. Limit laws. Binary search trees. Recursive trees. 1 Introuction In this paper we consier fringe subtrees of the ranom binary search tree, as well as of the ranom recursive tree; recall that a fringe subtree is a subtree consisting of some noe an all its escenants, see Alous [1] for a general theory, an note that fringe subtrees typically are small compare to the whole tree. This is an extene abstract of [9] where further etails are given. We will use a representation of Devroye [4, 5] for the binary search tree, an a well-known bijection between binary trees an recursive trees, together with ifferent applications of Stein s metho for both normal an Poisson approximation to give both new general results on the asymptotic istributions for ranom variables epening on fringe subtrees, an more irect proofs of several earlier results in the fiel. Supporte in part by the Sweish Research Council. Supporte in part by the Knut an Alice Wallenberg Founation. subm. to DMTCS c by the authors Discrete Mathematics an Theoretical Computer Science (DMTCS, Nancy, France

2 Cecilia Holmgren, an Svante Janson The binary search tree is the tree representation of the sorting algorithm Quicksort, see e.g. [11]. Starting with n istinct numbers calle keys, we raw one of the keys at ranom an associate it to the root. Then we raw one of the remaining keys. We compare it with the root, an associate it to the left chil if it is smaller than the key at the root, an to the right chil if it is larger. We continue recursively by rawing new keys until the set is exhauste. The comparison for each new key starts at the root, an at each noe the key visits, it procees to the left/right chil if it is smaller/larger than the key associate to that noe; eventually, the new key is associate to the first empty noe it visits. In the final tree, all the n orere numbers are sorte by size, so that smaller numbers are in left subtrees, an larger numbers are in right subtrees. We use the representation of the binary search tree by Devroye [4, 5]. We may clearly assume that the keys are 1,..., n. We assign, inepenently, each key k a uniform ranom variable U k in (0, 1 which we regar as a time stamp inicating the time when the key is rawn. (We may an will assume that the U k are istinct. The ranom binary search tree constructe by rawing the keys in this orer, i.e., in orer of increasing U k, then is the unique binary tree with noes labelle by (1, U 1,..., (n, U n with the property that it is a binary search tree with respect to the first coorinates in the pairs, an along every path own from the root the values U i are increasing. We will also use a cyclic version of this representation escribe in Section 2.3. Recall that the ranom recursive tree is constructe recursively, by starting with a root with label 1, an at stage i (i = 2,..., n a new noe with label i is attache uniformly at ranom to one of the previous noes 1,..., i 1. We let Λ n enote a ranom recursive tree with n noes. There is a well-known bijection between orere trees of size n an binary trees of size n 1, see e.g. Knuth [12, Section 2.3.2] who calls this the natural corresponence (the same bijection is also calle the rotation corresponence: Given an orere tree with n noes, eliminate first the root, an arrange all its chilren in a path from left to right, as right chilren of each other. Continue recursively, with the chilren of each noe arrange in a path from left to right, with the first chil attache to its parent as the left chil. This yiels a binary tree with n 1 noes, an the transformation is invertible. As note by Devroye [4], see also Fuchs, Hwang an Neininger [8], the natural corresponence extens to a coupling between the ranom recursive tree Λ n an the binary search tree T n 1 ; the probability istributions are equal by inuction because the n possible places to a a new noe to Λ n correspon to the n possible places (external leaves to a a new noe to T n 1, an these places have equal probabilities for both moels. Note that a left chil in the binary search tree correspons to an elest chil in the ranom recursive tree, while a right chil correspons to a sibling. We say that a proper subtree in a binary tree is left-roote [right-roote] if its root is a left [right] chil. Thus, for 1 < k < n, fringe subtrees of size k in the ranom recursive tree Λ n, correspon to left-roote fringe subtrees of size k 1 in the binary search tree T n 1, while fringe subtrees of size 1 (i.e., leaves correspon to noes without a left chil. We consier first only the sizes of the fringe subtrees. The results in the following two theorems, except the explicit rate in (3 (4, were shown by Feng, Mahmou an Panholzer [6] an Fuchs [7] by using variants of the metho of moments. Theorem 1.3 was earlier prove for fixe k by Devroye [4] (using the central limit theorem for m-epenent variables, see also Alous [1]. The part (5 of Theorem 1.3 was prove for a smaller range of k by Devroye [5] using Stein s metho. (The mean (1 is also foun in [5]. In the present paper we continue this approach, an use Stein s metho for both Poisson an normal approximations to provie simple proofs for the full range. We recall the efinition of the total variation istance between two probability measures.

Using Stein s metho to show Poisson an normal limit laws for fringe subtrees 3 Definition 1.1 Let (X, A be any measurable space. The total variation istance T V between two probability measures µ 1 an µ 2 on X is efine to be T V (µ 1, µ 2 := sup µ 1 (A µ 2 (A. A A Let L(X enote the istribution of a ranom variable X. Po(µ enotes the Poisson istribution with mean µ, an N (0, 1 the stanar normal istribution. Convergence in istribution is enote by. Theorem 1.2 Let X n,k be the number of fringe subtrees of size k in the ranom binary search tree T n an similarly let ˆXn,k be the number of fringe subtrees in the ranom recursive tree Λ n. Let k = k n where k < n. Furthermore, let Then, for the binary search tree, 2(n + 1 µ n,k := E(X n,k = (k + 1(k + 2, (1 ˆµ n,k := E( ˆX n n,k = k(k + 1. (2 T V (L(X n,k, Po(µ n,k = 1 P(X n,k = l e µ (µ n,k n,k l ( 1 = O, (3 2 l! k an for the ranom recursive tree, l 0 T V (L( ˆX n,k, Po(ˆµ n,k = 1 P( 2 ˆX n,k = l e ˆµ (ˆµ n,k n,k l ( 1 = O. (4 l! k Consequently, if n an k, then l 0 T V (L(X n,k, Po(µ n,k 0 an T V (L( ˆX n,k, Po(ˆµ n,k 0. Theorem 1.3 Let X n,k be the number of fringe subtrees of size k in the binary search tree T n an similarly let ˆXn,k be the number of fringe subtrees of size k in the ranom recursive tree Λ n. Let k = k n = o( n. Then, as n, for the binary search tree X n,k E(X n,k Var(Xn,k N (0, 1 (5 an, similarly, for the ranom recursive tree ˆX n,k E( ˆX n,k Var( ˆX n,k N (0, 1. (6

4 Cecilia Holmgren, an Svante Janson Remark 1.4 If k/ n, then µ n,k, ˆµ n,k 0, an the convergence result in Theorem 1.2 reuces to p the trivial X n,k 0 an ˆX p n,k 0; the rate of convergence in (3 (4 is still of interest. If k/ n c (0,, then µ n,k 2c 2 an ˆµ n,k c 2 ; an we obtain the Poisson istribution limits X n,k Po(2c 2 an ˆX n,k Po(c 2 [6, 7]. We also consier the number of fringe subtrees that are equal to a fixe tree T in the binary search tree T n, which we enote by X T n. Combining the Cramér Wol evice [3, Theorem 7.7] an Stein s metho we show that ranom vectors of fringe subtrees are multivariate normally istribute. These results are also useful for proving general theorems for sums of functions of fringe subtrees. Theorem 1.5 Let T be a binary tree of size k an let T be a binary tree of size m where m k. Let Xn T be the number of fringe subtrees T an let X T n be the number of fringe subtrees T in the binary search tree T n with n noes. Let p k,t := P(T k = T an p m,t := P(T m = T, an let qt T be the number of fringe subtrees of T that are copies of T. If n > k + m + 1, then the covariance between Xn T an X T n is equal to Cov(X T n, X T n = (n + 1σ T,T, (7 where, with σ T,T := 2 (k + 1(k + 2 qt T p k,t γ(k, mp k,t p m,t, (8 γ(k, m := 4(k + m + 3 (k + 1(k + 2(m + 1(m + 2 4(k 2 + 3km + m 2 + 4k + 4m + 3 (k + 1 (m + 1 (k + m + 1 (k + m + 2 (k + m + 3. (9 Theorem 1.6 Let Xn T be the number of fringe subtrees T in the ranom binary search tree T n. Let T 1,..., T be a fixe sequence of istinct binary trees an let X n = (X T 1 n, X T 2 n,..., X T n. Let ( µ n := E(X T 1 n, E(X T 2 n,..., E(X T n an let Γ = (γ ij i,j=1 enote the matrix with elements 1 γ ij = lim n n Cov(XT n with notation as in (7 (8. Then Γ is non-singular an n 1/2 ( X n µ n i, X T j n = σ T i,t j, (10 N (0, Γ. (11 In [9] there are corresponing multivariate-normal istribution results for the number of fringe subtrees Λ in the ranom recursive tree Λ n. In Section 5 we explain how Theorem 1.6 can be use to prove that the number of the so calle protecte noes in the binary search trees asymptotically has a normal istribution.

Using Stein s metho to show Poisson an normal limit laws for fringe subtrees 5 2 Representations using uniform ranom variables 2.1 Devroye s representation for the binary search tree We use the representation of the binary search tree T n by Devroye [4, 5] escribe in Section 1, using i.i.. ranom time stamps U i U(0, 1 assigne to the keys i = 1,..., n. Write, for 1 k n an 1 i n k + 1, σ(i, k = {(i, U i,..., (i + k 1, U i+k 1 }, (12 i.e., the sequence of k labels (j, U j starting with j = i. For every noe u T n, the fringe subtree T n (u roote at u consists of the noes with labels in a set σ(i, k for some such i an k, where k = T n (u, but note that not every set σ(i, k is the set of labels of the noes of a fringe subtree; if it is, we say simply that σ(i, k is a fringe subtree. We efine the inicator variable I i,k := 1{σ(i, k is a fringe subtree in T n }. It is easy to see that, for convenience efining U 0 = U n+1 = 0, I i,k = 1 { U i 1 an U i+k are the two smallest among U i 1,..., U i+k }. (13 Note that if i = 1 or i = n k + 1, this reuces to I 1,k = 1 { U k+1 is the smallest among U 1,..., U k+1 }, (14 I n k+1,k = 1 { U n k is the smallest among U n k,..., U n }. (15 For k = n, when we only consier i = 1, we have I 1,n = 1. Let f(t be a function from the set of (unlabelle binary trees to R. We are intereste in the functional X n := u T n f(t n (u, (16 summing over all fringe subtrees of T n. For example, one obtains the number of fringe subtrees that are equal (up to labelling to a given binary tree T by choosing f(t = 1{T T } (where enotes equality when we ignore labels, an one obtains the number of fringe subtrees with exactly k noes by letting f(t = 1{ T = k}. We refer to Devroye [5] for several other examples showing the generality of this representation. Since a permutation (σ 1,..., σ k efines a binary search tree (by rawing the keys in orer σ 1,..., σ k, we can also regar f as a function of permutations (of arbitrary length. Moreover, any set σ(i, k efines a permutation (σ 1, σ 2,..., σ k where the values j, 1 j k, are orere accoring to the orer of U i+j 1. We can thus also regar f as a mapping from the collection of all sets σ(i, k. Note that if σ(i, k correspons to a fringe subtree T n (u of T n, then, ignoring labels, T n (u is the binary search tree efine by the permutation efine by σ(i, k, an thus f ( T n (u = f ( σ(i, k. Consequently, see [5], X n := u T n f(t n (u = n k=1 n k+1 i=1 I i,k f(σ(i, k. (17

6 Cecilia Holmgren, an Svante Janson 2.2 The ranom recursive tree Consier now instea the ranom recursive tree Λ n. Let f(t be a function from the set of orere roote trees to R. In analogy with (16, we efine Y n := u Λ n f(λ n (u, (18 summing over all fringe subtrees of Λ n. As sai in the introuction, the natural corresponence yiels a coupling between the ranom recursive tree Λ n an the binary search tree T n 1. The subtrees in Λ n correspon to the left subtrees at the noes in T n 1 together with the whole tree, incluing an empty left subtree at every noe in T n 1 without a left chil, corresponing to a subtree of size 1 (a leaf in Λ n. Thus, as note by [4], the representation in Section 2.1 yiels a similar representation for the ranom recursive tree, which can be escribe as follows. Define f as the functional on binary trees corresponing to f by f(t := f(t, where T is the orere tree corresponing to the binary tree T by the natural corresponence. (Thus T = T + 1. We regar the empty binary tree as corresponing to the (unique orere tree with only one vertex, an thus we efine f( := f(. Assume first 1 < k < n an recall that subtrees of size k in the ranom recursive tree Λ n correspon to left-roote subtrees of size k 1 in the binary search tree T n 1. As sai in Section 2.1, a subtree of size k 1 in T n 1 correspons to a set σ(i, k 1 for some i {1,..., n k + 1}. The parent of the root of this subtree is either i 1 or i + k 1; it is i 1, an the subtree is right-roote, if U i 1 > U i+k 1, an it is i + k 1, an the subtree is left-roote, if U i 1 < U i+k 1. We efine Using (13 it follows that, in analogy with (17, I L i,k 1 := 1{σ(i, k 1 is a left-roote subtree in T n 1 }. (19 Y n := This is easily extene to k = 1 too. 2.3 Cyclic representations u Λ n f(λ n (u = n k=1 n k+1 i=1 I L i,k 1 f(σ(i, k 1. (20 The representation (17 of X n using a linear sequence U 1,..., U n of i.i.. ranom variables is natural an useful, but it has the (minor isavantage that terms with i = 1 or i = n k + 1 have to be treate specially because of bounary effects, as seen in (14 (15. It will be convenient to use a relate cyclic representation, where we take n + 1 i.i.. uniform variables U 0,..., U n U(0, 1 an exten them to an infinite perioic sequence of ranom variables by U i := U i mo (n+1, i Z, (21 where i mo (n+1 is the remainer when i is ivie by n+1, i.e., the integer l [0, n] such that i l (mo n + 1. (We may an will assume that U 0,..., U n are istinct. We efine further I i,k as in (13, but now for all i an k. Similarly, we efine σ(i, k by (12 for all i an k. We then have the following cyclic representation of X n. (We are inebte to Allan Gut for suggesting a cyclic representation.

Using Stein s metho to show Poisson an normal limit laws for fringe subtrees 7 Lemma 2.1 Let U 0,..., U n U(0, 1 be inepenent an exten this sequence perioically by (21. Then, with notations as above, X n := u T n f(t n (u = X n := n n+1 I i,k f(σ(i, k. (22 k=1 i=1 Proof: The ouble sum in (22 is invariant uner a cyclic shift of U 0,..., U n. If we shift these values so that U 0 becomes the smallest, we obtain the same istribution of (U 0,..., U n as if we instea conition on the event that U 0 is the smallest U i, i.e., on {U 0 = min i U i }. Hence, ( X n = Xn U 0 = min U i. (23 i Furthermore, the variables I i,k epen only on the orer relations among {U i }, so if U 0 is minimal, they remain the same if we put U 0 = 0. Moreover, in this case also U n+1 = U 0 = 0 an it follows from (13 that I i,k = 0 if i n + 1 i + k 1; hence the terms in (22 with n k + 1 < i n + 1 vanish. Note also that in the remaining terms, f(σ(i, k oes not epen on U 0. Consequently, X n = ( n k=1 n k+1 i=1 I i,k f(σ(i, k U 0 = 0 = X n, (24 by (17, showing that the cyclic an linear representations in (17 an (22 are equivalent. Since U i is efine for all i Z an has perio n + 1, it is natural to regar the inex i as an element of Z n+1 ; similarly, I i,k is efine for all i Z with perio n + 1 in i, so we can regar it as efine for i Z n+1. When iscussing these variables, we will use the natural metric on Z n+1 efine by i j n+1 := min i j l (n + 1. (25 l Z For the ranom recursive tree Λ n we argue in the same way, now using (20; for further etails see [9]. The cyclic representations lea to simple exact calculations of means an variances. In [9] we use the cyclic representation to show general results for sums of functions of fringe subtrees. 3 Poisson approximations by Stein s metho an couplings To prove Theorems 1.2 we use Stein s metho with couplings as escribe by Barbour et al. [2]. In general, let A be a finite inex set an let (I α, α A be inicator ranom variables. We write W := α A I α an λ := E(W. To approximate W with a Poisson istribution Po(λ, this metho uses a coupling for each α A between W an a ranom variable W α which is efine on the same probability space as W an has the property L(W α = L(W I α I α = 1. (26 A common way to construct such a coupling (W, W α is to fin ranom variables (J βα, β A efine on the same probability space as (I α, α A in such a way that for each α A, an jointly for all β A, L(J βα = L(I β I α = 1. (27

8 Cecilia Holmgren, an Svante Janson Then W α = β α J βα is efine on the same probability space as W an (26 hols. Suppose that J βα are such ranom variables, an that, for each α, the set A α := A\{α} is partitione into A α an A 0 α in such a way that J βα I β if β A α, (28 with no conition if β A 0 α. We will use the following result from [2] (with a slightly simplifie constant. ([2] also contain similar results using a thir part A + α of A α, where (28 hols in the opposite irection; we will not nee them an note that it is always possible to inclue A + α in A 0 α an then use the following result. Theorem 3.1 ([2, Corollary 2.C.1] Let W = α A I α an λ = E(W. Let A α = A\{α} an A α, A 0 α be efine as above. Then ( T V (L(W, Po(λ (1 λ 1 λ Var(W + 2 α A E(I α I β. β A 0 α Couplings for proving Theorem 1.2 Returning to the binary search tree, we use the cyclic representation in Section 2.3 to prove the following lemma where we give exact expressions for the expecte value an the variance of X n,k. Lemma 3.2 Let 1 k < n. For the ranom binary search tree T n, an E(X n,k = 2(n + 1 (k + 1(k + 2 E X n,k (n + 1 Var(X n,k = E X n,k + 2 n 64 (n+3, 2 E X n,k (E X n,k 2 = E X n,k 22k 2 +44k+12 (k+1(k+2 2 (2k+1(2k+3, 4(n+12 (k+1 2 (k+2, 2 k < n 1 2, k = n 1 2, (29 k > n 1 2. (30 Hence, ( n Var(X n,k = E(X n,k + O k 3, (31 except when k = (n 1/2; in this case Var(X n,k = E(X n,k + 2 ( n ( 1 n + O k 3 = E(X n,k + O. (32 n To prove this lemma, we use the cyclic representation (22, which in this case is n+1 X n,k = I i,k, (33 i=1

Using Stein s metho to show Poisson an normal limit laws for fringe subtrees 9 where now I i,k are efine by (13 with U i given by (21. By (13 an symmetry, for any i an 1 k < n, 2 E(I i,k = (k+2(k+1, an thus (29 follows irectly from (33. Using the cyclic representation we can also simply prove that (30 hols; however for Poisson approximation we only nee the weak asymptotics in (31 (32. Recall the construction of I i,k in (13 an the istance i j n+1 on Z n+1 given by (25. Lemma 3.3 Let k {1,..., n 1} an let I i,k be as in Section 2.3. Then for each i 1,..., n+1, there exists a coupling ((I j,k j, (Zji k j such that L(Zji k = L(I j,k I i,k = 1 jointly for all j 1,..., n + 1. Furthermore, Zji k = I j,k if j i n+1 > k + 1, Zji k I j,k if j i n+1 = k + 1, Zji k = 0 I j,k if 0 < j i n+1 k. Proof: We efine Zji k as follows. (Inices are taken moulo n + 1. Let m an m be the inices in i 1,..., i + k such that U m an U m are the two smallest of U i 1,..., U i+k ; if one of these is i 1 we choose m = i 1, an if one of them is i + k we choose m = i + k, otherwise, we ranomize the choice of m among these two inices so that P(m < m = 1 2, inepenently of everything else. Now exchange U i 1 U m an U i+k U m, i.e., let U i 1 := U m, U m := U i 1, U i+k := U m, U m := U i+k, an U l := U l for all other inices l. Finally, let, cf. (13, Z k ji = 1 { U j 1 an U j+k are the two smallest among U j 1,..., U j+k}. (34 Then, L ( ( U 1,..., U n = L (U1,..., U n I i,k = 1 an thus L(Zji k = L(I j,k I i,k = 1 jointly for all j. Note that U l = U l if l / {i 1,..., i + k} an thus Zji k = I j,k if j i n+1 > k + 1. On the other han, if 0 < j i < k + 1, then Zji k = 0 since i + k lies in {j,..., j + k 1} an U i+k is smaller than U j 1 by construction; the case k 1 < j i < 0 is similar. (This says simply that two ifferent fringe subtrees of the same size cannot overlap, which is obvious. Finally, if j = i + k + 1 with j + k + 1 < i + n + 1 (i.e., k + 1 < (n + 1/2, then j 1 = i + k an thus U j 1 U j 1 while U l = U l for l j,..., j + k; hence Zji k I j,k. The cases j = i + k + 1 with j + k + 1 = i + n + 1 an j = i k 1 with j k 1 > i n 1 are similar. See Figures 1 2 that illustrate an example for such a coupling in the case k = 3. Proof of Theorem 1.2: We prove the result for the binary search tree, using the representation X n,k = n+1 i=1 I i,k in (33. (For the ranom recursive tree the proof is similar an uses the representation ˆX n n,k = i=1 IL i,k 1 ; see [9]. Let A := {1,..., n + 1}. From Lemma 3.3 we see that for each i A we can apply Theorem 3.1 with A i := A \ {i, i ± (k + 1}, A 0 i := {i ± (k + 1}; this yiels, using Lemma 3.2 an the fact that E(I i,k I i+k+1,k = O ( 1 k (which is shown easily, 3 T V (L(X n,k, Po(µ n,k ( 1 µ n,k ( 1 µ n,k Var(X n,k + 4 E(I i,k I i+k+1,k ( 1 = O n ( 1 µ n,k k 3 = O, k 1 i n+1

10 Cecilia Holmgren, an Svante Janson 6,1 7,1 3,2 8,3 3,2 8,3 1,6 5.5 7,8 10,4 1,6 5.5 10,4 2,10 4,7 9,9 2,10 4,7 6,8 9,9 Fig. 1: A binary search tree with no fringe subtree of size three containing the keys {4, 5, 6}. Fig. 2: A coupling forcing a fringe subtree of size three containing the keys {4, 5, 6} in the tree in Fig. 1. which shows (3. 4 Normal approximations by Stein s metho In this section we will prove Theorem 1.6. As in [5, Theorem 5] we will apply the following result, see [10, Theorem 6.33] for a proof, an for the efinition of a epenency graph. Lemma 4.1 Suppose that (S n 1 is a sequence of ranom variables such that S n = α V n Z nα, where for each n, {Z nα } α is a family of ranom variables with epenency graph (V n, E n. Let N( enote the close neighborhoo of a noe or set of noes in this graph. Suppose further that there exist numbers M n an Q n such that α V n E( Z nα M n an for every α, α V n, E( Z nβ Z nα, Z nα Q n. β N(α,α Let σ 2 n = Var(S n. If lim n M nq 2 n σ 3 n = 0, then it hols that Sn E(Sn Var(Sn N (0, 1. Proof of Theorem 1.6: Recall that X n = (X T 1 n, X T 2 n,..., X T n an let Z = (Z 1,..., Z, where Z is multivariate normal with the istribution N (0, Γ, where Γ is the matrix with elements γ ij = 1 lim n n Cov(XT i n, X T j n, see (10 above. By the Cramér Wol evice [3, Theorem 7.7], to show that n 1 2 ( X n µ n converges in istribution to Z, it is enough to show that for every fixe vector (t 1,..., t R we have j=1 t jx T j n E ( j=1 t jx T j n t j Z j, (35 n j=1

Using Stein s metho to show Poisson an normal limit laws for fringe subtrees 11 where j=1 t jz j N ( 0, γ 2 with γ 2 := j,k=1 t jt k γ ij. Let S n := j=1 t jx T j n. Theorem 1.5 implies that, as n, Var(S n n j,k=1 t j t k σ T i,t j = n j,k=1 t j t k γ ij = nγ 2. (36 In particular, if γ 2 = 0, then (35 is trivial, with the limit 0. To show that (35 hols when γ 2 > 0, we will use the same metho as was use in [5, Theorem 5] for proving this theorem (in a more general form in the 1-imensional case = 1. Let Ii,k T = 1{σ(i, k T }. Let T j = k j, 1 j. We use the cyclic representation (22, which in this case can be written as X T j n = n+1 i=1 Ij i, for some inicator variable Ij i = I i,k j I T j i,k j epening only on U i 1,..., U i+kj. We efine V n := {(i, j : 1 i n + 1, 1 j } an let for each (i, j V n, A i,j be the set {i 1,..., i + k j }, regare as a subset of Z n+1. Thus I j i epens only on {U k : k A i,j }, an thus we can efine a epenency graph L n with vertex set V n by connecting (i, j an (i, j when A i,j A i,j. Let K := max{k 1, k 2,..., k } an M := max{t 1, t 2,..., t }. It is easy to see that for the sum S n := j=1 n+1 t j X T j n = i=1 j=1 t j I j i = (i,j V n t j I j i, we can choose the numbers M n an Q n in Lemma 4.1 as M n = (n + 1M an Q n = 2M Since σ n n 1/2 M by (36, it hols that lim nq 2 n n σn 3 It is shown in [9] that the matrix Γ is non-singular. sup N((i, j 2M(2K + 3. (i,j V n = 0, an Lemma 4.1 shows that (35 hols. The proof of Theorem 1.3 when k = o( n an k tens to infinity follows irectly from Theorem 1.2 an (31, since then E(X n,k an Var(X n,k ten to infinity as n tens to infinity. For the case k = O(1 we can apply Lemma 4.1 similarly to the proof of Theorem 1.6 an repeat the arguments use in [5, Theorem 5]; see [9] for etails. 5 Protecte noes We consier the number of protecte noes. A noe is protecte if the shortest istance to a leaf is at least two, i.e., it is neither a leaf or the parent of a leaf. The following theorem was shown by Mahmou an War [13, Theorem 3.1] using generating functions. Theorem 5.1 Let X n enote the number of protecte noes in a binary search tree T n. Then it hols that X n 11 30 n ( N 0, 29. n 225

12 Cecilia Holmgren, an Svante Janson We can show this result from a simple application of Theorem 1.6. Using the formulation of fringe subtrees, we note that the number of unprotecte noes in the binary search tree equals twice the number of leaves (counting all the leaves an all the parents of the leaves minus the number of cherry subtrees, i.e., subtrees consisting of a root with one left an one right chil that both are leaves (since these are the only cases when a parent is counte twice. Hence, since any linear combination of the components in a ranom vector with a multivariate normal istribution is normal, Theorem 5.1 follows from Theorem 1.6. Moreover, our approach using fringe subtrees also allows us to provie a simple proof of the following result which was conjecture in [13, Conjecture 2.1]. See [9] for etaile proofs of Theorems 5.1 5.2. Theorem 5.2 Let X n enote the number of protecte noes in a binary search tree T n. For each fixe integer k 1, there exists a polynomial p k (n of egree k, the leaing term of which is ( 11 30 k, such that E(Xn k = p k (n for all n 4k. References [1] Alous, D., Asymptotic fringe istributions for general families of ranom trees. Ann. Appl. Probab. 1 (1991, no. 2, 228 266. [2] Barbour, A. D., Holst L. an Janson S., Poisson Approximation. Oxfor University Press, New York, 1992. [3] Billingsley P., Convergence of Probability Measures. John Wiley an Sons, New York, 1968. [4] Devroye L., Limit laws for local counters in ranom binary search trees. Ranom Structures Algorithms 2 (1991, no. 3, 303 315. [5] Devroye L., Limit laws for sums of functions of subtrees of ranom binary search trees. SIAM J. Comput. 32 (2002/03, no. 1, 152 171. [6] Feng Q., Mahmou H. M. an Panholzer A., Phase changes in subtree varieties in ranom recursive an binary search trees. SIAM J. Discrete Math. 22 (2008, no. 1, 160 184. [7] Fuchs M., Subtree sizes in recursive trees an binary search trees: Berry Esseen bouns an Poisson approximations. Combin. Probab. Comput. 17 (2008, no. 5, 661 680. [8] Fuchs, M., Hwang, H.-K. an Neininger, R., Profiles of ranom trees: limit theorems for ranom recursive trees an binary search trees. Algorithmica 46 (2006, no. 3-4, 367 407. [9] Holmgren C. an Janson S., Limit Laws for Functions of Fringe subtrees for Binary Search Trees an Recursive Trees. In preparation. [10] Janson S., Łuczak T. an Ruciński A., Ranom Graphs. John Wiley, New York, 2000. [11] Knuth D.E., The Art of Computer Programming, Vol. 3 : Sorting an Searching Aison-Wesley Publishing Co., Reaing, Mass., 1973. [12] Knuth D.E., The Art of Computer Programming. Volume 1: Funamental Algorithms. Thir e eition. Aison-Wesley Publishing Co., Reaing, Mass., 1997. [13] Mahmou H.M. an War M.D., Asymptotic istribution of two-protecte noes in ranom binary search trees. Appl. Math. Lett. 25 (2012, no. 12, 2218 2222.