Analysis of Algorithms I: Optimal Binary Search Trees



Similar documents
Analysis of Algorithms I: Binary Search Trees

Data Structures and Algorithms

OPTIMAL BINARY SEARCH TREES

Binary Search Trees (BST)

Ordered Lists and Binary Trees

Solutions to Homework 6

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

On the k-path cover problem for cacti

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Binary Heaps. CSE 373 Data Structures

Optimal Binary Search Trees Meet Object Oriented Programming

Sample Questions Csci 1112 A. Bellaachia

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Lecture 10: Regression Trees

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Trees and Huffman Encoding Binary Search Trees

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Full and Complete Binary Trees

Binary Search Trees CMPSC 122

From Last Time: Remove (Delete) Operation

Generating Elementary Combinatorial Objects

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Binary Heap Algorithms

Converting a Number from Decimal to Binary

S.Hrushikesava Raju, Associate Professor, SIETK, NarayanaVanam Road, Puttur,A.P. Dr. T.Swarna Latha

Linear Equations and Inequalities

UNIVERSITY OF LONDON (University College London) M.Sc. DEGREE 1998 COMPUTER SCIENCE D16: FUNCTIONAL PROGRAMMING. Answer THREE Questions.

24. The Branch and Bound Method

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

Why? A central concept in Computer Science. Algorithms are ubiquitous.

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Influences in low-degree polynomials

CS473 - Algorithms I

Chapter 14 The Binary Search Tree

Matrix-Chain Multiplication

Chapter 13: Query Processing. Basic Steps in Query Processing

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Lecture 4 Online and streaming algorithms for clustering

Near Optimal Solutions

Rotation Operation for Binary Search Trees Idea:

Algorithms and Data Structures

Data Caching in Networks with Reading, Writing and Storage Costs

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

CSE 326: Data Structures B-Trees and B+ Trees

CSCE 310J Data Structures & Algorithms. Dynamic programming 0-1 Knapsack problem. Dynamic programming. Dynamic Programming. Knapsack problem (Review)

Satisfiability Checking

Lecture Notes on Binary Search Trees

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

DATA STRUCTURES USING C

Lecture 1: Course overview, circuits, and formulas

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Big Data and Scripting. Part 4: Memory Hierarchies

Introduction to Scheduling Theory

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

International Journal of Advanced Research in Computer Science and Software Engineering

Introduction to Data Structures and Algorithms

GENERATING THE FIBONACCI CHAIN IN O(log n) SPACE AND O(n) TIME J. Patera

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Graphs without proper subgraphs of minimum degree 3 and short cycles

Lecture 6 Online and streaming algorithms for clustering

Algorithms Chapter 12 Binary Search Trees

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

Reinforcement Learning

Scheduling Single Machine Scheduling. Tim Nieberg

Cpt S 223. School of EECS, WSU

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

TREE BASIC TERMINOLOGIES

The Goldberg Rao Algorithm for the Maximum Flow Problem

Binary Coded Web Access Pattern Tree in Education Domain

Algorithms and Data Structures

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

CS473 - Algorithms I

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Symbol Tables. Introduction

Less naive Bayes spam detection

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Reinforcement Learning

Efficient Algorithms to Globally Balance a Binary Search Tree

Classification/Decision Trees (II)

Frsq: A Binary Image Coding Method

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

Section IV.1: Recursive Algorithms and Recursion Trees

Detection of changes in variance using binary segmentation and optimal partitioning

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Generating models of a matched formula with a polynomial delay

Testing LTL Formula Translation into Büchi Automata

Discrete Optimization

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Transcription:

Analysis of Algorithms I: Optimal Binary Search Trees Xi Chen Columbia University

Given a set of n keys K = {k 1,..., k n } in sorted order: k 1 < k 2 < < k n we wish to build an optimal binary search tree with keys from K to minimize the expected number of comparisons needed for each search operation. We consider the following setting slightly simpler than the one discussed in Section 15.5 of the textbook. Assume we know in advance that for each search operation in the future, the key k to search for always comes from K and satisfies k = k i with probability p i, for each i = 1, 2,..., n;

This implies that n p i = 1 i=1 Let T be a binary search tree with keys k 1,..., k n. So T has n nodes and each node is labelled with a key k i. We use depth T (k i ) to denote the depth of the node labelled with k i plus one (so if k r is the key at the root, then we set depth T (k r ) = 1 instead of 0). It is clear that if the key we search for is k = k i, then the number of comparisons needed is exactly depth T (k i ).

Thus, the expected number of comparisons is n p i depth T (k i ) i=1 and we will refer to it as cost(t ), the cost of tree T. The goal is then to find an optimal binary search tree T for K = {k 1,..., k n } with the minimum cost. As usual, we start by describing a dynamic programming algorithm that computes the minimum cost. It can be then used to construct an optimal BST.

Note the difference between this problem and Huffman trees. In the latter we only need to build a tree (instead of a binary search tree) in which each leaf is labelled with a character. Also in Huffman trees, the cost of a tree is the expected depth of leaves. Here the cost is the expected depth over all nodes.

Again, we use dynamic programming. To this end, we first need to figure out the optimal substructure of the problem, which will then lead us to a recursive formula that reduces the problem to smaller subproblems. Here the first choice to make, when constructing a binary search tree for K, is which key to put at the root of the tree.

Assume the key at the root is k r. Denote the tree by T, its left subtree by T L, and its right subtree by T R. We know T L has keys k 1,..., k r 1 and T R has keys k r+1,..., k n. Then for any k i, i < r: For any k j, j > r, we have: depth T (k i ) = depth TL (k i ) + 1 depth T (k j ) = depth TR (k j ) + 1

As a result, we have cost(t ) n = p i depth T (k i ) i=1 = 1 p r + p i (depth TL (k i ) + 1 ) + p j (depth TR (k j ) + 1 ) i<r j>r n = cost(t L ) + cost(t R ) + p i = 1 + cost(t L ) + cost(t R ) i=1

This equation implies the following: Denote by c r 1 the cost of an optimal BST with keys k 1,..., k r 1, and by c r+1 the cost of an optimal BST with keys k r+1,..., k n. Then the minimum cost of a binary search tree for K, that has k r as its root, is exactly 1 + c r 1 + c r+1 This gives us the following recursive algorithm.

Naive Optimal Binary Search Tree: 1 For r = 1 to n do 2 make a recursive call to compute the cost of an optimal BST for {k 1,..., k r 1 }; store it in C[r 1] Note when r = 1, the cost of an empty BST is 0 3 make a recursive call to compute the cost of an optimal BST for {k r+1,..., k n }; store it in C [r + 1] Note when r = n, the cost of an empty BST is 0 4 output [ ] 1 + min C[r 1] + C [r + 1] r=1,...,n

However, the worst-case running time of this naive recursive algorithm is exponential. Note that there are only about Θ(n 2 ) subproblems we make recursive calls to solve in its recursion tree. The reason why its recursion tree is huge is because we solve the same subproblem over and over.

Again, we use dynamic programming to give a more efficient algorithm: maintain a table to store solutions to subproblems already solved; and solve all the subproblems one by one, using the recursive formula we found earlier, in an appropriate order.

For this purpose, we introduce the following notation. We let p i,j = p k, for any 1 i j n i k j Given p 1,..., p n, we can compute all p i,j s in Θ(n 2 ) time (how?).

Given i, j : 1 i j n, we use c i,j to denote the minimum cost of an optimal binary search tree with keys k i, k i 1,..., k j. For any i [n], we also set c i,i 1 = 0 for convenience (meaning that an empty binary search tree has cost 0). Then to obtain c i,j, we have:

1 If the root is k i, then the minimum cost is 0 + p i + (c i+1,j + p i+1,j ) = c i,i 1 + c i+1,j + p i,j as c i,i 1 = 0 2 If the root is k j, then the minimum cost is (c i,j 1 + p i,j 1 ) + p j + 0 = c i,j 1 + c j+1,j + p i,j as c j+1,j = 0 3 If the root is k r, where r : i < r < j, the minimum cost is (c i,r 1 + p i,r 1 ) + p r + (c r+1,j + p r+1,j ) = c i,r 1 + c r+1,j + p i,j To summarize, we get the following recursive formula: [ ] c i,j = p i,j + min ci,r 1 + c r+1,j i r j

This gives us the following algorithm: 1 compute p[i, j] for all i, j : 1 i j n 2 create a table c[i, j], where i, j : 1 i j + 1 n 3 set c[i, i 1] = 0 and c[i, i] = p i for all i [n] 4 for k from 1 to n 1 do 5 for i from 1 to n k do 6 set j = i + k and set [ ] c[i, j] = p[i, j] + min c[i, r 1] + c[r + 1, j] i r j 7 output c[1, n]

Here we fill up the table in the following order. At the beginning, all entries with j i = 1 (empty tree) and j i = 0 (tree with one single node) are ready. Then we work on entries with j i = 1, j i = 2,..., j i = n 1. Every time we work on an entry c[i, j] with j i = k, we know that all the entries c[i, j ] with j i < k have already been computed. Note that the recursive formula we use to compute c[i, j] only involves entries c[i, j ] with j i < k. So they are all ready, and we can compute c[i, j] in time O(j i). It is easy to check that the total running time is Θ(n 3 ).