Binary Search Trees CMPSC 122 Note: This notes packet has significant overlap with the first set of trees notes I do in CMPSC 360, but goes into much greater depth on turning BSTs into pseudocode than in 360. Starting in Spring 2014, I've split the introduction to trees in 360 into two packets: one that encompasses all we do here and a second on the deeper mathematical analysis, namely a proof by strong induction of an important theorem relating the height and number of terminal vertices. If you are not concurrently taking both courses with me, but take 360 with me later, check in with me about potentially being excused from a lecture that will be review for you there. I. Motivation We've learned about various structures in which to store data arrays, lists, stacks, queues and each has something about it that makes it unique. What often motivates the choice of structure is what we want to do with it, or how we want to get information out of it. All of those other structures were linear structures. We can use the idea of binary trees to store data in a way that allows branching. Let's do an activity. You'll give me some numbers, and I'll put them into a binary tree in a particular way. As we go, write down the list of numbers in order and the tree. See if you can figure out what I'm doing. List of numbers: Resulting tree: Page 1 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122
II. Binary Search Trees, Defined The kind of tree we're working with is something called a binary search tree, sometimes abbreviated BST. For a binary tree to be a binary search tree, it must satisfy the binary search tree property. That is, for each node n, n's left child must be less than n. More formally n's right child must be greater than n. More formally In this definition, we work under the assumption that all keys in a BST are unique. (This isn't a stretch, but if we wanted to allow non-unique keys, there are few different strategies we could employ for "same" keys.) Now then, it's worth noting how BSTs can be used. While we could certainly use a BST to store a list of numbers, it's really the meaning of those numbers that makes a BST useful. We really want to use a BST to store records. But, in practice, we don't really store an entire record in a node of a BST; we instead store some key to the record (think primary keys in database tables as we'll see in CMPSC 221). So, we store keys to records in a tree and use the structure of a binary tree to locate a record easily. That's why it's called a binary search tree. III. Searching A BST Question: In the tree we drew above, how would go about searching for the key 50 systematically, given that the tree must follow the BST property? Question: How would we determine that a key isn't found in a BST? So, let's generalize and write down pseudocode for an algorithm to search for a node in a BST. It should take as an input a pointer to the tree's root and a search key. It should return a pointer to a node containing the search key, or, in the case of failure, NIL. Page 2 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122
Problem: What is the precondition for the above algorithm? IV. An Algorithm for Insertion into a BST To build a binary search tree from a set of input numbers: 1. Make the first input the root of the BST. 2. For each remaining input, recursively compare the input to the root of the tree. a. If the input is less than the root, it becomes the left child of the root (or, recursively, it goes into the left subtree.) b. If the input is greater than the root, it becomes the right child of the root (or, recursively, it goes into the right subtree.) Example 1: Build a BST from the following lists: a. 6, 4, 7 b. 6, 4, 7, 2, 5, 9 Problem: a. Build a BST from these inputs: 10, 20, 30, 40, 5, 8, 50, 60, 70, 15, 80 b. Comment on the shape of the BST. Page 3 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122
Problem: Write a recursive algorithm to insert a key into a BST, given that key and a pointer to the BST's root. Page 4 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122
V. Tree Traversal Once a tree is in place, we can traverse or walk the tree to list the elements of the tree. There are three kinds of traversals. The first is called an inorder traversal of the tree. Algorithm: Inorder Traversal(Tree T) 1. Do an Inorder Traversal on the left subtree of T 2. Print the root of T 3. Do an Inorder Traversal on the right subtree of T Notice the recursive nature of this procedure. Example: Let's go back and do an inorder traversal on a BST from the first page. The other two kinds of traversals are called preorder and postorder. In short, here's how all three go: Inorder Traversal: left, root, right Preorder Traversal: root, left, right Postorder Traversal: left, right, root Example: Let's do a preorder traversal on a BST from the first page. Example: Let's do a postorder traversal on a BST from the first page. Page 5 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122
VI. Tree Sort Question: Suppose we had a list of numbers we wanted to sort. How could we use a BST to do this? Question: What advantages does this method have? VII. Performance of BST Algorithms Problem: Build a BST from these values: 50, 30, 20, 40, 70, 80, 60. Trace a search for 50. How many comparisons are necessary? Trace a search for 20. How many comparisons are necessary? Trace a search for 45. How many comparisons are necessary? Can we call any of these best or worst-case scenarios? Page 6 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122
Let's now consider a tree that's slightly larger, one where each of the leaves of the last tree had 2 children. Let's again extend the last tree in the same way and get a maximum number of comparisons. Let's generalize the worst-case number of comparisons for the special case of a binary search tree where each node has exactly 2 children: Number of nodes (n) Worst-Case Number of Comparisons 7 15 31 63 Question: Does this count as a worst-case running time for a search in a BST? Why? If not, what would an accurate worst case be? Searching wasn't the only algorithm we looked at. Let's consider the performance of others: Insertion Traversal Finally, it would seem, then, that having perfectly balanced binary trees yields optimal performance. So, it would behoove us to have a way of balancing BSTs. We'll leave that for the middle of 465 (and, in the meantime, do some other things with trees in 360, as well as graphs, of which trees are just a special case). Page 7 of 7 Prepared by D. Hogan for PSU CMPSC 360 and CMPSC 122