CE204 Data Structures and Algorithms Part 7

CE204 Data Structures and Algorithms Part 7 06/03/2016 CE204 Part 7 1

Balancing Binary Search Trees 1 We saw in part 4 that the average times for binary search tree operations are O(log n), but in the worst case the times are O(n). To obtain optimum performance we would like to ensure that binary search trees are always balanced. We cannot insist that trees will be perfectly balanced since such a tree must have an odd number of nodes (the root and two children each with the same number of nodes). Hence we use the following definition. A binary tree is balanced if for every non-leaf node the number of nodes in its left child and the number of nodes in its right child differ by no more than one. 06/03/2016 CE204 Part 7 2

Balancing Binary Search Trees 2 It can be shown that the depth of a balanced binary tree containing n nodes is exactly floor(log 2 n)+1 (where floor(x) is the largest integer less than or equal to x). If a binary search tree will be searched frequently but changed only occasionally, then ensuring that the tree remains balanced after each change would be worthwhile. However, if insertions and deletions will be performed frequently rebalancing would be cost-effective only if the rebalancing of a tree containing n items could be done in O(log n) time. 06/03/2016 CE204 Part 7 3

Balancing Binary Search Trees 3 Consider the following balanced binary search tree 3 1 5 2 4 6 06/03/2016 CE204 Part 7 4

Balancing Binary Search Trees 4 Suppose we wish to insert the value 7 into the tree on the previous slide and maintain the balanced property. There is only one possible balanced binary search tree containing the values 1, 2, 3, 4, 5, 6 and 7 (and no other values) and none of its elements are in the same position as in the original tree. Furthermore no value has the same parent in this tree as in the original. Hence to rebalance the tree after the insertion all of the elements have to be moved individually. Similar situations can arise with larger trees and consequently rebalancing takes at least O(n) time in the worst case. Hence it is not reasonable to maintain balanced trees if insertions and deletions are to be performed frequently. 06/03/2016 CE204 Part 7 5

AVL Trees 1 Although it is not cost-effective to ensure that binary search trees are always balanced, it is still desirable in many cases to ensure that they are reasonably well balanced. One way of doing this is to use the concept of being depth-balanced. A binary tree is depth-balanced (or AVL-balanced) if for every non-leaf node the depth of its left child and the depth of its right child differ by no more than one. It is not difficult to see that any balanced tree is AVL-balanced (using the depth property from slide 4), but the converse is not true. It can be shown that the depth of any AVL-balanced tree containing n nodes is no more than 2log 2 n (as long as n>1). 06/03/2016 CE204 Part 7 6

AVL Trees 2 An AVL tree is a depth-balanced binary search tree. The use of such trees was first suggested by Adel son-velskii and Landis in 1962. They showed that the rebalancing of an AVL tree after an insertion can be performed in constant time (that is, the time taken does not depend on the size of the tree) and that the rebalancing after a deletion can be performed in time proportional to the depth. Furthermore the time taken to determine whether rebalancing is needed is at most proportional to the depth, as long as a small amount of extra information is stored in each node. Hence searching, insertion and deletion for AVL trees can all be performed in O(log n) time. 06/03/2016 CE204 Part 7 7

AVL Trees 3 We will consider only the rebalancing algorithm for insertion (the algorithm for deletion is similar). Hence we assume that the insertion of a value into a tree that was AVL-balanced has resulted in a tree that is no longer AVL-balanced. The first step is to find the smallest sub-tree, A, whose children have depths differing by more than one (this is unique since its root must lie on the path from the root to the new leaf). A must have children of depth n and n+2 for some n; before the insertion the tree was AVL-balanced so the children must have had depths of n and n+1, so A s original depth was n+2. 06/03/2016 CE204 Part 7 8

AVL Trees 4 We now let B be A s deeper child. We assume that B is A s left child; if instead it is A s right child all references to left and right on subsequent slides need to be exchanged. B has depth n+2 so one of its children must have depth n+1. The other child cannot also have depth n+1 (since if it did then B would have had depth n+2 before the insertion, which we know is not the case since B s parent, A, had an original depth of n+2). Since A is the smallest unbalanced sub-tree B must be AVL-balanced, so this other child cannot have depth less than n and hence must have depth n. 06/03/2016 CE204 Part 7 9

AVL Trees 5 There are two cases to consider, dependent upon which of B s children is the deeper: Case 1: B s left child is deeper than B s right child Case 2: B s right child is deeper than B s left child The diagrams on the following slides show how the tree A is rebalanced in each of the two cases. (The nodes labelled a and b denote the roots of A and B respectively.) Remember that we assumed that B (A s deeper child) is A s left child; if B is A s right child we need to reflect the trees in the diagrams to swap the roles of left and right. 06/03/2016 CE204 Part 7 10

AVL Trees 6 Case 1: a b b T 3 T 1 a T 1 T 2 T 2 T 3 06/03/2016 CE204 Part 7 11

AVL Trees 7 The depths of T 2 and T 3 are both n, and T 2 and T 3 must be AVL-balanced (since they are smaller than the smallest unbalanced sub-tree, A). Hence the right child of the new subtree is AVL-balanced and has depth n+1. The depth of T 1 is also n+1 and T 1 is AVL-balanced, so the whole sub-tree is also AVL-balanced. Furthermore it is easy to see that it is a binary search tree. The depth of the new sub-tree is n+2 which is the same as the original depth of A so replacing A with the new sub-tree will preserve the AVL-balanced status of the rest of the tree and no further rebalancing needs to be performed. 06/03/2016 CE204 Part 7 12

AVL Trees 8 Case 2: a c b T 4 b a T 1 c T 1 T 2 T 3 T 4 T 2 T 3 06/03/2016 CE204 Part 7 13

AVL Trees 9 Again it can be seen that the new sub-tree is a binary search tree, is AVL-balanced and has depth n+2, equal to the original depth of A. The rebalancing in both cases can be performed by changing a small number of left and right child references and hence the time taken is independent of the size of the tree. It is important to be able to detect the smallest unbalanced subtree without searching the whole tree. To enable this we need to store in each node additional information about depths and update this information after each insertion or deletion. The most efficient way of doing this is to simply store details of which, if either, child is the deeper. 06/03/2016 CE204 Part 7 14

Computability 1 Theoretical computer scientists have for many years been interested in the subject of computability, addressing the question of exactly what can be done by a computer. Informally a function can be said to be computable if we can write an algorithm or program to evaluate the function, but to give a precise definition we need to state exactly what is meant by an algorithm or program. The ambiguity in the use of the term program arises because there are many programming languages and it is not immediately obvious whether the class of functions that can be implemented in Java is the same as the class that can be implemented in a significantly different language. 06/03/2016 CE204 Part 7 15

Computability 2 The issue of computability has been studied since at least 1900, well before the advent of modern digital computers or high-level programming languages. In 1936 two scientists, Turing and Church, came up with independent formal definitions of what is meant by the term computable, using totally different approaches. It was subsequently proved that the class of computable functions provided by these two definitions was the same (i.e. any function that is Turing-computable is also Churchcomputable and vice versa). 06/03/2016 CE204 Part 7 16

Computability 3 All functions that are computable using the Church-Turing definitions can be implemented in any programming language with a reasonable set of features. The Church and Turing definitions regarded a function as being something which accepts input from a stream and produces output on a stream without accessing any external resources. Limiting the meaning of a function in this way (so that we cannot allow GUIs or file access), no-one has managed to implement any function that is not computable by the Church- Turing definition, so all available evidence suggests that it is a reasonable definition. 06/03/2016 CE204 Part 7 17

Computability 4 A question which arises is "are there any non-computable functions?". There are certainly well-defined problems for which no algorithmic solution is known, but this might be simply because the problems are very difficult and no-one has yet managed to develop an algorithm to solve them. If there are any non-computable functions it follows that can they never be implemented (unless the definition of computable is wrong). It is in fact the case that there are some functions which can be proven to be non-computable the best-known example is the halting problem. 06/03/2016 CE204 Part 7 18

The Halting Problem 1 Consider the following program. public static void main(string args[]) { int i = 0; try { i = Integer.parseInt(args[0]); } catch (Exception e) { } while (i!=0) if (i%2==1) i--; else i++; System.out.println("Done"); } 06/03/2016 CE204 Part 7 19

The Halting Problem 2 The program on the previous slide will either output the message Done or continue looping forever. If its command-line argument is "0" or "1" or any string that does not represent a valid number it will output the message and terminate but if the argument is a string holding any integer other than 0 or 1 it will not terminate. (For example if the argument is "8" the value of the variable i will repeatedly alternate between 8 and 9 and never become zero.) 06/03/2016 CE204 Part 7 20

The Halting Problem 3 When dealing with programs containing recursion it can be more difficult to determine whether they will terminate. We can be confident that the recursive methods we have written for trees will terminate since the argument to every recursive call refers to a smaller tree than that referred to by the argument to the calling method so we must eventually reach a leaf or an empty tree and we therefore cannot continue recursing indefinitely. 06/03/2016 CE204 Part 7 21

The Halting Problem 4 The following method calculates Ackermann's function. int ack(int n, int y) { if (n==0) return y+1; else if (y==0) return ack(n-1, 1); else return ack(n-1, ack(n, y-1)); } The function is defined only for non-negative arguments; to ensure this happens it should be called from a method that checks that the arguments are valid, since we do not want to put this check into every recursive call. 06/03/2016 CE204 Part 7 22

The Halting Problem 5 We observe that the recursion is much more complex that in our tree traversals. Can we be sure that it will always terminate? We can see that in every recursive call at least one of the arguments will be smaller than the corresponding argument in the caller but the other could be larger. Hence we cannot immediately observe that the recursive calls get simpler as recursion gets deeper. For example if we make a call to ack(2,2) one of the recursive calls will turn out to be to ack(1,5). It turns out that ack(2,2) will indeed terminate after making about 25 calls and return the value 7. 06/03/2016 CE204 Part 7 23

The Halting Problem 6 By observing that no recursive call is made with a larger first argument than that of the caller and when the first argument is the same the second is smaller, it is possible to prove that the recursion will terminate. We can define an ordering on pairs of numbers so that (a,b)<(c,d) if and only if a<c or a is equal to c and b<d. Then, using this ordering, the pair of arguments to each recursive call is less than the pair of arguments of its caller and since the arguments cannot become negative the pairs cannot keep on getting smaller for ever so the recursion must terminate. However, it takes a long time the time complexity of ack is much greater than O(2 n ), where n is the sum of its arguments. 06/03/2016 CE204 Part 7 24

The Halting Problem 7 We have seen that in many cases it is possible to prove that a program will terminate or not terminate, but this required some intelligent thinking. Is it possible to write an algorithm to perform this task? The writing of such an algorithm would inevitably be very difficult and would probably require the use of artificial intelligence techniques. The question of whether a program will terminate when run with specific data is known as the halting problem. Specifically this asks whether we can write a program which, given two strings, one, P, containing a program source and the other, D, containing input data for that program, will determine whether the program P will terminate when run with data D. 06/03/2016 CE204 Part 7 25

The Halting Problem 8 The halting problem is semi-decidable it is certainly possible to write a program which is capable of outputting yes if P would indeed terminate when run with D as data. To do this we could write a program containing a compiler and an interpreter and simulate the running of the program; if the interpretation terminates then our program should output yes. However, if the program P does not terminate our program would also fail to terminate since the interpretation will run for ever. The fact that no-one has produced a program to solve the halting problem does not automatically mean that it cannot be solved the writing of such a solution would be very hard. 06/03/2016 CE204 Part 7 26

The Halting Problem 9 As stated earlier, it can in fact be proved that the halting problem is not computable it is not possible to write a program to solve it. The proof uses a technique known as proof by contradiction. We show that if a program to solve the problem can be written we can prove that something is true if and only if it is false, which cannot be possible, so we can conclude that such a program cannot return the correct results in all cases. (Our proof shows that the program cannot be written in Java but it can be adapted to the general case.) 06/03/2016 CE204 Part 7 27

The Halting Problem 10 Suppose that a program to solve the halting problem can be written in Java. Then it would be possible to write a method boolean halts(string p, String d) { } that returns true if the Java program whose source code is the string p terminates when run with d as input data and returns false if the program does not terminate (and throws an exception if p does not represent a valid Java program). 06/03/2016 CE204 Part 7 28

The Halting Problem 11 Having written the halts function we can include it in a program with the following main method. public static void main(string args[]) { String s; // read entire input data into s try { if (halts(s, s)) while (true) {} } catch (Exception e) { } } 06/03/2016 CE204 Part 7 29

The Halting Problem 12 When run with the contents of a valid Java source file as its input data the program on the previous slide will enter an infinite loop if the program P in the source file will terminate if supplied with a copy of itself as the input data and terminate if P will not terminate when supplied with that data. Now let X be a string containing the entire source code of the program on the previous slide and consider what happens when we run the program with X as the input data. 06/03/2016 CE204 Part 7 30

The Halting Problem 13 The program will use the halts method to determine whether X terminates with X as input data and will terminate if and only if the method returns false. However the program which we are running is X with X as input data so it terminates if the halts method says that is does not, and does not terminate if the halts method says that it does. If the halting problem was computable we know it must be possible to write a halts method that returns the correct result so this program would terminate if and only if it failed to terminate. This cannot be possible so the assumption that the halting problem was computable must have been wrong. 06/03/2016 CE204 Part 7 31