Exam study sheet for CS2711. List of topics

Exam study sheet for CS2711 Here is the list of topics you need to know for the final exam. For each data structure listed below, make sure you can do the following: 1. Give an example of this data structure (e.g., draw a binary search tree on 5 nodes). 2. Know the basic properties of data structures (e.g., that a heap has height log n) and be able to prove them (e.g. by induction). 3. Know which basic operations the data structure supports and be able to show them on an example (e.g., insertion into an AVL tree). 4. Know what is the time complexity of its basic operations and how they compare between different structures (e.g., searching in an AVL tree vs. searching in a linked list). 5. For each algorithm, make sure you know its time complexity, can write pseudocode for it and can show its execution on an example. For the analysis of algorithms (chapter 4), you need to know time complexity and induction, and be able to solve problems similar to ones in labs and assignments. Also, have a basic understanding of amortized analysis (for splay trees and doubling array size) and probabilistic analysis (hash tables vs. skip lists). Know the formula for the Master Method and examples for it. List of topics 1. Basic data structures (chapters 3 and 6): Used to implement array list and node list, sequence, iterator ADTs: can insert/remove/update an element accessed either by its rank or after/before a given location, check size, scan through all elements. Linked list: insert takes constant time in front and back, remove constant from the front, O(n) from the back, search/i th element O(n). Implementation: node contains an element and pointer to next, list specified by the first node (can be sentinel). Doubly linked list: insert and remove in constant time both front and back, search/i th element O(n). Implementation: like single linked list, except every node also points to the previous. Array/extendable array: get i th element in constant time; insert (in the middle)/remove i th element O(n). If ordered, search O(log n). Note: doubling the size is a better way to grow an extendable array (amortized analysis). 2. Stacks and queues (chapter 5): Stack: First-In-Last-Out (LIFO). ADT: push, pop, top, size, isempty. Main ones are push (to insert an element) and pop (removes an element). Implementation: linked list or array. Applications: histories, stack of recursive calls, spans, parentheses/tag matching, arithmetic expression evaluation. Queue: First-In-First-Out (FIFO). ADT: queue, dequeue, front, size, isempty. Implementation: linked list or array. Applications: waiting lists, resource access, round-robin scheduler (e.g., process switching). 3. Trees (chapter 7): In a tree every node can have several successors (called children) and a single predecessor (called parent). 1

Root is a node with no parent, leaf/external node has no children; a node with at least one child is called internal node. A depth of a node is the length of a path from it to the root; depth of the root is 0. A height of a tree is the max over depths of all nodes. A size of a tree is a total number of nodes. Nodes on the path to the root from a given node are its ancestors, nodes in the subtree rooted at this node (children, their children, etc) are descendants. In a binary tree every node has at most 2 children. In a proper binary tree, every node has either 2 children or is a leaf. In a complete binary tree, all levels have all possible nodes (except possibly the last level, at which the nodes are in order from the left). Note: sometimes a complete binary tree is also used to mean a tree which has all leaves on the last level (so the last level has all the possible nodes). Sometimes this is called a perfect binary tree. Properties: h log(n + 1) 1 (equality for perfect), i e + 1 (equality for proper binary tree), etc. A traversal of a tree lists all its nodes in order, with children usually explored left-to-right. In a preorder traversal, the parent node is explored before children (for example, printing a table of content) and in a post-order traversal the parent is explored after the children (for example, computing the value of an arithmetic expression or size of a directory). In a binary tree, there is also an in-order traversal, where the parent is explored after the left child, but before the right child. This can be used to print arithmetic expressions, etc. Euler tour generalizes the traversals, allowing for all three as special cases. Implementation: the tree is identified by its root. A node consists of the element, pointer to the parent, and either pointer to the sequence containing children (general tree) or two distinct leftchild and right-child pointers. Sometime (for complete binary trees) array representation (see under heaps). Many operations run in time O(height) or O(depth) for a given node; traversals are O(n). Applications: arithmetic expressions, any hierarchical structure/diagram, decision trees, file system, etc. Extended to build heaps, binary search trees, etc. 4. Heaps (chapter 8): Priority queue: Elements can be removed in order of priority. ADT: removemin and min; also size, insert, isempty. Heap: Implements priority queue ADT. A complete binary tree (last row from left possibly unfinished), with a heap property: every node element is no greater than then those of its children. Thus, the root always contains the smallest element. A heap is implemented using arrays. The root is at 1, and for every node i, its children are at cells 2i and 2i + 1. To insert an element into the heap, insert at the last cell of the array, and keep swapping up the chain of ancestors (upheap) until the heap property is maintained. To remove the min element, remove the root, place the element in the last cell into the root, and keep swapping with the smaller of the two children until heap property is maintaned (downheap). Both are O(log n) time. To build a heap in linear time, combine nodes in the second half of the array first in size 1 heaps, then size 3 with the layer above, etc, every time restoring the heap property. Applications: priority queue, heapsort (build a heap, then do n removemin operations: O(n log n) running time). 5. Maps, dictionaries, Hash tables, Skip lists (chapter 9): Map is a collection of elements, where elements could be looked up by a unique key. ADT: get(k), put(k, v), as well as size, isempty, remove(k), keyset(), values(), entryset(). Maps can 2

be ordered (by the keys). A dictionary is similar to a map, except the key is not required to be unique, so put(k, v) does not need to return the entry with a previous occurrence of k, and there is a getall(k). A hash table consists of an array, a hash function and a collision resolution strategy (out of separate chaining, or one of the open addressing: linear probing, quadratic probing, double hashing). A typical hash function on integers is h(k) = ak + b mod N, where N is the size of the array. An output of a good hash function looks random ; then less likely to get a collision. May need an additional function converting keys into values appropriate for the hash function; sometimes additional function compressing values of the hash function to fit into the array. Does not usually store keys in order. Performance of the hash table depends on the input values! Worst case: O(n) to insert or find an element. However, the expected performance is constant for insert/find, provided the table is not close to being full (the load factor= n/n 1/2, for example, where n is the number of elements and N the size of the table.) Applications: implement map or dictionary ADT (common for associative arrays). Word count and similar problems. Skip list consists of a number of linked lists layered and connected. The bottom list contains all elements (in order); every list above stores a subset of the elements from the list below, with nodes pointing to their duplicates in the lists above/below (if exist). A probabilistic data structure: the insertion method uses randomness to determine the number of lists (starting from the bottom) into which to insert a given element. Expected number of lists is 2; expected total number of layers O(log n). Search for k proceeds from the top list; traverse the list to the right, when the next element is larger than k and previous was smaller, drop down to the list below following the link from the smaller element and continue moving right; if k is not in the bottom list, it is not in the structure. Expected time for the search is O(log n). Usually fast in practice. Application: implements ordered map or dictionary. 6. Search trees (chapter 10): Binary search tree (BST): preserves the order of the keys. For every node, left subtree contains only keys that are at most the key at the root of the subtree, and the right at least the key at the root of the subtree. To find a node with a given key, follow the path down, going to the left/right subtree depending whether the key is less/greater than the node. ADT: get, put, remove, size, isempty; performance depends on the depth of a node/height of a tree. Insertion: find the place where the element would have been, insert there. Deletion: if leaf, just delete, if one child, move the child in its place. Otherwise find the next node in in-order traversal (it should not have a left child), swap its value into the original node, and delete the second one. Application: implements ordered map or dictionary (in-order traversal outputs keys in increasing order). AVL tree A fairly balanced version of a binary search tree, so performance of get/put/remove is O(log n). AVL property: a balance factor (difference between heights of the two subtrees) of every node is at most 1. Store height of the subtree in each node; when inserting/deleting, may need to do rotations to rebalance a tree (check the book for the types of rotations). Application: more efficient map/dictionary than plain BST. Splay tree A BST where after each get/put/remove operation the key in question is moved to the root using rotations (for deletion/unsuccessful search, move the parent of an element deleted/parent of a node where the key would be). Not necessarily balanced, but amortized performance of m operations is O(m log n). Application: favourites lists, etc, where recent searches have high chance of repeating. 3

7. Sorting/divide-and-conquer (chapter 11): Divide input into portions, solve each recursively. Example: Mergesort, Quicksort. Recurrences solved by the Master Method. Mergesort: split input in half, sort each recursively, then merge. O(n log n) Quicksort: select a (random) pivot, partition into greater-than, equal and less-than pivot portions, solve each recursively. O(n 2 ) worst time, O(n log n) average, randomized. Can be done in-place. If items to sort arbitrary, O(n log n) is the lower bound: no better algorithm possible. If items are restricted (i.e., come from a small set), can do linear time: bucket sort with one bucket per type of item. More general, radix sort (where elements are tuples of items coming from a small set). Uses bucket sort as a subroutine, requires it to be a stable sort (i.e., order of items of the same time does not change with sorting on a different radix). Radix sort can be used to sort short integers, playlists, x, y, z coordinates, etc. Other sorts: selection sort and insertion sort. Time O(n 2 ). Put elements in the array (sorted in insertion sort case, unsorted in selection), selection sort does most work on removing elements, insertion on inserting. Another O(n log n) sort is HeapSort: put elements on a heap, remove one by one. Selection problem: pick k th element. Algorithm similar to QuickSort, but only recurse on a partition which contains the element. Randomized running time O(n), can be deterministic O(n). Sets/union-find: Can store a collection of elements. Union-find: can check if two elements are in the same set and merge sets. Simple implementation O(log n), can be made amortized O(log n). 8. Graphs (chapter 13): Graphs: nodes (vertices) and links (edges) connecting them. undirected; can be weights/costs on edges or vertices. Can be directed (digraph) or Representation: adjacency list (for each vertex, list of adjacent/outgoing edges), or adjacency matrix (C(i, j) is either 1 for edge from i to j, or weight of the edge, and 0 or otherwise). Adjacency lists better if there are few edges, matrices if lots of edges and need matrix operations (i.e., transitive closure). Path: sequence of vertices where edge between two subsequent ones. Cycle: path where first and last vertices the same. Simple path/cycle: no vertex repeats. Connected: path between any two vertices. Strongly connected (digraph): path between any two vertices, weakly connected: path between any two vertices in underlying undirected graph. Graph traversals: Depth First Search (DFS) go to the child first. Breadth First Search (BFS) go to the neighbour first, uses queue. Run time O(n + m). BFS solves single-source shortest path on unweighted graphs, DFS strongly connected components, both can detect cycles. Faster implementation with adjacency lists. Both give spanning trees, label edges. Topological sort: only applies to directed acyclic graphs (DAGs). List vertices in such order that if there is a path from u to v, then u is before v. Uses DFS, O(n+m). On DAGs, problems like longest-path solvable in polynomial time. Single-source shortest paths: BFS (on unweighted graphs), Dijkstra s algorthm (on digraphs with positive weights, O(n log n)), Bellman-Ford on any graphs/digraphs, O(nm). Dijkstra s algorithm is greedy, requires positive weights on edges and uses a priority queue as an auxulary data structure; start with only the start vertex, on each step take off the queue the closest vertex to the cloud and relax its edges. Also use adjacency list implementation. 4

All-pairs shortest paths: on digraphs, if unweighed, transitive closure. A transitive closure graph has an edge (u, v) if original graph has a path (u, v). Transitive closure can be computed using matrix multiplication. Floyd-Warshall algorithm for all-pairs shortest paths (weighed) is dynamic programming, O(n 3 ): on each iteration look at path between any u, v using only the first k vertices in the graph. Spanning tree: a tree on all vertices using original graph s edges. If graph not connected, spanning forest. Minimum spanning trees: Kriskal s algorithm and Prim-Jarnik s algorithm. Both are greedy; Kruskal s uses union-find, goes in the order of increasing edge weight, Prim- Jarnik uses a priority queue similar to Dijkstra s, but adds vertices by smallest outgoing edge from the cloud, not the shortest path. Both run in O((m + n) log n). 9. Greedy algorithms (chapter 12-13) Sort items then go through them either picking or ignoring each; never reverse a decision. Running time usually O(n log n) where n is the number of elements (depends on data structures used, too). Often does not work or only gives an approximation; when it works, correctness proof by induction on the number of steps (i.e., S i is the solution set after considering i th element in order. ) Base case: show S opt such that S 0 S opt S 0 {1,..., n}. Induction hypothesis: assume S opt such that S i S opt S i {i + 1,..., n}. Induction step: show S opt such that S i+1 S opt S i+1 {i + 2,..., n}. (a) Element i + 1 is not in S i+1. Argue that S opt does not have it either, then S opt = S opt. (b) Element i + 1 is in S i+1. Either S opt has it (possibly in the different place then switch things around to get S opt), or S opt does not have it, then throw some element j out of S opt and put i + 1 instead for S opt; argue that your new solution is at least as good. Examples of greedy algorithms: Fractional knapsack, Kruskal s algorithm for Minimal Spanning Tree, Dijkstra s algorithm, scheduling with deadlines and profits. 10. Dynamic programming (chapter 12-13) Precompute partial solutions starting from the base cases, keep them in a table, compute the table from already precomputed cells (e.g., row by row, but can be different). Arrays can be 1,2, 3-dimensional (possibly more), depends on the problem. Running time a function of the size of the array might be not polynomial (e.g., scheduling with very large deadlines)! Examples: Scheduling, Knapsack, Longest Common Subsequence, Longest Increasing Subsequence, All Pair Shortest Path (Floyd-Warshall). Steps of design: (a) Define an array; that is, state what are the values being put in the cells, then what are the dimensions and where the value of the best solution is stored. E.g.: A(i, t) stores the profit of the best schedule for jobs from 1 to i finishing by time t, where 1 i n, and 0 t maxd i. Final answer value is A(n, maxd i ). (b) Give a recurrence to compute A from the previous { cells in the array, including initialization. A(i 1, j 1) + 1 x i = y j E.g. (longest common subsequence) A(i, j) = max{a(i 1, j), A(i, j 1)} otherwise (c) Give pseudocode to compute the array (usually we omitted it in class). (d) Explain how to recover the actual solution from the array (usually using a recursive P rintopt() procedure to retrace decisions). 11. Backtracking Used when others don t work; usually exponential time, but faster than testing all possibilities. Make a decision tree of possibilities, go through the tree recursively, if some possibilities fail, backtrack. 5