Concept: Types of algorithms

Discrete Math for Bioiformatics WS 10/11:, by A. Bockmayr/K. Reiert, 18. Oktober 2010, 21:22 1001 Cocept: Types of algorithms The expositio is based o the followig sources, which are all required readig: 1. Corma, Leiserso, Rivest: Chapter 1 ad 2. 2. Motwai, Raghava: Chapter 1. I this lecture we will discuss differet ways to categorize classes of algorithms. There is o oe correct classificatio. Oe should regard the task of categorizig algorithms more as givig them certai attributes. After discussig how a give algorithm ca be labeled (e.g. as a radomized, divide-ad-coquer algorithm) we will discuss differet techiques to aalyze algorithms. Usually the labels with which we categorized a algorithm are quite helpful i choosig the appropriate type of aalysis. Determiistic vs. Radomized Oe importat (ad exclusive) distictio oe ca make is, whether the algorithm is determiistic or radomized. Determiistic algorithms produce o a give iput the same results followig the same computatio steps. Radomized algorithms throw cois durig executio. Hece either the order of executio or the result of the algorithm might be differet for each ru o the same iput. There are subclasses for radomized algorithms. Mote Carlo type algorithms ad Las Vegas type algorithms. A Las Vegas algorithm will always produce the same result o a give iput. Radomizatio will oly affect the order of the iteral executios. I the case of Mote Carlo algorithms, the result may might chage, eve be wrog. However, a Mote Carlo algorithm will produce the correct result with a certai probability. So of course the questio arises: What are radomized algorithms good for? The computatio might chage depedig o coi throws. Mote Carlo algorithms do ot eve have to produce the correct result. Why would that be desirable? The aswer is twofold: Radomized algorithms usually have the effect of perturbig the iput. Or put it differetly, the iput looks radom, which makes bad cases very seldom. Radomized algorithms are ofte coceptually very easy to implemet. At the same time they are i ru time ofte superior to their determiistic couterparts. Ca you thik of a obvious example? We will come to the example later o i more detail. Offlie vs. Olie Aother importat (ad exclusive) distictio oe ca make is, whether the algorithm is offlie or olie. Olie algorithms are algorithms that do ot kow their iput at the begiig. It is give to them olie, whereas ormally algorithms kow their iput beforehad. What seems like a mior detail has profoud effects o the desig of algorithms ad o their aalysis. Olie algorithms are usually aalyzed by usig the cocept of competitiveess, that is the worst case factor they take loger compared to the best algorithm with complete iformatio. Oe example for a olie problem is the ski problem.

1002 Cocept: Types of algorithms ad algorithm aalyses, by Kut Reiert, 18. Oktober 2010, 21:22 A skier must decide every day she goes skiig, whether to ret or to buy skis, uless or util she decides to buy them. The skier does ot kow how may days she ca ski, because the whether is upredictable. Call the umber of days she will ski T. The cost to ret skis is 1 uit, while the cost of buyig skis is B. What is the optimal offlie algorithm miimizig the worst case cost? Ad what would be the optimal strategy i the olie case? Exact vs approximate vs. heuristic vs. operatioal Usually algorithms have a optimizatio goal i mid, e.g. compute the shortest path or the aligmet or miimal edit distace. Exact algorithms aim at computig the optimal solutio give such a goal. Ofte this is quite expesive i terms of ru time or memory ad hece ot possible for large iput. I such cases oe tries other strategies. Approximatio algorithms aim at computig a solutio which is for example oly a certai, guarateed factor worse tha the optimal solutio, that meas a algorithm yields a c approximatio, if it ca guaratee that its solutio is ever worse tha a factor c compared to the optimal solutio. Alteratively, heuristic algorithms try to reach the optimal solutio without givig a guaratee that they always do. Ofte it is easy to costruct a couter example. A good heuristics is almost always ear or at the optimal value. Fially there are algorithms which do ot aim at optimizig a objective fuctio. I call them operatioal sice they chai a series of computatioal operatios guided by expert kowledge but ot i cojuctio with a specific objective fuctio (e.g. ClustalW). Example: Approximatio algorithm As a example thik of the Travelig Salema Problem with triagle iequality for cities. This is a NP-hard problem (o polyomial-time algorithm is kow). The followig greedy, determiistic algorithm yields a 2 approximatio for the TSP with triagle iequality i time O( 2 ). 1. Compute a miimum spaig tree T for the complete graph implied by the cities. 2. Duplicate all edges of T yieldig a Euleria graph T ad the fid a Euleria path i T. 3. Covert the Euleria cycle ito a Hamiltoia cycle by takig shortcuts. Ca you ow argue why this is a 2 approximatio? Categorizatio accordig to mai cocept Aother way which you have ofte heard util ow is to use the mai algorithmic paradigm to categorize a algorithm, such as: Simple recursive algorithms Backtrackig algorithms Divide-ad-coquer algorithms Dyamic programmig algorithms Greedy algorithms Brach-ad-boud algorithms

Cocept: Types of algorithms ad algorithm aalyses, by Kut Reiert, 18. Oktober 2010, 21:22 1003 Brute force algorithms ad others... Simple recursive algorithms A simple recursive algorithm Solves the base cases directly Recurs with a simpler subproblem Does some extra work to covert the solutio to the simpler subproblem ito a solutio to the give problem Examples are: To cout the umber of elemets i a list: If the list is empty, retur zero; otherwise, Step past the first elemet, ad cout the remaiig elemets i the list Add oe to the result To test if a value occurs i a list: If the list is empty, retur false; otherwise, If the first thig i the list is the give value, retur true; otherwise Step past the first elemet, ad test whether the value occurs i the remaider of the list Backtrackig algorithms A backtrackig algorithm is based o a depth-first recursive search. It Tests to see if a solutio has bee foud, ad if so, returs it; otherwise For each choice that ca be made at this poit, Make that choice Recur If the recursio returs a solutio, retur it If o choices remai, retur failure For example color a map with o more tha four colors: color(coutry ) If all coutries have bee colored ( > umber of coutries) retur success; otherwise, For each color c of four colors, If coutry is ot adjacet to a coutry that has bee colored c Color coutry with color c recursivly color coutry + 1 If successful, retur success Retur failure (if loop exits)

1004 Cocept: Types of algorithms ad algorithm aalyses, by Kut Reiert, 18. Oktober 2010, 21:22 Divide-ad-coquer algorithms A divide-ad-coquer algorithm cosists of two parts. Divide the problem ito smaller subproblems of the same type ad solve these subproblems recursively Combie the solutios to the subproblems ito a solutio to the origial problem Traditioally, a algorithm is oly called divide-ad-coquer if it cotais two or more recursive calls. Two examples: Quicksort: Partitio the array ito two parts, ad quicksort each of the parts No additioal work is required to combie the two sorted parts Mergesort: Cut the array i half, ad mergesort each half Combie the two sorted arrays ito a sigle sorted array by mergig them Dyamic programmig algorithms A dyamic programmig algorithm remembers past results ad uses them to fid ew results. Dyamic programmig is geerally used for optimizatio problems i which: Multiple solutios exist, eed to fid the best oe Requires optimal substructure ad overlappig subproblem Optimal substructure: Optimal solutio cotais optimal solutios to subproblems Overlappig subproblems: Solutios to subproblems ca be stored ad reused i a bottom-up fashio This differs from Divide-ad-Coquer, where subproblems geerally eed ot overlap. There are may examples i bioiformatics. For example: Compute a optimal pairwise aligmet Optimal substructure: the aligmet of two prefixes cotais solutios for the optimal aligmets of smaller prefixes. Overlappig subproblems: The solutio for the optimal aligmet of two prefixes ca be costructed usig the stored solutios of the aligmet of three subproblems (i the liear gap model). Compute a Viterbi path i a HMM Optimal substructure: the Viterbi path for a iput prefix edig i a state of a HMM cotais shorter Viterbi paths for smaller parts of the iput ad other HMM states. Overlappig subproblems: The solutio for the Viterbi path for a iput prefix edig i a state of a HMM ca be costructed usig the stored solutios of Viterbi paths for a shorter iput prefix ad all HMM states. Greedy algorithms

Cocept: Types of algorithms ad algorithm aalyses, by Kut Reiert, 18. Oktober 2010, 21:22 1005 A greedy algorithm sometimes works well for optimizatio problems. A greedy algorithm works i phases. At each phase: You take the best you ca get right ow, without regard for future cosequeces You hope that by choosig a local optimum at each step, you will ed up at a global optimum This strategy actually ofte works quite well ad for some class of problems it always yields a optimal solutio. Do you kow a simple graph problem which is solved greedily to optimality? Aother example would be the followig. Suppose you wat to cout out a certai amout of moey, usig the fewest possible bills ad cois. A greedy algorithm would do this would be to take the largest possible bill or coi that does ot overshoot. For example: To make $6.39, you ca choose: a $5 bill a $1 bill, to make $6 a 25c coi, to make $6.25 A 10c coi, to make $6.35 four 1c cois, to make $6.39 For US moey, the greedy algorithm always gives the optimum solutio (cautio: for other moey systems ot (imagie a currecy with uits of 1, 7, ad 10 ad try the algorithm for 15 uits). Brach-ad-boud algorithms Brach-ad-boud algorithms are geerally used for optimizatio problems. As the algorithm progresses, a tree of subproblems is formed. The origial problem is cosidered the root problem. A method is used to costruct a upper ad lower boud for a give problem. At each ode, apply the boudig methods. If the bouds match, it is deemed a feasible solutio to that particular subproblem. If bouds do ot match, partitio the problem represeted by that ode, ad make the two subproblems ito childre odes. Cotiue, usig the best kow feasible solutio to trim sectios of the tree, util all odes have bee solved or trimmed. A example of a brach-ad-boud algorithms would be the folloig for the Travellig salesma problem (TSP). A salesma has to visit each of cities (at least) oce each, ad wats to miimize total distace travelled. Cosider the root problem to be the problem of fidig the shortest route through a set of cities visitig each city oce Split the ode ito two child problems: Shortest route visitig city A first Shortest route ot visitig city A first Cotiue subdividig similarly as the tree grows Brute force algorithms

1006 Cocept: Amortized aalysis, by Kut Reiert, 18. Oktober 2010, 21:22 A brute force algorithm simply tries all possibilities util a satisfactory solutio is foud. Such a algorithm ca be: Optimizig: Fid the best solutio. This may require fidig all solutios, or if a value for the best solutio is kow, it may stop whe ay best solutio is foud (Example: Fidig the best path for a travellig salesma) Satisficig: Stop as soo as a solutio is foud that is good eough (Example: Fidig a travellig salesma path that is withi 10% of optimal) Coclusio I preseted to you may categories with which you ca classify or label algorihms. Such a classificatio gives you a clear uderstadig about how a algorithm works ad a idicatio how to aalyze it. What algorithms do you kow? ad what labels would they get? I the followig we wil talk about how to aalyze the differet kid of algorithms with appropriate techiques. Cocept: Ru time aalysis The expositio is based o the followig sources, which are all required readig: 1. Corma, Leiserso, Rivest: Itroductio to algorithms, Chapter 18 I this sectio we recall the basic otatios for ru time aalyses ad the describe the differet cocepts of worst-case ru time, average case ru time, expected ru time, amortized ru time, ad the aalysis of competitiveess. Lets start by recallig the defiitios of the Ladau symbols (O,Ω,Ω,Θ,o,ω). O(f ) := {g : N R + : c R > 0, 0 N : N, 0 : g() c f ()} (1.1) Ω(f ) := {g : N R + : c R > 0, 0 N : N, 0 : g() c f ()} (1.2) Ω (f ) := {g : N R + : c R > 0 : m N : N, > m : g() c f ()} (1.3) Θ(f ) := {g : N R + : g O(f ) ad g Ω(f )} (1.4) o(f ) := g() {g : N R + : lim = 0} f () (1.5) ω(f ) := f () {g : N R + : lim = 0} g() (1.6) I the followig we list some commoly used adjectives describig classes of fuctios. Mid that we use the more commo = sig istead of the (more correct) sig. We say af fuctio f : is costat, if f () = Θ(1) grows logarithmically, if f () = O(log ) grows polylogarithmically, if f () = O(log k ()) for a k N. grows liearly, if f () = O() grows quadraticly, if f () = O( 2 ) We say af fuctio f :

Cocept: Amortized aalysis, by Kut Reiert, 18. Oktober 2010, 21:22 1007 grows polyomially, if f () = O( k ), for a k N. grows superpolyomially, if f () = ω( k ), k N. grows subexpoetially, if f () = o(2 c ), 0 < c R. grows expoetially, if f () = O(2 c ) for a 0 < c R. After that remider lets itroduce the differet ru time defiitios ad explai them usig a example. worst case aalysis: We assume for both, the iput ad the executio of the algorithm the worst case. The latter is of course oly applicable for o-determiistic algorithms. best case aalysis: We assume for both, the iput ad the executio of the algorithm the best case. The latter is of course oly applicable for o-determiistic algorithms. average case aalysis: We average over all possible iput the ru time of our (determiistic) algorithm. expected ru time aalysis: Our algorithm rus depedig o the value of some radom variables for which we kow their distributios. Hece we try to estimate the expected ru time of the algorithm. amortized aalysis: Sometimes, a algorithm (usually a operatio o a data structure) eeds a log time to ru, but chages the data structure such that subsequet operatios are ot costly. A worst case ru time aalysis would be iappropriate. A amortized aalysis averages over a series of operatios (ot over the iput). competitiveess aalysis: For olie algorithms we eed a ew cocept of ru time aalysis. The mai cocept is to compare the ru time a algorithm eeds i the worst case (i.e. forall possible iputs) ot kowig the iput, with the rutime of a optimal offlie algorithm (which kows the iput). The well-kow (determiistic) quicksort algorithm for sortig a array chooses a fixed elemet as its pivot elemet, lets say w.l.o.g. the first oe. It arrages all smaller elemets o the left of the pivot, all larger oes o the right ad recurses o the two halfs. worst case aalysis: I the worst case, the left (or the right) half are always empty. Hece the worst case ru time is the solutio to the recurrece f () = ( 1) + f ( 1),f (0) = 0. Obviously f = O( 2 ). best case aalysis: I the best case, the left ad the right half differ i size by at most oe. Hece the best case ru time is the solutio to the recurrece f () = ( 1) + 2 f (/2),f (0) = 0. Obviously f = O( log). average case aalysis: We average over all possible iputs the ru time of the determiistic quicksort. The result is that o average quicksort eeds O( log ) comparisos. The depedece o the iput ad the bad worst case ru time of quicksort are worrysome. Quicksort (like may other algorithms) ca be made cosiderably more robust by radomizig the algorithm. I radomized quicksort we choose the pivot elemet radomly usig the value of a radom variable uiformly distributed over [1, ]. expected ru time aalysis: radomized quicksort ca be show to ru i expected time O( log) with high probability. We will discuss such a aalysis ow i detail (ad a similar oe usig skiplists later i the lecture). Example: radomized quicksort

1008 Cocept: Amortized aalysis, by Kut Reiert, 18. Oktober 2010, 21:22 Quicksort is a perfect example to demostrate the power of radomizatio. First we create a radomized versio of quicksort called RadQS by simply ot choosig the first elemet as pivot elemet but a radom elemet of the sublist we have to sort i each recursive call. Hece RadQs is a Las-Vegas style radomized, divide-adcoquer algorithm. By doig so, we created a algorithm whose outcome depeds o radom choices, or put it differetly, o the values of some radom variables. Hece we have to aalyse its expected ru time or more particularly the expected umer of comparisos i a executio of RadQS. Lets do that. Let S (i) deote the elemet of rak i i the set S which we wat to sort. Now we defie the radom variable X ij to assume the value 1 if S (i) ad S (j) are compared i a executio of RadQS. The variable is 0 otherwise. So obviously the ru time of RadQs is proportioal to X = i=1 j>i X ij. The sum of radom variables is itself a radom variable. I the aalysis of the expected ru time we are hece iterested i E(X) = i=1 j>i E(X ij ). Let p ij be the probability that S (i) ad S (j) are compared i a executio of RadQS. The: E(X ij ) = p ij 1 + (1 p ij ) 0 = p ij So we have to cocetrate o the questio how large p ij is. To aalyze this we view the executio of RadQS as a biary tree T, i which each ode is labeled with a distict elemest y S. The elemets i the left subtree are the all to y, the elemets i the right subtree are all > tha y. Observe that the root of the tree is compared to all elemets i the tree, but there is o compariso betwee a elemet of the left subtree with a elemet of the right subtree. Hece two elemets S (i) ad S (j) are oly compared if oe is a acestor of the other. A i-order traversal of the tree output the elemets of S i sorted order. For the aalysis we focus o the levelorder traversal. This traversal goes level-by-level ad left-to-right ad yields a permutatio π of the elemets of S. Now we make two key observatios: 1. a compariso betwee S (i) ad S (j) if ad oly if S (i) or S (j) occurs earlier i π tha ay elemet of rak betwee S (i) ad S (j). (Why?) 2. Ay of the elemets S (i),s i+1,...,s (j) is equally likely to be the first oe i the executio of RadQS. Hece the probability of either S (i) or S (j) beig the first oe is exactly From these two observatios follows that p ij = 2 j 1+1. Usig this value i our computatio of E(X) yields: i=1 j>i p ij = 2 i=1 j>i i+1 i=1 k=1 2 j 1 + 1 2 k i=1 k=1 1 k 2 j 1+1. (1.7) (1.8) (1.9) = 2 H. (1.10) where H is the th harmoic umber. give that H l + Θ(1) it follows that the expected ru time of RadQS is O( log).

Cocept: Amortized aalysis, by Kut Reiert, 18. Oktober 2010, 21:22 1009 Comparig this to the worst case time of the determiistic Quicksort which is O( 2 ) shows how powerful a coi throw ca be. As a secod cocept i ru time aalysis we have a look at amortized aalysis. Ofte (o matter whether for radomized or determiistic algorithms) the worst case aalysis is ot appropriate, because the wrost case caot happe ofte. No matter what the iput is. If this is the case, the the ru time averaged over a series of operatios caot be equal to times the worst case ru time. Ideed, the averaged ru time will ofte be much better. It is importat to ote that this is differet form the average ru time aalysis. There oe averages over the distributio of iputs. So it could still be that a algorithm, preseted with a bad iput rus very slowly. Here, we average over all possible excutios of the algorithms for ay give iput. To make this distictio clear, this type of aalysis is called amortized aalysis. Cocept: Amortized aalysis Imagie for example a stack. We have the followig operatios o the stack Pop(S) pops the top elemet of the stack ad returs it. Push(S,x) pushed elemet x o the stack. MultiPop(S,k) returs at most the top k elemets from the stack (it calls Pop k times). Obviously the operatios Pop ad Push have worst case time O(1). However the operatio MultiPop ca be liear i the stack size. So if we assume that at most objects are o the stack a multipop operatio ca have worst case cost of O(). Hece i the worst case a series of stack operatios is bouded by O( 2 ). We will ow use this example to illustrate three differet techiques to fid a more realistic amortized boud for operatios. The three methods to coduct the aalysis which are: The aggregate method The accoutig method The potetial method The aggregate method Here we show that for all, a sequece of operatios takes worst-case time T () i total. Hece, for this worst case, the average cost, or amortized cost is T ()/. Note that this method charges the same amortized cost to each operatio i the sequece of operatios, eve if this sequece cotais differet types of operatios. The other two methods ca assig idividual amortized costs for each type of operatio. We argue as follows. I ay sequece of operatios o a iitially empty stack, each object ca be popped at most oce for each time it is pushed. Therefore, the umber of times Pop ca be called o a oempty stack (icludig calls withi MultiPop), is at most the umber of Push operatios, which is at most. Hece, for ay value of, ay sequece of Push, Pop, ad MultiPop operatios takes a total of O() time. Hece the amortized cost of ay operatio is O()/ = O(1). The accoutig method I the accoutig method we assig differig charges to differet operatios, with some operatios charge more or less tha they actually cost. The amout we charge a operatio is called its amortized cost.

1010 Cocept: Amortized aalysis, by Kut Reiert, 18. Oktober 2010, 21:22 Whe a operatio s amortized cost exceeds its actual cost, the differece is assiged to specific objects i the data structure as credit. Credit ca the be used to pay later for operatios whose amortized cost is less tha their actual cost. Oe must choose the amortized costs carefully. If we wat the aalysis with amortized costs to show that i the worst case the average cost per operatio is small, the the total amortized cost of a sequece of operatios must be a upper boud o the total actual cost of the sequece. Moreover, this relatioship must hold for all sequeces of operatios, thus the total credit must be oegative at all times. Let us retur to our stack example. The actual costs are: Pop(S) 1, Push(S,x) 1, MultiPop(S,k) mi(k,s), where s is the size of the stack. Let us assig the followig amortized costs. Pop(S) 0, Push(S,x) 2, MultiPop(S,k) 0. Note that the amortized costs of all operatios is O(1) ad hece the amortized cost of operatios is O(). Note also that the actual cost of MultiPop is variable whereas the amortized cost is costat. Usig the same argumet as i the aggregate method it is easy to see that our accout is always charged startig with a empty stack. Each Push operatio pays 2 credits. Oe for its ow cost ad oe for the cost of poppig the elemet off the stack, either through a ormal Pop or through a MultiPop operatio. Please ote that this method ca assig idividual, differet amortized costs to each operatio. The potetial method Istead of represetig prepaid work as credit stored with specific objects i the data structure, the potetial method represets the prepaid work as potetial eergy or simply potetial that ca be released to pay for future operatios. The potetial is associated with the data structure as a whole rather tha with specific objects withi the data structure. It works as follows. We start with a iitial data structure D 0, o which operatios are performed. For each i = 1,2,..., we let c i be the actual cost of the i-th operatio ad D i be the data structure that results after applyig the i-th operatio. A potetial fuctio Φ maps each data structure D i to a real umber Φ(D i ), which is the potetial associated with the data structure D i. The amortized cost ĉ i of the i-th operatio with respect to the potetial fuctio Φ is defied by ĉ i = c i + Φ(D i ) Φ(D i 1 ). that is, the amortized cost is the actual cost plus the icrease i potetial due to its operatio. Hece the total amortized cost of the operatios is: i=1 ĉ i = i=1 (c i + Φ(D i ) Φ(D i 1 )) = i=1 c i + Φ(D ) Φ(D 0 ). If we ca defie a potetial fuctio Φ so that Φ(D ) Φ(D 0 ), the the total amortized cost i=1 ĉi is a upper boud o the total actual cost. I practice we do ot kow how may operatios might be performed ad therefore

= k (1.12) Cocept: Amortized aalysis, by Kut Reiert, 18. Oktober 2010, 21:22 1011 guaratee the Φ(D i ) Φ(D 0 ), for all i. It is ofte coveiet to defie Φ(D 0 ) = 0 ad the show that all other potetials are o egative. Lets illustrate the method usig our stack example. We defie the potetial fuctio o the stack as the umber of elemets it cotais. Hece the empty stack D 0 has Φ(D 0 ) = 0. Sice the umber of objects o the stack is ever egative we have Φ(D i ) 0 for all stacks D i resultig after the i-th operatio. Let us ow compute the amortized costs of the various stack operatios. If the i-th operatio o a stack cotaiig s objects is a Push operatio, the the potetial differece is Φ(D i ) Φ(D i 1 ) = (s + 1) s = 1 (1.11) Hece the amortized cost is: ĉ i = c i + Φ(D i ) Φ(D i 1 ) = 1 + 1 = 2. If the i-th operatio o a stack cotaiig s objects is a MultiPop(S,k) the k = mi(k,s) objects are popped off the stack. The actual cost of the operatio is k ad the potetial differece is: Φ(D i ) Φ(D i 1 ) = (s k ) s Hece the amortized cost is: ĉ i = c i + Φ(D i ) Φ(D i 1 ) = k k = 0. A similar result is obtaied for Pop. This shows that the amortized cost of each operatio is O(1) ad hece the total amortized cost of a sequece of operatios is O().