Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please put multiple figures o oe page where possible) Task list givig which tasks were performed by which group members. This should be siged by all members of the group. 3 Outlie Project Deliverables Each idividual should tur i the followig: Admiistrative Master Theorem Search Egies A evaluatio of the cotributio to the group project of every member of your group o a scale of 1(poor) to 10(excellet). To do this, write dow the ame of each member of your group (beside yourself), ad put a umber betwee 1 ad 10 ext to each ame. Please be hoest ad professioal i your evaluatio. Your evaluatio will be oe factor used to determie the project grade for each member of your group. 1 4 Admiistrative Project Commets The project is due this Thursday i class. This deadlie is strict - late projects will recieve o credit. (A partially completed project tured i o time will get some credit but a complete project tured i late will get o credit) To give you more time to work o the project, I m ot goig to have a hw due o Thursday Before you tur i the project: Reread The Top Project Mistakes sectio i the project descriptio sectio o the course web page You will loose poits if you make these same mistakes 2 5
Master Method Last I-Class Exercise The recurrece T () = at (/b) + f() ca be solved as follows: If a f(/b) f()/k for some costat K > 1, the T () = Θ(f()). If a f(/b) K f() for some costat K > 1, the T () = Θ( log b a ). If a f(/b) = f(), the T () = Θ(f() log b ). Cosider the recurrece: T () = 2T (/4) + log If we write this as T () = at (/b) + f(), the a = 2,b = 4,f() = log f() = log ad a f(/b) = /2 log(/4) a f(/b) is a costat factor smaller tha f(), so the root ode domiates. Thus the solutio is T () = Θ( log ) 6 9 Proof I-Class Exercise If f() is a costat factor larger tha a f(b/), the the sum is a descedig geometric series. The sum of ay geometric series is a costat times its largest term. I this case, the largest term is the first term f(). If f() is a costat factor smaller tha a f(b/), the the sum is a ascedig geometric series. The sum of ay geometric series is a costat times its largest term. I this case, this is the last term, which by our earlier argumet is Θ( log b a ). Fially, if a f(b/) = f(), the each of the L terms i the summatio is equal to f(). Cosider the recurrece: T () = 4T (/2) + 2 Q: What is f() ad a f(/b)? Q: Which of the three cases does the recurrece fall uder (whe is large)? Q: What is the solutio to this recurrece? 7 10 Example Limits of Master Theorem T () = T (/2) + log If we write this as T () = at (/b) + f(), the a = 1,b = 2,f() = log Here a f(/b) = /2 log /2 is smaller tha f() = log by a costat factor, so T () = Θ( log ) Cosider the recurrece: T () = 2T (/2) + / log We ca t apply the master thm here, because a f(/b) = /(log 1) is t equal to / lg, but the differece is t a costat factor We ca use the recursio tree method 8 11
Limits of Master Theorem Example Algorithmic Applicatio Cosider the recurrece: T () = 2T (/2) + / log Not hard to see that the sum of the i-th level is /(log i) Depth of tree is log T () = log log i i=0 = log j j=1 = log 1 j j=1 (1) (2) (3) = Θ( log log ) (4) Moder search egies are begiig to use algorithmic tools Google ad Clever are two good search egies that use algorithmic techiques Their rakig algorithms are based o liear algebra (they make use of the eigevectors of a certai type of matrix) These algorithms could ot have bee developed without rigorous mathematical aalysis 12 15 Take Away The Two Types of Search Queries The Master Theorem is a fast way to solve certai types of recurreces However it is ot as powerful as the recursio tree method Whe you see a appropriate recurrece, first try the Master Theorem, if this does t work, try recursio tree method or aihilators. Two types of search egie queries Specific Queries: E.g. Does Netscape support the JDK 1.1 codesigig API? Geeral Queries: E.g. Fid Iformatio about Java 13 16 The Deep Ed of the Sadbox The Abudace Problem That s the deep ed of the sadbox. I do t like to go there. A lephrechau lives there ad he tells me to bur thigs - Ralph from the Simpsos For Geeral Queries, we have Beyod 361: The Abudace Problem: The umber of relevat pages is far too large for a huma to digest. CS461: More advaced data structures ad algorithms, differet classes of algorithms (greedy, dyamic programmig), advaced mathematical aalysis of algorithms (amortizatio, proofs of correctess, graph theory) Approximatio Algorithms, Radomized Algorithms, Computatioal Geometry, Computatioal Biology, Graph Theoretic Algorithms, Cryptography, Olie Algorithms, Experimetal Algorithms, Smoothed Aalysis, Complexity Theory, etc, etc. 14 More specifically: There are ow hudreds of millios of pages o the web For most search queries, the umber of pages that cotai the query words is o the order of thousads A perso typically wats to look at less tha five pages 17
What we wat Crash Course i Graph Theory A Graph is: Oe way to hadle the abudace problem: Filter out from the huge set of relevat pages, those few pages which are most authoritative or defiitive To do this, we eed a otio of what makes a page authoritative A set of odes, V, ad A set of edges, E, where each edge is directed from some ode i V to some other ode i V Note: A tree is a just a graph without cycles There are may useful algorithms kow for graphs, so it s useful to frame problems i terms of Graph Theory where possible. 18 21 A Example Problem The Web as a Graph If we search for all pages cotaiig the word Microsoft we get back over a millio pages. There is othig about cotet of the page www.microsoft.com that makes it stad out. This page does t use the term microsoft most frequetly. This page does t use the term microsoft most promietly. There is othig completely uique i the textual cotet of the page that idetifies it as beig authoritative. What is it that makes the page www.microsoft.com most authoritative? Let each page i the web, be a ode of a graph Let there be a edge from ode x to ode y iff there is a lik from page x to page y This formulatio allows us to brig lots of great tools of graph theory to bear o the problem of lik aalysis. 19 22 Lik Aalysis Popularity ad Relevace Lik Structure is oe exteral measure of authority of a page If page x liks to page y, x is coferrig authority o page y There are some potetial pitfalls here Liks do t always cofer authority (e.g. they could be for avigatio or to ads) There is a balace betwee popularity ad relevace Cosider the simple heuristic for returig authoritative pages: of all pages cotaiig the search term, retur those which have the greatest umber of i-liks. This heuristic fails May authoritative pages for a search term do ot cotai that term. Examples: seach egies, automobile maufactures May pages which are ot authoritative for a give term would be icorrectly idetified. Examples: www.yahoo.com ad www.microsoft.com would be cosidered authoritative for ay search term these pages cotaied. 20 23
The Clever Approach Key observatio: Certai pages poit maily to sites that are authoritative. These pages are called good Hubs Examples: www.yahoo.com Jared s Big Page Of Liks To Cool Sites Questio: But the how do we defie what makes a page a good Hub? The Clever algorithm ad Google solve this problem 24 The Clever Approach We use a mutually-recursive defiitio of the otio of Authoritative ad Hub-like Good Hub pages poit to may good Authoritative pages Good Authoritative pages are poited to by may good Hub pages 25 The Clever Algorithm High Level Idea: Clever collects all pages cotaiig the search term, or pages that are eighbors of pages cotaiig the search term. It the iteratively propogates authority scores ad hub scores All pages start with equal authority ad hub scores. I each step: The pages with hub weight cofer authority weight to the pages they poit to. The pages with authority weight cofer hub weight to the pages that poit to them 26