The cienific mehod i eenial in applicaion of compuaion A peronal opinion formed on he bai of decade of experience a a The Role of Science and Mahemaic in Sofware Developmen CS educaor auhor algorihm deigner Rober Sedgewick Princeon Univeriy ofware engineer Silicon Valley conribuor CS reearcher Peronal opinion or unpoken conenu? Unforunae fac Many cieni lack baic knowledge of compuer cience One way o addre he iuaion Teach he ame coure o all cience/engineering uden wwwcprinceonedu/inroc Many compuer cieni lack back knowledge of cience 1970: Wan o ue he compuer? Take inro CS 2000: Inro CS coure relevan only o fuure cubicle-dweller All uden learn he imporance of One way o addre he iuaion modern programming model idenify fundamenal he cienific mehod in underanding program behavior each hem o all uden who need o know hem fundamenal precep of compuer cience compuaion in a broad variey of applicaion a early a poible preparing for a lifeime of engaging wih compuaion
Science/engineering uden a Princeon ake he ame inro CS coure, mo in he fir year modern programming model Baic conrol rucure Sandard inpu and oupu ream Drawing, image and ound Daa abracion Ue any compuer, and he web relevan CS concep Applicaion programming Underanding of he co Fundamenal daa ype Compuer archiecure Compuabiliy and Inracabiliy Example and aignmen ue familiar eay-o-moivae applicaion Ideal programming example/aignmen eache a baic CS concep olve an imporan problem appeal o uden inellecual inere illurae modular programming Bouncing ball imulaion i eay Bouncing ball N-body Boe-Einein Goal demyify compuer yem empower uden o exploi compuaion build awarene of inellecual underpinning of CS OOP i helpful daa-driven program are ueful efficien algorihm are neceary Underlying meage: performance maer in a large number of inereing applicaion The cienific mehod i eenial in underanding program performance Simple fac: quadraic algorihm are uele in modern applicaion million or billion of inpu 10 12 nanoecond i 15+ minue 10 18 nanoecond i 31+ year Web commerce Boe-Einein model Sring maching for genomic Naural language analyi N-body problem [ long li ] Scienific mehod creae a model decribing naural world ue model o develop hypohee run experimen o validae hypohee refine model and repea model hypohei experimen Simple e: Doubling hypohei Perform experimen, meaure T(N) and T(2N) if T(2N)/T(N) ~ 4, need anoher algorihm Leon: 1 Efficien algorihm enable oluion of problem ha could no oherwie be addreed 2 Scienific mehod i eenial in underanding program performance Imporan leon for beginner ofware engineer cieni [everyone] 1950: ue cienific mehod 2000: ue cienific mehod? Algorihm deigner who doe no experimen ge lo in abracion Sofware developer who ignore co rik caarophic conequence
Preliminary hypohei (need checking) Modern ofware require huge amoun of code Preliminary hypohei (need checking) Modern ofware developmen require huge amoun of code bu performance-criical code implemen relaively few fundamenal algorihm Warmup: random number generaion Problem: wrie a program o generae random number model: claical probabiliy and aiic hypohei: frequency value hould be uniform weak experimen: generae random number check for uniform frequencie beer experimen: generae random number ue 2 e o check frequency value again uniform diribuion beer hypohee/experimen ill needed many documened diaer acive area of cienific reearch applicaion: imulaion, crypography connec o core iue in heory of compuaion model hypohei experimen in k = 0; while ( rue ) Syemouprin(k++ % V); V = 10 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 random? in k = 0; while ( rue ) { k = k*1664525 + 1013904223); Syemouprin(k % V); exbook algorihm ha flunk 2 e Warmup (coninued) Q I a given equence of number random? A No Q Doe a given equence exhibi ome propery ha random number equence exhibi? Birhday paradox Average coun of random number generaed unil a duplicae happen i abou Example of a beer experimen: generae number unil duplicae check ha coun i cloe o V/2 V/2 V = 365 even beer: repea many ime, check again diribuion ill beer: run many imilar e for oher properie Anyone who conider arihmeical mehod of producing random digi i, of coure, in a ae of in John von Neumann average probe unil duplicae i abou 24
Deailed example: pah in graph A lecure wihin a lecure Finding an -pah in a graph i a fundamenal operaion ha demand underanding Ground rule for hi alk work in progre (more queion han anwer) baic reearch ave deep dive for he righ problem Applicaion graph-baed opimizaion model nework percolaion compuer viion ocial nework (many more) Baic reearch fundamenal abrac operaion wih numerou applicaion worh doing even if no immediae applicaion rei empaion o premaurely udy impac : maxflow Ford-Fulkeron maxflow cheme find any - pah in a (reidual) graph augmen flow along pah (may creae or delee edge) ierae unil no pah exi : max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer number of verice V number of edge E maximum capaciy C How many augmening pah? Goal: compare performance of wo baic implemenaion hore augmening pah maximum capaciy augmening pah Key ep in analyi reearch lieraure How many augmening pah? Wha i he co of finding each pah? hi alk hore max capaciy wor cae upper bound VE/2 VC 2E lg C How many ep o find each pah? E (wor-cae upper bound)
: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah : max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer for example graph number of verice V = 177 number of edge E = 2000 maximum capaciy C = 100 Graph parameer for example graph number of verice V = 177 number of edge E = 2000 maximum capaciy C = 100 How many augmening pah? How many augmening pah? wor cae upper bound for example wor cae upper bound for example acual hore VE/2 VC 177,000 17,700 hore VE/2 VC 177,000 17,700 37 max capaciy 2E lg C 26,575 max capaciy 2E lg C 26,575 7 How many ep o find each pah? 2000 (wor-cae upper bound) How many ep o find each pah? < 20, on average oal i a facor of 1 million high for houand-node graph! : max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer number of verice V number of edge E maximum capaciy C Toal number of ep? hore max capaciy wor cae upper bound VE 2 /2 VEC 2E 2 lg C WARNING: The Algorihm General ha deermined ha uing uch reul o predic performance or o compare algorihm may be hazardou : leon Goal of algorihm analyi predic performance (running ime) guaranee ha co i below pecified bound Common widom random graph model are unrealiic average-cae analyi of algorihm i oo difficul wor-cae performance bound are he andard Unforunae ruh abou wor-cae bound ofen uele for predicion (ficional) ofen uele for guaranee (oo high) ofen miued o compare algorihm Bound are ueful in ome applicaion: wor-cae bound which one?? Open problem: Do beer! acual co
Surely, we can do beer An acual exchange wih a heoreical compuer cieni: Finding an -pah in a graph i a baic operaion in a grea many applicaion Q Wha i he be way o find an -pah in a graph? Algorihm A i bad TCS (in a alk): Google hould be inereed in my new Algorihm B Wha he maer wih Algorihm A? RS: I i no opimal I ha an exra O(log log N) facor TCS: Bu Algorihm B i very complicaed, lg lg N i le han 6 in hi univere, and ha i ju an upper bound Algorihm A i cerainly going o run 10 o 100 ime faer in any conceivable real-world iuaion Why hould Google care abou Algorihm B? RS: A Several well-udied exbook algorihm are known Breadh-fir earch (BFS) find he hore pah Deph-fir earch (DFS) i eay o implemen Union-Find (UF) need wo pae BUT all hree proce all E edge in he wor cae divere kind of graph are encounered in pracice Wor-cae analyi i uele for predicing performance Which baic algorihm hould a praciioner ue? TCS:?? Well, I like i I don care abou Google Algorihm performance depend on he graph model Applicaion of grid graph conduciviy concree complee random grid neighbor mall-world granular maerial porou media Example polymer 1: Percolaion widely-udied model few anwer from analyi arbirarily huge graph fore fire epidemic Inerne (many appropriae candidae) Iniial choice: grid graph ufficienly challenging o be inereing found in pracice (or imilar o graph found in pracice) calable poenial for analyi Ex: eay o find hor pah quickly wih A* in geomeric graph (ay uned) Ground rule algorihm hould work for all graph algorihm hould no ue any pecial properie of he model reior nework evoluion ocial influence Fermi paradox fracal geomery ereo viion image reoraion objec egmenaion cene reconrucion Example 2: Image proceing model pixel in image DFS, maxflow/mincu, and oher alg huge graph
Finding an -pah in a grid graph M by M grid of verice undireced edge connecing each verex o i HV neighbor ource verex a cener of op boundary deinaion verex a cener of boom boundary Find any pah connecing o M 2 verice abou 2M 2 edge Co meaure: number of graph edge examined M verice edge 7 49 84 15 225 420 31 961 1860 63 3969 7812 127 16129 32004 255 65025 129540 511 261121 521220 Finding an -pah in a grid graph Similar problem are covered exenively in he lieraure Percolaion Random walk Nonelfinerecing pah in grid Graph covering Elemenary algorihm are found in exbook Deph-fir earch (DFS) Breadh-fir earch (BFS) Union-find Which baic algorihm hould a praciioner ue o find a pah in a grid-like graph? Lieraure i no help, o Implemen elemenary algorihm Ue cienific mehod o udy performance?? Daa abracion a modern ool o eparae clien from implemenaion Implemening a GRAPH daa ype i an exercie in ofware engineering A daa ype i a e of value and he operaion performed on hem An abrac daa ype (ADT) i a daa ype whoe repreenaion i hidden An applicaion programming inerface (API) i a pecificaion Sample deign paern (for hi alk) GRAPH API public cla GRAPH GRAPH(Edge[] a) void findpah(in, in ) in (in v) Verice are ineger in [0, V) Edge are verex pair conruc a GRAPH from an array of edge conduc a earch from o reurn predeceor of v on pah found Clien Inerface Implemenaion invoke operaion API pecifie how o code ha implemen invoke operaion operaion Implemenaion hould no be ailored o paricular clien Develop implemenaion ha work properly for all clien Sudy heir performance for he clien a hand Clien code for grid graph in e = 0; Edge[] a = new Edge[E]; for (in i = 0; i < V; i++) { if (i < V-M) a[e++] = new Edge(i, i+m); if (i >= M) a[e++] = new Edge(i, i-m); if ((i+1) % M!= 0) a[e++] = new Edge(i, i+1); if (i % M!= 0) a[e++] = new Edge(i, i-1); GRAPH G = new GRAPH(a); GfindPah(V-1-M/2, M/2); for (in k = ; k!= ; k = G(k)) Syemouprinln( + - + ); M = 5 20 21 22 23 24 15 16 17 18 19 10 11 12 13 14 5 6 7 8 9 0 1 2 3 4
Three andard way o find a pah Deph-fir earch (DFS): recurive (ack-baed) earch Breadh-fir earch (BFS): queue-baed hore-pah earch Union-find (UF): ue claic e-equivalence algorihm DFS BFS UF Deph-fir earch: a andard implemenaion GRAPH conrucor code for (in k = 0; k < E; k++) { in v = a[k]v, w = a[k]w; adj[v] = new Node(w, adj[v]); adj[w] = new Node(v, adj[w]); graph repreenaion verex-indexed array of linked li wo node per edge DFS() pu on Q while Q i nonempy DFS(v):!! ge x from Q done if v = done if x = if v unmarked for each v adj o x mark v if v unmarked DFS(v) pu v on Q mark v Fir ep: Implemen GRAPH uing each algorihm for each edge u-v union (u, v) done if and are in he ame e run DFS or BFS on e conaining and DFS implemenaion (code o ave pah omied) void findpahr(in, in ) { if ( == ) reurn; viied() = rue; for(node x = adj[]; x!= null; x = xnex) if (!viied[xv]) findpahr(xv, ); void findpah(in, in ) { viied = new boolean[v]; earchr(, ); 4 7 4 6 7 8 3 4 5 0 1 2 7 Baic flaw in andard DFS cheme co rongly depend on arbirary deciion in clien (!!) for (in i = 0; i < V; i++) { if ((i+1) % M!= 0) a[e++] = new Edge(i, i+1); if (i % M!= 0) a[e++] = new Edge(i, i-1); if (i < V-M) a[e++] = new Edge(i, i+m); if (i >= M) a[e++] = new Edge(i, i-m); we, ea, norh, ouh ouh, norh, ea, we ~E/2 ~E 1/2 order of hee aemen deermine order in li order in li ha draic effec on running ime bad new for ANY graph model Addreing he baic flaw Advie he clien o randomize he edge? no, very poor ofware engineering lead o nonrandom edge li (!) Randomize each edge li before ue? no, may no need he whole li Soluion: Ue a randomized ieraor andard ieraor in N = adj[x]lengh; for(in i = 0; i < N; i++) { proce verex adj[x][i]; randomized ieraor in N = adj[x]lengh; for(in i = 0; i < N; i++) { exch(adj[x], i, i + (in) Mahrandom()*(N-i)); proce verex adj[x][i]; exchange random verex from adj[x][in-1] wih adj[x][i] x i N x x repreen graph wih array, no li i i N
Ue of randomized ieraor urn every graph algorihm ino a randomized algorihm Imporan pracical effec: abilize algorihm performance co depend on problem no i repreenaion (Revied) andard DFS implemenaion graph ADT conrucor code for (in k = 0; k < E; k++) { in v = a[k]v, w = a[k]w; adj[v][deg[v]++] = w; adj[w][deg[w]++] = v; graph repreenaion verex-indexed array of variablelengh array Yield well-defined and fundamenal analyic problem Average-cae analyi of algorihm X for graph family Y(N)? Diribuion? Full employmen for algorihm analy DFS implemenaion (code o ave pah omied) void findpahr(in, in ) { in N = adj[]lengh; if ( == ) reurn; viied() = rue; for(in i = 0; i < N; i++) { in v = exch(adj[], i, i+(in) Mahrandom()*(N-i)); if (!viied[v]) earchr(v, ); void findpah(in, in ) { viied = new boolean[v]; findpahr(, ); 4 7 4 6 7 8 3 4 5 0 1 2 7 BFS: andard implemenaion Ue a queue o hold fringe verice pu on Q while Q i nonempy!! ge x from Q done if x = for each unmarked v adj o x pu v on Q mark v ree verex fringe verex uneen verex Animaion give inuiion on performance and ugge hypohee o verify wih experimenaion Aide: Are you uing animaion like hi regularly? Why no? void findpah(in, in ) FIFO queue for BFS { Queue Q = new Queue(); Qpu(); viied[] = rue; while (!Qempy()) { in x = Qge(); in N = adj[x]lengh; if (x == ) reurn; randomized ieraor for (in i = 0; i < N; i++) { in v = exch(adj[x], i, i + (in) Mahrandom()*(N-i)); if (!viied[v]) { Qpu(v); viied[v] = rue; BFS DFS UF (code omied) Generalized graph earch: oher queue yield DFS, A* and oher algorihm
Experimenal reul how ha DFS i faer han BFS and UF on he average A faer algorihm for finding an -pah in a graph M V E BFS DFS UF 7 49 168 075 032 105 15 225 840 075 045 102 Analyic proof? 31 961 3720 075 036 114 63 3969 15624 075 032 105 Faer algorihm available? 127 16129 64008 075 040 099 255 65025 259080 075 042 108 BFS DFS UF Ue wo deph-fir earche one from he ource one from he deinaion inerleave he wo M V E BFS DFS UF wo 7 49 168 075 032 105 018 15 225 840 075 045 102 013 31 961 3720 075 036 114 015 63 3969 15624 075 032 105 014 127 16129 64008 075 040 099 013 255 65025 259080 075 042 108 012 Examine 13% of he edge 3-8 ime faer han andard implemenaion No bad (bu ill apparenly linear) Are oher approache faer? Oher earch algorihm randomized? farhe-fir? Muliple earche? inerleaving raegy? merge raegy? how many? which algorihm? Hybrid algorihm which combinaion? probabiliic rear? merge raegy? randomized choice? Experimen wih oher approache Randomized earch ue random queue in BFS eay o implemen Reul: no much differen from BFS Muliple earcher ue N earcher one from he ource one from he deinaion N-2 from random verice Addiional facor of 2 for N>2 Reul: no much help anyway 140 70 BFS Beer han conan-facor improvemen poible? Proof? Be mehod found (by far): DFS wih 2 earcher 40 12 DFS 1 2 3 4 5 10 20
Hamle John Gielguld Enigma Murder on he Orien Expre Eernal Sunhine of he Spole Mind Caligola Vernon Dobcheff Glenn Cloe Porrai of a Lady Jude Kae Winle Cold Mounain An American Hauning The Sepford Wive Nicole Kidman The Woodman Wild Thing John Beluhi Meryl Sreep Parick Allen The Eagle ha Landed Donald Suherland Animal Houe Kevin Bacon The River Wild Tianic Dial M for Murder To Cach a Thief Kahleen Quinlan Apollo 13 Bill Paxon Paul Herber Yve Auber Grace Kelly The Da Vinci Code High Noon Lloyd Bridge Joe Veru he Volcano Tom Hank Shane Zaza Audrey Tauou Small-world graph are a widely udied graph model wih many applicaion Small-world graph Applicaion of mall-world graph Small-world graph A mall-world graph ha large number of verice low average verex degree (pare) low average pah lengh local cluering Example: Add random edge o grid graph Add random edge o any pare graph wih local cluering Many cienific model Q How do we find an -pah in a mall-world graph? ocial nework airline road neurobiology evoluion ocial influence proein ineracion percolaion inerne elecric power grid poliical rend Example 1: Social nework infeciou dieae exenive imulaion ome analyic reul huge graph Example 2: Proein ineracion mall-world model naural proce experimenal validaion A iny porion of he movie-performer relaionhip graph Finding a pah in a mall-world graph i a heavily udied problem Small-world graph Finding a pah in a mall-world graph i much eaier han finding a pah in a grid graph Small-world graph Milgram experimen (1960) Conjecure: Two-way DFS find a hor -pah in ublinear ime in any mall-world graph Small-world graph model Random (many varian) Wa-Srogaz Kleinberg add V random horcu o grid graph and oher A* ue ~ log E ep o find a pah Evidence in favor 1 Experimen on many graph 2 Proof kech for grid graph wih V horcu ep 1: 2 E 1/2 ep ~ 2 V 1/2 random verice ep 2: like birhday paradox How doe 2-way DFS do in hi model? no change a all in graph code ju a differen graph model Pah lengh? wo e of 2V 1/2 randomly choen verice are highly unlikely o be dijoin Experimen: add M ~ E 1/2 random edge o an M-by-M grid graph ue 2-way DFS o find pah Surpriing reul: Find hor pah in ~ E 1/2 ep! Muliple earcher reviied? Nex ep: refine model, more experimen, deailed proof
Deailed example: pah in graph End of lecure-wihin-a-lecure Concluding remark on he role of mahemaic in underanding performance Worriome poin Complicaed mahemaic eem o be needed for model Do all programmer need o know he mah? Good new Many people are working on he problem Simple univeral underlying model are emerging Leon We know much le abou graph algorihm han you migh hink The cienific mehod i eenial in underanding performance Appropriae mahemaical model are eenial for cienific udie of program behavior Analyic Combinaoric i a modern bai for udying dicree rucure Pioneering work by Don Knuh Developed by Philippe Flajole and many coauhor baed on claical combinaoric and analyi Analyic Combinaoric Philippe Flajole Rober Sedgewick Coming in 2008, now available on he web Cambridge Univeriy Pre Large and acive analyi of algorihm reearch communiy i acively udying model and mehod Cauion: No all mahemaical model are appropriae! Example (from beginning of alk): O-noaion in he heory of algorihm hide deail of implemenaion ake inpu ou by doing wor-cae ueful for claifying algorihm and complexiy clae no a all ueful for predicing or comparing performance Generaing funcion (GF) encapulae equence Symbolic mehod rea GF a formal objec formal definiion of combinaorial conrucion direc aociaion wih generaing funcion Complex aympoic rea GF a funcion in he complex plane Sudy hem wih ingulariy analyi and oher echnique Accuraely approximae original equence
Analyi of algorihm: claic example A binary ree i a node conneced o wo binary ree How many binary ree wih N node? : claic example A ree i a node conneced o a equence of ree How many ree wih N node? Given a recurrence relaion inroduce a generaing funcion B N = B 0 B N-1 ++ B k B N-1-k ++ B N-1 B 0 B(z) $ B 0 z 0 + B 1 z 1 + B 2 z 2 + B 3 z 3 + Combinaorial conrucion <G> = " + <G> + <G> <G> + <G> <G> <G> + muliply boh ide by z N and um o ge an equaion B(z) = 1 + z B(z) 2 direcly map o GF G(z) = 1 + G(z) + G(z) 2 + G(z) 3 + ha we can olve algebraically and expand o ge coefficien B(z) = 1-1 - 4z 2z B N = 1 ( 2N ) N+1 N Quadraic equaion Binomial heorem ha we can manipulae algebraically G(z) = 1-1 - 4z 2 by quadraic equaion 1 ince G(z) =, 1 - G(z) o G(z) 2 - G(z) + z = 0 ha we can approximae B N! Baic challenge: need a new derivaion for each problem N 4 N "N Sirling approximaion Appear in birhday paradox (and counle oher problem) Coincidence? and rea a a complex funcion o approximae growh G N! 4 N 4 N = 2N #(½) N 2N "N N Fir principle: locaion of ingulariy deermine exponenial growh Second principle: naure of ingulariy deermine ubexponenial facor : ingulariy analyi i a key o exracing coefficien aympoic Exponenial growh facor depend on locaion of dominan ingulariy i eaily exraced Ex: [z N ](1 - bz) c = b N [z N ](1 - z) c Combinaorial conrucion : univeral law of weeping generaliy derive from he ame echnology Ex Conex free conrucion < G 0 > = OP 0 (< G 0 >, < G 1 >,, < G >) < G 1 > = OP 1 (< G 0 >, < G 1 >,, < G >) < G > = OP (< G 0 >, < G 1 >,, < G >) like conex-free language (or Java daa ype) Ex: Polynomial growh facor depend on naure of dominan ingulariy can ofen be compued via conour inegraion 1! " z N+1 [z N ](1 - z) c (1 - z) = c dz 2"i C 1! " z N+1 ~ (1 - z) c dz 2"i H 1 ~ #(c)n c+1 Cauchy coefficien formula Hankel conour many deail omied! C H direcly map o a yem of GF ha we can manipulae algebraically o ge a ingle complex funcion ha i amenable o ingulariy analyi G 0 (z) = F 0 ( G 0 (z), G 1 (z), G (z)) G 1 (z) = F 1 ( G 0 (z), G 1 (z), G (z)) G (z) = F ( G 0 (z), G 1 (z), G (z)) G(z) $ G 0 (z) = F( G 0 (z), G (z))! (1 - z ) -c Groebner-bai eliminaion G N! a b N N c for any conex-free conrucion! Good new: Several uch law have been dicovered Drmoa-Lalley-Wood Beer new: Diribuion alo available (ypically normal, mall igma)
A general hypohei from analyic combinaoric The running ime of your program i ~ a b N N c (lg N) d he conan a depend on boh complex funcion and properie of machine and implemenaion he exponenial growh facor b hould be 1 he exponen c depend on ingulariie he log facor d i reconciled in deailed udie Final remark Wriing a program wihou underanding performance i like no knowing where a rocke will go no knowing he rengh of a bridge Why? daa rucure evolve from combinaorial conrucion univeral law from analyic combinaoric have hi form To compue value: lg(t(2n)/t(n) # c he doubling e ha we each o beginner! T(N)/b N N c # a Pleny of cavea, bu provide a bai for udying program performance We need o no knowing he doage of a drug each he cienific mehod hroughou he curriculum ue he cienific mehod whenever developing ofware do he reearch neceary o develop underlying model The Role of Science and Mahemaic in Sofware Developmen Rober Sedgewick Princeon Univeriy