The Role of he Scienific Mehod in Sofware Developmen Rober Sedgewick Princeon Univeriy
The cienific mehod i neceary in algorihm deign and ofware developmen Scienific mehod creae a model decribing naural world ue model o develop hypohee run experimen o validae hypohee refine model and repea model hypohei experimen 1950 2000 Algorihm deigner who doe no experimen ge lo in abracion Sofware developer who ignore co rik caarophic conequence
Fir hypohei (need checking) Modern ofware developmen require huge amoun of code
Fir hypohei (need checking) Modern ofware developmen require huge amoun of code bu performance-criical code implemen relaively few fundamenal algorihm
Warmup: random number generaion Problem: wrie a program o generae random number model: claical probabiliy and aiic hypohei: frequency value hould be uniform weak experimen: generae random number check for uniform frequencie model hypohei experimen beer experimen: generae random number ue x 2 e o check frequency value again uniform diribuion beer hypohee/experimen ill needed many documened diaer acive area of cienific reearch applicaion: imulaion, crypography in k = 0; V = 10 connec o core iue in heory of compuaion while ( rue ) Syem.ou.prin(k++ % V); in k = 0; random? while ( rue ) { } 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7... k = k*1664525 + 1013904223); Syem.ou.prin(k % V); exbook algorihm ha flunk x 2 e
Warmup (coninued) Q. I a given equence of number random? A. No. average probe unil duplicae i abou 24 Q. Doe a given equence exhibi ome propery ha random number equence exhibi? V = 365 Birhday paradox Average coun of random number generaed unil a duplicae happen i abou pv/2 Example of a beer experimen: generae number unil duplicae check ha coun i cloe o pv/2 even beer: repea many ime, check again diribuion ill beer: run many imilar e for oher properie Anyone who conider arihmeical mehod of producing random digi i, of coure, in a ae of in John von Neumann
Deailed example: pah in graph A lecure wihin a lecure
Finding an -pah in a graph i a fundamenal operaion ha demand underanding Ground rule for hi alk work in progre (more queion han anwer) baic reearch ave deep dive for he righ problem Applicaion graph-baed opimizaion model nework percolaion compuer viion ocial nework (many more) Baic reearch fundamenal abrac operaion wih numerou applicaion worh doing even if no immediae applicaion rei empaion o premaurely udy impac
: maxflow Ford-Fulkeron maxflow cheme find any - pah in a (reidual) graph augmen flow along pah (may creae or delee edge) ierae unil no pah exi Goal: compare performance of wo baic implemenaion hore augmening pah maximum capaciy augmening pah Key ep in analyi How many augmening pah? Wha i he co of finding each pah? reearch lieraure hi alk
: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer number of verice V number of edge E maximum capaciy C How many augmening pah? hore max capaciy wor cae upper bound VE/2 VC 2E lg C How many ep o find each pah? E (wor-cae upper bound)
: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer for example graph number of verice V = 177 number of edge E = 2000 maximum capaciy C = 100 How many augmening pah? hore wor cae upper bound VE/2 VC for example 177,000 17,700 max capaciy 2E lg C 26,575 How many ep o find each pah? 2000 (wor-cae upper bound)
: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer for example graph number of verice V = 177 number of edge E = 2000 maximum capaciy C = 100 How many augmening pah? wor cae upper bound for example acual hore VE/2 VC 177,000 17,700 37 max capaciy 2E lg C 26,575 7 How many ep o find each pah? < 20, on average oal i a facor of a million high for houand-node graph!
: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer number of verice V number of edge E maximum capaciy C Toal number of ep? hore max capaciy wor cae upper bound VE 2 /2 VEC 2E 2 lg C WARNING: The Algorihm General ha deermined ha uing uch reul o predic performance or o compare algorihm may be hazardou.
: leon Goal of algorihm analyi predic performance (running ime) guaranee ha co i below pecified bound wor-cae bound Common widom random graph model are unrealiic average-cae analyi of algorihm i oo difficul wor-cae performance bound are he andard Unforunae ruh abou wor-cae bound ofen uele for predicion (ficional) ofen uele for guaranee (oo high) ofen miued o compare algorihm Bound are ueful in many applicaion: which one?? Open problem: Do beer! acual co
Finding an -pah in a graph i a baic operaion in a grea many applicaion Q. Wha i he be way o find an -pah in a graph? A. Several well-udied exbook algorihm are known Breadh-fir earch (BFS) find he hore pah Deph-fir earch (DFS) i eay o implemen Union-Find (UF) need wo pae BUT all hree proce all E edge in he wor cae divere kind of graph are encounered in pracice Wor-cae analyi i uele for predicing performance Which baic algorihm hould a praciioner ue???
Algorihm performance depend on he graph model complee random grid neighbor mall-world Iniial choice: grid graph ufficienly challenging o be inereing found in pracice (or imilar o graph found in pracice) calable poenial for analyi Ground rule algorihm hould work for all graph... (many appropriae candidae) if verice have poiion we can find hor pah quickly wih A* (ay uned) algorihm hould no ue any pecial properie of he model
Applicaion of grid graph conduciviy concree granular maerial porou media polymer fore fire epidemic Inerne reior nework evoluion ocial influence Fermi paradox fracal geomery ereo viion image reoraion objec egmenaion cene reconrucion... Example 1: Saiical phyic percolaion model exenive imulaion ome analyic reul arbirarily huge graph Example 2: Image proceing model pixel in image maxflow/mincu energy minimizaion huge graph
Finding an -pah in a grid graph M by M grid of verice undireced edge connecing each verex o i HV neighbor ource verex a cener of op boundary deinaion verex a cener of boom boundary Find any pah connecing o M 2 verice abou 2M 2 edge M verice edge 7 49 84 15 225 420 31 961 1860 63 3969 7812 127 16129 32004 255 65025 129540 511 261121 521220 Co meaure: number of graph edge examined
Finding an -pah in a grid graph Similar problem are covered exenively in he lieraure Percolaion Random walk Nonelfinerecing pah in grid Graph covering?? Which baic algorihm hould a praciioner ue o find a pah in a grid-like graph?
Finding an -pah in a grid graph Elemenary algorihm are found in exbook Deph-fir earch (DFS) Breadh-fir earch (BFS) Union-find?? Which baic algorihm hould a praciioner ue o find a pah in a grid-like graph?
Abrac daa ype eparae clien from implemenaion A daa ype i a e of value and he operaion performed on hem An abrac daa ype i a daa ype whoe repreenaion i hidden Clien Inerface Implemenaion invoke operaion pecifie how o invoke op code ha implemen op Implemenaion hould no be ailored o paricular clien Develop implemenaion ha work properly for all clien Sudy heir performance for he clien a hand
Graph abrac daa ype Verice are ineger beween 0 and V-1 Edge are verex pair Graph ADT implemen Graph(Edge[]) o conruc graph from array of edge findpah(in, in) o conduc earch from o (in) o reurn predeceor of v on pah found Example: clien code for grid graph in e = 0; Edge[] a = new Edge[E]; for (in i = 0; i < V; i++) { if (i < V-M) a[e++] = new Edge(i, i+m); if (i >= M) a[e++] = new Edge(i, i-m); if ((i+1) % M!= 0) a[e++] = new Edge(i, i+1); if (i % M!= 0) a[e++] = new Edge(i, i-1); } GRAPH G = new GRAPH(a); G.findPah(V-1-M/2, M/2); for (in k = ; k!= ; k = G.(k)) Syem.ou.prinln( + - + ); M = 5 20 21 22 23 24 15 16 17 18 19 10 11 12 13 14 5 6 7 8 9 0 1 2 3 4
DFS: andard implemenaion graph ADT conrucor code for (in k = 0; k < E; k++) { in v = a[k].v, w = a[k].w; adj[v] = new Node(w, adj[v]); adj[w] = new Node(v, adj[w]); } graph repreenaion verex-indexed array of linked li wo node per edge DFS implemenaion (code o ave pah omied) void findpahr(in, in ) { if ( == ) reurn; viied() = rue; for(node x = adj[]; x!= null; x = x.nex) if (!viied[x.v]) earchr(x.v, ); } void findpah(in, in ) { viied = new boolean[v]; earchr(, ); } 4 7 4 6 7 8 3 4 5 0 1 2 7
Baic flaw in andard DFS cheme co rongly depend on arbirary deciion in clien code!... for (in i = 0; i < V; i++) { if ((i+1) % M!= 0) a[e++] = new Edge(i, i+1); if (i % M!= 0) a[e++] = new Edge(i, i-1); if (i < V-M) a[e++] = new Edge(i, i+m); if (i >= M) a[e++] = new Edge(i, i-m); }... order of hee aemen deermine order in li we, ea, norh, ouh ouh, norh, ea, we order in li ha draic effec on running ime ~E/2 ~E 1/2 bad new for ANY graph model
Addreing he baic flaw Advie he clien o randomize he edge? no, very poor ofware engineering lead o nonrandom edge li (!) Randomize each edge li before ue? no, may no need he whole li Soluion: Ue a randomized ieraor andard ieraor in N = adj[x].lengh; for(in i = 0; i < N; i++) { proce verex adj[x][i]; } x i N repreen graph wih array, no li randomized ieraor in N = adj[x].lengh; for(in i = 0; i < N; i++) { exch(adj[x], i, i + (in) Mah.random()*(N-i)); } proce verex adj[x][i]; exchange random verex from adj[x][i..n-1] wih adj[x][i] x x i i N
Ue of randomized ieraor urn every graph algorihm ino a randomized algorihm Imporan pracical effec: abilize algorihm performance co depend on problem no i repreenaion Yield well-defined and fundamenal analyic problem Average-cae analyi of algorihm X for graph family Y(N)? Diribuion? Full employmen for algorihm analy
(Revied) andard DFS implemenaion graph ADT conrucor code for (in k = 0; k < E; k++) { in v = a[k].v, w = a[k].w; adj[v][deg[v]++] = w; adj[w][deg[w]++] = v; } graph repreenaion verex-indexed array of variablelengh array DFS implemenaion (code o ave pah omied) void findpahr(in, in ) 4 7 { in N = adj[].lengh; if ( == ) reurn; viied() = rue; for(in i = 0; i < N; i++) 7 4 { in v = exch(adj[], i, i+(in) Mah.random()*(N-i)); } } if (!viied[v]) earchr(v, ); 6 7 8 3 4 5 void findpah(in, in ) 0 1 2 { viied = new boolean[v]; findpahr(, ); }
BFS: andard implemenaion Ue a queue o hold fringe verice while Q i nonempy ge x from Q done if x = for each unmarked v adj o x pu v on Q mark v ree verex fringe verex uneen verex void findpah(in, in ) FIFO queue for BFS { Queue Q = new Queue(); Q.pu(); viied[] = rue; while (!Q.empy()) { in x = Q.ge(); in N = adj[x].lengh; if (x == ) reurn; randomized ieraor for (in i = 0; i < N; i++) { in v = exch(adj[x], i, i + (in) Mah.random()*(N-i)); if (!viied[v]) { Q.pu(v); viied[v] = rue; } } } } Generalized graph earch: oher queue yield A* and oher graph-earch algorihm
Union-Find implemenaion 1. Run union-find o find componen conaining and iniialize array of ieraor iniialize UF array while and no in ame componen chooe random ieraor chooe random edge for union 2. Build ubgraph wih edge from ha componen 3. Ue DFS o find -pah in ha ubgraph
Animaion give inuiion on performance BFS DFS UF and ugge hypohee o verify wih experimenaion
Experimenal reul for baic algorihm DFS i ubanially faer han BFS and UF on he average M V E BFS DFS UF 7 49 168.75.32 1.05 15 225 840.75.45 1.02 31 961 3720.75.36 1.14 63 3969 15624.75.32 1.05 127 16129 64008.75.40.99 255 65025 259080.75.42 1.08 UF DFS BFS Analyic proof? Faer algorihm available?
A faer algorihm for finding an -pah in a graph Ue wo deph-fir earche one from he ource one from he deinaion inerleave he wo M V E BFS DFS UF wo 7 49 168.75.32 1.05.18 15 225 840.75.45 1.02.13 31 961 3720.75.36 1.14.15 63 3969 15624.75.32 1.05.14 127 16129 64008.75.40.99.13 255 65025 259080.75.42 1.08.12 Examine 13% of he edge 3-8 ime faer han andard implemenaion No loglog E, bu no bad!
Are oher approache faer? Oher earch algorihm randomized? farhe-fir? Muliple earche? inerleaving raegy? merge raegy? how many? which algorihm? Hybrid algorihm which combinaion? probabiliic rear? merge raegy? randomized choice? Beer han conan-facor improvemen poible? Proof?
Experimen wih oher approache Randomized earch ue random queue in BFS eay o implemen Reul: no much differen from BFS Muliple earcher ue N earcher one from he ource one from he deinaion N-2 from random verice Addiional facor of 2 for N>2 Reul: no much help anyway 1.40 BFS Be mehod found (by far): DFS wih 2 earcher.70.40.12 1 2 3 4 5 10 20 DFS
Small-world graph are a widely udied graph model wih many applicaion Small-world graph A mall-world graph ha large number of verice low average verex degree (pare) low average pah lengh local cluering Example: Add random edge o grid graph Add random edge o any pare graph wih local cluering Many cienific model Q. How do we find an -pah in a mall-world graph?
Small-world graph model he ix degree of eparaion phenomenon Small-world graph Caligola Parick Allen Dial M for Murder Grace Kelly John Gielguld Glenn Cloe Porrai of a Lady The Sepford Wive Nicole Kidman The Eagle ha Landed To Cach a Thief High Noon Lloyd Bridge Murder on he Orien Expre Cold Mounain Donald Suherland Kahleen Quinlan Joe Veru he Volcano Hamle Enigma Eernal Sunhine of he Spole Mind Vernon Dobcheff Jude Kae Winle An American Hauning The Woodman Wild Thing John Beluhi Meryl Sreep Animal Houe Kevin Bacon The River Wild Tianic Apollo 13 Bill Paxon Paul Herber Yve Auber Tom Hank The Da Vinci Code Shane Zaza Audrey Tauou A iny porion of he movie-performer relaionhip graph Example: Kevin Bacon number
Applicaion of mall-world graph ocial nework airline road neurobiology evoluion ocial influence proein ineracion percolaion inerne elecric power grid poliical rend... Example 1: Social nework infeciou dieae exenive imulaion ome analyic reul huge graph Example 2: Proein ineracion mall-world model naural proce experimenal validaion Hamle John Gielguld Enigma Murder on he Orien Expre Eernal Sunhine of he Spole Mind Caligola Vernon Dobcheff Glenn Cloe Porrai of a Lady Jude Kae Winle Cold Mounain An American Hauning Small-world graph The Sepford Wive Nicole Kidman The Woodman Wild Thing John Beluhi Meryl Sreep Parick Allen The Eagle ha Landed Donald Suherland Animal Houe Kevin Bacon The River Wild Tianic Dial M for Murder To Cach a Thief Kahleen Quinlan Apollo 13 Bill Paxon Paul Herber Yve Auber A iny porion of he movie-performer relaionhip graph Grace Kelly The Da Vinci Code High Noon Lloyd Bridge Joe Veru he Volcano Tom Hank Shane Zaza Audrey Tauou
Finding a pah in a mall-world graph i a heavily udied problem Small-world graph Milgram experimen (1960) Small-world graph model Random (many varian) Wa-Srogaz Kleinberg add V random horcu o grid graph and oher A* ue ~ log E ep o find a pah How doe 2-way DFS do in hi model? no change a all in graph code ju a differen graph model Experimen: add M ~ E 1/2 random edge o an M-by-M grid graph ue 2-way DFS o find pah Surpriing reul: Find hor pah in ~ E 1/2 ep!
Finding a pah in a mall-world graph i much eaier han finding a pah in a grid graph Conjecure: Two-way DFS find a hor -pah in ublinear ime in any mall-world graph Small-world graph Evidence in favor 1. Experimen on many graph 2. Proof kech for grid graph wih V horcu ep 1: 2 E 1/2 ep ~ 2 V 1/2 random verice ep 2: like birhday paradox Pah lengh? Muliple earcher reviied? wo e of 2V 1/2 randomly choen verice are highly unlikely o be dijoin Nex ep: refine model, more experimen, deailed proof
More queion han anwer Anwer Randomizaion make co depend on graph, no repreenaion. DFS i faer han BFS or UF for finding pah in grid graph. Two DFS are faer han 1 DFS or N of hem in grid graph. We can find hor pah quickly in mall-world graph Queion Wha are he BFS, UF, and DFS conan in grid graph? I here a ublinear algorihm for grid graph? Which mehod adap o direced graph? Can we preciely analyze and quanify co for mall-world graph? Wha i he co diribuion for DFS for any inereing graph family? How effecive are hee mehod for oher graph familie? Do hee mehod lead o faer maxflow algorihm? How effecive are hee mehod in pracice?...
Leon We know much le han you migh hink abou mo of he algorihm ha we ue The cienific mehod i neceary in algorihm deign and ofware developmen
The cienific mehod i neceary in algorihm deign and ofware developmen Scienific mehod creae a model decribing naural world ue model o develop hypohee run experimen o validae hypohee refine model and repea model hypohei experimen 1950 2000 Algorihm deigner who doe no experimen ge lo in abracion Sofware developer who ignore co rik caarophic conequence