The Role of the Scientific Method in Software Development. Robert Sedgewick Princeton University



Similar documents
The Role of Science and Mathematics in Software Development

Chapter 13. Network Flow III Applications Edge disjoint paths Edge-disjoint paths in a directed graphs

2.4 Network flows. Many direct and indirect applications telecommunication transportation (public, freight, railway, air, ) logistics

How Much Can Taxes Help Selfish Routing?

Acceleration Lab Teacher s Guide

A Comparative Study of Linear and Nonlinear Models for Aggregate Retail Sales Forecasting

CHARGE AND DISCHARGE OF A CAPACITOR

Heat demand forecasting for concrete district heating system

Fortified financial forecasting models: non-linear searching approaches

The Application of Multi Shifts and Break Windows in Employees Scheduling

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

Chapter 2 Kinematics in One Dimension

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Making a Faster Cryptanalytic Time-Memory Trade-Off

Performance Center Overview. Performance Center Overview 1

Chapter 8: Regression with Lagged Explanatory Variables

Morningstar Investor Return

Process Modeling for Object Oriented Analysis using BORM Object Behavioral Analysis.

Capacity Planning and Performance Benchmark Reference Guide v. 1.8

The Grantor Retained Annuity Trust (GRAT)

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

µ r of the ferrite amounts to It should be noted that the magnetic length of the + δ

cooking trajectory boiling water B (t) microwave time t (mins)

Automatic measurement and detection of GSM interferences

Optimal Investment and Consumption Decision of Family with Life Insurance

AP Calculus BC 2010 Scoring Guidelines

Calculation of variable annuity market sensitivities using a pathwise methodology

Chapter 7. Response of First-Order RL and RC Circuits

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

The Transport Equation

Equity Valuation Using Multiples. Jing Liu. Anderson Graduate School of Management. University of California at Los Angeles (310)

Physical Topology Discovery for Large Multi-Subnet Networks

Max Flow, Min Cut. Maximum Flow and Minimum Cut. Soviet Rail Network, Minimum Cut Problem

Optimal Path Routing in Single and Multiple Clock Domain Systems

Table of contents Chapter 1 Interest rates and factors Chapter 2 Level annuities Chapter 3 Varying annuities

Top-K Structural Diversity Search in Large Networks

Vector Autoregressions (VARs): Operational Perspectives

Measuring macroeconomic volatility Applications to export revenue data,

Appendix A: Area. 1 Find the radius of a circle that has circumference 12 inches.

Strategic Optimization of a Transportation Distribution Network

Capacitors and inductors

An approach for designing a surface pencil through a given geodesic curve

Task is a schedulable entity, i.e., a thread

Empirical heuristics for improving Intermittent Demand Forecasting

Improvement of a TCP Incast Avoidance Method for Data Center Networks

Appendix D Flexibility Factor/Margin of Choice Desktop Research

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Caring for trees and your service

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

Capital Budgeting and Initial Cash Outlay (ICO) Uncertainty

Differential Equations and Linear Superposition

Multiprocessor Systems-on-Chips

INTRODUCTION TO MARKETING PERSONALIZATION. How to increase your sales with personalized triggered s

9. Capacitor and Resistor Circuits

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

4 Convolution. Recommended Problems. x2[n] 1 2[n]

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

Stability. Coefficients may change over time. Evolution of the economy Policy changes

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

AP Calculus AB 2010 Scoring Guidelines

Model-Based Monitoring in Large-Scale Distributed Systems

Q-SAC: Toward QoS Optimized Service Automatic Composition *

The Time Value of Money

Direc Manipulaion Inerface and EGN algorithms

How To Solve An Uncerain Daa Problem

Trends in TCP/IP Retransmissions and Resets

Usefulness of the Forward Curve in Forecasting Oil Prices

GoRA. For more information on genetics and on Rheumatoid Arthritis: Genetics of Rheumatoid Arthritis. Published work referred to in the results:

1 HALF-LIFE EQUATIONS

Double Entry System of Accounting

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

Present Value Methodology

Cross-sectional and longitudinal weighting in a rotational household panel: applications to EU-SILC. Vijay Verma, Gianni Betti, Giulio Ghellini

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 1: Introduction, Elementary ANNs

OPTIMAL BATCH QUANTITY MODELS FOR A LEAN PRODUCTION SYSTEM WITH REWORK AND SCRAP. A Thesis

Hotel Room Demand Forecasting via Observed Reservation Information

A Natural Feature-Based 3D Object Tracking Method for Wearable Augmented Reality

Hedging with Forwards and Futures

Policies & Procedures. I.D. Number: 1071

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

Chapter 1.6 Financial Management

Module 3 Design for Strength. Version 2 ME, IIT Kharagpur

Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects

Permutations and Combinations

Transcription:

The Role of he Scienific Mehod in Sofware Developmen Rober Sedgewick Princeon Univeriy

The cienific mehod i neceary in algorihm deign and ofware developmen Scienific mehod creae a model decribing naural world ue model o develop hypohee run experimen o validae hypohee refine model and repea model hypohei experimen 1950 2000 Algorihm deigner who doe no experimen ge lo in abracion Sofware developer who ignore co rik caarophic conequence

Fir hypohei (need checking) Modern ofware developmen require huge amoun of code

Fir hypohei (need checking) Modern ofware developmen require huge amoun of code bu performance-criical code implemen relaively few fundamenal algorihm

Warmup: random number generaion Problem: wrie a program o generae random number model: claical probabiliy and aiic hypohei: frequency value hould be uniform weak experimen: generae random number check for uniform frequencie model hypohei experimen beer experimen: generae random number ue x 2 e o check frequency value again uniform diribuion beer hypohee/experimen ill needed many documened diaer acive area of cienific reearch applicaion: imulaion, crypography in k = 0; V = 10 connec o core iue in heory of compuaion while ( rue ) Syem.ou.prin(k++ % V); in k = 0; random? while ( rue ) { } 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7... k = k*1664525 + 1013904223); Syem.ou.prin(k % V); exbook algorihm ha flunk x 2 e

Warmup (coninued) Q. I a given equence of number random? A. No. average probe unil duplicae i abou 24 Q. Doe a given equence exhibi ome propery ha random number equence exhibi? V = 365 Birhday paradox Average coun of random number generaed unil a duplicae happen i abou pv/2 Example of a beer experimen: generae number unil duplicae check ha coun i cloe o pv/2 even beer: repea many ime, check again diribuion ill beer: run many imilar e for oher properie Anyone who conider arihmeical mehod of producing random digi i, of coure, in a ae of in John von Neumann

Deailed example: pah in graph A lecure wihin a lecure

Finding an -pah in a graph i a fundamenal operaion ha demand underanding Ground rule for hi alk work in progre (more queion han anwer) baic reearch ave deep dive for he righ problem Applicaion graph-baed opimizaion model nework percolaion compuer viion ocial nework (many more) Baic reearch fundamenal abrac operaion wih numerou applicaion worh doing even if no immediae applicaion rei empaion o premaurely udy impac

: maxflow Ford-Fulkeron maxflow cheme find any - pah in a (reidual) graph augmen flow along pah (may creae or delee edge) ierae unil no pah exi Goal: compare performance of wo baic implemenaion hore augmening pah maximum capaciy augmening pah Key ep in analyi How many augmening pah? Wha i he co of finding each pah? reearch lieraure hi alk

: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer number of verice V number of edge E maximum capaciy C How many augmening pah? hore max capaciy wor cae upper bound VE/2 VC 2E lg C How many ep o find each pah? E (wor-cae upper bound)

: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer for example graph number of verice V = 177 number of edge E = 2000 maximum capaciy C = 100 How many augmening pah? hore wor cae upper bound VE/2 VC for example 177,000 17,700 max capaciy 2E lg C 26,575 How many ep o find each pah? 2000 (wor-cae upper bound)

: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer for example graph number of verice V = 177 number of edge E = 2000 maximum capaciy C = 100 How many augmening pah? wor cae upper bound for example acual hore VE/2 VC 177,000 17,700 37 max capaciy 2E lg C 26,575 7 How many ep o find each pah? < 20, on average oal i a facor of a million high for houand-node graph!

: max flow Compare performance of Ford-Fulkeron implemenaion hore augmening pah maximum-capaciy augmening pah Graph parameer number of verice V number of edge E maximum capaciy C Toal number of ep? hore max capaciy wor cae upper bound VE 2 /2 VEC 2E 2 lg C WARNING: The Algorihm General ha deermined ha uing uch reul o predic performance or o compare algorihm may be hazardou.

: leon Goal of algorihm analyi predic performance (running ime) guaranee ha co i below pecified bound wor-cae bound Common widom random graph model are unrealiic average-cae analyi of algorihm i oo difficul wor-cae performance bound are he andard Unforunae ruh abou wor-cae bound ofen uele for predicion (ficional) ofen uele for guaranee (oo high) ofen miued o compare algorihm Bound are ueful in many applicaion: which one?? Open problem: Do beer! acual co

Finding an -pah in a graph i a baic operaion in a grea many applicaion Q. Wha i he be way o find an -pah in a graph? A. Several well-udied exbook algorihm are known Breadh-fir earch (BFS) find he hore pah Deph-fir earch (DFS) i eay o implemen Union-Find (UF) need wo pae BUT all hree proce all E edge in he wor cae divere kind of graph are encounered in pracice Wor-cae analyi i uele for predicing performance Which baic algorihm hould a praciioner ue???

Algorihm performance depend on he graph model complee random grid neighbor mall-world Iniial choice: grid graph ufficienly challenging o be inereing found in pracice (or imilar o graph found in pracice) calable poenial for analyi Ground rule algorihm hould work for all graph... (many appropriae candidae) if verice have poiion we can find hor pah quickly wih A* (ay uned) algorihm hould no ue any pecial properie of he model

Applicaion of grid graph conduciviy concree granular maerial porou media polymer fore fire epidemic Inerne reior nework evoluion ocial influence Fermi paradox fracal geomery ereo viion image reoraion objec egmenaion cene reconrucion... Example 1: Saiical phyic percolaion model exenive imulaion ome analyic reul arbirarily huge graph Example 2: Image proceing model pixel in image maxflow/mincu energy minimizaion huge graph

Finding an -pah in a grid graph M by M grid of verice undireced edge connecing each verex o i HV neighbor ource verex a cener of op boundary deinaion verex a cener of boom boundary Find any pah connecing o M 2 verice abou 2M 2 edge M verice edge 7 49 84 15 225 420 31 961 1860 63 3969 7812 127 16129 32004 255 65025 129540 511 261121 521220 Co meaure: number of graph edge examined

Finding an -pah in a grid graph Similar problem are covered exenively in he lieraure Percolaion Random walk Nonelfinerecing pah in grid Graph covering?? Which baic algorihm hould a praciioner ue o find a pah in a grid-like graph?

Finding an -pah in a grid graph Elemenary algorihm are found in exbook Deph-fir earch (DFS) Breadh-fir earch (BFS) Union-find?? Which baic algorihm hould a praciioner ue o find a pah in a grid-like graph?

Abrac daa ype eparae clien from implemenaion A daa ype i a e of value and he operaion performed on hem An abrac daa ype i a daa ype whoe repreenaion i hidden Clien Inerface Implemenaion invoke operaion pecifie how o invoke op code ha implemen op Implemenaion hould no be ailored o paricular clien Develop implemenaion ha work properly for all clien Sudy heir performance for he clien a hand

Graph abrac daa ype Verice are ineger beween 0 and V-1 Edge are verex pair Graph ADT implemen Graph(Edge[]) o conruc graph from array of edge findpah(in, in) o conduc earch from o (in) o reurn predeceor of v on pah found Example: clien code for grid graph in e = 0; Edge[] a = new Edge[E]; for (in i = 0; i < V; i++) { if (i < V-M) a[e++] = new Edge(i, i+m); if (i >= M) a[e++] = new Edge(i, i-m); if ((i+1) % M!= 0) a[e++] = new Edge(i, i+1); if (i % M!= 0) a[e++] = new Edge(i, i-1); } GRAPH G = new GRAPH(a); G.findPah(V-1-M/2, M/2); for (in k = ; k!= ; k = G.(k)) Syem.ou.prinln( + - + ); M = 5 20 21 22 23 24 15 16 17 18 19 10 11 12 13 14 5 6 7 8 9 0 1 2 3 4

DFS: andard implemenaion graph ADT conrucor code for (in k = 0; k < E; k++) { in v = a[k].v, w = a[k].w; adj[v] = new Node(w, adj[v]); adj[w] = new Node(v, adj[w]); } graph repreenaion verex-indexed array of linked li wo node per edge DFS implemenaion (code o ave pah omied) void findpahr(in, in ) { if ( == ) reurn; viied() = rue; for(node x = adj[]; x!= null; x = x.nex) if (!viied[x.v]) earchr(x.v, ); } void findpah(in, in ) { viied = new boolean[v]; earchr(, ); } 4 7 4 6 7 8 3 4 5 0 1 2 7

Baic flaw in andard DFS cheme co rongly depend on arbirary deciion in clien code!... for (in i = 0; i < V; i++) { if ((i+1) % M!= 0) a[e++] = new Edge(i, i+1); if (i % M!= 0) a[e++] = new Edge(i, i-1); if (i < V-M) a[e++] = new Edge(i, i+m); if (i >= M) a[e++] = new Edge(i, i-m); }... order of hee aemen deermine order in li we, ea, norh, ouh ouh, norh, ea, we order in li ha draic effec on running ime ~E/2 ~E 1/2 bad new for ANY graph model

Addreing he baic flaw Advie he clien o randomize he edge? no, very poor ofware engineering lead o nonrandom edge li (!) Randomize each edge li before ue? no, may no need he whole li Soluion: Ue a randomized ieraor andard ieraor in N = adj[x].lengh; for(in i = 0; i < N; i++) { proce verex adj[x][i]; } x i N repreen graph wih array, no li randomized ieraor in N = adj[x].lengh; for(in i = 0; i < N; i++) { exch(adj[x], i, i + (in) Mah.random()*(N-i)); } proce verex adj[x][i]; exchange random verex from adj[x][i..n-1] wih adj[x][i] x x i i N

Ue of randomized ieraor urn every graph algorihm ino a randomized algorihm Imporan pracical effec: abilize algorihm performance co depend on problem no i repreenaion Yield well-defined and fundamenal analyic problem Average-cae analyi of algorihm X for graph family Y(N)? Diribuion? Full employmen for algorihm analy

(Revied) andard DFS implemenaion graph ADT conrucor code for (in k = 0; k < E; k++) { in v = a[k].v, w = a[k].w; adj[v][deg[v]++] = w; adj[w][deg[w]++] = v; } graph repreenaion verex-indexed array of variablelengh array DFS implemenaion (code o ave pah omied) void findpahr(in, in ) 4 7 { in N = adj[].lengh; if ( == ) reurn; viied() = rue; for(in i = 0; i < N; i++) 7 4 { in v = exch(adj[], i, i+(in) Mah.random()*(N-i)); } } if (!viied[v]) earchr(v, ); 6 7 8 3 4 5 void findpah(in, in ) 0 1 2 { viied = new boolean[v]; findpahr(, ); }

BFS: andard implemenaion Ue a queue o hold fringe verice while Q i nonempy ge x from Q done if x = for each unmarked v adj o x pu v on Q mark v ree verex fringe verex uneen verex void findpah(in, in ) FIFO queue for BFS { Queue Q = new Queue(); Q.pu(); viied[] = rue; while (!Q.empy()) { in x = Q.ge(); in N = adj[x].lengh; if (x == ) reurn; randomized ieraor for (in i = 0; i < N; i++) { in v = exch(adj[x], i, i + (in) Mah.random()*(N-i)); if (!viied[v]) { Q.pu(v); viied[v] = rue; } } } } Generalized graph earch: oher queue yield A* and oher graph-earch algorihm

Union-Find implemenaion 1. Run union-find o find componen conaining and iniialize array of ieraor iniialize UF array while and no in ame componen chooe random ieraor chooe random edge for union 2. Build ubgraph wih edge from ha componen 3. Ue DFS o find -pah in ha ubgraph

Animaion give inuiion on performance BFS DFS UF and ugge hypohee o verify wih experimenaion

Experimenal reul for baic algorihm DFS i ubanially faer han BFS and UF on he average M V E BFS DFS UF 7 49 168.75.32 1.05 15 225 840.75.45 1.02 31 961 3720.75.36 1.14 63 3969 15624.75.32 1.05 127 16129 64008.75.40.99 255 65025 259080.75.42 1.08 UF DFS BFS Analyic proof? Faer algorihm available?

A faer algorihm for finding an -pah in a graph Ue wo deph-fir earche one from he ource one from he deinaion inerleave he wo M V E BFS DFS UF wo 7 49 168.75.32 1.05.18 15 225 840.75.45 1.02.13 31 961 3720.75.36 1.14.15 63 3969 15624.75.32 1.05.14 127 16129 64008.75.40.99.13 255 65025 259080.75.42 1.08.12 Examine 13% of he edge 3-8 ime faer han andard implemenaion No loglog E, bu no bad!

Are oher approache faer? Oher earch algorihm randomized? farhe-fir? Muliple earche? inerleaving raegy? merge raegy? how many? which algorihm? Hybrid algorihm which combinaion? probabiliic rear? merge raegy? randomized choice? Beer han conan-facor improvemen poible? Proof?

Experimen wih oher approache Randomized earch ue random queue in BFS eay o implemen Reul: no much differen from BFS Muliple earcher ue N earcher one from he ource one from he deinaion N-2 from random verice Addiional facor of 2 for N>2 Reul: no much help anyway 1.40 BFS Be mehod found (by far): DFS wih 2 earcher.70.40.12 1 2 3 4 5 10 20 DFS

Small-world graph are a widely udied graph model wih many applicaion Small-world graph A mall-world graph ha large number of verice low average verex degree (pare) low average pah lengh local cluering Example: Add random edge o grid graph Add random edge o any pare graph wih local cluering Many cienific model Q. How do we find an -pah in a mall-world graph?

Small-world graph model he ix degree of eparaion phenomenon Small-world graph Caligola Parick Allen Dial M for Murder Grace Kelly John Gielguld Glenn Cloe Porrai of a Lady The Sepford Wive Nicole Kidman The Eagle ha Landed To Cach a Thief High Noon Lloyd Bridge Murder on he Orien Expre Cold Mounain Donald Suherland Kahleen Quinlan Joe Veru he Volcano Hamle Enigma Eernal Sunhine of he Spole Mind Vernon Dobcheff Jude Kae Winle An American Hauning The Woodman Wild Thing John Beluhi Meryl Sreep Animal Houe Kevin Bacon The River Wild Tianic Apollo 13 Bill Paxon Paul Herber Yve Auber Tom Hank The Da Vinci Code Shane Zaza Audrey Tauou A iny porion of he movie-performer relaionhip graph Example: Kevin Bacon number

Applicaion of mall-world graph ocial nework airline road neurobiology evoluion ocial influence proein ineracion percolaion inerne elecric power grid poliical rend... Example 1: Social nework infeciou dieae exenive imulaion ome analyic reul huge graph Example 2: Proein ineracion mall-world model naural proce experimenal validaion Hamle John Gielguld Enigma Murder on he Orien Expre Eernal Sunhine of he Spole Mind Caligola Vernon Dobcheff Glenn Cloe Porrai of a Lady Jude Kae Winle Cold Mounain An American Hauning Small-world graph The Sepford Wive Nicole Kidman The Woodman Wild Thing John Beluhi Meryl Sreep Parick Allen The Eagle ha Landed Donald Suherland Animal Houe Kevin Bacon The River Wild Tianic Dial M for Murder To Cach a Thief Kahleen Quinlan Apollo 13 Bill Paxon Paul Herber Yve Auber A iny porion of he movie-performer relaionhip graph Grace Kelly The Da Vinci Code High Noon Lloyd Bridge Joe Veru he Volcano Tom Hank Shane Zaza Audrey Tauou

Finding a pah in a mall-world graph i a heavily udied problem Small-world graph Milgram experimen (1960) Small-world graph model Random (many varian) Wa-Srogaz Kleinberg add V random horcu o grid graph and oher A* ue ~ log E ep o find a pah How doe 2-way DFS do in hi model? no change a all in graph code ju a differen graph model Experimen: add M ~ E 1/2 random edge o an M-by-M grid graph ue 2-way DFS o find pah Surpriing reul: Find hor pah in ~ E 1/2 ep!

Finding a pah in a mall-world graph i much eaier han finding a pah in a grid graph Conjecure: Two-way DFS find a hor -pah in ublinear ime in any mall-world graph Small-world graph Evidence in favor 1. Experimen on many graph 2. Proof kech for grid graph wih V horcu ep 1: 2 E 1/2 ep ~ 2 V 1/2 random verice ep 2: like birhday paradox Pah lengh? Muliple earcher reviied? wo e of 2V 1/2 randomly choen verice are highly unlikely o be dijoin Nex ep: refine model, more experimen, deailed proof

More queion han anwer Anwer Randomizaion make co depend on graph, no repreenaion. DFS i faer han BFS or UF for finding pah in grid graph. Two DFS are faer han 1 DFS or N of hem in grid graph. We can find hor pah quickly in mall-world graph Queion Wha are he BFS, UF, and DFS conan in grid graph? I here a ublinear algorihm for grid graph? Which mehod adap o direced graph? Can we preciely analyze and quanify co for mall-world graph? Wha i he co diribuion for DFS for any inereing graph family? How effecive are hee mehod for oher graph familie? Do hee mehod lead o faer maxflow algorihm? How effecive are hee mehod in pracice?...

Leon We know much le han you migh hink abou mo of he algorihm ha we ue The cienific mehod i neceary in algorihm deign and ofware developmen

The cienific mehod i neceary in algorihm deign and ofware developmen Scienific mehod creae a model decribing naural world ue model o develop hypohee run experimen o validae hypohee refine model and repea model hypohei experimen 1950 2000 Algorihm deigner who doe no experimen ge lo in abracion Sofware developer who ignore co rik caarophic conequence