Data-Driven Optimization John Turner Assistant Professor The Paul Merage School of Business University of California, Irvine October 24, 2014: UCI Data Science Initiative
Analytics Unlocking the Potential of Big Data è Descriptive What is happening? Querying, Reporting, Data Capturing, Filtering & Analysis è Predictive What will likely happen? Statistical Methods (Regression), Forecasting & Data Mining è Prescriptive What should we do? Optimization, Simulation, Quantitative Models John Turner - UCI Data Science Initiative 2
John Turner - UCI Data Science Initiative 3 Optimization min f(x) @ i=1..m@ i=1..p@ x X s.t. g i (x) 0, h i (x)=0,
Applica'ons for Op'miza'on chip design logistics supply chain management portfolio optimization scheduling network design air traffic routing timetabling marketing campaign design production planning
John Turner - UCI Data Science Initiative 5 Problems, Formula'ons, and Instances Define Problem Algebraic Manipulation Construct Mathematical Model (Formulation) Solver (CPLEX, Gurobi, KNITRO, etc.) Solve one or more Instances Communicate Results / Automate the Process
John Turner - UCI Data Science Initiative 6 Computational Challenges (x) @.m@.p@ x X Nonconvex objective s.t. g i (x) 0, h i (x)=0, Too many constraints Nonlinear constraints Too many variables Nonconvex domain (e.g., discrete X) à Large-Scale Instances Require Specialized Decomposition or Approximation Schemes
Two Examples 1. Planning Online Advertising Turner, J. 2012. The Planning of Guaranteed Targeted Display Advertising. Operations Research, 60(1) 18--33. 2. Locating Trauma Centers at Nation-Scale Cho, S-H., H. Jang, T. Lee, J. Turner. 2014. Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning. Operations Research, 62(4) 751--771. John Turner - UCI Data Science Initiative 7
John Turner - UCI Data Science Initiative 8
Different viewer types John Turner - UCI Data Science Initiative 9
Different viewer types John Turner - UCI Data Science Initiative 10
Different targeting requirements John Turner - UCI Data Science Initiative 11
John Turner - UCI Data Science Initiative 12 Impression Goals 1 5 2 Impression goals for each advertiser 1
John Turner - UCI Data Science Initiative 13 4 Supply 1 Impression Goals 1 3 5 2 1 2 5 2 We use optimization to find an optimal allocation 1
Combinatorial Explosion Targeting can specify: gender age geographic region income level day of week / time of day the sports you like, etc. Trillions of possible combinations!! John Turner - UCI Data Science Initiative 14
John Turner - UCI Data Science Initiative 15 Aggregate Solve Disaggregate Supply highly variable Expected supply close to zero! Questions: 1. What is best way to aggregate? 2. How to interpret aggregate solution? 3. How good is aggregate solution?
Implications Allow the optimization to segment viewers into buckets This becomes an iterative process Solve à Learn à Revise à Repeat Data requirements change at each iteration John Turner - UCI Data Science Initiative 16
Data- Driven Op'miza'on Sequen&al Process 1. Construct Es&mates of Model Parameters (Forecas&ng / Data Mining) 2. Solve Op&miza&on Model Take-away: Can often get to within 1% of optimality with less than 100 judiciously-chosen audience segments Itera&ve Process 1. Start with Coarse Approxima&ons for Model Parameters (Cheap Forecasts) 2. Solve Op&miza&on Model 3. Learn & Revise Model 4. Re- sample Data to Improve Forecasts 5. Repeat Steps 2-4 17
Example 2: Demand for Trauma Care in Korea l Popula&on ~ 50 million l Area 100,210 km 2 (~ Indiana) l 7 metro ci&es & 9 provinces l Number of trauma cases** = 190,196* l ~ 382.1 trauma / 100k 0 10 50 100 200 400 600 600~ * Stats in year 2008 ** Patients with EMR-ISS score 15 18
Candidate Sites Candidate Trauma Center Sites (n=38) Candidate Heliport Sites (n=54 total: 16 NEMA* + 38 @ TC s) *NEMA = Korean National Emergency Management Agency sites (heliports currently used for fire & other EMS missions) 19
T10H5 T10H15 T10H25 20 Decoupled No-Congestion Integrated Take-away: Solutions of varying quality can look visually similar. For each approach (decoupled, no-congestion, and integrated), indicates the sites chosen for T10H5; indicates the sites chosen for T10H15 but not for T10H5; indicates the sites chosen for T10H25 but not for T10H15
21 Successful Pa'ents (%) Take-away: An integrated model solved approximately is often better than solving two moreprecise models in sequence. 100% 90% 80% 10 trauma centers 12 trauma centers 14 trauma centers 70% 60% 50% Integrated NoCongestion Decoupled T10H5 T10H10 T10H15 T10H20 T10H25 T12H5 T12H10 T12H15 T12H20 T12H25 T14H5 T14H10 T14H15 T14H20 T14H25 # Helicopters # Helicopters # Helicopters
22 Big Data, Big Challenges Conclusions: l Predic&on and Op&miza&on work well together l Predict AND Op&mize, instead of Predict THEN Op&mize.
Data-Driven Optimization John Turner Assistant Professor The Paul Merage School of Business University of California, Irvine October 24, 2014: UCI Data Science Initiative