Recent Progresses on Linear Programming and the Simplex Method

Similar documents
1 Example 1: Axis-aligned rectangles

The Greedy Method. Introduction. 0/1 Knapsack Problem

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Solving Factored MDPs with Continuous and Discrete Variables

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Loop Parallelization

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Credit Limit Optimization (CLO) for Credit Cards

A Lyapunov Optimization Approach to Repeated Stochastic Games

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Ring structure of splines on triangulations

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Heuristic Static Load-Balancing Algorithm Applied to CESM

What is Candidate Sampling

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Efficient Reinforcement Learning in Factored MDPs

Compiling for Parallelism & Locality. Dependence Testing in General. Algorithms for Solving the Dependence Problem. Dependence Testing

Preventive Maintenance and Replacement Scheduling: Models and Algorithms

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Lecture 2: Single Layer Perceptrons Kevin Swingler

Optimal Bidding Strategies for Generation Companies in a Day-Ahead Electricity Market with Risk Management Taken into Account

An MILP model for planning of batch plants operating in a campaign-mode

Figure 1. Inventory Level vs. Time - EOQ Problem

Project Networks With Mixed-Time Constraints

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

Minimal Coding Network With Combinatorial Structure For Instantaneous Recovery From Edge Failures

PERRON FROBENIUS THEOREM

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Efficient Project Portfolio as a tool for Enterprise Risk Management

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Formulating & Solving Integer Problems Chapter

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Fisher Markets and Convex Programs

Stochastic Games on a Multiple Access Channel

To Fill or not to Fill: The Gas Station Problem

NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

J. Parallel Distrib. Comput.

Recurrence. 1 Definitions and main statements

Simple Interest Loans (Section 5.1) :

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

Retailers must constantly strive for excellence in operations; extremely narrow profit margins

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

Dynamic Pricing for Smart Grid with Reinforcement Learning

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

Article received on April 23, 2007; accepted on October 18, 2007

A Binary Particle Swarm Optimization Algorithm for Lot Sizing Problem

BERNSTEIN POLYNOMIALS

Forecasting the Direction and Strength of Stock Market Movement

Extending Probabilistic Dynamic Epistemic Logic

Simulation and optimization of supply chains: alternative or complementary approaches?

Ants Can Schedule Software Projects

Information Sciences

Chapter 7: Answers to Questions and Problems

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Period and Deadline Selection for Schedulability in Real-Time Systems

denote the location of a node, and suppose node X . This transmission causes a successful reception by node X for any other node

Using Series to Analyze Financial Situations: Present Value

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

The OC Curve of Attribute Acceptance Plans

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

On the Interaction between Load Balancing and Speed Scaling

Calculation of Sampling Weights

OPTIMAL INVESTMENT POLICIES FOR THE HORSE RACE MODEL. Thomas S. Ferguson and C. Zachary Gilstein UCLA and Bell Communications May 1985, revised 2004

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

Cost-based Scheduling of Scientific Workflow Applications on Utility Grids

A method for a robust optimization of joint product and supply chain design

Ant Colony Optimization for Economic Generator Scheduling and Load Dispatch

QoS-based Scheduling of Workflow Applications on Service Grids

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Generalizing the degree sequence problem

Fuzzy Set Approach To Asymmetrical Load Balancing In Distribution Networks

NONLINEAR OPTIMIZATION FOR PROJECT SCHEDULING AND RESOURCE ALLOCATION UNDER UNCERTAINTY

Imperial College London

ESTABLISHING TRADE-OFFS BETWEEN SUSTAINED AND MOMENTARY RELIABILITY INDICES IN ELECTRIC DISTRIBUTION PROTECTION DESIGN: A GOAL PROGRAMMING APPROACH

RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Global Search in Combinatorial Optimization using Reinforcement Learning Algorithms

Fragility Based Rehabilitation Decision Analysis

行 政 院 國 家 科 學 委 員 會 補 助 專 題 研 究 計 畫 成 果 報 告 期 中 進 度 報 告

Transcription:

Recent Progresses on Lnear Programmng and the Smplex Method Ynyu Ye www.stanford.edu/ ~ yyye K.T. L Professor of Engneerng Management Scence and Engneerng and Insttute of Computatonal and Mathematcal Engneerng Stanford Unversty

Lnear Programmng started

wth the smplex method

Outlne Counterexamples to the Hrsch conecture Lnear Programmng (LP) and the smplex method Pvotng rules and ther exponental behavor Smplex and polcy-teraton methods for Markov Decson Process (MDP) and Zero-Sum Game wth fxed dscounts Smplex method for determnstc MDP wth varable dscounts Remarks and comments

Hrsch s Conecture Warren Hrsch conectured n 1957 that the dameter of the graph of a (convex) polyhedron defned by n nequaltes n d dmensons s at most n-d. The dameter of the graph s the maxmum of the shortest paths between every two vertces.

Counter examples to Hrsch s conecture Francsco Santos (2010): There s a 43-dmensonal polytope wth 86 facets and of dameter at least 44. There s an nfnte famly of non-hrsch polytopes wth dameter (1 + ε)n, even n fxed dmenson. Santos' constructon s an extenson of a result of Klee and Walkup (1967), where they proved that the Hrsch conecture could be proved true from ust the case n = 2d.

LP and the Smplex Method Optmze a lnear obectve functon over a convex polyhedron

Pvotng rules The smplex method s governed by a pvot rule,.e. a method of choosng adacent vertces wth a better obectve functon value. Klee and Mnty (1972) showed that Dantzg's orgnal greedy pvot rule may requre exponentally many steps. The random edge pvot rule chooses, from among all mprovng pvotng steps (or edges) from the current basc feasble soluton (or vertex), one unformly at random. The Zadeh pvot rule chooses the decreasng edge or the enterng varable that has been entered least often n the prevous pvot steps.

and they fall as well No non-polynomal lower bounds were known untl now for these two pvot rules. Fredmann, Hansen and Zwck (2011) gave an example that the random edge pvot rule needs exponentally many steps. Fredman (2011) developed an example that the Zadeh pvot rule needs exponentally many steps. These examples explore the connecton of lnear programmng and Markov Decson Process (MDP), and the close relaton between the smplex method for solvng lnear programs and the polcy teraton method for MDP. (The dameter of MDP polytopes s bounded by d.)

Markov Decson Process Markov decson process provdes a mathematcal framework for modelng sequental decsonmakng n stuatons where outcomes are partly random and partly under the control of a decson maker. MDPs are useful for studyng a wde range of optmzaton problems solved va dynamc programmng, where t was known at least as early as the 1950s (cf. Shapley 1953, Bellman 1957). Modern applcatons nclude dynamc plannng, renforcement learnng, socal networkng, and almost all other dynamc/sequental decson makng problems n Mathematcal, Physcal, Management, Economcs, and Socal Scences.

States and Actons At each tme step, the process s n some state = 1,...,m, and the decson maker chooses an acton A that s avalable for state, say of total n actons. The process responds at the next tme step by randomly movng nto a new state, and gvng the decson maker an mmedate correspondng cost c. The probablty that the process enters as ts new state s nfluenced by the chosen acton. Specfcally, t s gven by the state transton probablty dstrbuton P. But gven acton, the probablty s condtonally ndependent of all prevous states and actons; n other words, the state transtons of an MDP possess the Markov property.

A Smple MDP Problem I

Smplfed Representaton

Polcy and Dscount Factor A polcy of MDP s a set functon π = { 1, 2,, m } that specfes one acton A that the decson maker wll choose for each state. The MDP s to fnd an optmal (statonary) polcy to mnmze the expected dscounted sum over an nfnte horzon wth a dscount factor 0 γ < 1. One can obtan an LP that models the MDP problem n such a way that there s a one-to-one correspondence between polces of the MDP and extreme-pont solutons of the (dual) LP, and between mprovng swtches and mprovng pvots. de Ghellnck (1960), D Epenoux (1960) and Manne (1960)

Cost-to-Go values and LP formulaton Let y R m represent the expected present costto-go values of the m states, respectvely, for a gven polcy. Then, the cost-to-go vector of the optmal polcy s a Fxed Pont of Such a fxed pont computaton can be formulated as an LP. },, arg mn{, },, mn{ A y p c A y p c y T T. ;, s.t. max 1 A y p c y y T m

Cost-to-Go values Chosen actons n Red

The dual of the MDP-LP mn s.t. n 1 ( e 1 p ) x 1,, 0,. where e =1 f A and 0 otherwse. n c x x Dual varable x represents the expected acton flow or vst-frequency, that s, the expected present value of the number of tmes acton s used.

Greedy Smplex Rule Chosen actons n Red

Lowest-Index Smplex Rule Chosen actons n Red

Polcy Iteraton Rule (Howard 1960) Chosen actons n Red

Effcency of smplex/polcy methods Melekopoglou and Condon (1990) showed that the smplex method wth the smallest ndex pvot rule needs an exponental number of teratons to compute an optmal polcy for a specfc MDP problem regardless of dscount factors. Fearnley (2010) showed that the polcy-teraton method needs an exponental number of teratons for a undscounted fnte-horzon MDP, together wth early mentoned negatve results. Negatve theoretcal results mentoned earler In practce, the polcy-teraton method, ncludng the smplex method wth greedy pvot rule, has been remarkably successful and shown to be most effectve and wdely used. Any good news n theory?

Bound on the smplex/polcy methods Y (2011): The classc smplex and polcy teraton methods, wth the greedy pvotng rule, termnate n no more than pvot steps, where n s the total number of actons n an m-state MDP wth dscount factor γ. Ths s a strongly polynomal-tme upper bound when γ s bounded above by a constant less than one. CIPA (Y, 2005) m 2 mn ) 1 mn log( 1 log( 1 m 2 )

Roadmap of proof Defne a combnatoral event that cannot repeats more than n tmes. More precsely, at any step of the pvot process, there exsts a non-optmal acton that wll never re-enter future polces or bases after 2 m m log( 1 1 pvot steps There are at most (n - m) such non-optmal acton to elmnate from appearance n any future polces generated by the smplex or polcy-teraton method. The proof reles on the dualty, the reduced-cost vector at the current polcy and the optmal reducedcost vector to provde a lower and upper bound for a non-optmal acton when the greedy rule s used. )

Improvement and extenson Hansen, Mltersen and Zwck (2011): For the polcy teraton method termnates n no more steps. 2 n m log( 1 1 The smplex and polcy teraton methods, wth the greedy pvotng rule, are strongly polynomaltme algorthms for Turn-Based Two-Person Zero-Sum Stochastc Game wth any fxed dscount factor, whch problem cannot even be formulated as an LP. )

A Turn-Based Zero-Sum Game

Improvement and extenson Ktahara and Mzuno (2011) extended the bound to solvng general non-degenerate LPs: mn s.t. 1 1, ; The smplex method termnates n at most n a n x mn log c pvot steps, when the rato of the mnmum value over the maxmum value, n all basc feasble soluton entres, s bounded below by σ. x m ( 2 b ) x 0,.

Determnstc MDP wth dscounts Dstrbuton vector p R m contans exactly one 1 and 0 everywhere else. },, arg mn{, },, mn{ A y p c A y p c y T T. ;, s.t. max 1 A y p c y y T m It has unform dscounts f all γ are dentcal.

The dual resembles generalzed flow mn s.t. n 1 ( e 1 ) x 1,, 0,. where e =1 f A and 0 otherwse. n c x p x Dual varable x represents the expected acton flow or frequency, that s, the expected present value of the number of tmes acton s chosen.

Effcency of smplex/polcy methods They are not known to be polynomal-tme algorthms for determnstc MDP even wth unform dscounts. There are quadratc lower bounds on these methods for solvng MDP wth unform dscounts. Ian Post and Y (2012): The Smplex method wth the greedy pvot rule termnates n at most 3 0( m n 2 log m) pvot steps when dscount factors are unform, or n at most 0( m 5 n 3 log pvot steps wth non-unform dscounts. We are not yet able to prove such results hold for the polcy teraton method. 2 2 m)

Polcy structures wth unform factors Each chosen acton can be ether a path-edge or cycle-edge. x n [ 1, m ] f t s a path-acton, x n [ 1/(1-γ), m/(1-γ) ] f t s a cycle-acton, so that they form two possble polynomal layers.

Roadmap of proof There two types of pvots: the newly chosen acton s ether on a path or on a cycle of the new polcy. In every m 2 n log(m ) consecutve pvot steps, there must be at least one step that s a cycle pvot. After every m log(m ) cycle pvot steps, there s an acton that would never re-enter as a cycle or path acton. There are at most n acton for such a downgrade. Item 2 result remans true when dscounts are not unform, but others do not hold.

Polcy structures of general factors The flow value of x depends on the smallest dscount factor (domnatng factor γ a ) on a same cycle. There are n dfferent dscount factors, so that there are n possble dfferent polynomal layers of x s.

Decomposed s-dual of MDP-LP mn s.t. 1 1 ( e ( e 1 There are m such dual LPs, and the optmal polcy s also optmal for each of them. x of a gven polcy on each s-dual form a sngle path+cycle or a sngle cycle. n n n c x p p ) x ) x x 1, 0, s, 0,. s, or

Roadmap of Proof Let(s,γ a ) denote a polcy where the cycle for the s-dual s domnated by γ a. In every m 2 n log(m ) consecutve pvot steps, there must be at least one step that s a cycle pvot. After every m 2 log(m ) cycle pvot steps, there s an acton that would never re-enter to form a (s,γ a ) polcy. There are at most nm such combnatons, and at most n actons for such a down-grade. Ths gves the overall pvot step bound.

Remarks and Open Problems Is the polcy teraton method a strongly polynomal tme algorthm for determnstc MDP? Is there a smplex method strongly polynomal for the determnstc turn-based stochastc game? Is there strongly polynomal tme algorthm for MDP wth varable dscounts, generalzed network flow, or even LP? Solve LPs wth a huge sze (bllon-dmenson) n practce? Lnear Programmng and the Smplex Method Story Contnues