Online Convex Programming and Generalized Infinitesimal Gradient Ascent



Similar documents
Chapter 8: Regression with Lagged Explanatory Variables

Multiprocessor Systems-on-Chips

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

Optimal Investment and Consumption Decision of Family with Life Insurance

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Stochastic Optimal Control Problem for Life Insurance

TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

How To Predict A Person'S Behavior

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

Online Convex Programming and Generalized Infinitesimal Gradient Ascent

MTH6121 Introduction to Mathematical Finance Lesson 5

Measuring macroeconomic volatility Applications to export revenue data,

DETERMINISTIC INVENTORY MODEL FOR ITEMS WITH TIME VARYING DEMAND, WEIBULL DISTRIBUTION DETERIORATION AND SHORTAGES KUN-SHAN WU

ARCH Proceedings

Individual Health Insurance April 30, 2008 Pages

On the degrees of irreducible factors of higher order Bernoulli polynomials

E0 370 Statistical Learning Theory Lecture 20 (Nov 17, 2011)

Appendix D Flexibility Factor/Margin of Choice Desktop Research

INTRODUCTION TO FORECASTING

An Online Learning-based Framework for Tracking

Option Put-Call Parity Relations When the Underlying Security Pays Dividends

Chapter 7. Response of First-Order RL and RC Circuits

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Why Did the Demand for Cash Decrease Recently in Korea?

Network Effects, Pricing Strategies, and Optimal Upgrade Time in Software Provision.

Cointegration: The Engle and Granger approach

4. International Parity Conditions

Real-time Particle Filters

Longevity 11 Lyon 7-9 September 2015

Online Learning with Sample Path Constraints

Optimal Life Insurance Purchase, Consumption and Investment

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

Inventory Planning with Forecast Updates: Approximate Solutions and Cost Error Bounds

A Re-examination of the Joint Mortality Functions

Economics Honors Exam 2008 Solutions Question 5

Appendix A: Area. 1 Find the radius of a circle that has circumference 12 inches.

Optimal Stock Selling/Buying Strategy with reference to the Ultimate Average

Chapter 2 Problems. 3600s = 25m / s d = s t = 25m / s 0.5s = 12.5m. Δx = x(4) x(0) =12m 0m =12m

Morningstar Investor Return

Inductance and Transient Circuits

The Grantor Retained Annuity Trust (GRAT)

Chapter 6: Business Valuation (Income Approach)

Analysis of Tailored Base-Surge Policies in Dual Sourcing Inventory Systems

The Application of Multi Shifts and Break Windows in Employees Scheduling

Task is a schedulable entity, i.e., a thread

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

Distributing Human Resources among Software Development Projects 1

Term Structure of Prices of Asian Options

policies are investigated through the entire product life cycle of a remanufacturable product. Benefiting from the MDP analysis, the optimal or

Research on Inventory Sharing and Pricing Strategy of Multichannel Retailer with Channel Preference in Internet Environment

Forecasting and Information Sharing in Supply Chains Under Quasi-ARMA Demand

Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software

Dependent Interest and Transition Rates in Life Insurance

Caring for trees and your service

Load Prediction Using Hybrid Model for Computational Grid

PATHWISE PROPERTIES AND PERFORMANCE BOUNDS FOR A PERISHABLE INVENTORY SYSTEM

Predicting Stock Market Index Trading Signals Using Neural Networks

Risk Modelling of Collateralised Lending

Chapter 13. Network Flow III Applications Edge disjoint paths Edge-disjoint paths in a directed graphs

Strategic Optimization of a Transportation Distribution Network

Cost-Sensitive Learning by Cost-Proportionate Example Weighting

A Two-Account Life Insurance Model for Scenario-Based Valuation Including Event Risk Jensen, Ninna Reitzel; Schomacker, Kristian Juul

Module 3 Design for Strength. Version 2 ME, IIT Kharagpur

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

The Kinetics of the Stock Markets

THE FIRM'S INVESTMENT DECISION UNDER CERTAINTY: CAPITAL BUDGETING AND RANKING OF NEW INVESTMENT PROJECTS

Molding. Injection. Design. GE Plastics. GE Engineering Thermoplastics DESIGN GUIDE

The Transport Equation

Table of contents Chapter 1 Interest rates and factors Chapter 2 Level annuities Chapter 3 Varying annuities

Niche Market or Mass Market?

Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

Time Consisency in Porfolio Managemen

Optimal Power Cost Management Using Stored Energy in Data Centers

Working Paper No Net Intergenerational Transfers from an Increase in Social Security Benefits

Making a Faster Cryptanalytic Time-Memory Trade-Off

adaptive control; stochastic systems; certainty equivalence principle; long-term

Answer, Key Homework 2 David McIntyre Mar 25,

cooking trajectory boiling water B (t) microwave time t (mins)

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

A Short Introduction to Boosting

Spectrum-Aware Data Replication in Intermittently Connected Cognitive Radio Networks

The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of

The option pricing framework

Vector Autoregressions (VARs): Operational Perspectives

Dynamic programming models and algorithms for the mutual fund cash balance problem

I. Basic Concepts (Ch. 1-4)

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

AP Calculus AB 2013 Scoring Guidelines

Optimal Life Insurance Purchase and Consumption/Investment under Uncertain Lifetime

STABILITY OF LOAD BALANCING ALGORITHMS IN DYNAMIC ADVERSARIAL SYSTEMS

Transcription:

Online Convex Programming and Generalized Infiniesimal Gradien Ascen Marin Zinkevich Carnegie Mellon Universiy, 5000 Forbes Avenue, Pisburgh, PA 1513 USA maz@cs.cmu.edu Absrac Convex programming involves a convex se F R n and a convex cos funcion c : F R. The goal of convex programming is o find a poin in F which minimizes c. In online convex programming, he convex se is known in advance, bu in each sep of some repeaed opimizaion problem, one mus selec a poin in F before seeing he cos funcion for ha sep. This can be used o model facory producion, farm producion, and many oher indusrial opimizaion problems where one is unaware of he value of he iems produced unil hey have already been consruced. We inroduce an algorihm for his domain. We also apply his algorihm o repeaed games, and show ha i is really a generalizaion of infiniesimal gradien ascen, and he resuls here imply ha generalized infiniesimal gradien ascen (GIGA) is universally consisen. 1. Inroducion Convex programming is a generalizaion of linear programming, wih many applicaions o machine learning. For example, one wans o find a hypohesis in a hypohesis space H ha minimizes absolue error (Boyd & Vandenberghe, 003), minimizes squared error (Hasie e al., 001), or maximizes he margin (Boser e al., 199) for he raining se X. If he hypohesis space consiss of linear funcions, hen hese problems can be solved using linear programming, leas-squares regression, and suppor vecor machines respecively. These are all examples of convex programming problems. Convex programming has oher applicaions, such as nonlinear faciliy locaion problems (Boyd & Vandenberghe, 003, pages 41-4), nework rouing problems (Bansal e al., 003), and consumer opimizaion problems (Boo, 003, pages 4 6). Oher examples of linear programming problems are meeing nuriional requiremens, balancing producion and consumpion in he naional economy, and producion planning(cameron, 1985, pages 36 39). Convex programming consiss of a convex feasible se F R n and a convex (valley-shaped) cos funcion c : F R. In his paper, we discuss online convex programming, in which an algorihm faces a sequence of convex programming problems, each wih he same feasible se bu differen cos funcions. Each ime he algorihm mus choose a poin before i observes he cos funcion. This models a number of opimizaion problems including indusrial producion and nework rouing, in which decisions mus be made before rue coss or values are known 1. This is a generalizaion of boh work in minimizing error online (Cesa- Bianchi e al., 1994; Kivinen & Warmuh, 1997; Gordon, 1999; Herbser & Warmuh, 001; Kivinen & Warmuh, 001) and of he expers problem (Freund & Schapire, 1999; Lilesone & Warmuh, 1989). In he expers problem, one has n expers, each of which has a plan a each sep wih some cos. A each round, one selecs a probabiliy disribuion over expers. If x R n is defined such ha x i is he probabiliy ha one selecs exper i, hen he se of all probabiliy disribuions is a convex se. Also, he cos funcion on his se is linear, and herefore convex. Repeaed games are closely relaed o he expers problem. In minimizing error online, one sees an unlabeled insance, assigns i a label, and hen receives some error based on how divergen he label given was from he rue label. The divergences used in previous work are 1 We expand on he nework rouing domain a he end of his secion. In paricular, our resuls have been applied (Bansal e al., 003) o solve he online oblivous rouing problem. Proceedings of he Twenieh Inernaional Conference on Machine Learning (ICML-003), Washingon DC, 003.

fixed Bregman divergences, e.g. squared error. In his paper, we make no disribuional assumpions abou he convex cos funcions. Also, we make no assumpions abou any relaionships beween successive cos funcions. Thus, expecing o choose he opimal poin a each ime sep is unrealisic. Insead, as in he analysis of he expers problem, we compare our cos o he cos of some oher offline algorihm ha selecs a fixed vecor. However, his oher algorihm knows in advance all of he cos funcions before i selecs his single fixed vecor. We formalize his in Secion.1. We presen an algorihm for general convex funcions based on gradien descen, called greedy projecion. The algorihm applies gradien descen in R n, and hen moves back o he se of feasible poins. There are hree advanages o his algorihm. The firs is ha gradien descen is a simple, naural algorihm ha is widely used, and sudying is behavior is of inrinsic value. Secondly, his algorihm is more general han he expers seing, in ha i can handle an arbirary sequence of convex funcions, which has ye o be solved. Finally, in online linear programs his algorihm can in some circumsances perform beer han an expers algorihm. While he bounds on he performance of mos expers algorihms depends on he number of expers, hese bounds are based on oher crieria which may someimes be lower. This relaionship is discussed furher in Secion 4, and furher commens on relaed work can be found in Secion 5. The main heorem is saed and proven in Secion.1. Anoher measure of he performance of gradien projecion is found in Secion., where we esablish resuls unlike hose usually found in online algorihms. We esablish ha he algorihm can perform well, even in comparison o an agen ha knows he sequence in advance and can move for some shor disance. This resul esablishes ha greedy projecion can handle environmens ha are slowly changing over ime and require frequen bu small modificaions o handle well. The algorihm ha moivaed his sudy was infiniesimal gradien ascen (Singh e al., 000), which is an algorihm for repeaed games. Firs, his resul shows ha infiniesimal gradien ascen is universally consisen (Fudenberg & Levine, 1995), and secondly i shows ha GIGA, a nonrivial exension developed here of infiniesimal gradien ascen o games wih more han wo acions, is universally consisen. GIGA is defined in Secion 3., and he proof is similar o ha The assumpions we do make are lised in he beginning of Secion. in (Freund & Schapire, 1999). Bansal e al. (003) formulae an online oblivious rouing problem as an online convex programming problem, and apply greedy projecion o obain good performance. In online oblivious rouing, one is in charge of minimizing nework congesion by programming a variey of rouers. A he beginning of each day, one chooses a flow for each source-desinaion pair. The se of all such flows is convex. Then, an adversary chooses a demand (number of packes) for each sourcedesinaion pair. The cos is he maximum congesion along any edge ha he algorihm has divided by he maximum congesion of he opimal rouing given he demand. The conribuion of his paper is a general soluion for a wide variey of problems, some solved, some unsolved. Someimes, hese resuls show new properies of exising algorihms, like IGA, and someimes, his work has resuled in new algorihms, like GIGA. Finally, he flexibiliy o choose arbirary convex funcions has already resuled in a soluion o a pracical online problem (Bansal e al., 003).. Online Convex Programming Definiion 1 A se of vecors S R n is convex if for all x, x S, and all λ [0, 1], λx + (1 λ)x S. Definiion For a convex se F, a funcion f : F R is convex if for all x, y F, for all λ [0, 1], λf(x) + (1 λ)f(y) f(λx + (1 λ)y) If one were o imagine a convex funcion R R, where he funcion described he aliude, hen he funcion would look like a valley. Definiion 3 A convex programming problem consiss of a convex feasible se F and a convex cos funcion c : F R. The opimal soluion is he soluion ha minimizes he cos. Definiion 4 An online convex programming problem consiss of a feasible se F R n and an infinie sequence {c 1, c,... } where each c : F R is a convex funcion. A each ime sep, an online convex programming algorihm selecs a vecor x F. Afer he vecor is seleced, i receives he cos funcion c. Because all informaion is no available before decisions are made, online algorihms do no reach soluions, bu insead achieve cerain goals. See Secion.1.

Define x = x x and d(x, y) = x y. Throughou he remainder of he paper we will make seven assumpions: 1. The feasible se F is bounded. There exiss N R such ha for all x, y F, d(x, y) N.. The feasible se F is closed. For all sequences {x 1, x,... } where x F for all, if here exiss a x R n such ha x = lim x, hen x F. 3. The feasible se F is nonempy. There exiss an x F. 4. For all, c is differeniable 3. 5. There exiss an N R such ha for all, for all x F, c (x) N. 6. For all, here exiss an algorihm, given x, which produces c (x). 7. For all y R n, here exiss an algorihm which can produce argmin x F d(x, y). We define he projecion P (y) = argmin x F d(x, y). Given his machinery, we can describe our algorihm. Algorihm 1 Greedy Projecion Selec an arbirary x 1 F and a sequence of learning raes η 1, η,... R +. In ime sep, afer receiving a cos funcion, selec he nex vecor x +1 according o: x +1 = P ( x η c (x ) ). The basic principle a work in his algorihm is quie clear if we consider he case where he sequence {c 1, c,... } is consan. In his case, our algorihm is operaing in an unchanging valley. The boundary of he feasible se is he edge of he valley. By proceeding along he direcion opposie he gradien, we walk down ino he valley. By projecing back ino he convex se, we skir he edges of he valley..1. Analyzing he Performance of he Algorihm Wha we would like o do is o prove greedy projecion works for any sequence of convex funcions, even if hese convex funcion are unrelaed o one anoher. In his scenario, we canno hope choose a poin x i ha minimizes c i, because c i can be anyhing. Insead we ry o minimize regre. 3 Alhough we make he assumpion ha c is differeniable, he algorihm can also work if here exiss an algorihm ha, given x, can produce a vecor g such ha for all y, g (y x) c (y) c (x). We calculae our regre by comparing ourselves o an offline algorihm ha has all of he informaion before i has o make any decisions, bu is more resriced han we are in he choices i can make. For example, imagine ha he offline algorihm has access o he firs T cos funcions a he beginning. However, i can only choose one vecor x F. Then he offline algorihm can aemp o minimize he cos funcion c(x) = T i=1 ci (x). This is an offline convex programming problem. Regre is he difference beween our cos and he cos of he offline algorihm. Average regre is he regre divided by T, he number of rounds. Definiion 5 Given an algorihm A, and a convex programming problem (F, {c 1, c,... }), if {x 1, x... } are he vecors seleced by A, hen he cos of A unil ime T is C A (T ) = c (x ). The cos of a saic feasible soluion x F unil ime T is C x (T ) = c (x). The regre of algorihm A unil ime T is R A (T ) = C A (T ) min x F C x(t ). As he sequences ge longer, he offline algorihm finds iself a less of an advanage. If he sequence of cos funcions is relaively saionary, hen an online algorihm can learn wha he cos funcions will look like in he fuure. If he sequence of cos funcions varies drasically, hen he offline algorihm will no be able o ake advanage of his because i selecs a fixed poin. Our goal is o prove ha he average regre of Greedy Projecion approaches zero. In order o sae our resuls abou bounding he regre of his algorihm, we need o specify some parameers. Firs, le us define: F = max d(x, y) x,y F c = max x F, {1,,... } c (x). Here is he firs resul derived in his paper: Theorem 1 If η = 1/, he regre of he Greedy Projecion algorihm is: R G (T ) F T + ( ) T 1 c Therefore, lim sup T R G (T )/T 0.

The firs par of he bound is because we migh begin on he wrong side of F. The second par is a resul of he fac ha we always respond afer we see he cos funcion. Proof: Firs we show ha wihou loss of generaliy, for all here exiss a g R n such ha for all x, c (x) = g x. Firs, begin wih arbirary {c 1, c,... }, run he algorihm and compue {x 1, x,... }. Then define g = c (x ). If we were o change c such ha for all x, c (x) = g x, he behavior of he algorihm would be he same. Would he regre be he same? Because c is convex, for all x: c (x) ( c (x )) (x x ) + c (x ). Se x o be a saically opimal vecor. Then, because x F : c (x ) g (x x ) + c (x ). Thus: c (x ) c (x ) c (x ) ( g (x x ) + c (x ) ) g x g x Thus he regre would be a leas as much wih he modified sequence of funcions. We define for all, y +1 = x η g. Observe ha x +1 = P (y +1 ). We will aemp o bound he regre of no playing acion x on round. y +1 x = (x x ) η g (y +1 x ) = (x x ) η (x x ) g +η g Observe ha in he expression a ab + b, a is a poenial, ab is he immediae cos, and b is he error (wihin a facor of η ). We will now begin o fully flush ou hese properies. For all y R n, for all x F, (y x) (P (y) x). Also, g c. So (x +1 x ) (x x ) η (x x ) g +η c (x x ) g 1 ( (x x ) (x +1 x ) ) η + η c Now, by summing we ge: R G (T ) = (x x ) g 1 η ( (x x ) (x +1 x ) ) + η c 1 (x 1 x ) 1 (x T +1 x ) η 1 η T + 1 ( 1 1 ) (x x ) η η 1 = + c η ( F 1 + 1 η 1 + c η = F 1 + c η T Now, if we define η = 1, hen η = 1 + 1 + 1 T ( 1 1 ) ) η η 1 d ] T [ T 1 Plugging his ino he above equaion yields he resul... Regre Agains a Dynamic Sraegy Anoher possibiliy for he offline algorihm is o allow a small amoun of change. For example, imagine ha he pah ha he offline algorihm follows is of limied size. Definiion 6 The pah lengh of a sequence x 1,..., x T is: 1 d(x i, x i+1 ). i=1 1 η

Define A(T, L) o be he se of sequences wih T vecors and a pah lengh less han or equal o L. Definiion 7 Given an algorihm A and a maximum pah lengh L, he dynamic regre R A (T, L) is: R A (T, L) = C A (T ) min A A(T,L) C A (T ). Theorem If η is fixed, he dynamic regre of he Greedy Projecion algorihm is: R G (T, L) 7 F 4η + L F η + T η c The proof is similar o he proof of Theorem 1, and is included in he full version of he paper (Zinkevich, 003). 3. Generalized Infiniesimal Gradien Ascen In his secion, we esablish ha repeaed games are online linear programming problems, and an applicaion of our algorihm is universally consisen. 3.1. Repeaed Games From he perspecive of one player, a repeaed game is wo ses of acions A and B, and a uiliy funcion u : A B R. A pair in A B is called a join acion. For he example in his secion, we will hink of a maching game. A = {a 1, a, a 3 }, B = {b 1, b, b 3 }, where u(a 1, b 1 ) = u(a, b ) = u(a 3, b 3 ) = 1, and everywhere else u is zero. As a game is being played, a each sep he player will be selecing an acion a random based on pas join acions, and he environmen will be selecing an acion a random based on pas join acions. We will formalize his laer. A hisory is a sequence of join acions. H = (A B) is he se of all hisories of lengh. Define H = =0 H o be he se of all hisories, and for any hisory h H, define h o be he lengh of ha hisory. An example of a hisory is: h = {(a 3, b 1 ), (a 1, b ), (a, b 3 ), (a, b ), (a, b )} In order o access he hisory, we define h i o be he ih join acion. Thus, h 3 = (a, b 3 ), h 1,1 = a 3 and h 5, = b. The uiliy of a hisory h H is: h u oal (h) = u(h i,1, h i, ). i=1 The uiliy of he above example is u oal (h) =. We can define wha he hisory would look like if we replaced he acion of he player wih a a each ime sep. h a = {(a, b 1 ), (a, b ), (a, b 3 ), (a, b ), (a, b )} Now, u oal (h a ) = 3. Thus we would have done beer playing his acion all he ime. The definiion of regre of no playing acion a for all h H, for all a A is: R a (h) = u oal (h a ) u oal (h) In his example, he regre of no playing acion a is R a (h) = 3 = 1. This regre of no playing an acion need no be posiive. For example, R a1 (h) = 1 = 1. Now, we define he maximum regre, or jus regre, o be: R(h) = max a A R a (h). Here R(h) = 1. The mos imporan aspec of his definiion of regre is ha regre is a funcion of he resuling hisory, independen of he sraegies ha generaed ha hisory. Now, we inroduce he definiion of he behavior and he environmen. For any se S, define (S) o be he se of all probabiliies over S. For a disribuion D and a boolean predicae P, we use he noaion Pr x D [P (x)] o indicae he probabiliy ha P (x) is rue given ha x was seleced from D. A behavior σ : H (A) is a funcion from hisories of pas acions o disribuions over he nex acion of he player. An environmen ρ : H (B) is a funcion from he hisory of pas acions o disribuions over he nex acion of he environmen. Define H = (A B) o be he se of all infinie hisories. We define F σ,ρ (H ) o be he disribuion over infinie hisories where he player chooses is nex acion according o σ and he environmen chooses is nex acion according o ρ. For all h H, define h() o be he firs join acions of h. Definiion 8 A behavior σ is universally consisen if for any ɛ > 0 here exiss a T such ha for all ρ: [ Pr > T, R(h()) ] > ɛ < ɛ. h F σ,ρ Afer some ime, wih high probabiliy he average regre is low. Observe ha his convergence over ime is uniform over all environmens.

3.. Formulaing a Repeaed Game as an Online Linear Program For simpliciy, suppose ha we consider he case where A = {1,..., n}. Before each ime sep in a repeaed game, we selec a disribuion over acions. This can be represened as a vecor in an n-sandard closed simplex, he se of all poins x R n such ha for all i, x i 0, and n i=1 x i = 1. Define his o be F. Since we have a uiliy u insead of cos c, we will perform gradien ascen insead of descen. The uiliy u is a linear funcion when he environmen s acion becomes known. Algorihm (Generalized Infiniesimal Gradien Ascen) Choose a sequence of learning raes {η 1, η,... }. Begin wih an arbirary vecor x 1 F. Then for each round : 1. Play according o x : play acion i wih probabiliy x i.. Observe he acion h, of he oher player and calculae: y +1 i = x i + η u(i, h, ) x +1 = P (y +1 ) where P (y) = argmin x F d(x, y), as before. Theorem 3 Seing η = 1, GIGA is universally consisen. The proof is in he full version of he paper (Zinkevich, 003), and we provide a skech here. As a direc resul of Theorem 1, we can prove ha if he environmen plays any fixed sequence of acions, our regre goes o zero. Using a echnique similar o Secion 3.1 of (Freund & Schapire, 1999), we can move from his resul o convergence wih respec o an arbirary, adapive environmen. 4. Convering Old Algorihms In his secion, in order o compare our work wih ha of ohers, we show how one can naïvely ranslae algorihms for mixing expers ino algorihms for online linear programs, and online linear programming algorihms ino algorihms for online convex programs. This secion is a discussion and no formal proofs are given. 4.1. Formal Definiions We begin wih defining he exper s problem. Definiion 9 An expers problem is a se of expers E = {e 1,..., e n } and a sequence of cos vecors c 1, c,... where for all i, c i R n. On each round, an exper algorihm (EA) firs selecs a disribuion D (E), and hen observes a cos vecor c. We assume ha he EA can handle boh posiive and negaive values. If no, i can be easily exended by shifing he values ino he posiive range. Definiion 10 An online linear programming problem is a closed convex polyope F R n and a sequence of cos vecors c 1, c,... where for all i,c i R n. On each round, an online linear programming algorihm (OLPA) firs plays a disribuion D (F ), and hen observes a cos vecor c. An OLPA can be consruced from an EA, as described below. Algorihm 3 Define v 1,..., v k o be he verices of he polyope for an online linear program. Choose E = {e 1,..., e k } o be he expers, one for each verex. On each round, receive a disribuion D from he EA, and selec vecor v i if exper e i is seleced. Define c R k such ha c i = c v i. Send EA he cos vecor c R k. The opimal saic vecor mus be a verex of he polyope, because a linear program always has a soluion a a verex of he polyope. If he original EA can do almos as well as he bes exper, his OLPA can do a leas as well as he bes saic vecor. The second observaion is ha mos EA have bounds ha depend on he number of expers. The number of verices of he convex polyope is oally unrelaed o he diameer, so any normal exper s bound is incomparable o our bound on Greedy Projecion. There are some EA ha begin wih a disribuion or uneven weighing over he expers. These EA may perform beer in his scenario, because ha one migh be able o weak he disribuion such ha i is spread evenly over he space (in some way) and no he expers, giving more weigh o lonely verices and less weigh o clusered verices. 4.. Convering an OLPA o an Online Convex Programming Algorihm There are wo reasons ha he algorihm described above will no work for an online convex program. The firs is ha an online convex program can have an

arbirary convex shape as a feasible region, such as a circle, which canno be described as he convex hull of any finie number of poins. The second reason is ha a convex funcion may no have a minimum on he edge of he feasible se. For example, if F = {x : x x 1} and c(x) = x x, he minimum is in he cener of he feasible se. Now, his firs issue is difficul o handle direcly 4, so we will simply assume ha he OLPA can handle he feasible region of he online convex programming problem. This can be eiher because ha he OLPA can handle an arbirary convex region as in Kalai and Vempala (00), or because ha he convex region of he convex programming problem is a convex polyope. We handle he second issue by convering he cos funcion o a linear one. In Theorem 1, we find ha he wors case is when he cos funcion is linear. This assumpion depends on wo properies of he algorihm; he algorihm is deerminisic, and he only propery of he cos funcion c ha is observed is c (x ). Now, we form an Online Convex Programming algorihm in he following way. Algorihm 4 On each round, receive D from he OLPA, and play x = E X D [X]. Send he OLPA he cos vecor c (x ). The algorihm is discree and only observes he gradien a he poin x, hus we can assume ha he cos funcion is linear. If he cos funcion is linear, hen: E X D [c (X)] = c (E X D [X]). x may be difficul o compue, and we address his issue in he full version of he paper. 5. Relaed Work Kalai and Vempala (00) have developed algorihms o solve online linear programming, which is a specific ype of online convex programming. They are aemping o make he algorihm behave in a lazy fashion, changing is vecor slowly, whereas here we are aemping o be more dynamic, as is highlighed in Secion.. These algorihms were moivaed by he algorihm of (Singh e al., 000) which applies gradien ascen o repeaed games. We exend heir algorihm o games wih an arbirary number of acions, and prove universal consisency. There has been exensive work on 4 One can approximae a convex region by a series of increasingly complex convex polyopes, bu his soluion is very undesirable. regre in repeaed games and in he expers domain, such as (Blackwell, 1956; Foser & Vohra, 1999; Foser, 1999; Freund & Schapire, 1999; Fudenberg & Levine, 1995; Fudenberg & Levine, 1997; Hannan, 1957; Har & Mas-Colell, 000; Har & Mas-Colell, 001; Lilesone & Warmuh, 1989). Wha makes his work noeworhy in a very old field is ha i proves ha a widely-used echnique in arificial inelligence, gradien ascen, has a propery ha is of ineres o hose in game heory. As saed in Secion 4, expers algorihms can be used o solve online online linear programs and online convex programming problems, bu he bounds may become significanly worse. There are several sudies of online gradien descen and relaed updae funcions, for example (Cesa-Bianchi e al., 1994; Kivinen & Warmuh, 1997; Gordon, 1999; Herbser & Warmuh, 001; Kivinen & Warmuh, 001). These sudies focus on predicion problems where he loss funcions are convex Bregman divergences. In his paper, we are considering arbirary convex funcions, in problems ha may or may no involve predicion. Finally, in he offline case, (Della Piera e al., 1999) have done work on proving ha gradien descen and projecion for arbirary Bregman disances converges o he opimal soluion. 6. Fuure Work Here we deal wih a Euclidean geomery: wha if one considered gradien descen on a noneuclidean geomery, like (Amari, 1998; Mahony & Williamson, 001)? I is also possible o conceive of cos funcions ha do no jus depend on he mos recen vecor, bu on every previous vecor. 7. Conclusions In his paper, we have defined an online convex programming problem. We have esablished ha gradien descen is a very effecive algorihm on his problem, because ha he average regre will approach zero. This work was moivaed by rying o beer undersand he infiniesimal gradien ascen algorihm, and he echniques developed we applied o ha problem o esablish an exension o infiniesimal gradien ascen ha is universally consisen. Acknowledgemens This work was suppored in par by NSF grans CCR- 0105488, NSF-ITR CCR-01581, and NSF-ITR IIS- 011678. Any errors or omissions in he work are he

sole responsibiliy of he auhor. We would like o hank Pa Riley for grea help in developing he algorihm for he case of repeaed games, Adam Kalai for improving he proof and bounds of he main heorem, and Michael Bowling, Avrim Blum, Nikhil Bansal, Geoff Gordon, and Manfred Warmuh for heir help and suggesions wih his research. References Amari, S. (1998). Naural gradien works efficienly in learning. Neural Compuaion, 10, 51 76. Bansal, N., Blum, A., Chawla, S., & Meyerson, A. (003). Online oblivious rouing. Fifeenh ACM Symposium on Parallelism in Algorihms and Archiecure. Blackwell, D. (1956). An analog of he minimax heorem for vecor payoffs. Souh Pacific J. of Mahemaics, 1 8. Boo, J. (003). Quadraic programming: Algorihms, anomolies, applicaions. Rand McNally & Co. Boser, B., Guyon, I., & Vapnik, V. (199). A raining algorihm for opimal margin classifiers. Proceedings of he Fifh Annual Conference on Compuaional Learning Theory. Boyd, S., & Vandenberghe, L. (003). Convex opimizaion. In press, available a hp://www.sanford.edu/~boyd/cvxbook.hml. Cameron, N. (1985). Inroducion o linear and convex programming. Cambridge Universiy Press. Cesa-Bianchi, N., Long, P., & Warmuh, M. K. (1994). Wors-case quadraic bounds for on-line predicion of linear funcions by gradien descen. IEEE Transacions on Neural Neworks, 7, 604 619. Della Piera, S., Della Piera, V., & Laffery, J. (1999). Dualiy and auxilary funcions for Bregman disances (Technical Repor CMU-CS-01-109). Carnegie Mellon Universiy. Foser, D. (1999). A proof of calibraion via Blackwell s approachabiliy heorem. Games and Economic Behavior (pp. 73 79). Foser, D., & Vohra, R. (1999). Regre in he on-line decision problem. Games and Economic Behavior, 9, 7 35. Freund, Y., & Schapire, R. (1999). Adapive game playing using muliplicaive weighs. Games and Economic Behavior (pp. 79 103). Fudenberg, D., & Levine, D. (1995). Universal consisency and cauious ficiious play. Journal of Economic Dynamics and Conrol, 19, 1065 1089. Fudenberg, D., & Levine, D. (1997). Condiional universal consisency. Available a hp://ideas.repec.org/s/cla/levarc.hml. Gordon, G. (1999). Approximae soluions o markov decision processes. Docoral disseraion, Carnegie Mellon Universiy. Hannan, J. (1957). Approximaion o bayes risk in repeaed play. Annals of Mahemaics Sudies, 39, 97 139. Har, S., & Mas-Colell, A. (000). A simple adapive procedure leading o correlaed equilibrium. Economerica, 68, 117 1150. Har, S., & Mas-Colell, A. (001). A general class of adapive sraegies. Journal of Economic Theory, 98, 6 54. Hasie, T., Tibishirani, R., & Friedman, J. (001). The elemens of saisical learning. Springer. Herbser, M., & Warmuh, M. K. (001). Tracking he bes linear predicor. Journal of Machine Learning Research, 1, 81 309. Kalai, A., & Vempala, S. (00). Geomeric algorihms for online opimizaion (Technical Repor). MIT. Kivinen, J., & Warmuh, M. (1997). Exponeniaed gradien versus gradien descen for linear predicors. Informaion and Compuaion, 13, 1 64. Kivinen, J., & Warmuh, M. (001). Relaive loss bounds for mulidimensional regression problems. Machine Learning Journal, 45, 301 39. Lilesone, N., & Warmuh, M. K. (1989). The weighed majoriy algorihm. Proceedings of he Second Annual Conference on Compuaional Learning Theory. Mahony, R., & Williamson, R. (001). Prior knowledge and preferenial srucures in gradien descen algorihms. Journal of Machine Learning Research, 1, 311 355. Singh, S., Kearns, M., & Mansour, Y. (000). Nash convergence of gradien dynamics in general-sum games. Proceedings of he Sixeenh Conference in Uncerainy in Arificial Inelligence (pp. 541 548). Zinkevich, M. (003). Online convex programming and generalized infiniesimal gradien ascen (Technical Repor CMU-CS-03-110). CMU.