We addressed the problem of developing a model to simulate at a high level of detail the movements of over

Similar documents

Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.

Morningstar Investor Return

PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE

Hedging with Forwards and Futures

Performance Center Overview. Performance Center Overview 1

Journal Of Business & Economics Research September 2005 Volume 3, Number 9

Chapter 1.6 Financial Management

The Application of Multi Shifts and Break Windows in Employees Scheduling

Multiprocessor Systems-on-Chips

Strategic Optimization of a Transportation Distribution Network

TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS

The Transport Equation

Risk Modelling of Collateralised Lending

Dynamic programming models and algorithms for the mutual fund cash balance problem

Distributing Human Resources among Software Development Projects 1

LEASING VERSUSBUYING

AP Calculus AB 2010 Scoring Guidelines

Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID

ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS

Task is a schedulable entity, i.e., a thread

Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1

Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar

Term Structure of Prices of Asian Options

Individual Health Insurance April 30, 2008 Pages

11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements

Planning Demand and Supply in a Supply Chain. Forecasting and Aggregate Planning

Chapter 6: Business Valuation (Income Approach)

THE SINGLE-NODE DYNAMIC SERVICE SCHEDULING AND DISPATCHING PROBLEM

A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation

Chapter 8: Regression with Lagged Explanatory Variables

DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS

The Grantor Retained Annuity Trust (GRAT)

Chapter 7. Response of First-Order RL and RC Circuits

How To Predict A Person'S Behavior

Measuring macroeconomic volatility Applications to export revenue data,

Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)

cooking trajectory boiling water B (t) microwave time t (mins)

USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES

PATHWISE PROPERTIES AND PERFORMANCE BOUNDS FOR A PERISHABLE INVENTORY SYSTEM

Strategic, Tactical and Real Time Planning of Locomotives at Norfolk Southern Using Approximate Dynamic Programming

Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.

C Fast-Dealing Property Trading Game C

As widely accepted performance measures in supply chain management practice, frequency-based service

Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary

UNDERSTANDING THE DEATH BENEFIT SWITCH OPTION IN UNIVERSAL LIFE POLICIES. Nadine Gatzert

AP Calculus BC 2010 Scoring Guidelines

Making a Faster Cryptanalytic Time-Memory Trade-Off

Capital Budgeting and Initial Cash Outlay (ICO) Uncertainty

Forecasting, Ordering and Stock- Holding for Erratic Demand

GOOD NEWS, BAD NEWS AND GARCH EFFECTS IN STOCK RETURN DATA

Real-time Particle Filters

BALANCE OF PAYMENTS. First quarter Balance of payments

Vector Autoregressions (VARs): Operational Perspectives

Option Put-Call Parity Relations When the Underlying Security Pays Dividends

Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt

MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR

Analysis of Pricing and Efficiency Control Strategy between Internet Retailer and Conventional Retailer

Automatic measurement and detection of GSM interferences

SPEC model selection algorithm for ARCH models: an options pricing evaluation framework

Nikkei Stock Average Volatility Index Real-time Version Index Guidebook

THE FIRM'S INVESTMENT DECISION UNDER CERTAINTY: CAPITAL BUDGETING AND RANKING OF NEW INVESTMENT PROJECTS

Inventory Planning with Forecast Updates: Approximate Solutions and Cost Error Bounds

Acceleration Lab Teacher s Guide

Chapter 5. Aggregate Planning

A Two-Account Life Insurance Model for Scenario-Based Valuation Including Event Risk Jensen, Ninna Reitzel; Schomacker, Kristian Juul

The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas

Optimal Stock Selling/Buying Strategy with reference to the Ultimate Average

Motion Along a Straight Line

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees.

Price Controls and Banking in Emissions Trading: An Experimental Evaluation

DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR

Hotel Room Demand Forecasting via Observed Reservation Information

The Impact of Surplus Distribution on the Risk Exposure of With Profit Life Insurance Policies Including Interest Rate Guarantees

SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS

Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand

I. Basic Concepts (Ch. 1-4)

ARCH Proceedings

Working Paper No Net Intergenerational Transfers from an Increase in Social Security Benefits

Table of contents Chapter 1 Interest rates and factors Chapter 2 Level annuities Chapter 3 Varying annuities

Chapter Four: Methodology

Usefulness of the Forward Curve in Forecasting Oil Prices

The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1

A Universal Pricing Framework for Guaranteed Minimum Benefits in Variable Annuities *

Chapter 2 Problems. 3600s = 25m / s d = s t = 25m / s 0.5s = 12.5m. Δx = x(4) x(0) =12m 0m =12m

Economics Honors Exam 2008 Solutions Question 5

PRACTICES AND ISSUES IN OPERATIONAL RISK MODELING UNDER BASEL II

Why Did the Demand for Cash Decrease Recently in Korea?

17 Laplace transform. Solving linear ODE with piecewise continuous right hand sides

Chapter 2 Kinematics in One Dimension

Can Individual Investors Use Technical Trading Rules to Beat the Asian Markets?

Research on Inventory Sharing and Pricing Strategy of Multichannel Retailer with Channel Preference in Internet Environment

Niche Market or Mass Market?

We consider a decentralized assembly system in which a buyer purchases components from several first-tier

4 Convolution. Recommended Problems. x2[n] 1 2[n]

policies are investigated through the entire product life cycle of a remanufacturable product. Benefiting from the MDP analysis, the optimal or

LIFE INSURANCE WITH STOCHASTIC INTEREST RATE. L. Noviyanti a, M. Syamsuddin b

Trends in TCP/IP Retransmissions and Resets

Mortality Variance of the Present Value (PV) of Future Annuity Payments

AP Calculus AB 2013 Scoring Guidelines

LEVENTE SZÁSZ An MRP-based integer programming model for capacity planning...3

Transcription:

Published online ahead of prin Augus 15, 28 Aricles in Advance, pp. 1 2 issn 41-1655 eissn 1526-5447 informs doi 1.1287/rsc.18.238 28 INFORMS An Approximae Dynamic Programming Algorihm for Large-Scale Flee Managemen: A Case Applicaion Hugo P. Simão Deparmen of Operaions Research and Financial Engineering, Princeon Universiy, Princeon, New Jersey 8544, hpsimao@princeon.edu Jeff Day Schneider Naional, Green Bay, Wisconsin 5436, dayj@schneider.com Abraham P. George Deparmen of Operaions Research and Financial Engineering, Princeon Universiy, Princeon, New Jersey 8544, ageorge@princeon.edu Ted Gifford, John Nienow Schneider Naional, Green Bay, Wisconsin 5436 {gifford@schneider.com, nienowj@schneider.com} Warren B. Powell Deparmen of Operaions Research and Financial Engineering, Princeon Universiy, Princeon, New Jersey 8544, powell@princeon.edu We addressed he problem of developing a model o simulae a a high level of deil he movemens of over 6, drivers for Schneider Naional, he larges ruckload moor carrier in he Unied Ses. The goal of he model was no o obin a beer soluion bu raher o closely mach a number of operaional sisics. In addiion o he need o capure a wide range of operaional issues, he model had o mach he performance of a highly skilled group of dispachers while also reurning he marginal value of drivers domiciled a differen locaions. These requiremens diced ha i was no enough o opimize a each poin in ime (somehing ha could be easily handled by a simulaion model) bu also over ime. The projec required bringing ogeher years of research in approximae dynamic programming, merging mah programming wih machine learning, o solve dynamic programs wih exremely high-dimensional se variables. The resul was a model ha closely calibraed agains real-world operaions and produced accurae esimaes of he marginal value of 3 differen ypes of drivers. Key words: flee managemen; ruckload rucking; approximae dynamic programming; driver managemen Hisory: Received: February 27; revision received: Augus 27; acceped: April 28. Published online in Aricles in Advance. In 23, Schneider Naional, he larges ruckload moor carrier in he Unied Ses, conraced wih CASTLE Laboraory a Princeon Universiy, Princeon, New Jersey, o develop a model ha would simulae is long-haul ruckload operaions o perform analyses o answer quesions ranging from he size and mix of is driver pool o quesions abou valuing conracs and geing drivers home. The requiremens for he simulaor seemed quie simple: i had o capure he dynamics of he real problem, producing behaviors ha closely mached corporae performance along several dimensions, and i had o provide esimaes of he marginal value of differen ypes of drivers. If he model accuraely mached hisorical performance, he company would be able o use he sysem o es changes in he mix of drivers, he mix of freigh, and oher operaing policies. The major challenge we faced was ha hese requiremens mean ha we had o do much more han jus develop a classical simulaor. I was no enough o opimize decisions (in he form of maching drivers o loads) a a poin in ime. The model had o opimize decisions over ime o ke ino accoun downsream impacs. Formulaing he problem as a deerminisic, ime-space nework problem was boh compuionally inracble (he problem is huge) and oo limiing (we needed o model differen forms of unceriny as well as a high degree of realism ha 1

2 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS was beyond he capabiliies of classical mah programs). Classical echniques from Markov decision processes applied o his seing are limied o problems wih only a small number of idenical rucks moving beween a few locaions (see Powell 1988 or Kleyweg, Nori, and Savelsbergh 24). Our problem involved modeling housands of drivers a a high level of deil. We solved he problem using approximae dynamic programming (ADP), bu even classical ADP echniques (Bersekas and Tsisiklis 1996; Suon and Baro 1998) would no handle he requiremens of his projec. Three years of developmen produced a model ha closely maches a range of hisorical merics. Achieving his goal required drawing on he research of hree Ph.D. disserions (Spivey 21; Marar 22; George 25) and depended on he exensive paricipaion of he sponsor o produce a model ha accuraely simulaed operaions. The model is able o handle a hos of engineering deils o allow he sponsor o run a broad range of simulaions. To esblish credibiliy, he model had o mach he hisorical performance of a dozen major operaing sisics. Two of paricular impornce o our presenion included maching he average lengh of haul for differen ypes of drivers and geing drivers home wih he same frequency as he company. A cenral hypohesis of he research, which is suppored by he evidence we presen in his paper, was ha he behavior of a group of dispachers could be described by an opimizaion model using a suibly designed objecive funcion. The conribuions of his paper include: (1) We show, for he firs ime in a producion seing for a ruckload moor carrier, ha approximae dynamic programming can provide high-qualiy soluions while capuring operaional issues a a high level of deil, including all business rules such as hours of service, reurning drivers home, and operaional resricions on he use of specific driver ypes. This appears o be he firs opimizaion model of any form ha capures he complex dynamics of a ruckload moor carrier where decisions produce behavior ha opimizes over ime. (2) We demonsrae ha he framework of approximae dynamic programming, wih mehods adaped o his problem class, produces a model ha accuraely capures he performance of a well-run company based on comparisons wih hisorical merics. This appears o be he firs demonsraed calibraion of an opimizaion model for ruckload rucking for planning purposes. (3) We show ha he value funcion approximaions used in he dynamic programming formulaion produce accurae esimaes of he marginal value of paricular driver ypes (for example, he value of adding addiional eam drivers domiciled in a paricular region) over he enire simulaion when compared agains brue-force derivaives compued using he model (adding addiional drivers and running he simulaion again). These marginal values would no be available from a radiional simulaor (which does no use he framework of dynamic programming o capure he value of a driver over he enire simulaion). They mimic dual variables from a linear program (which is no able o handle he complex dynamics of his sysem). The presenion begins in 1 wih a general descripion of he problem. Secion 2 provides a formal model of he problem. Secion 3 describes he algorihmic sraegies ha are used, focusing primarily on he use of approximae dynamic programming o solve he problem of opimizing over ime. Secion 4 describes he resuls of calibraion experimens ha show ha he model closely maches hisorical performance, which required using recen research describing how o make cos-based models mach rule-based paerns. Then, 5 shows ha he model can be used o esimae he value of paricular ypes of drivers, which is hen used o change he mix of drivers. The value of a paricular ype of driver, which requires esimaing a derivaive of he simulaion, can only be achieved using he approximae dynamic programming sraegies ha were used o opimize over ime. Secion 6 concludes he paper. 1. Problem Descripion and Lieraure Review On he surface, ruckload rucking can appear o be a relaively simple operaional problem. A any poin in ime, here will be a se of drivers available o be dispached and a se of loads ha need o be moved (ypically from one ciy o anoher). The loads in his indusry are ypically quie long, generally requiring anywhere from one o four days o complee. As a resul, a a poin in ime we will assign a driver o a mos one load. This can easily be modeled as an assignmen problem, where he cos of assigning a driver o a load includes boh he cos of moving empy o pick up he load and he ne revenue from moving he load. In real applicaions, he problem is much richer. Whereas dispachers do heir bes o minimize he empy miles and move he mos profible loads, real decisions have o balance profis now and in he fuure as well as accomplish objecives such as geing drivers home in a reasonable amoun of ime. An imporn issue in his projec was maching hisorical behavior in erms of he average lengh of loads handled by differen ypes of drivers. We modeled hree capaciy ypes (using he erminology of he

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 3 carrier): eams (wo drivers in he same racor who could rade off driving and resing), solos (a single driver who had o res according o a schedule deermined by federal law), and ICs (independen conracors who owned he racors hey drove). Drivers in each of hese hree flees had differen expecions regarding he lenghs of he loads o which hey were assigned. Teams were generally given he longes loads so ha heir ol revenue per week would reasonably compensae wo people. Solos exhibied he shores average lengh of haul. Geing he model o mach hisorical performance for lengh of haul for each of he hree driver classes required special algorihmic measures. The sndard approach for modeling such largescale problems (we worked wih over 6, drivers) a a high level of deil would be o simply simulae decisions over ime. In his seing, his would involve solving a series of nework problems o assign drivers o loads a a poin in ime. Whereas such an approach would handle a high level of deil, he decisions would no be able o reflec he fuure impac of decisions made now. For example, his logic would no ke ino accoun ha sending a driver whose home is in Dallas on loads o Chicago is a good way of geing him home. I is also unable o realize ha a long (and high revenue) load from Maryland o Idaho is no as good as a shorer load from Maryland o Cleveland (which offers more opporuniies for drivers once hey unload). In addiion o producing an accurae simulaion of he company, we also waned o produce esimaes of he marginal value of differen ypes of drivers disinguished by heir home domicile and capaciy ype. For example, we would like o know he marginal value of adding 1 eams wih home domiciles in cenral Illinois. I is no pracical o run a simulaion, add 1 drivers of a paricular ype (here were 3 ypes), and simulae again. If his were repeaed 1 imes (o reduce sisical error), we would have o run 3, simulaions. There is fairly exensive lieraure on models and algorihms for he full ruckload problem and, in paricular, dynamic versions of he problem. Much of his work has solved sequences of deerminisic problems ha reflec only wha is known a a poin in ime (for reviews, see Psarafis 1995; Powell, Jaille, and Odoni 1995; Gendreau and Povin 1998; Larsen, Madsen, and Solomon 22). This work has ofen focused on he algorihmic challenge of solving problems in real ime (e.g., Gendreau e al. 1999; Taylor e al. 1999). A number of papers simulaed dynamic operaions o sudy quesions such as he value of real ime informaion or oher dynamic operaing policies (Tjokroamidjojo and Kunoglu 21; Regan, Mahmassani, and Jaille 1998; Chiu and Mahmassani 22; Yang, Jaille, and Mahmassani24). Ichoua, Gendreau, and Povin (26) also propose a policy for dynamically rouing vehicles wih he inen of opimizing over ime. Their research focuses on myopic policies ha adjus behavior now based on probabilisic esimaes of fuure demands. Secomandi(2, 21) provides a more formal reamen of policies for solving sochasic vehicle rouing problems. This line of research, however, is limied o single-vehicle rouing problems. The general problem of rouing drivers so hey reurn home on ime has received very lile aenion. Caliskan and Hall (23) propose a deerminisic model for rouing drivers in rucking, bu his model does no capure eiher he complexiy of drivers or he challenge of geing drivers home in he presence of he ype of unceriny ha characerizes ruckload rucking. There is a rich lieraure on planning pilo schedules capuring all he aribues of a pilo and a full se of work rules (see Desrosiers, Solomon, and Soumis 1995; Desaulniers e al. 1998). However, hese problems are deerminisic and benefi from he highly scheduled naure of airline operaions. Also, hese problems are much smaller han he problem we address here. A separae line of research has focused on developing models ha produce soluions ha opimize over an enire planning horizon. A summary of differen modeling and algorihmic sraegies for dynamic flee managemen problems is given in Powell (1988) and Powell, Jaille, and Odoni (1995). Early work in his area focused on managing large flees of relaively similar vehicles such as would arise in he opimizaion of empy freigh cars for railroads or in aggregae models of flees for ruckload moor carriers. Such problems could be formulaed as space-ime models (where each node represened a poin in space and ime) and solved as a nework problem if here was a single vehicle ype (see, for example, Whie 1972) or as a mulicommodiy flow problem if here were muliple vehicle ypes and subsiuion (Tapiero and Soliman 1972; Crainic, Ferland, and Rousseau 1984). These models do no allow us o model drivers wih any of he richness needed for our projec. The research closes o his projec is given in Spivey and Powell (24), which provides a formal model of he sochasic dynamic driver managemen problem. We build on his model bu inroduce a number of new sraegies o overcome challenges ha arose when we made he ransiion from a laboraory experimen o a producion applicaion. 2. Problem Formulaion We model he problem using he language of dynamic resource managemen (see Powell, Shapiro, and

4 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS Simão 21), where drivers are resources and loads are sks. The se of a single resource is defined by an aribue vecor a, composed of muliple aribues ha may be numerical or caegorical. For our model, we used a 1 Locaion a 2 Domicile a 3 Capaciy ype a 4 Scheduled ime a home a 5 Days away from home a = = a 6 Available ime a 7 Geographical consrains a 8 DOT road hours a 9 DOT duy hours Eigh-day duy hours a 1 = Se of all possible driver aribue vecors a. A brief discussion of he driver aribues (and he load aribues below) provides an appreciaion of some of he complexiies in an indusrial srengh sysem. Driver locaions were capured a a level ha produced 4 locaions around he counry. Driver domiciles were also capured a a level ha divided he counry ino 1 regions. As discussed earlier, here were hree capaciy ypes: eam, solo, and IC (independen conracor). The hree aribues (locaion, domicile, and capaciy ype) were paricularly imporn and will play a major role hroughou our analysis. Field a 4 is he ime by which we would like o ge he driver back home (e.g., nex Saurday), bu he cos of no doing his is also influenced by he number of days he driver has been away from home (a 5 ). Our abiliy o ge drivers home on ime was one of he major merics o which we had o calibrae. The remaining aribues were needed o produce an accurae simulaion. For example, a 6 (available ime) capured he fac ha a driver migh be headed o Chicago (a 1 = Chicago) bu would no arrive unil 3:17 p.m. omorrow (all aciviies were modeled in coninuous ime). Field a 7 capured consrains such as he fac ha Canadian drivers in he Unied Ses had o reurn o Canada, or ha oher drivers had o sy wihin 5 miles of heir homes. Fields a 8 and a 9 (Deparmen of Transporion (DOT) road hours and DOT duy hours) capured how many hours a driver had been behind he wheel (road hours) or on duy (duy hours) on a given day. Field a 1 is acually an eigh-elemen vecor, capuring he number of hours a driver had worked on each of he las eigh days. Similarly, we le b be he vecor of aribues of a load, including elemens such as origin, desinaion, appoinmen ime and ype, prioriy, revenue, and delivery window. Some windows are igh bu many are fairly loose, providing some flexibiliy in when a load is served. We le B be he space of all load ypes. We can hink of a, he aribue vecor of a driver a ime, as he se of he driver. We model he se of all he drivers using he resource se vecor, which is defined using R = The number of resources wih aribue vecor a a ime. R = The resource se vecor a ime. = R a. We hen le D b be he number of loads wih aribue b, and le D = D b b B. Our sysem se vecor is hen given by S = R D We measure he se S jus before we make a decision. These decision epochs are modeled in discree ime = 1 2 T, bu he physical process occurs in coninuous ime. For example, he available ime of a driver a 6 and he ready ime (ime a which i is available for pickup) of a load b 6 are boh coninuous. There are wo ypes of exogenous informaion processes: updaes o he aribues of a driver and new cusomer demands. We le R = The change in he number of drivers wih aribue a due o informaion arriving beween ime 1 and. D b = The number of new loads ha firs became known o he sysem wih aribue b beween ime 1 and. For example, D b =+1 if we have a new cusomer order wih aribue vecor b. If a driver aribue randomly changed from a o a (arising, for example, from a delay), we would have R = 1 and R =+1. We le W = R D be our generic variable for new informaion. We view informaion as arriving coninuously in ime, where he inerval beween ime insn 1 and is labeled as ime inerval. Thus, W 1 is he informaion ha arrives beween now ( = ) and he firs decision epoch ( = 1). The major decision classes for his problem include wheher a ruck is o be used o move a load o a paricular desinaion or wheher i needs o move empy o anoher locaion in he anicipaion of fuure loads wih beer rewards. When a decision is applied o a resource, i produces a conribuion. A loaded move would generae revenue, whereas an empy move would incur some cos. Decisions are described using d = An elemenry decision,

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 5 L = The se of all decisions o cover a ype of load, where an elemen d L represens a decision o cover a load of ype b d B, d = The decision o hold a driver, = L d, x d = The number of imes decision d is applied o resource wih aribue vecor a a ime, x = x d a d. The decision variables x d have o saisfy he following consrains: x d = R a (1) d x d D bd d L (2) a x d a d (3) Equaion (1) capures flow conservaion for drivers (we canno assign more han we have of a paricular ype) and Equaion (2) is flow conservaion on loads (we canno assign more drivers o loads of ype b d han here are loads of his ype). We le be he se of all x ha saisfy Equaions (1) (3). The feasible region depends on S. Raher han wrie S,we le he subscrip in indicae he dependence on he informaion available a ime. Finally, we assume ha decisions are deermined by a decision funcion denoed X S = A funcion ha deermines x given S, where, = A se of decision funcions (or policies). We nex need o model he dynamics of he sysem. Boh R and D evolve over ime, bu for he momen we focus purely on he evoluion of R. If we ac on a driver wih aribue a using decision d, we represen he change in he aribue vecor using a = a M a d We model he ransiion funcion deerminisically, which means ha a is he aribue vecor ha we hink resuls from a decision bu before any new informaion has arrived. So, if we decide o move a ruck from Dallas o Chicago leaving a ime 12.2 wih an expeced ravel ime of 17.5, hen immediaely afer he assignmen, his would be a ruck wih he aribue ha we expec i o be in Chicago a ime 29.7 (laer informaion may change his). For algebraic purposes, define { 1 if a M a d = a, a a d = oherwise. We now define he pos-decision resource vecor, which is he resource vecor afer we make a decision bu before any new informaion arrives. This can be wrien as: R x = a a d x d (4) a d Finally, our nex predecision resource vecor would be given by R +1 a = R x + R +1 a (5) I is more convenional in sochasic dynamic sysems o wrie he ransiion from R o R +1. Explicily capuring he pos-decision resource vecor provides significan compuional advanges, as we illusrae laer. The ransiion funcion for he demands is symmerical. In addiion o he se variable D, we would define he pos-decision demand vecor D x along wih an indicaor funcion similar o o describe how decisions change he aribues of a load. In he simples model, a demand is eiher moved (in which case i leaves he sysem) or i wais unil he nex ime period. In our projec, i was possible o have a driver move o pick up a load, move he load o an inermediae locaion, and hen drop i off so ha a differen driver could finish he move (his is known as a relay). Whereas such sraegies are used for only a small percenge of he ol demand, rucking companies will use such sraegies o help ge drivers home. A driver may pick up a load ha kes him oo far from his home. Insead, he may move he load par way so ha a differen driver can pick up he load and complee he rip. We define he objecive funcion using c d = The conribuion generaed by applying decision d o resource wih aribue vecor a a ime. The conribuions were divided beween hard dollar and sof dollar conribuions. Hard dollar conribuions include he revenue generaed from moving a load minus he cos of acually moving he ruck (firs moving empy o pick up he load, followed by he acual cos of moving he load). The sof dollar coss capure bonuses for geing he driver home, penalies for early or lae pick-up of a load, and penalies for geing a driver home before or afer he ime ha he was scheduled o ge home. If we assume ha he conribuions are linear, he conribuion funcion for period would be given by C S x = c d x d (6) a d The opimal policy maximizes he expeced sum of conribuions, discouned by a facor, over all he ime periods: { T } F S = max Ɛ C S X S S (7) =

6 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS One policy for solving his problem is he myopic policy, given by X M S = arg max c d x d x a d which involves assigning known drivers o known loads a each poin in ime. This is a sraighforward assignmen problem, involving he assigning of drivers (ha is, aribues a where R > ) o loads (aribues b where D b > ). One of he bigges challenges we faced was he sheer size of his problem, which involved over 2, available drivers and loads a each ime period. Using careful engineering, we limied he number of links per driver (or load) o approximaely 1, which sill required generaing abou 2, links for each ime period (he cosing of each link required considerable calculaions o enforce driver work rules and o handle he service consrains on each load). Given a soluion x = X M S, we would hen use our ransiion funcions o compue R x Dx and hen find R +1 D +1 by sampling R +1 and D +1. A cenral hypohesis of his research is ha an algorihm ha does a beer job of solving Equaion (7) will do a beer job of maching he hisorical performance of he company. Alhough we use approximaions, our sraegy works from a formal semen of he objecive funcion (somehing ha is ypically missing from mos simulaion papers) raher han heurisic policies. As we show, a by-produc of his sraegy is ha we also obin esimaes of he derivaive of F S wih respec o R a (for a a some level of aggregaion) ha would ell us he value of hiring addiional drivers in a paricular domicile. In he nex secion, we describe he sraegies we esed for solving Equaion (7). 3. Algorihmic Sraegies In dynamic programming, insead of solving Equaion (7) in is enirey, we divide he problem ino ime sges. A each ime period depending on our curren se, we can search over he se of available acions o idenify a subse ha is opimal. The value associaed wih each se can be compued using Bellman s opimaliy equaions, which are ypically wrien as ( V S =max C S x + ) p s S x V +1 s (8) x s where p s S x is he one-sep ransiion marix giving he probabiliy ha S +1 = s, and is he se space. Solving Equaion (8) encouners hree curses of dimensionaliy: he se vecor S (wih dimensionaliy + B, which can be exremely large), he oucome space (he expecion is over a vecor of random variables measuring + B ), and he acion space (he vecor x is dimensioned ). Secion 3.1 provides a skech of a basic approximae dynamic programming algorihm for approximaing he soluion of Equaion (7). Secion 3.2 describes how we updae he value funcion. Secion 3.3 shows how we solve he sisical problem of esimaing he value of drivers wih hundreds of housands of aribue vecors. Secion 3.4 briefly describes research on sepsizes ha was moivaed by his projec. In 3.5, we describe how we implemened a backward pass o accelerae he rae of convergence. Finally, 3.6 repors on a series of comparisons of differen algorihmic choices we had o make. 3.1. An Approximae Dynamic Programming Algorihm Approximae dynamic programming has been emerging as a powerful echnique for solving dynamic programs ha would oherwise be compuionally inracble. Our approach requires merging mah programming wih he echniques of machine learning used wihin approximae dynamic programming. Our algorihmic sraegy differs markedly from wha is presened in classic exs on approximae dynamic programming, paricularly in our use of he posdecision se variable. A comprehensive reamen of our algorihmic sraegy is conined in Powell (27). We solve Equaion (7) by breaking he dynamic programming recursions ino wo seps: V x 1 Sx 1 = Ɛ V S S x 1 (9) V S = max C S x + V x x (1) where S = S M W S 1 x W and S x = S M x S x. The basic algorihmic sraegy works as follows: A ieraion n, assume we are following sample pah n and ha we find ourselves a he pos-decision se S x n 1 afer making he decision x 1 n. Now, compue he nex predecision se S n using S n = S M W S x n 1 W n From se S n, we compue our feasible region n (which depends on informaion such as R n and D n). Nex, solve he opimizaion problem: v n = max x n ( C S n x + V n 1 S M x S n x ) (11) and assume ha x n is he value of x ha solves Equaion (18). We hen compue he pos-decision se S x n = S M x S n x o coninue he process. We nex wish o use he soluion of Equaion (18) o updae our value funcion approximaion. Wih radiional approximae dynamic programming (Bersekas and Tsisiklis 1996; Suon and Baro 1998), we would use v n o updae a value funcion approximaion around S n. Using he pos-decision se variable,

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 7 we use v n updae V 1 n 1 Sx n 1 around Sx n 1. The updaing sraegy depends on he specific srucure of V 1 n 1 Sx 1. To design our value funcion approximaion, we ook advange of wo properies of our problem. Firs, mos loads are served a a given poin in ime. If we were o define a pos-decision demand vecor D x (comparable o he pos-decision resource vecor R x ) ha gives he number of loads lef over afer assignmen decisions have been made, we would find ha mos of he elemens of D x were zero. Second, given he complexiy of he aribue vecor, R was ypically zero or one. For his reason, we used a value funcion approximaion ha was linear in R, given by V n 1 S x = V n 1 R x = a v R x (12) We have worked exensively wih nonlinear (piecewise linear) approximaions of he value funcion o capure nonlinear behavior such as he fifh ruck in a region is no as useful as he firs (see Topaloglu and Powell 26, for example), bu in his projec he focus was less on deermining how many drivers o move and more on wha ype of driver o use. I is easy o rewrie Equaion (12) using V R x = v a a d x d (13) a a d where Equaion (13) is obined by using he se ransiion equaion (4). This enables us o wrie he problem of finding he opimal decision funcion using X S = arg max x ( a = arg max x c d x d + d a ) a a d x d a d a d v ( c d + v a a d a ) x d (14) Recognizing ha a a a d = a M a d a d = 1, we can wrie Equaion (14) as X S ( = arg max cd + v n 1 a M a d ) xd (15) x a d Clearly, Equaion (15) is no more difficul han solving he original myopic problem, wih he only difference being ha we have o solve i ieraively in order o esimae he value funcion approximaion. Forunaely, i is neiher necessary nor desirable o reesimae he value funcions each ime we underke a policy sudy. Drivers a 1 a 2 a 3 a 4 a 5 Figure 1 Loads Fuure aribues a M (a 3, d 1 ) a M (a 3, d 2 ) a M (a 3, d 3 ) a M (a 3, d 4 ) a M (a 3, d 5 ) Driver Assignmen Problem, Illusraing he Differen Fuure Driver Aribues ha Have o be Evaluaed We face wo challenges a his sge. Firs, we have o find a way o updae he values of v 1 a n 1 using informaion derived from solving he decision problems. Secion 3.2 describes a Mone Carlo-based approach, bu his inroduces a sisical problem. As illusraed in Figure 1, in order o decide which load a driver should move, we have o know he value of he driver a he end of each load. This means i is no enough o know he value of drivers wih aribues ha acually occur (ha is, R > ); we mus also know he value of aribues ha we migh visi. 3.2. Value Funcion Updaes Once we have seled on a value funcion approximaion, we face he challenge of esimaing i. The general idea in approximae dynamic programming is ha we ieraively simulae he sysem forward in ime. A ieraion n, we follow a sample pah n ha deermines R n = R n and D n = D n. The decision funcion in Equaion (15) is compued using value funcions v n 1, compued using informaion from ieraion n 1. We hen use informaion from ieraion n o updae v 1 a n 1, giving us vn 1 a. This secion describes how he updaing is accomplished. Assume ha our previous decision (a ime 1) lef us in he pos-decision se S x n 1. Following he sample pah n hen pus us in se S n = S M W S x n 1 W n, which deermines our feasible region n. We hen make decisions a ime by solving F S n = max ( C S n x n x + V n 1 S x ) (16) where S x = S M x S n x. We le x n be he value of x ha solves Equaion (16). Noe ha R x n 1 affecs Equaion (16) hrough he flow conservaion consrain (1).

8 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS Keep in mind ha R n = Rx n 1 a + R n. R may be he random arrival of a new driver bu, for our work, i primarily capures random changes in he sus of a driver (e.g., ravel delays or equipmen failures). If hese ransiions change he sus of a driver from a o a, hen we would have R = 1 and R =+1. If here are no random changes of his sor (which means ha R = ), hen i is easy o see ha v n 1 a = F S = F S = n R x 1 a R (17) a where n is he dual variable for he flow conservaion consrain (1). If we do allow random changes (say, from a o a ), we would use n o updae vn 1 a. We wan o use informaion from Equaion (16) o updae he value funcions used a ime 1, given by v 1 a n 1. Keeping in mind ha hese are esimaes of slopes, wha we need is he derivaive of F S n wih respec o each R x n 1 a, where a = ax 1, which we compue using v n 1 a = F S R x 1 a = a F S R R R x 1 a (18) = n F S / R is jus he dual of he opimizaion problem (15) associaed wih he flow conservaion consrain (1), which we denoe by n. For he second par of he derivaive, we have R 1 i f a = a M W a x 1 W n, = R x 1 a oherwise. This simply means ha if we had a ruck wih aribue a x 1, which hen evolves (due o exogenous informaion) ino a ruck wih aribue a = a = a M W a x 1 W n, hen v n 1 a = n We do no have o execue he summaion in Equaion (18). We jus need o keep rack of he ransiion from a x 1 o a. We noe, however, ha we are unable o compue n for each aribue a (he aribue space is oo large). Insead, for each a 1 where R x n 1 a 1 >, we found a = a = a M W a 1 W n and compued. We hen found v 1 a n 1 from Equaion (18). Once we have compued v 1 a n, we updae he value funcion approximaion using v n 1 a 1 = 1 n 1 v n 1 1 a 1 + n 1 v n 1 a (19) where n 1 is a sepsize beween zero and one (discussed in greaer deil in 3.4). Sep : Iniializaion: Sep a: Iniialize V. Sep b: Iniialize he se S 1. Sep c: Se n = 1. Sep 1: Choose a sample pah n. Sep 2: Do for = 1 T: Sep 2a: Solve he opimizaion problem: ( C S n x + V n 1 S M x S n x ) 2 max x n Le x n be he value of x ha solves Equaion (2), and le be he dualcorresponding o he resource conservaion consrain for each R where R >. Sep 2b: Updae he value funcion using v n = 1 1 a n 1 v n 1 + 1 a n 1 n. Do his for each aribue a for which we have compued n. Sep 2c: Updae he se: S x n = S M x S n xn S n = S M W S x n 1 W n. Sep 3: Incremen n. Ifn N, hen se S x n = S x n 1 T and go o Sep 1. Sep 4: Reurn he value funcions, v n = 1 T a. Figure 2 An Approximae Dynamic Programming Algorihm o Solve he Driver Assignmen Problem We ouline he seps of a ypical approximae dynamic programming algorihm for solving he flee managemen problem in Figure 2. This algorihm uses a single pass o simulae a sample rajecory using he curren esimaes of he value funcions. We sr from an iniial se S 1 = R D of drivers and loads wih a value funcion approximaion V Sx. From his, we deermine an assignmen of drivers o loads x 1.We hen find he pos-decision se S x 1 and simulae our way o he nex se S 1 = S M W S x 1 W 1 1. This simulaion includes new cusomer orders as well as random changes o he sus of he drivers. All of he complexiy of he physics of he problem is capured in he ransiion funcions, which impose virually no limis on our abiliy o handle he realism of he problem. A an early sge of he projec, he company expressed concern ha he resuls migh be overfied o a paricular se of drivers (inpu o he model as R x ) and loads. We ook wo seps in response o his concern. Firs, we randomized he loads, choosing a subse from a larger se of loads a each ieraion. Second, we ook he final resource se vecor (R x T ) and used his as he new iniial resource se vecor (see Sep 3). A major echnical challenge in he algorihm is compuing he value funcion approximaion V = v a. Even if he aribue vecor a has only a few dimensions, he aribue space is oo large o updae using Equaion (19). Furhermore, we only obin updaes v n for a subse of aribues a a each ieraion.

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 9 In principle, we could have solved our decision problem for a resource vecor R using all he aribues in. This is compleely impracical. For our simulaion, we only generaed nodes for aribues a where R n > (as a rule, we generaed a unique node for each driver), which means we obin v n only for a subse of aribues. We need an esimae v no jus for where we have drivers (ha is, R > ) bu where we migh wan o send drivers. We address his problem in he nex secion. 3.3. Approximaing he Value Funcion The full-aribue vecor a ha is needed o compleely capure he imporn characerisics of he driver produces an aribue space ha is far oo large o enumerae. Forunaely, i is no necessary o use all hese aribues for he purpose of approximaing he value funcion. In addiion o ime (we have a finie horizon model, so all value funcions are indexed by ime), hree aribues were considered essenial: he locaion of he driver, he home domicile of he driver, and his capaciy ype (eam, solo, or independen conracor). The company divided he counry ino 1 regions for he purpose of represening locaion and domicile (his is only for he value funcion). Combined wih hree capaciy ypes and 2 ime periods, his produced a ol of 6, aribues for which we would need an esimaed value. Alhough dramaically smaller han he original aribue space, his is sill exremely large. Mos of hese aribues will never be visied, and many will be visied only a few imes. As a resul, we have serious sisical issues in our abiliy o esimae v. The sndard approach o overcoming large se spaces is o use aggregaion. We can use aggregaion o creae a hierarchy of se spaces g g= 1 2 wih successively fewer elemens. We illusrae four levels of aggregaions in Table 1. A level, we have 2 ime periods, 1 regions for locaion and domicile, and 3 capaciy ypes, producing 6, aribues. A aggregaion level 1, we ignored he driver domicile; a aggregaion level 2, we ignored he capaciy ype; and a aggregaion level 3, we represened locaion as one of 1 areas, which had he effec of insuring ha we always had some ype of esimae for any aribue. Table 1 Levels of Aggregaion Used o Approximae Value Funcions g Time Locaion Domicile Capaciy ype Region Region 6 1 Region 6 2 Region 2 3 Area 2 Noe. A corresponding o a paricular aribue indicaes ha he aribue is included in he aribue vecor, and a indicaes ha i is aggregaed ou. Choosing he righ level of aggregaion o approximae he value funcions involves a rade-off beween sisical and srucural errors. If v g g denoes esimaes of a value v a differen levels of aggregaion, we can compue an improved esimae as a weighed combinaion of esimaes of he values a differen levels of aggregaion using v = g w g v g (21) where w g g is a se of appropriaely chosen weighs. George, Powell, and Kulkarni (25) show ha good resuls can be achieved using a simple formula, called WIMSE, ha weighs he esimaes a differen levels of aggregaion by he inverse of he esimaes of heir mean squared deviaions (obined as he sum of he variances and he biases) from he rue value. These weighs are easily compued from a series of simple calculaions. We briefly summarize he equaions wihou derivaion. We firs compue = Esimae of bias due o smoohing a ransien da series, = 1 n 1 g n 1 + n 1 v n g n 1 v (22) = Esimae of bias due o aggregaion error, = v v n = Esimae of ol squared variaion, g n 1 = 1 n 1 + n 1 v n n 1 v g 2 g n 1 We are using wo sepsize formulas here. is he sepsize used in Equaion (19) o updae v n 1. This is discussed in more deil in 3.4. n is ypically a deerminisic sepsize ha migh be a consn such as.1, alhough we used McClain s sepsize rule: n = n 1 1 + n 1 (23) where = 1 has been found o be very robus (George, Powell, and Kulkarni25). We esimae he variance of he observaions a a paricular level of aggregaion using s 2 = 1 + 2 (24) where is compued using ( g n 1 ) 2 n = 1 = ( g n 1 ) 2 g n 1 1 + ( g n 1 ) 2 n>1 This allows us o compue an esimae of he variance of v using 2 = Var v = s 2 a (25)

1 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS Average weigh.7.6.5.4.3.2 Level.1 2 3 5 1 15 2 25 3 35 4 45 Ieraions Figure 3 Average Weigh Pu on Each Level of Aggregaion by Ieraion The weigh o be used a each level of aggregaion is given by w ( 2 + 2) 1 (26) where he weighs are normalized so hey sum o one. This formula is easy o compue even for very large-scale applicaions such as his. All he sisics have o be compued for each aribue a, for all levels of aggregaion, ha is acually visied. From his, we can compue an esimae of he value of any aribue regardless of wheher we visied i or no. Figure 3 shows he average weigh pu on each level of aggregaion from one run of he model. As is apparen from he figure, higher weighs are pu on he more aggregae esimaes, wih he weigh shifing o he more disaggregae esimaes as he algorihm progresses. I is very imporn ha he weighs be adjused as he algorihm progresses; using he final se of weighs a he beginning produces very poor resuls. 3.4. Sepsizes Sepsizes are ofen reaed as he sof science of approximae dynamic programming, wih people using simple formulas such as a consn (.1 or.5 is ypical) or a declining sepsize rule such as a/ a + n for some a. A popular rule is McClain s formula, given by Equaion (23), which provides 1/n behavior iniially and quickly converges o he consn (we used.1, which is ypical). We genuinely sruggled wih sepsizes for his problem. If he sepsize was oo small, he rae of convergence was much oo slow. If he sepsize was oo large, he performance was unsble and he variance of he esimaes v was oo large (laer, we show ha we use v in our policy sudies). As a by-produc of his research, we developed a new sepsize formula ha significanly improved he performance of he algorihm (faser iniial convergence, wih beer sbiliy in he limi). The sepsize rule is developed in George and Powell (26), where 1 i was named he opimal sepsize algorihm (OSA) and is given by n = 1 2 n 1 + n 1 2 n + n 2 (27) where 2 n is compued using Equaion (25) and n is given by Equaion (22) (we have dropped he indexing by aggregaion level g and aribue a for simpliciy). The sepsize rule balances he esimae of he noise 2 n and he esimae of he bias n ha is aribuble o he ransien naure of he da. If he da are found o be relaively sionary (low bias), hen we wan a smaller sepsize; as he esimae of he noise variance decreases, we wan a larger sepsize. 3.5. ADP Using a Double-Pass Algorihm The seps in Figure 2 describe he simples implemenion of an approximae dynamic programming algorihm ha seps forward in ime, updaing value funcions as we proceed. This is also known as a TD() algorihm (Bersekas and Tsisiklis 1996). Alhough easy o implemen, his algorihm can suffer from slow convergence because v n depends on vn 1, which is ypically iniialized o zero and slowly rises, producing a downward bias in all he value funcion esimaes. This does no necessarily produce poor decisions, bu i does mean ha v n underesimaes he value of a driver wih aribue a a ime. A sraegy for overcoming his slow convergence, which proved o be paricularly valuable for his projec, involves using a wo-pass procedure (also known as TD(1)). In his procedure, we simulae decisions forward in ime wihou updaing he value funcions. The derivaive v n is hen compued in a backward pass. In he forward pass implemenion, v n depends on vn 1. Wih he backward pass, v n depends on vn +1 a. In classical discree dynamic programs, implemening a backward pass (or backward raversal, as i is ofen referred o) is fairly sraighforward (see Bersekas and Tsisiklis 1996; Suon and Baro 1998). If we are in se S n, we choose an acion xn according o some policy, compue a conribuion C S n xn, hen observe informaion W +1 n, which leads us o se S+1 n. Afer following a pah Sn xn Sn +1 xn +1, we can compue v n = C S n xn + vn +1 recursively by sepping backward hrough ime. This logic is compleely inracble for he problem class ha we are considering. Insead, we perform a numerical derivaive for each driver, which means ha afer solving he original assignmen problem (a ime ), we loop over all he drivers (ha is, all he aribues a where R > ) and se R = and reopimize. The process is illusraed in Figure 4, where 4(a) shows he iniial assignmen of four

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 11 (a) Iniial soluion (b) Wihou driver a 1 (c) Difference a,1 a,2 a,3 a,4 v(a +1,12 ) v(a +1,23 ) v(a +1,34 ) v(a +1,45 ) a,1 a,2 a,3 a,4 v(a +1,22 ) v(a +1,33 ) v(a +1,45 ) Figure 4 Illusraion of Numerical Derivaive Noe. (a) The base soluion wih four drivers, (b) he soluion wih driver a 1 dropped ou, and (c) he difference in assignmen coss and pos-decision resource vecor are shown. drivers. Because he downsream value from assigning a driver o a load in he fuure depends on he driver-load combinaion, we have duplicaed each load for each driver, using an ellipse o indicae which driver-load combinaions represen he same load. If he driver wih aribue a 1 is assigned o he second load, hen his creaes a driver in he fuure wih aribue a +1 12 and value v a +1 12. In Figure 4(b), we show he soluion wihou driver a 1. Because driver a 2 shifs up o cover load 2, we no longer have a driver in he fuure wih aribue a +1 12 bu insead we have a driver wih aribue a +1 22. Figure 4(c) shows he difference, where we are ineresed in he change in he immediae conribuion, and he change in he fuure availabiliy of drivers. To represen hese quaniies, le X R be he iniial driver-load assignmens and le X R e a be he perurbed soluion, where e a is a vecor of s wih a 1 in he elemen corresponding o R. Now le C a = C S X R C S X R e a be he change in coss due o changes in flows over he driver o load assignmen arcs as a resul of he perurbaion. Nex, le R x R be he pos-decision resource vecor given R and le R x a = Rx R R x R e a be he change in he pos-decision se vecor due o he perurbaion. Figure 4(c) indicaes he change in flows ha drive C a, and he vecor R x a, where R x a = 1 if including a driver wih aribue a produces an addiional driver of ype a,or R x a = 1 if he change kes away a driver of ype a. In he double-pass algorihm, we compue C a (which is a scalar) and R x a (which is a vecor of a,1 a,2 a,3 a,4 C +1 1 +1 1 +1 +1 R x +1s and 1s) for each aribue a (which we have chosen o represen). Afer we have compleed he forward pass, we obin v n in a backward pass using v n = C a + a R x a vn +1 a where we have made a sligh noional simplificaion by assuming ha a x = a +1 (ha is, here is no noise in he aribue ransiion funcion), which means ha R x = R +1. 3.6. Comparisons This secion has produced a number of algorihmic choices: Should we use he forward pass (Figure 2) or backward pass ( 3.5)? Should we compue v n using numerical derivaives or dual variables? And should we perform smoohing using he OSA sepsize (Equaion (27)) or a deerminisic formula such as McClain (Equaion (23))? Figure 5 compares all of hese sraegies. We can draw several conclusions from his figure. Firs, i is apparen ha he value funcions compued from he backward pass show much faser convergence in he early ieraions han hose compued from using a forward pass. This is a well-known propery of dynamic programming when we sr wih iniial value funcion approximaions equal o zero. However, he difference beween hese approaches disappears afer 5 ieraions. We also have o consider ha he backward pass is much harder o implemen. The real value of he backward pass is ha we appear o be obining good value funcions afer as few as 25 ieraions (a resul suppored by oher experimens repored below). For very large-scale applicaions such as his (where each ieraion requires almos 1 minues of CPU ime on a 3 GHz Penium processor), reducing he number of ieraions needed from 5 o 25 is a significan benefi.

12 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 5, Figure 5 Average value funcion 4, 3, 2, 1, Backward pass Forward pass OSA sepsize McClain sepsize Numerical derivaives Forward pass duals (OSA sepsize) 5 1 Ieraions Average Value Funcion When We Use Forward and Backward Passes, Numerical Derivaives and Dual Variables, and he OSA Sepsize or he McClain Sepsize The figure also compares value funcions compued using he OSA sepsize versus he McClain sepsize. The OSA sepsize produces faser convergence (his is paricularly noiceable when using a forward pass) as well as more sble esimaes (his is primarily apparen when using gradiens compued using a backward pass). Finally, we also see ha here is a significan difference beween value funcions compued using dual variables versus numerical derivaives. I is easy o verify ha he numerical derivaive is greaer han or equal o he dual variable, bu i is no a all obvious ha he difference would be as large as ha shown in Figure 5. Of course, his comes a a significan price compuionally. Run imes using numerical derivaives are 3% 4% greaer han if we used dual variables. We have found, however, ha alhough numerical derivaives produce much more accurae value funcions (imporn in our sudy), hey do no produce beer dispaching decisions. If he ineres is in a realisic simulaion of he flee (and no he value funcions hemselves), hen we have found ha dual variables work fine. In his paper, we wish o use he value funcions o esimae he value of differen ypes of drivers. 4. Model Calibraion Before he model could be used for policy analyses, he company insised ha i closely replicae a number of operaing sisics including he average lengh of haul (he lengh of a load o which a driver is assigned), he average revenue per ruck per day, equipmen uilizaion (miles per day), and he percenge of drivers who were sen home on a weekend. These sisics had o fall beween hisorical minimums and maximums for each of he hree capaciy ypes. Model calibraion mean maching he performance of he collecive decisions made by he company s dispachers (see Figure 6). Perhaps one of he surprising (and significan) oucomes of he research is ha a properly calibraed opimizaion model was required o closely mach he performance of an experienced group of dispachers. Average lengh of haul is paricularly imporn because drivers are only paid while hey are driving and longer loads mean less idle ime. For his applicaion, i was imporn o mach he average lengh of haul for each of he hree ypes of drivers (known as capaciy ypes ). Of he hree capaciy ypes, eams (drivers ha work in pairs) prefer he longes loads because hey pay he mos. The company was no willing o consider he resuls of a simulaion ha produced an average lengh of haul ha was significanly differen (for each capaciy ype) from hisorical performance. This could have an impac on driver urnover, which was no capured in he objecive funcion. When we look a he hisorical paern of loads for a paricular driver class, we obin a disribuion such as ha shown in Table 2. Thus, whereas his driver class may have an 8-mile average lengh of haul, his average will include a number of loads ha are significanly longer or shorer. Using penalies o discourage assignmens o loads ha differ from he average would seriously disor he model. Secion 4.1 describes an algorihmic sraegy o produce assignmens ha mach hisorical paerns of behavior. Secion 4.2 hen describes how well he model mached he hisorical merics, where we depend on boh he conribuion of he value funcions as well as he paern-maching logic described

Simão e al.: An Approximae Dynamic Programming Algorihm for Large-Scale Flee Managemen Copyrigh: INFORMS holds copyrigh o his Aricles in Advance version, which is made available o insiuional subscribers. The file may Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS Figure 6 13 The Schneider Dispach Cener in Green Bay, Wisconsin in he nex secion. Secion 4.3 compares he conribuion of value funcions agains he paern-maching logic. 4.1. Paern Maching The problem of maching hisorical averages for lengh of haul (LOH) by capaciy ype can be viewed as an example where he resuls of a model need o mach exogenous paerns of behavior. Our presenion follows he work in Marar, Powell, and Kulkarni (26) and Marar and Powell (24). In his work, we assume ha we are given a paern vecor 3, where 3e = 3ea d a d, 3ea d = The exogenous paern, represening he percenge of ime ha resources wih aribue a Table 2 Illusraive Lengh-of-Haul (LOH) Disribuion for a Single Driver Type LOH (miles) Relaive frequency (%) 39 39 689 69 1,89 1,9 1,589 1,59 8 3 33 9 36 6 15 6 5 4 are aced on by decisions of ype d based on hisorical da. We refer o 3e as an exogenous paern because i describes desired behaviors raher han a specific cos for a decision. In mos applicaions, he indices a and d for 3ea d are aggregaions of he original aribue a and decision d. For he purpose of maching he lengh of haul, a consiss only of he capaciy ype and d represens a decision o assign a driver of ype a o a load whose lengh is wihin some range. We nex have o deermine he degree o which he model is maching he exogenous paern. Le 3a d x be he average paern flow from a soluion X R of he model corresponding o he aribue Also, le R be he ol number decision pair a d. a of resources wih aribue a over he enire horizon. The goal is for he model paern flow 3a d x o closely mach he desired exogenous paern flows 3ea d. The deviaion from he desired frequency is capured in he objecive funcion using a penaly funcion. The acual erm included in he objecive funcion for each paern is denoed as H 3 x 3e where we used he square of he difference of

14 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS he wo, given by H x e = ā d Rā ād x ē ad 2 (28) The aim is o penalize he square of he deviaions of he observed frequencies from he desired frequencies. In pracice, a quadraic approximaion of H is used. The paern maching erm H is muliplied by a weighing parameer and subraced from he sndard ne revenue funcion. The objecive funcion ha incorporaes he paerns is wrien as follows: [ ] x = arg max c d x d H x e (29) x a d permis conrol over how much emphasis is pu on he paerns relaive o he remainder of he objecive funcion. Seing o zero urns he paerns off. We use an algorihm proposed by Marar and Powell (24) (and modified by Powell, Wu, and Whisman 24) ha incorporaes his feaure. 4.2. Comparison o Hisory We are finally ready o compare he model o hisorical measures. We have 4 ypes of sisics ha are measured for each of he 3 capaciy ypes, giving us 12 sisics alogeher. The company derived wha i considered o be accepble ranges for each sisic. Figures 7(a) o 7(d) give he lengh of haul, revenue per driver, uilizaion (miles per driver per day), and he percenge of drivers who are sen home on a LOH Uilizaion (a) (c) Hsorical minimum Simulaion Hisorical maximum Type 1 Type 2 Type 3 Capaciy caegory Type 1 Type 2 Type 3 Capaciy caegory weekend. The las sisic reflecs he abiliy of he model o ge drivers home on weekends, which was viewed as being imporn o he drivers. All he resuls of he model closely mached hisorical averages. The unis of he verical axis have been eliminaed due o he confidenialiy of he da, bu he graphs accuraely show he relaive error (he boom of he verical axis is zero in all he plos). The bands were developed by company managemen before he model was run. I is easy o see ha hree of he four ses of max/min bands are quie igh. We also noe ha alhough we used specific paern logic o mach he lengh of haul sisics, he oher sisics were a naural oupu of he model, calibraed hrough he use of cos-based rules. A his poin, company managemen fel comforble concluding ha he model was well calibraed and could be used for policy sudies. Alhough he model has many applicaions, in 5 we focus specifically on he abiliy of he model o evaluae he value of drivers by capaciy ype and domicile. This sudy required ha he value funcions do more han simply produce good driver assignmen decisions; he value funcions hemselves had o accuraely esimae he value of each driver ype. 4.3. Value Funcion Approximaions vs. Paerns We have inroduced wo major algorihmic sraegies for improving he performance of he model: value funcion approximaions (VFAs), which produce he behavior of opimizing over ime, and paern Revenue per driver % drivers home on weekends (b) (d) Type 1 Type 2 Type 3 Capaciy caegory Type 1 Type 2 Type 3 Capaciy caegory Figure 7 Simulaion Resuls Compared Agains Hisorical Exremes for Various Paerns

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 15 maching, which is specifically designed o help he model mach he lengh of haul for each driver class. These sraegies inroduce wo quesions: How do value funcion approximaions and paerns each conribue o he abiliy of he model o mach hisorical performance? And how do hey individually affec he qualiy of he soluion as measured by he objecive funcion? Figure 8 shows he average lengh of haul as a funcion of he number of ieraions (a) wih paerns and VFAs, (b) wih paerns and wihou VFAs, (c) wihou Miles Miles 13 12 11 1 9 8 13 12 11 1 9 8 paerns and wih VFAs, and (d) wihou paerns or VFAs. We show he resuls for wo differen driver classes because he behavior is somewha differen. In boh figures, we show upper and lower bounds specified by managemen as he limi of wha hey consider accepble (he middle of his range is considered he bes). Boh figures show ha we obin he wors resuls when we do no use VFAs or paerns, and we obin he bes resuls wih paerns and VFAs. Of ineres is he individual conribuion of VFAs versus paerns. In Figure 8(a), he use of VFAs (a) Driver ype 1 Ieraions Boh paerns and VFAs VFAs only Paerns only No paerns or VFAs UB LB 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 (b) Driver ype 2 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Ieraions Figure 8 Lengh of Haul for Two Driver Classes Wih Paerns and VFAs, Wih Paerns and Wihou VFAs, Wihou Paerns and Wih VFAs, and Wihou Paerns or VFAs Noe. Upper and lower bounds (UB and LB, respecively) represen he accepble range se by managemen.

16 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 5,, Opimizaion objecive funcion Figure 9 Objecive funcion 45,, 4,, 35,, 3,, 25,, 2,, 15,, 1,, 5,, Boh paerns and VFAs VFAs only Paerns only No paerns or VFAs 1 6 11 16 21 26 31 36 41 46 Ieraions Objecive Funcion Wihou Paerns and VFAs, Wih Paerns and Wihou VFAs, Wihou Paerns and Wih VFAs, and Wih Paerns and VFAs alone improves he abiliy of he model o mach hisory, whereas in Figure 8(b) VFAs acually make he mach worse. Even in Figure 8(b), VFAs and paerns ogeher ouperform eiher alone. We nex examine he effec of VFAs and paerns on he objecive funcion. We define he objecive funcion as he ol conribuion earned by following he policy deermined by using paerns and value funcions. The conribuions include he revenues from covering loads minus he cos of moving he ruck and any penalies for aciviies such as arriving lae o a service appoinmen or allowing a driver o be away from home for oo long (he sof coss ). Figure 9 shows he objecive funcion for he same four combinaions (wih and wihou paerns, wih and wihou value funcions). The figure shows ha he resuls using he value funcions significanly ouperform he resuls wihou he value funcions. Including he paerns wih he value funcion does no seem o change he objecive funcion (alhough i obviously improves our abiliy o mach hisoric performance measures). Ineresingly, using paerns wihou he value funcions produces a noiceable improvemen over he resuls wihou he paerns (or value funcions), suggesing ha he paerns do, in fac, conribue o he overall objecive funcion. However, he poin of he paerns is o achieve goals ha are no capured by he objecive funcion, so his benefi appears o be incidenl. 5. Flee Mix Sudies All ruckload moor carriers are coninuously hiring drivers jus o mainin heir flee size. I is no unusual for companies o experience over 1% urnover (ha is, if he flee has 1, drivers, hey have o hire 1, drivers a year o mainin he flee). Because companies are consnly adverising and processing applicaions, i is necessary o decide each week how many jobs o offer drivers based on heir home domicile and which of he hree capaciy ypes hey would belong o. We sudied he abiliy of he model o help guide he driver hiring process. We divided he sudy ino wo pars. In 5.1, we assessed our abiliy o esimae he marginal value of a driver ype (defined by he driver domicile and capaciy ype) using he value funcion approximaions. Then, we repor in 5.2 on he resuls of simulaions where we used he value funcions o change he mix of drivers (while holding he flee size consn). 5.1. Driver Valuaions For our projec, he 1 domicile regions and 3 capaciy ypes produced 3 driver ypes. If his were a radiional simulaor, we could esimae he value of each of hese driver ypes by sring from a single base run, hen incremening he number of drivers of a paricular ype and running he simulaion again. There is a fair amoun of noise inheren in he resuls of a single simulaion run, so i migh be reasonable o replicae his 1 imes and ke an average. Wih 3 driver ypes, his implies 3, runs of he simulaion model. We can avoid his by simply using he value funcions. If we run N ieraions of our ADP algorihm, we migh expec ha he final value funcions for ime =, given by v N = vn a a, could be used o esimae he value of each driver ype. The value funcions are indexed by hree aribues (locaion, domicile, and capaciy ype); however, we only need values indexed by domicile and capaciy ype. The value indexed

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 17 by domicile and capaciy ype is nohing more han an aggregaion on locaion and was esimaed using he same mehods we used o esimae he value funcions a differen levels of aggregaion. For he remainder of his secion, we use he aribue a o represen driver domicile and capaciy ype only. In addiion, we will le v a = v a N be our final esimae of he marginal value of a driver (a ime zero) wih aribue a. Whereas i is cerinly reasonable o expec v a o be he marginal value of a driver of ype a, we needed o verify ha his was, in fac, an accurae esimae. We ran experimens adding 1, 2, 3, 4, and 5 drivers for four differen driver classes. These experimens convinced us ha he model produced relaively linear behavior when we add up o 2 drivers. We hen esimaed he value of adding 2 differen ypes of drivers (differen domiciles and capaciy ypes) by adding 2 drivers and averaging he marginal value over 1 repeiions of he experimen. In each case, we compued a 95% confidence inerval for he slope (based on he esimaed mean and sndard deviaion of boh he base case and he resuls of he 1 ieraions wih 2 addiional drivers). Figure 1 shows he confidence inervals for he slope esimaed from adding 2 addiional drivers and he poin esimae from he value funcion for he 2 differen driver ypes. For 18 driver ypes, he value funcion esimae fell wihin he confidence inerval (wih a 95% confidence inerval, we would expec Marginal value of a driver 3, 2,5 2, 1,5 1, 5 5 19 of he driver ypes o fall wihin he confidence inerval). 5.2. Driver Remix Experimens In his secion, we aemp o opimize he number of drivers belonging o each class so ha here is an increase in he objecive funcion. The mehod ha we adop for his purpose is o redisribue he drivers beween he various driver ypes such ha here are more drivers of ypes wih higher marginal values as compared wih he ones wih lower values. To find he number of drivers o be added or removed from each class, we apply a sochasic gradien algorihm where we use a correcion erm o smooh he original number of drivers of each class. The correcion erm is a funcion of he difference in he marginal value from he mean marginal value of all he driver classes. We define he following: v a n = The marginal value of a driver wih aribue a a ieraion n. R n a = The number of drivers wih aribue a a ieraion n. v = v a n averaged over all aribue vecors a. The algorihm for compuing he new number of drivers of class a consiss of he following sep: 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 17 18 19 2 Aribue vecor (driver ype) R n+1 a = max R n a + vn a vn (3) where is a scaling facor ha we se, afer some experimenion, o.1. Afer he updae, we hen rescale R n+1 so ha a Ra n+1 = a R n a. Figure 1 Prediced Values Compared Agains Observed Values from Acual Scenarios Noe. The columns represen he approximaions of he marginal values for differen driver ypes. The error bars denoe a 95% confidence inerval around he mean marginal value, compued from observed scenarios.

18 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS Objecive funcion (millions) Percenge of drivers no geing home 2. 1.9 1.8 1.7 1.6 18 16 14 12 1 8 6 4 2 Original (a) Objecive funcion Remix 1.5 2 25 3 35 4 45 5 55 6 Number of ieraions (b) Percen no geing home 6 7 8 9 1, Number of ieraions Figure 11 Resul of Driver Remix Experimens Noe. (a) The change in he objecive funcion, and (b) change in he percenge of drivers no geing home are shown. In Figure 11, we show he effec of shifing o a new mix of drivers. Figure 11(a) shows he improvemen in he objecive funcion when we used value funcions o adjus he mix of drivers. We did no adjus he driver mix unil ieraion 4 so ha he value funcions had a chance o sbilize. Figure 11(b) shows he percenge of drivers who did no ge home wihin he simulaion. This figure shows a significan improvemen in our abiliy o ge drivers home when we shif he flee based on he value funcion approximaions. 6. Conclusions This paper has demonsraed ha approximae dynamic programming allows us o produce an accurae simulaion of a large-scale flee ha (a) allowed us o capure real-world operaions a a very high level of deil, (b) produced operaing sisics ha closely mached hisorical performance, and (c) provided accurae esimaes of he marginal value of 3 differen driver ypes from a single simulaion. The echnology of approximae dynamic programming allows us o capure all he relevan feaures of drivers and loads o produce a very realisic simulaion, including decisions ha balance immediae conribuions agains downsream impacs. The logic is able o handle differen ypes of unceriny including random cusomer demands and ravel imes. Value funcion approximaions produced no only more realisic behaviors (measured in erms of our abiliy o mach hisorical performance) bu also he marginal value of differen ypes of drivers from a single run of he model. This projec moivaed oher imporn resuls. Alhough he value funcions were approximaed in erms of only four driver aribues (locaion, driver ype, domicile, and ime), his sill produced 6, parameers o be esimaed, creaing a significan sisical problem. A new approach for esimaing parameers using a weighed average of esimaes a differen levels of aggregaion was developed specifically for his projec. This mehod was shown o produce beer, more sble esimaes. This projec also moivaed he developmen of a new sepsize formula ha eliminaed he need o consnly une parameers in deerminisic formulas. Finally, we used novel paern-maching logic o produce behaviors (he average lengh of a load for differen driver ypes) ha mached hisorical performance. The simulaion has been adoped a Schneider Naional as a planning ool ha, as of his wriing, is used coninually o perform sudies of policies ha affec he performance of he nework. A parial lis of benefis from sudies ha have been underken using he simulaion are: Geing drivers home A major componen for reining drivers in a long-haul carrier is he abiliy o reurn hem home in a predicble way. Schneider had developed a plan o make sronger commimens o drivers, bu he simulaion showed ha he plan would have cos he company $3 million per year. Using he model, an alernaive sraegy was developed ha provided 93% of he proposed selfscheduling flexibiliy for only $6 million per year. Quanifying he cos of hours-of-service rules Using he model, Schneider has been able o quanify he cos of changes in he hours-of-service rules se by he Deparmen of Transporion. Wih his informaion, we are able o effecively negoiae adjusmens in cusomer billing raes and freigh endering/handling procedures, leading o margin improvemens of 2% o 3%. Seing appoinmens The model has been used o evaluae he value of new policies for seing appoinmens. Preliminary resuls sugges margin impacs from improved uilizaion are in he range of 4% 1%, and he number of lae deliveries was reduced by half.

Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS 19 Cross-border driver managemen Wih recen changes in securiy and border policies, i is necessary o mainin a pool of drivers who are rained wih hese policies. Using he model, Schneider was able o reduce he number of drivers engaged in border crossing by 91% and resric relays o hree designaed poins. This has resuled in an iniial avoidance of $3.8 million in raining/idenificaion/cerificaion coss and ongoing annual cos avoidance of $2.3 million. Hiring drivers The home locaion of long-haul ruck drivers has a significan impac on nework operaing efficiency. Schneider is coninually hiring drivers and can conrol he number of drivers hired in each region. Using he model, Schneider has been able o quanify he marginal conribuion of changes in regional driver populaions, leading o an esimaed annual profi improvemen of $5 million. The nex sep wih he model is o focus on he loads. The model currenly does no model he difference beween endered loads (loads offered o he company ha may be refused), commied loads (endered loads ha he carrier has made a commimen o move), and conraced loads (loads ha are offered o he carrier under a snding conrac). Our goal is o use he model o idenify good policies for making commimens o loads as hey are endered, king ino consideraion he se of he sysem. Once his policy is in place, he nex goal would be o deermine how o evaluae cusomer conracs as a foundaion for deermining conracual commimens. We are no aware of any exising echnology ha can evaluae loads in he presence of driver managemen issues. This projec has offered an imporn insigh ino he process of implemening opimizaion models for operaional problems. The research communiy has radiionally focused on developing opimizaion models ha produce he bes possible soluions, presumably beer han wha can be achieved by a company. Our experience wih his and oher similar projecs is ha he firs and mos imporn goal is o produce a model ha calibraes agains hisory. Of paricular impornce was he abiliy of he model o handle a high level of deil, allowing he model o accuraely represen hours-of-service rules, deiled service commimens, and complex rules governing driver relays and foreign drivers. Only afer he model proved o be realisic did he carrier begin o believe he resuls. Perhaps he mos remarkable conclusion was ha an opimizaion model ha used opimal soluions a a poin in ime and near-opimal soluions over ime accuraely reproduced (a an aggregae level) he performance of a well-run company. References Bersekas, D., J. Tsisiklis. 1996. Neuro-Dynamic Programming. Ahena Scienific, Belmon, MA. Caliskan, C., R. W. Hall. 23. A dynamic empy equipmen and crew allocaion model for long-haul neworks. Transporion Res. Par A 5 45 418. Chiu, Y., H. S. Mahmassani. 22. Hybrid real-ime dynamic raffic assignmen approach for robus nework performance. Transporion Res. Record 1783 89 97. Crainic, T., J. Ferland, J.-M. Rousseau. 1984. A cical planning model for rail freigh ransporion. Transporion Sci. 18 165 184. Desaulniers, G., J. Desrosiers, M. Gamache, F. Soumis. 1998. Crew scheduling in air ransporion. T. G. Crainic, G. Lapore, eds. Flee Managemen and Logisics. Kluwer Academic Publishers, Norwell, MA, 169 185. Desrosiers, J., M. Solomon, F. Soumis. 1995. Time consrained rouing and scheduling. C. Monma, T. Magnani, M. Ball, eds. Handbook in Operaions Research and Managemen Science, Volume on Neworks. Norh Holland, Amserdam, 35 139. Gendreau, M., J. Y. Povin. 1998. Dynamic vehicle rouing and dispaching. T. Crainic, G. Lapore, eds. Flee Managemen and Logisics. Kluwer Academic Publishers, Norwell, MA, 115 126. Gendreau, M., F. Guerin, J. Povin, E. Taillard. 1999. Parallel bu search for real-ime vehicle rouing and dispaching. Transporion Sci. 33 381 39. George, A. 25. Opimal learning sraegies for muli-aribue resource allocaion problems. Ph.D. hesis, Princeon Universiy, Princeon, NJ. George, A., W. B. Powell. 26. Adapive sepsizes for recursive esimaion wih applicaions in approximae dynamic programming. Machine Learn. 65 167 198. George, A., W. B. Powell, S. Kulkarni. 25. Value funcion approximaion using hierarchical aggregaion for muliaribue resource managemen. Technical repor, Deparmen of Operaions Research and Financial Engineering, Princeon Universiy, Princeon, NJ. Ichoua, S., M. Gendreau, J.-Y. Povin. 26. Exploiing knowledge abou fuure demands for real-ime vehicle dispaching. Transporion Sci. 4 211 225. Kleyweg, A., V. S. Nori, M. W. P. Savelsbergh. 24. Dynamic programming approximaions for a sochasic invenory rouing problem. Transporaion Sci. 38 42 7. Larsen, A., O. B. G. Madsen, M. M. Solomon. 22. Parially dynamic vehicle rouing Models and algorihms. J. Oper. Res. Soc. 53 637 646. Marar, A. 22. Informaion represenion in large-scale resource allocaion problems: Theory, algorihms and applicaions. Ph.D. hesis, Princeon Universiy, Princeon, NJ. Marar, A., W. B. Powell. 24. Using sic flow paerns in ime-sged resource allocaion problems. Technical repor, Deparmen of Operaions Research and Financial Engineering, Princeon Universiy, Princeon, NJ. Marar, A., W. B. Powell, S. Kulkarni. 26. Capuring exper knowledge in resource allocaion problems hrough low-dimensional paerns. IIE Trans. 38 159 172. Powell, W. B. 1988. A comparaive review of alernaive algorihms for he dynamic vehicle allocaion problem. B. Golden, A. Assad, eds. Vehicle Rouing Mehods and Sudies. Norh Holland, Amserdam, 249 292. Powell, W. B. 27. Approximae Dynamic Programming Solving he Curses of Dimensionaliy. John Wiley & Sons, New York. Powell, W. B., P. Jaille, A. Odoni. 1995. Sochasic and dynamic neworks and rouing. C. Monma, T. Magnani, M. Ball, eds. Handbook in Operaions Research and Managemen Science, Volume on Neworks. Norh Holland, Amserdam, 141 295. Powell, W. B., J. A. Shapiro, H. P. Simão. 21. A represenional paradigm for dynamic resource ransformaion problems.

2 Transporion Science, Aricles in Advance, pp. 1 2, 28 INFORMS R. F. C. Coullard, J. H. Owens, eds. Annals of Operaions Research. J. C. Balzer AG, Basel, Swizerland, 231 279. Powell, W. B., T. T. Wu, A. Whisman. 24. Using low dimensional paerns in opimizing simulaors: An illusraion for he airlif mobiliy problem. Mah. Compu. Model. 29 657 24. Psarafis, H. 1995. Dynamic vehicle rouing: Sus and prospecs. Ann. Oper. Res. 61 143 164. Regan, A., H. S. Mahmassani, P. Jaille. 1998. Evaluaion of dynamic flee managemen sysems Simulaion framework. Transporion Res. Record 1648 176 184. Secomandi, N. 2. Comparing neuro-dynamic programming algorihms for he vehicle rouing problem wih sochasic demands. Compu. Oper. Res. 27 121 1225. Secomandi, N. 21. A rollou policy for he vehicle rouing problem wih sochasic demands. Oper. Res. 49 796 82. Spivey, M. J. 21. The dynamic assignmen problem. Ph.D. hesis, Princeon Universiy, Princeon, NJ. Spivey, M., W. B. Powell. 24. The dynamic assignmen problem. Transporion Sci. 38 399 419. Suon, R., A. Baro. 1998. Reinforcemen Learning. The MIT Press, Cambridge, MA. Tapiero, C., M. Soliman. 1972. Mulicommodiies ransporion schedules over ime. Neworks 2 311 327. Taylor, G., T. S. Meiner, R. C. Killian, G. L. Whicker. 1999. Developmen and analysis of alernaive dispaching mehods in ruckload rucking. Transporion Res. Par E 35 191 25. Tjokroamidjojo, E., G. T. Kunoglu. 21. Quanifying he value of advance load informaion in ruckload rucking. Technical repor, Universiy of Arkansas, Fayeeville. Topaloglu, H., W. B. Powell. 26. Dynamic programming approximaions for sochasic, ime-sged ineger mulicommodiy flow problems. INFORMS J. Compu. 18 31 42. Whie, W. 1972. Dynamic ransshipmen neworks: An algorihm and is applicaion o he disribuion of empy coniners. Neworks 2 211 236. Yang, J., P. Jaille, H. Mahmassani. 24. Real-ime mulivehicle ruckload pick-up and delivery problems. Transporion Sci. 38 135 148.