RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION. Henrik Kure

RECURSIVE DYNAMIC PROGRAMMING: HEURISTIC RULES, BOUNDING AND STATE SPACE REDUCTION Henrik Kure Dina, Danish Inforatics Network In the Agricultural Sciences Royal Veterinary and Agricultural University Bulowsvej 13, DK-1870 Frb., Copenhagen. E-ail: kure@dina.kvl.dk Abstract: An alternative approach towards dynaic prograing (DP) is presented: Recursions. A basic deterinistic odel and solution function are defined and the odel is generalized to include stochastic processes, with the traditional stochastic DP odel as a special case. Heuristic rules are included in the odels siply as restrictions on decision spaces and a sall slaughter pig arketing planning exaple illustrates the potential state space reductions induced by such rules. Keywords: Dynaic prograing, recursive algoriths, heuristics, stochastic odelling. 1. INTRODUCTION Dynaic Prograing (DP) is a general optiization principle, forulated by Bellan (1957). The principle is usually related to the field of Operations Research (OR), but the underlying and rather intuitive "Principle of optiality" (Bellan, 1957) is essential to several specific optiization techniques, including AI algoriths as A* (Winston, 1992) and the solution ethods of Influence Diagras and Valuation Networks (Shenoy, 1992). Value Iteration (VI) is (at least in OR) the best known and widely used DP technique. In this paper an alternative and ore general approach towards DP is presented: Recursions. The approach (Recursive Dynaic Prograing, RDP) is basically a direct ipleentation of the fundaental functional equation of DP (as defined by eg. Nehauser (1966)): f n (x n ) in d n D n [r n (x n,d n ) f n 1 (t n (x n,d n ))] (1) dn is a decision, chosen fro the set of valid decisions, D n, rn is the gain function and t is the transition function. n It ust be ephasized that this approach neither is nor intend to be novel, but it appears to be relatively unknown to OR and agricultural scientists (Kure, 1995). The purpose of this paper is to show the siplicity and the potential coputational benefits of RDP. The approach has, as will be shown, several theoretical and applicational iplications. The ethods presented ay be applied to virtual any proble that ight be solved using VI, including agricultural plaarketing decision support syste which is under ning/optiization probles like facility anageent, replaceent probles and far operations scheduling. The ethods (as defined in this paper) have been applied to siple illustrative probles and a slaughter pig construction. where n is the stage nuber, xn is the state (n = 0 indicates a goal state), fn is the solution function at stage n. Returns the solution to the proble associated with x n (f 0 (x 0) is per definition a known value), All odels and functions in this paper are specified and defined using the syntax of the specification language, VDM (see eg. Jones (1990) for an introduction) which in any aspects is equivalent to functional languages as SML.

2. RECURSIVE DYNAMIC PROGRAMMING An exaple ight illustrate the odel, the proble, the ethod and soe iplications of RDP: Consider a road ap represented as a directed graph as shown in fig. 1. Each node (or state) in the graph represents a crossroads and each (directed) edge represents a piece of one-way road. A decision prescribes what edge to follow fro a given node. Node A is defined as the start node. Node B has no successors; it is a goal node. Edges ight be interpreted as a pair of functions: a) A transforation function: t: State Decision State returning a new node/state and b) a gain function: g: State Decision Gain returning a gain (ie. the length of an edge). A policy is (in this presentation) defined as an assignent of a (valid) decision to every node in a sub-set of nodes in the graph and a coplete policy assigns a decision to every node (except goal nodes) in the graph. The optiization proble to be considered here is: What is the shortest path fro A to B? 2.1 The Data Model Based on these basic coponents a deterinistic data odel/data type of a syste is forulated as a finite apping fro states into a finite apping fro decisions into gains and (succeeding) states: DPModel = State Decision Gain State inv (dp) where the invariant ust restrict the odel to be finite and cycle free (see Kure (1995) for a definition). The odel will later be odified to be based on functions instead of finite appings. In addition to the data odel of the syste a representation of a policy is required: Policy = State Decision 2.2 The Proble In order to forulate and solve the proble, a set of basic functions is specified (see Jones (1990)): coplete_pol (dp: DPModel, p: Policy) r: boolean post r = (do p = do dp Y ~stdo p # p (st)do dp (st)) all_pol (dp: DPModel) r: Policy-set post r = {ppolicycoplete_pol (dp, p)} cobine: Gain Gain Gain all_pol returns all coplete policies applicable to a given instance, dp, of the odel. The function cob is not yet defined (in the exaple it is defined as a siple addition of the arguents), but is used in the definition (see Jones (1990) for an introduction to direct definitions in VDM) of the objective function: objfct: DPModel Policy State Gain objfct (dp, p, st) r let (g, n_st) = dp (st)(p (st)) in if n_stõdo p then g else cob (g, objfct (dp, p, n_st)) Given a instance of the odel (dp: DPModel) and a state (st: State), the proble is to iniize the objective function objfct with respect to all coplete policies (pall_pol (dp)): in { objfct (dp, p, st) pall_pol (dp) } In other words the goal is to find a solution function, solution, that satisfies the specification: solution (dp: DPModel, st: State) r: Gain pre stdo dp post r = in {objfct (dp,p,st) pall_pol (dp) } The state space is defined as the total set of states (except goal states) in the odel and a decision space as the total set of valid decisions in a given state: A N1 N2 N7 N3 N4 N5 N6 B state_space (dp) r do dp decision_space (dp, st) r do dp (st) 2.3 A solution function By introducing two new functions a solution function ay be derived (see Kure (1995) where a proof based on a atheatically interpretation of Bellan's Principle of Optiality is given): Fig. 1. A road ap. in_two: Y Y Y in_two (a, b) r if a < b then a else b

iniu: (X Y) X-set Y 2.5 Function based odel iniu (g, as) r let l_in (, l_as) = if l_as = {} then else let al_as in l_in (in_two (, g (a)), l_as{a}) and fst_aas in l_in (g (fst_a), as {fst_a}) pre as g {} solution: DPModel State Gain solution (dp, st) r let f (d) = let (g, n_st) = dp (st)(d) in if n_stõdo dp then g else cob (g, solution (dp, n_st)) in iniu (f, do dp (st)) pre stdo dp The basic data odel ay be odified to be based on functions instead of (finite) aps: NotInDoain = State boolean DecisionSpace = State Decision-set Consq = State Decision State Gain DPModelFunc = NotInDoain DecisionSpace Consq inv (dp) r where NotInDoain restricts the doain of the odel (or the proble space), DecisionSpace returns the decision space of a given state and Consq returns the consequences (ie. gain and succeeding state) of executing a decision. The function iniu iniizes the returns of the function g over as, a finite set of arguents to g. In solution iniu is utilized to iniize f over the finite set of valid decisions, do dp (st), where f returns the optial gain of executing a decision, d. The proble of finding the shortest path fro A to B in fig. 1 is solved recursively starting in the start state (A). 2.4 Meory functions As shown in the exaple, RDP consists of a recursively decoposition of a proble into and solving of subprobles, but the solution function, solution, does not prevent the sae proble to be solved ore than once (ie. in the exaple different paths ay lead to the sae node). This (possibly inefficient) resolving of the sae proble ay be eliinated by keeping a record of known solutions: If a solution is recorded, it is already known and ay be used directly. Otherwise the proble ust be solved and the result saved in the record for later use. This use of record keeping or use of a eory function (Brassard and Bratley, 1988) is an (or the) essential part of DP and ay result in great savings in coputing tie. The record of solutions ight be represented as a ap: Solutions = State Gain If knowledge of the policy leading to the solution is required, this optial policy ight be represented as an extension to the record of solutions: SolutionsOptPolicy = State Gain Decision All solution functions in this paper ight be extended with eory functions (see Kure (1995) for direct definitions). Based on this data odel the basic solution function ay be changed into: solution_func: DPModelFunc State Gain solution_func (dp, st) r let (not_in_do, decs, consq) = dp and f (d) = let (g, n_st) = consq (st, d) in if not_in_do (n_st) then g else cob (g, solution_func (dp, n_st)) in iniu (f, decs (st)) The flexibility of this odel and solution function will be exeplified in section 4. Selected parts of a SML-ipleentation (see Paulson (1991) for an introduction) is shown in the appendix. The function QRWBLQBGRP (of type: NotInDoain) is in this ipleentation a global function and not a part of the data odel. 3. GENERALIZED RDP The basic deterinistic odel, DPModel is easily generalized to include stochastic processes. A general suation function used in the definitions is defined as: su: (X Y) X-set Y su (f, as) r let l_su (s, l_as) = if l_as = {} then s else let al_as in l_su (s + f (a), l_as {a}) and fst_aas in l_su (f (fst_a), as {fst_a}) pre as g {}

3.1 The odel 3.2 The solution function Let rando events (or states) be represented by the data type ChanceState. The general odel ight then be defined as: Prob = { xü 0 < x 1 } GenState = State F ChanceState DecConsq = Decision Gain GenState ChanceConsq = (Prob Gain GenState)-set inv (c) r su (fst, c) = 1 GenConsq = GenState DecConsq F ChanceConsq inv (gc) r GDPModel = NotInDoain DecisionSpace GenConsq inv (dp) r The graph in fig. 2 is an exaple of an instance of this odel and it illustrates an iportant feature of the odel: Decision nodes and chance nodes ight appear in any order. A stochastic DP odel (see eg. Nehauser, 1966) is seen to be a special case of GDPModel, in which a State always (unless it is a goal state) is succeeded by ChanceStates and vice versa. In this special case the representation of chance states is superfluous and the odel ight be reforulated as: StochConsq = State Decision (Prob Gain State)-set SDPModel = NotInDoain DecisionSpace StochConsq inv (sdp) r A solution function for the general odel is defined as: solution_gen: GDPModel GenState Gain solution_gen (gdp, st) r let (not_e, decs, consq) = gdp in if st State then let f (d) = let (g, n_st) = consq (d) in if not_e (n_st) then g else cob (g, solution_gen (gdp, n_st)) in iniu (f, decs (st)) else let f ((prob, g, n_st)) = if not_e (n_st) then prob*g else prob* cob (g, solution_gen (gdp, n_st)) in su (f, consq) which is seen to be an extended version of solution_func: if the state st is a ChanceState the weighted gains are sued up; if not, the gains are (exactly as in solution_func) iniized. 4. HEURISTIC RULES AND BOUNDING Dynaic prograing is in itself a very powerful technique, but in soe situations heuristic reasoning ight (or even has to) be applied in order to reduce state spaces (and to ake incoputable probles coputable). The solution functions presented in section 2 and 3 ight be extended to include heuristic reasoning. 4.1 Heuristic rules Without significantly effecting the optial solution a heuristic rule for each state in the state space reduces the decision space and consequently the state space itself. These rules (and restrictions on decision spaces in general) are siply included in the odels as odifications of DecisionSpace. Decision node Change node Fig. 2. An instance of a General DP odel. Nodes represent values (and not variables as in Bayesian Networks). A road ap (continued). The exaple ight illustrate how restrictions on decision spaces ight reduce the state space. The rule/restriction applied to the exaple is as follows: "Always follow the road at the left if you have the option." Fig. 3 illustrates this situation; roads "at the right" have been erased fro the original graph (fig. 1) and only one path fro A to B reains. The state space has been reduced to 4 states (A, N1, N3 and N5) and the recursive solution function is "reduced" to a siple suation function; in_two is never called.

4 A 3 N1 N2 2 1 0 N7 N5 B N3 N6 N4 represented and solved) there is in general no easy way (except processing the reduced state space prior to the optiization) to avoid this redundant proble solving in the VI algorith. The eory usage due to the recursive nature of the algorith is the ain disadvantage of RDP copared to VI. However, the eory coplexity is linear in the depth of the proble or the nuber of stages and will therefore rarely cause any significant probles. Fig. 3. The graph of fig. 1 partitioned into stages and applied the rule: "always follow the road at the left if you have the option." Slaughter pig arketing. Another exaple ight illustrate the potential agnitude of the state space reductions: A farer has a barn which is partitioned into 3 sections; each section containing fro 0 to 5 pigs (of sae age). Sections has to be eptied after no ore than 15 weeks (this is actually a decision space restriction) and sections are assued equal. Pigs ight be arketed at any tie. The proble is: when is it tie to arket individual pigs and when is it tie to epty a section and insert a new group of 5 piglets? The size of the state space of a section is 15*6 = 90 and the size of the state space of the barn pr. stage is: 90*89*88 + 90*1*89 + 90*1*1 = 712,980. A set of restrictions /heuristic rules ight describe the fact that any of these states (like eg. "no pigs in any section") under noral conditions are very rare: - Max 1 insertion pr. week (labor restrictions). - No deliveries before week 13 (fro experience/ growth odels). - One pig is delivered pr. section pr. week fro the age of 13 weeks (fro experience). Given these restrictions the total nuber of states pr. stage is 1,690, but after optiization (40 weeks) the actual axiu nuber of states per stage is 462. Value Iteration. In VI or backward ultistage proble solving (Nehauser, 1966) the proble space is partitioned into stages. Each stage contains probles that ight be solved independently and in any order. The stages are solved in an order which guarantees that all sub-probles (ie. probles associated with succeeding states) already have been solved and recorded (probles associated with goal states is initially known). Fig.3 illustrates one valid partitioning of the graph of fig. 1; probles in stage 1 are solved first, followed by probles in stage 2, etc. Fig. 3 also illustrates one of the ain disadvantages of VI: Redundant proble solving. Probles (like those associated with node N2, N4, N6 and N7) which are not on a path leading fro the start node will be solved, but they will never be reused. In contrast to RDP (where only probles on paths leading fro the start node are 4.2 Bounding Heuristic functions (which returns an under (over) estiate of the solution to a iniization (axiization) proble, see eg. Rich and Knight, 1991) ight without effecting the solution to the proble reduce the proble space. The proble space is reduced by bounding subgraphs which with certainty will not lead to a better possible global solution than the current best known possible solution. Kure (1995) defines a solution function for the basic odel (RDPModel) which utilizes heuristic functions. The function is a special case of an algorith known in AI as A*: The search strategy is a depth first search instead of an "intelligent" ixture of breadth first and depth first search as in A*. A siilar solution function ight be defined for the general DP odel and based on that function an A* algorith for stochastic probles ight be derived. 5. CONCLUSION Several aspects of DP have been exained and discussed in this paper. Soe of these need special considerations. The basic solution function, solution described in section 2 is seen to siply be a reforulation (with cob substituted by the infix operator '+' and the stage nuber interpreted as a state variable) of (1). Despite this very close and obvious relationship, the recursive approach towards DP presented in this paper, ight see new to ost people dealing with DP and it appears difficult to find the approach described in literature (Kure, 1995). The stepwise developent of solution functions presented in this paper has shown that the use of eory functions is a distinct and excludable part of DP. In certain situations it ight even appear ore (tie- and/or space- ) efficient to exclude a eory function fro the solution function than to include it. The recursive approach iplies easy assessent of heuristic rules (and of decision space restricting rules in general). This feature ight appear very beneficial in practical applications: Any appropriate rule is easily applied to the odel and only states which according to

the rules are on a path fro the start state will be represented and solved. This easy application of heuristic rules is fro an applicational point of view the ost interesting feature. Fro a theoretical point of view the flexibility and generality of the odels and solution functions is of greater interest and as indicated the approach ight lead to a stochastic A* algorith and/or algoriths for solving probles that traditionally are represented as Bayesian Networks. REFERENCES Bellan, R. (1957): Dynaic Prograing. Princeton University Press, Princeton, New Jersey. Brassard, G. and P. Bratley (1988): Algoriths. Theory and Practice. Prentice-Hall International, Inc. Jones, C. B. (1990): Systeatic Software Developent using VDM. Prentice Hall International (UK) Ltd. Kure, H. (1995): Recursive Dynaic Prograing. Specified and defined using VDM. Dina Notat No. 29. Dina, The Royal Veterinary and Agricultural University, Copenhagen. Nehauser, G. L. (1966): Dynaic Prograing. John Wiley and Sons, Inc., NY. Paulson, L. C. (1991): ML for the Working Prograer. Cabridge University Press, Cabridge Shenoy, P. P. (1992): Valuation-based systes for Bayesian decision analysis, Operations Research 40, p 463-484. Winston, P. H. (1992): Artificial Intelligence. Addison- Wesley Publishing Copany, NY. APPENDIX: A SML-IMPLEMENTATION Fragents of progra: W\SH6WDWH 6WDJH6WDWH9DU W\SH'30RGHO 6WDWH!'HFLVLRQOLVW 'HFLVLRQ!*DLQ6WDWH IXQ PLQLPXPI>@ UDLVH(UURU _ PLQLPXPIIVWBGGBVHW OHW IXQ OBPLQP>@ P _ OBPLQPGOBGBVHW OBPLQPLQBRIBWZRP IGOBGBVHW LQ OBPLQIIVWBGGBVHW HQG IXQVROXWLRQGSPVW OHW IXQIG OHW YDOJQBVW VQGGSPVWG LQ LI QBVWQRWBLQBGRPGSP WKHQ J HOVH FRPEJ VROXWLRQGSPQBVW HQG LQ PLQLPXPIIVWGSPVW HQG Corresponding fragents of output: W\SH6WDWH 6WDJH6WDWH9DU W\SH'30RGHO 6WDWH!'HFLVLRQOLVW 'HFLVLRQ!*DLQ6WDWH YDOPLQLPXP IQD!*DLQD OLVW!*DLQ YDOVROXWLRQ IQ LQWD!EOLVWE!UHDO LQWDLQWD!UHDO