Efficient Reinforcement Learning in Factored MDPs
|
|
|
- Edgar Chapman
- 10 years ago
- Views:
Transcription
1 Effcent Renforcement Learnng n Factored MDPs Mchael Kearns AT&T Labs [email protected] Daphne Koller Stanford Unversty [email protected] Abstract We present a provably effcent and near-optmal algorthm for renforcement learnng n Markov decson processes (MDPs) whose transton model can be factored as a dynamc Bayesan network (DBN). Our algorthm generalzes the recent E 3 algorthm of Kearns and Sngh, and assumes that we are gven both an algorthm for approxmate plannng, and the graphcal structure (but not the parameters) of the DBN. Unlke the orgnal E 3 algorthm, our new algorthm explots the DBN structure to acheve a runnng tme that scales polynomally n the number of parameters of the DBN, whch may be exponentally smaller than the number of global states. 1 Introducton Kearns and Sngh (1998) recently presented a new algorthm for renforcement learnng n Markov decson processes (MDPs). Ther E 3 algorthm (for Explct Explore or Explot) acheves near-optmal performance n a runnng tme and a number of actons whch are polynomal n the number of states and a parameter T, whch s the horzon tme n the case of dscounted return, and the mxng tme of the optmal polcy n the case of nfnte-horzon average return. The E 3 algorthm makes no assumptons on the structure of the unknown MDP, and the resultng polynomal dependence on the number of states makes E 3 mpractcal n the case of very large MDPs. In partcular, t cannot be easly appled to MDPs n whch the transton probabltes are represented n the factored form of a dynamc Bayesan network (DBN). MDPs wth very large state spaces, and such DBN-MDPs n partcular, are becomng ncreasngly mportant as renforcement learnng methods are appled to problems of growng dffculty [Boutler et al., 1999]. In ths paper, we extend the E 3 algorthm to the case of DBN-MDPs. The orgnal E 3 algorthm reles on the ablty to fnd optmal strateges n a gven MDP that s, to perform plannng. Ths ablty s readly provded by algorthms such as value teraton n the case of small state spaces. Whle the general plannng problem s ntractable n large MDPs, sgnfcant progress has been made recently on approxmate soluton algorthms for both DBN-MDPs n partcular [Boutler et al., 1999], and for large state spaces n general [Kearns et al., 1999; Koller and Parr, 1999]. Our new DBN-E 3 algorthm therefore assumes the exstence of a procedure for fndng approxmately optmal polces n any gven DBN-MDP. Our algorthm also assumes that the qualtatve structure of the transton model s known,.e., the underlyng graphcal structure of the DBN. Ths assumpton s often reasonable, as the qualtatve propertes of a doman are often understood. Usng the plannng procedure as a subroutne, DBN-E 3 explores the state space, learnng the parameters t consders relevant. It acheves near-optmal performance n a runnng tme and a number of actons that are polynomal n T and the number of parameters n the DBN-MDP, whch n general s exponentally smaller than the number of global states. We further examne condtons under whch the mxng tme T of a polcy n a DBN-MDP s polynomal n the number of parameters of the DBN-MDP. The anytme nature of DBN-E 3 allows t to compete wth such polces n total runnng tme that s bounded by a polynomal n the number of parameters. 2 Prelmnares We begn by ntroducng some of the basc concepts of MDPs and factored MDPs. A Markov Decson Process (MDP) s defned as a tuple (S; A; R; P ) where: S s a set of states; A s a set of actons; R s a reward functon R : S 7! [0;R max ], such that R(s) represents the reward obtaned by the agent n state s 1 ; P s a transton model P : S A 7! S, such that P (s 0 j s; a) represents the probablty of landng n state s 0 f the agent takes acton a n state s. Most smply, MDPs are descrbed explctly, by wrtng down a set of transton matrces and reward vectors one for each acton a. However, ths approach s mpractcal for descrbng complex processes. Here, the set of states s typcally descrbed va a set of random varables X = fx 1 ;:::;X n g, where each X takes on values n some fnte doman Val(X ). In general, for a set of varables Y X,an nstantaton y assgns a value x 2 Val(X) for every X 2 Y; we use Val(Y) to denote the set of possble nstantatons to 1 A reward functon s sometmes assocated wth (state,acton) pars rather than wth states. Our assumpton that the reward depends only on the state s made purely to smplfy the presentaton; t has no effect on our results.
2 Y. A state n ths MDP s an assgnment x 2 Val(X); the total number of states s therefore exponentally large n the number of varables. Thus, t s mpractcal to represent the transton model explctly usng transton matrces. The framework of dynamc Bayesan networks (DBNs) allows us to descrbe a certan mportant class of such MDPs n a compact way. Processes whose state s descrbed va a set of varables typcally exhbt a weak form of decouplng not all of the varables at tme t drectly nfluence the transton of a varable X from tme t to tme t +1. For example, n a smple robotcs doman, the locaton of the robot at tme t +1 may depend on ts poston, velocty, and orentaton at tme t, but not on what t s carryng, or on the amount of paper n the prnter. DBNs are desgned to represent such processes compactly. Let a 2 A be an acton. We frst want to specfy the transton model P (x 0 j x;a). Let X denote the varable X at the current tme and X 0 denote the varable at the next tme step. The transton model for acton a wll consst of two parts an underlyng transton graph assocated wth a, and parameters assocated wth that graph. The transton graph s a 2-layer drected acyclc graph whose nodes are fx 1 ;:::;X n ;X1;:::;X 0 ng. 0 All edges n ths graph are drected from nodes n fx 1 ;:::;X n g to nodes n fx 0 1 ;:::;X0 ng; note that we are assumng that there are no edges between varables wthn a tme slce. We denote the parents of X 0 n the graph by Pa a(x 0 ). Intutvely, the transton graph for a specfes the qualtatve nature of probablstc dependences n a sngle tme step namely, the new settng of X depends only on the current settng of the varables n Pa a (X 0 ). To make ths dependence quanttatve, each node X 0 s assocated wth a condtonal probablty table (CPT) P a (X 0 j Pa a(x 0 )). Q The transton probablty P (x 0 j x;a) s then defned to be P a(x 0 j u ),whereu s the settng n x of the varables n Pa a (X 0 ). We also need to provde a compact representaton of the reward functon. As n the transton model, explctly specfyng a reward for each of the exponentally many states s mpractcal. Agan, we use the dea of factorng the representaton of the reward functon nto a set of localzed reward functons, each of whch only depends on a small set of varables. In our robot example, our reward mght be composed of several subrewards: for example, one assocated wth locaton (for gettng too close to a wall), one assocated wth the prnter status (for lettng paper run out), and so on. More precsely, let R be a set of functons R 1 ;:::;R k ; each functon R s assocated wth a cluster of varables C fx 1 ;:::;X n g, such that R s a functon from Val(C ) to IR. Abusng notaton, we wll use R (x) to denote the value that R takes for the part of the state vector correspondng to C. The reward functon assocated wth P the DBN-MDP at a state k x s then defned to be R(x) = R =1 (x) 2 [0;R max ]. The followng defntons for fnte-length paths n MDPs wll be of repeated techncal use n the analyss. Let M be a Markov decson process, and let be a polcy n M. A T -path n M s a sequence p of T +1states (that s, T transtons) of M: p = x 1 ;:::;x T ; x T +1. The probablty that p s traversed n M upon startng n state x 1 and executng polcy s denoted PM [p] =T P k=1 (x k+1 j x k ;(x k )). There are three standard notons of the expected return enjoyed by a polcy n an MDP: the asymptotc dscounted return, the asymptotc average return, and the fnte-tme average return. Lke the orgnal E 3 algorthm, our new generalzaton wll apply to all three cases, and to convey the man deas t suffces for the most part to concentrate on the fntetme average return. Ths s because our fnte-tme average return result can be appled to the asymptotc returns through ether the horzon tme 1=(1, ) for the dscounted case, or the mxng tme of the optmal polcy n the average case. (We examne the propertes of mxng tmes n a DBN-MDP n Secton 5.) Let M be a Markov decson process, let be a polcy n M, andletp be a T -path n M. Theaverage return along p n M s U M (p) =(1=T )(R(x 1 )++ R(x T +1 )): The T -step (expected) average return from state x s U M (x;t) =P P [p]u p M M(p) where the sum s over all T -paths p n M that start at x. Furthermore, we defne the optmal T -step average return from x n M by U M (x;t)= max fu (x;t)g. M An mportant problem n MDPs s plannng: fndng the polcy that acheves optmal return n a gven MDP. In our case, we are nterested n achevng the optmal T -step average return. The complexty of all exact MDP plannng algorthms depends polynomally on the number of states; ths property renders all of these algorthms mpractcal for DBN- MDPs, where the number of states grows exponentally n the sze of the representaton. However, there has been recent progress on algorthms for approxmately solvng MDPs wth large state spaces [Kearns et al., 1999], partcularly on ones represented n a factored way as an MDP [Boutler et al., 1999; Koller and Parr, 1999]. The focus of our work s on the renforcement learnng task, so we smply assume that we have access to a black box that performs approxmate plannng for a DBN-MDP. Defnton 2.1:A -approxmaton T -step plannng algorthm for a DBN-MDP s one that, gven a DBN-MDP M, produces a (compactly represented) polcy such that U M (x;t) (1, )U (x;t). M We wll charge our learnng algorthm a sngle step of computaton for each call to the assumed approxmate plannng algorthm. One way of thnkng about our result s as a reducton of the problem of effcent learnng n DBN-MDPs to the problem of effcent plannng n DBN-MDPs. Our goal s to perform model-based renforcement learnng. Thus, we wsh to learn an approxmate model from experence, and then explot t (or explore t) by plannng gven the approxmate model. In ths paper, we focus on the problem of learnng the model parameters (the CPTs), assumng that the model structure (the transton graphs) s gven to us. It s therefore useful to consder the set of parameters that we wsh to estmate. As we assumed that the rewards are determnstc, we can focus on the probablstc parameters. (Our results easly extend to the case of stochastc rewards.) We defne a transton component of the DBN-MDP to be a
3 dstrbuton P a (X 0 j u) for some acton a and some partcular nstantaton u to the parents Pa a (X 0 ) n the transton model. P Note that the number of transton components s at most a; jval(pa a(x 0 ))j, but may be much lower when a varable s behavor s dentcal for several actons. 3 Overvew of the Orgnal E 3 Snce our algorthm for learnng n DBN-MDPs wll be a drect generalzaton of the E 3 algorthm of Kearns and Sngh hereafter abbrevated KS we begn wth an overvew of that algorthm and ts analyss. It s mportant to bear n mnd that the orgnal algorthm s desgned only for the case where the total number of states N s small, and the algorthm runs n tme polynomal n N. E 3 s what s commonly referred to as an ndrect or modelbased algorthm: rather than mantanng only a current polcy or value functon, the algorthm mantans a model for the transton probabltes and the rewards for some subset of the states of the unknown MDP M. Although the algorthm mantans a partal model of M, t may choose to never buld a complete model of M, f dong so s not necessary to acheve hgh return. The algorthm starts off by dong balanced wanderng: the algorthm, upon arrvng n a state, takes the acton t has tred the fewest tmes from that state (breakng tes randomly). At each state t vsts, the algorthm mantans the obvous statstcs: the reward receved at that state, and for each acton, the emprcal dstrbuton of next states reached (that s, the estmated transton probabltes). A crucal noton s that of a known state astatethat the algorthm has vsted so many tmes that the transton probabltes for that state are very close to ther true values n M. Ths defnton s carefully balanced so that so many tmes s stll polynomally bounded, yet very close suffces to meet the smulaton requrements below. An mportant observaton s that we cannot do balanced wanderng ndefntely before at least one state becomes known: by the Pgeonhole Prncple, we wll soon start to accumulate accurate statstcs at some state. The most mportant constructon of the analyss s the known-state MDP. IfS s the set of currently known states, the known-state MDP s smply an MDP M S that s naturally nduced on S by the full MDP M. Brefly, all transtons n M between states n S are preserved n M S, whle all other transtons n M are redrected n M S to lead to a sngle new, absorbng state that ntutvely represents all of the unknown and unvsted states. Although E 3 does not have drect access to M S, by vrtue of the defnton of the known states, t does have a good approxmaton ^M S. The KS analyss hnges on two central techncal lemmas. The frst s called the Smulaton Lemma, and t establshes that ^M S has good smulaton accuracy: that s, the expected T -step return of any polcy n ^M S s close to ts expected T - step return n M S. Thus, at any tme, ^M S s a useful partal model of M, for that part of M that the algorthm knows very well. The second central techncal lemma s the Explore or Explot Lemma. It states that ether the optmal (T -step) polcy n M acheves ts hgh return by stayng (wth hgh probablty) n the set S of currently known states, or the optmal polcy has sgnfcant probablty of leavng S wthn T steps. Most mportantly, the algorthmcan detect whch of these two s the case; n the frst case, t can smulate the behavor of the optmal polcy by fndng a hgh-return explotaton polcy n the partal model ^M S, and n the second case, t can replcate the behavor of the optmal polcy by fndng an exploraton polcy that quckly reaches the addtonal absorbng state of the partal model ^M S. Thus, by performng two off-lne plannng computatons on ^M S, the algorthm s guaranteed to fnd ether a way to get near-optmal return for the next T steps, or a way to mprove the statstcs at an unknown or unvsted state wthn the next T steps. KS show that ths algorthm ensures near-optmal return n tme polynomal n N. 4 The DBN-E 3 Algorthm Our goal s to derve a generalzaton of E 3 for DBN-MDPs, and to prove for t a result analogous to that of KS but wth a polynomal dependence not on the number of states N, but on the number of CPT parameters ` n the DBN model. Our analyss closely mrrors the orgnal, but requres a sgnfcant generalzaton of the Smulaton Lemma that explots the structure of a DBN-MDP, a modfed constructon of ^M S that can be represented as a DBN-MDP, and a number of alteratons of the detals. Lke the orgnal E 3 algorthm, DBN-E 3 wll buld a model of the unknown DBN-MDP on the bass of ts experence, but now the model wll be represented n a compact, factorzed form. More precsely, suppose that our algorthm s n state x, executes acton a, and arrves n state x 0. Ths experence wll be used to update all the approprate CPT entres of our model namely, all the estmates ^P a (x 0 j u ) are updated n the obvous way, where as usual u s the settng of Pa a (X 0 ) n x. We wll also mantan counts C a (x 0; u ) of the number of tmes ^P a (x 0 j u ) has been updated. Recall that a crucal element of the orgnal E 3 analyss was the noton of a known state. In the orgnal analyss, t was observed that f N s the total number of states, then after O(N ) experences some state must become known by the Pgeonhole Prncple. We cannot hope to use the same logc here, as we are now n a DBN-MDP wth an exponentally large number of states. Rather, we must pgeonhole not on the number of states, but on the number of parameters requred to specfy the DBN-MDP. Towards ths goal, we wll say that the CPT entry ^P a (x 0 j u ) s known f t has been vsted enough tmes to ensure that, wth hgh probablty jp a (x 0 j u ), ^P a (x 0 j u )j: We now would lke to establsh that f, for an approprate choce of, all CPT entres are known, then our approxmate DBN-MDP can be used to accurately estmate the expected return of any polcy n the true DBN-MDP. Ths s the desred generalzaton of the orgnal Smulaton Lemma. As n the orgnal analyss, we wll eventually apply t to a generalzaton of the nduced MDP M S, n whch we delberately restrct attenton to only the known CPT entres.
4 4.1 The DBN-MDP Smulaton Lemma Let M and ^M be two DBN-MDPs over the same state space wth the same transton graphs for every acton a, and wth the same reward functons. Then we say that ^M s an - approxmaton of M f for every acton a and node X 0 n the transton graphs, for every settng u of Pa a (X 0 ),andfor every possble value x 0 of X, 0 jp a (x 0 j u), ^P a (x 0 j u)j where P a (j) and ^P a (j) are the CPTs of M and ^M, respectvely. Lemma 4.1: Let M be any DBN-MDP over n state varables wth ` CPT entres n the transton model, and let ^M be an -approxmaton of M, where = O((=(T 2`R max )) 2 ). Then for any polcy, and for any state x, ju M (x;t), U (x;t)j : ^M Proof: (Sketch) Let us fx a polcy and state x. Recall that for any next state x 0 and any acton a, the transton Q probablty factorzes va the CPTs as P (x 0 j x;a)= P a(x 0 j u ). where u s the settng of Pa a (X 0 ) n x. Let us say that P (x 0 j x;a) contans a -small factor f any of ts CPT factors P a (x 0 j u ) s smaller than. Note that a transton probablty may actually be qute small tself (exponentally small n n) wthout necessarly contanng a -small factor. Our frst goal s to show that trajectores n M and ^M that cross transtons contanng a -small CPT factor can be thrown away wthout much error. Consder a random trajectory of T steps n M from state x followng polcy. It can be shown that the probablty that such a trajectory wll cross at least one transton P (x 0 j x;a) that contans a - small factor s at most T`. Essentally, the probablty that at any step, any partcular -small transton (CPT factor) wll be taken by any partcular varable X s at most. A smple unon argument over the CPT entres and the T tme steps gves the desred bound. Therefore, the total contrbuton to the dfference ju M (x;t), U (x;t)j by these trajectores ^M canbeshowntobeatmostt 2 R max`( + ). We wll thus gnore such trajectores for now. The key advantage of elmnatng -smallfactorssthat we can convert addtve approxmaton guarantees nto multplcatve ones. Let p be any path of length T. If all the relevant CPT factors are greater than, andwelet==, t can be shown that (1, ) Tn P M [p] ^P M [p] (1+)Tn P M [p]: In other words, gnorng -small CPT factors, the dstrbutons on paths nduced by n M and ^M are qute smlar. From ths t follows that, for the upper bound, 2 U ^M (x;t) (1 + )Tn U M (x;t)+t 2 R max`( +2): For the choces = p, = O((=(T 2`R max )) 2 ) the lemma s obtaned. 2 The lower bound argument s entrely symmetrc. Returnng to the man development, we can now gve a precse defnton of a known CPT entry. It s a smple applcaton of Chernoff bounds to show that provded the count C a (x 0 ; u ) exceeds O(1= 2 log(1=)), ^P a (x 0 j u ) has addtve error at most wth probablty at least 1,. We thus say that ths CPT entry s known f ts count exceeds the gven bound for the choce = O((=(T 2 nvr max )) 2 ) specfed by the DBN-MDP Smulaton Lemma. The DBN-MDP Smulaton Lemma shows that f all CPT entres are known, then our approxmate model ^M can be used to fnd a near-optmal polcy n the true DBN-MDP M. Note that we can dentfy whch CPT entres are known va the counts C a (x 0 ; u ). Thus, f we are at a state x for whch at least one of the assocated CPT entres ^P a (x 0 j u ) sunknown, by takng actona we then obtan an experence that wll ncrease the correspondng count C a (x 0; u ). Thus, n analogy wth the orgnal E 3, as long as we are encounterng unknown CPT entres, we can contnue takng actons that ncrease the qualty of our model but now rather than ncreasng counts on a per-state bass, the DBN-MDP Smulaton Lemma shows why t suffces to ncrease the counts on a per-cpt entry bass, whch s crucal for obtanng the runnng tme we desre. We can thus show that f we encounter unknown CPT entres for a number of steps that s polynomal n the total number ` of CPT entres and 1=, there can no longer be any unknown CPT entres, and we know the true DBN-MDP well enough to solve for a near-optmal polcy. However, smlar to the orgnal algorthm, the real dffculty arses when we are n a state wth no unknown CPT entres, yet there do reman unknown CPT entres elsewhere. Then we have no guarantee that we can mprove our model at the next step. In the orgnal algorthm, ths was solved by defnng the known-state MDP M S, and provng the aforementoned Explore or Explot Lemma. Duplcatng ths step for DBN-MDPs wll requre another new dea. 4.2 The DBN-MDP Explore or Explot Lemma In our context, when we construct a known-state MDP, we must satsfy the addtonal requrement that the known-state MDP preserve the DBN structure of the orgnal problem, so that f we have a plannng algorthm for DBN-MDPs that explots the structure, we can then apply t to the known-state MDP 3. Therefore, we cannot just ntroduce a new snk state to represent that part of M that s unknown to us; we must also show how ths snk state can be represented as a settng of the state varables of a DBN-MDP. We present a new constructon, whch extends the dea of known states to the dea of known transtons. We say that a transton component P a (X 0 j u) s known f all of ts CPT entres are known. The basc dea s that, whle t s mpossble to check locally whether a state s known, t s easy to check locally whether a transton component s known. Let T be the set of known transton components. We defne the known-transton DBN-MDP M T as follows. The 3 Certan approaches to approxmate plannng n large MDPs do not requre any structural assumptons [Kearns et al., 1999], butwe antcpate that the most effectve DBN-MDP plannng algorthms eventually wll.
5 model behaves dentcally to M as long as only known transtons are taken. As soon as an unknown transton s taken for some varable X 0,thevarableX0 takes on a new wanderng value w, whch we ntroduce nto the model. The transton model s defned so that, once a varable takes on the value w, ts value never changes. The reward functon s defned so that, once at least one varable takes on the wanderng value, the total reward s nonpostve. These two propertes gve us the same overall behavor that KS got by makng a snk state for the set of unknown states. Defnton 4.2:Let M be a DBN-MDP and let T be any subset of the transton components n the model. The nduced DBN-MDP on T, denoted M T, s defned as follows: M T has the same set of state varables as M; however, n M T, each varable X has, n addton to ts orgnal set of values Val M (X ), a new value w. M T has the same transton graphs as M. For each a,, andu 2 Val M (Pa a (X 0 )), wehavethatp M T a (X 0 j u) =Pa M (X0 j u) f the correspondng transton component s n T ; n all other cases, Pa MT (w j u) =1,and Pa MT (x j u) =0for all x 2 Val M (X ). M T has the same set R as M. For each = 1;:::;k and c 2 Val M (C ),wehavethatr M T (c) =R M (c). For other vectors c, wehavethatr MT (c) =,R max. Wth ths defnton, we can prove the analogue to the Explore or Explot Lemma (detals omtted). Lemma 4.3:Let M be any DBN-MDP, let T be any subset of the transton components of M, and let M T be the nduced MDP on M. For any x 2 S, any T, and any 1 >>0, ether there exsts a polcy n M T such that U (x;t) MT U M (x;t),, or there exsts a polcy n M T such that the probablty that a walk of T steps followng wll take at least one transton not n T exceeds =((k +1)TR max ). Ths lemma essentally asserts that ether there exsts a polcy that already acheves near-optmal (global) return by stayng only n the local model M T, or there exsts a polcy that quckly exts the local model. 4.3 Puttng It All Together We now have all the peces to fnsh the descrpton and analyss of the DBN-E 3 algorthm. The algorthmntally executes balanced wanderng for some perod of tme. After some number of steps, by the Pgeonhole Prncple one or more transton components become known. When the algorthm reaches a known state x one where all the transton components are known t can no longer perform balanced wanderng. At that pont, the algorthm performs approxmate off-lne polcy computatons for two dfferent DBN-MDPs. The frst corresponds to attempted explotaton, and the second to attempted exploraton. Let T be the set of known transtons at ths step. In the attempted explotaton computaton, the DBN-E 3 algorthm would lke to fnd the optmal polcy on the nduced DBN- MDP M T. Clearly, ths DBN-MDP s not known to the algorthm. Thus, we use ts approxmaton ^M T, where the true transtonprobabltesare replaced wth ther current approxmaton n the model. The defnton of M T uses only the CPT entres of known transton components. The Smulaton Lemma now tells us that, for an approprate choce of a choce that wll result n a defnton of known transton that requres the correspondng count to be only polynomal n 1=, n, v, andt the return of any polcy n ^M T s wthn of ts return n M T. We wll specfy a choce for later (whch n turn sets the choce of and the defnton of known state). Let us now consder the two cases n the Explore or Explot Lemma. In the explotaton case, there exsts a polcy n M T such that U (x;t) U MT M (x;t),. (Agan, we wll dscuss the choce of below.) From the Smulaton Lemma,wehavethatU ^M (x;t) U T M (x;t),( +). Our approxmate plannng algorthm returns a polcy 0 whose value n ^M T s guaranteed to be a multplcatve factor of at most 1, away from the optmal polcy n ^M T. Thus, we are guaranteed that U 0 (x;t) (1, )(U ^M T M (x;t), ( + )). Therefore, n the explotaton case, our approxmate planner s guaranteed to return a polcy whose value s close to the optmal value. In the exploraton case, there exsts a polcy n M T (and therefore n ^M T ) that s guaranteed to take an unknown transton wthn T steps wth some mnmum probablty. Our goal now s to use our approxmate planner to fnd such a polcy. In order to do that, we need use a slghtly dfferent constructon MT 0 ( ^M T 0 ). The transton structure of M T 0 s dentcal to that of M T. However, the rewards are now dfferent. Here, for each =1;:::;kand c 2 Val M (C ),wehavethat R M 0 T (c) =0; for other vectors c, wehavethatr M T (c) =1. Now let 0 be the polcy returned by our approxmate planner on the DBN-MDP ^M T 0. It can be shown that the probablty that a T -step walk followng 0 wll take at least one unknown transton s at least (1, )(=((k +1)TR max ), )=kt. To summarze: our approxmate planner ether fnds an explotaton polcy n ^M T that enjoys actual return UM (x;t) (1, )(U M (x;t), ( + )) from our current state x, or t fnds an explotaton polcy n ^M T 0 that has probablty at least p =(1, )(=((k +1)TR max ), )=kt of mprovng our statstcs at an unknown transton n the next T steps. Approprate choces for and yeld our man theorem, whch we are now fnally ready to descrbe. Recall that for expostory purposes we have concentrated on the case of T -step average return. However, as for the orgnal E 3, our man result can be stated n terms of the asymptotc dscounted and average return cases. We omt the detals of ths translaton, but t s a smple matter of argung that t suffces to set T to be ether (1=(1,)) log(1=) (dscounted) or the mxng tme of the optmal polcy (average). Theorem 4.4: (Man Theorem) Let M be a DBN-MDP wth ` total entres n the CPTs. (Undscounted case) Let T be the mxng tme of the polcy achevng the optmal average asymptotc return U n M. There exsts an algorthm DBN-E 3 that, gven access to a -approxmaton plannng algorthm for DBN-
6 MDPs, and gven nputs ; ;`;T and U, takes a number of actons and computaton tme bounded by a polynomal n 1=(1, ); 1=, 1=, `, T, and R max, and wth probablty at least 1,, acheves total actual return exceedng U,. (Dscounted case) Let V denote the value functon for the polcy wth the optmal expected dscounted return n M. There exsts an algorthm DBN-E 3 that, gven access to a -approxmaton plannng algorthm for DBN-MDPs, and gven nputs,, ` and V, takes a number of actons and computaton tme bounded by a polynomal n 1=(1, ); 1=; 1=;`, the horzon tme T =1=(1, ), and R max, and wth probablty at least 1,, wll halt n a state x, and output a polcy ^, such that V ^ M (x) V (x),. Some remarks: The loss n polcy qualty nduced by the approxmate plannng subroutne translates nto degradaton n the runnng tme of our algorthm. As wth the orgnal E 3, we can elmnate knowledge of the optmal returns n both cases va search technques. Although we have stated our asymptotc undscounted average return result n terms of the mxng tme of the optmal polcy, we can nstead gve an anytme algorthm that competes aganst polces wth longer and longer mxng tmes the longer t s run. (We omt detals, but the analyss s analogous to the orgnal E 3 analyss.) Ths extenson s especally mportant n lght of the results of the followng secton, where we examne propertes of mxng tmes n DBN-MDPs. 5 Mxng Tme Bounds for DBN-MDPs As n the orgnal E 3 paper, our average case result depends on the amount of tme T that t takes the target polcy to mx. Ths dependence s unavodable. If some of the probabltes are very small, so that the optmal polcy cannot easly reach the hgh-reward parts of the space, t s unrealstc to expect the renforcement learnng algorthm to do any better. In the context of a DBN-MDP, however, ths dependence s more troublng. The sze of the state space s exponentally large, and vrtually all of the probabltes for transtonng from one state to the next wll be exponentally small (because a transton probablty s the product of n numbers that are 1). Indeed, one can construct very reasonable DBN- MDPs that have an exponentally long mxng tme. For example, a DBN representng the Markov chan of an Isng model [Jerrum and Snclar, 1993] has small parent sets (at most four parents per node), and CPT entres that are reasonably large. Nevertheless, the mxng tme of such a DBN can be exponentally large n n. Gven that even reasonable DBNs such as ths can have exponental mxng tmes, one mght thnk that ths s the typcal stuaton that s, that most DBN-MDPs have an exponentally long mxng tme, rentroducng the exponental dependence on n that we have been tryng so hard to avod. We now show that ths s not always the case. We provde a tool for analyzng the mxng tme of a polcy n a DBN-MDP, whch can gve us much better bounds on the mxng tme. In partcular, we demonstrate a class of DBN-MDPs and assocated polces for whch we can guarantee rapd mxng. Note that any fxed polcy n a DBN-MDP defnes a Markov chan whose transton model s represented as a DBN. We therefore begn by consderng the mxng tme of a pure DBN, wth no actons. We then extend that analyss to the mxng rate for a fxed polcy n a DBN-MDP. Defnton 5.1:Let Q be a transton model for a Markov chan, and let fx (t) g 1 t=1 represent the state of the chan. Let S = fx 1 ;:::;x s g. Let j be the statonary probablty of x j n ths Markov chan. We say that the Markov chan Q s -mxed at tme m f max ;j jp (X (t) = x j j X (1) = x ), j j. Our bounds on mxng tmes make use of the couplng method [Lndvall, 1992]. The dea of the couplng method s as follows: we run two copes of the Markov chan n parallel, from dfferent startng ponts. Our goal s to make the states of the two processes coalesce. Intutvely, the frst tme the states of the two copes are the same, the ntal states have been forgotten, whch corresponds to the processes havng mxed. More precsely, consder a transton matrx Q over some state space S. Let Q be a transton matrx over the state space S S, such that f f(y (t) ;Z (t) )g 1 t=1 s the Markov chan for Q, then the separated Markov chans fy (t) g 1 t=1 and fz (t) g 1 t=1 both evolve accordng to Q. Let be the random varable that represents the couplng tme the smallest m for whch Y (m) = Z (m). The followng lemma establshes the correspondence between mxng and couplng tmes. Lemma 5.2: For any, letm be such that for any ; j = 1;:::;s, P ( >mj Y (1) = x ;Z (1) = x j ). ThenQ s -mxed at tme m. Thus, to show that a Markov chan s -mxed by some tme m, we need only construct a coupled chan and show that the probablty that ths chan has not coupled by tme m decreases very rapdly n m. The couplng method allows us to construct the jont chan over (Y (t) ;Z (t) ) n any way that we want, as long as each of the two chans n solaton has the same dynamcs as the orgnal Markov chan Q. In partcular, we can correlate the transtons of the two processes, so as to make ther states concde faster than they would f each was pcked ndependently of the other. That s, we choose Y (t+1) and Z (t+1) to be equal to each other whenever possble, subject to the constrants on the transton probabltes. More precsely, let Y (t) = x and Z (t) = x j. For any value x 2 S, we can make the event Y (t+1) = x ;Z (t+1) = x j have a probablty that s the smaller of P (X 0 = x k j X = x ) and P (X 0 = x k j X = x j ). Compare ths to the probablty of ths event f the two processes were ndependent, whch s the product of these two numbers rather than ther mnmum. Overall, by correlatng the two processes as much as possble, and consderng the worst case over the current state
7 of the process, we can guarantee that, at every step, the two processes couple wth probablty at least mn ;j X k mn[p (X 0 = x k j X = x );P(X 0 = x k j X = x j )] Ths quantty represents the amount of probablty mass that any two transton dstrbutons are guaranteed to have n common. It s called the Dobrushn coeffcent, and s the contracton rate for L 1 -norm [Dobrushn, 1956] n Markov chans. Now, consder a DBN over the state varables X = fx 1 ;:::;X n g. As above, we create two copes of the process, lettng Y 1 ;:::;Y n denote the varables n the frst component of the coupled Markov chan, and Z 1 ;:::;Z n denote those n the second component. Our goal s to construct a Markov chan over Y; Z such that both Y and Z separately have the same dynamcs as X n the orgnal DBN. Our constructon of the jont Markov chan s very smlar to the one used above, except that wll now choose the transton of each varable par Y and Z so as to maxmze the probablty that they couple (assume the same value). As above, we can guarantee that Y and Z couple at any tme t wth probablty at least 8 < = mn u;u 0 2Val(Pa(X 0)) : X x 2Val(X ) mn[p (x j u);p(x j u 0 )] Ths coeffcent was defned by [Boyen and Koller, 1998] n ther analyss of the contracton rate of DBNs. Note that depends only on the numbers n a sngle CPT of the DBN. Assumngthatthe transtonprobabltesneach CPT are not too extreme, the probablty that any sngle varable couples wll be reasonably hgh. Unfortunately, ths bound s not enough to show that all of the varable pars couple wthn a short tme. The problem s that t s not enough for two varables Y (t) and Z (t) to couple, as process dynamcs may force us to decouple them at subsequent tme slces. To understand ths ssue, consder a smple process wth two varables X 1 ;X 2,and a transton graph wth the edges X 1! X1, 0 X 2! X2, 0 X 1! X2. 0 Assume that at tme t, the varable par Y (t) 2 ;Z(t) 2 has coupled wth value x 2,butY (t) 1 ;Z(t) 1 has not, so that Y (t) 1 = x 1 and Z (t) 1 = x 0 1. At the next tme slce, we must select Y (t+1) 2 ;Z (t+1) 2 from two dfferent dstrbutons P (X 0 2 j x 1 ;x 2 ) and P (X 0 2 j x 0 1;x 2 ), respectvely. Thus, our samplng process may be forced to gve them dfferent values, decouplng them agan. As ths example clearly llustrates, t s not enough for a varable par to couple momentarly. In order to eventually couple the two processes as a whole, we need to make each varable par a stable par.e., we need to guarantee that our samplng process can keep them coupled from then on. In our example, the par Y 1 ;Z 1 s stable as soon as t frst couples. And once Y 1 ;Z 1 s stable, then Y 2 ;Z 2 wll also be stable as soon as t couples. However, f Y 2 ;Z 2 couples whle Y 1 ;Z 1 s not yet stable, then the samplng process cannot guarantee stablty. 9 = ; In general, a varable par can only be stable f ther parents are also stable. So what happens f we add the edge X 2! X 0 1 to our transton model? In ths case, nether Y 1 ;Z 1 nor Y 2 ;Z 2 can stablze n solaton. They can only stablze f they couple smultaneously. Ths dscusson leads to the followng defnton. Defnton 5.3:Consder a DBN over the state varables X 1 ;:::;X n.thedependency graph D for the DBN s a drected cyclc graph whose nodes are X 1 ;:::;X n and where there s a drected edge from X to X j f there s an edge n the transton graph of the DBN from X to X 0. j Hence, there s a drected path from X to X j n D ff X (t) nfluences X (t0 ) j for some t 0 >t. We assume that the transton graph of the DBN always has arcs X! X 0, so that the every node n D has a self-loop. Let, 1 ;:::;, l be the maxmal strongly connected componentsnd, sorted so that f <j, there are no drected edges from, j to,. Our analyss wll be based on stablzng the, s n successon. (We note that we provde only a rough bound; a more refned analyss s possble.) Let = mn and g = max j j, j j. Assume that, 1 ;:::;,,1 have all stablzed by tme t. In order for, to stablze, all of the varables need to couple at exactly the same tme. Ths event happens at tme t wth probablty g. As soon as, stablzes, we can move on to stablzng, +1. When all the, s have stablzed, we are done. Theorem 5.4:For any 0, the Markov chan correspondng to a DBN as descrbed above s -mxed at tme m provded m 8l g log(1=): Thus, the mxng tme of a DBN grows exponentally wth the sze of the largest component n the dependency graph, whch may be sgnfcantly smaller than the total number of varables n a DBN. Indeed, n two real-lfe DBNs BAT [Forbes et al., 1995] wth ten state varables, and WATER [Jensen et al., 1989] wth eght the maxmal cluster sze s 3 4. It remans only to extend ths analyss to DBN-MDPs, where we have a polcy. Our stochastc couplng scheme must now deal wth the fact that the actons taken at tme t n the two copes of the process may be dfferent. The dffculty s that dfferent actons at tme t correspond to dfferent transton models. If a varable X has a dfferent transton model n dfferent transton graphs P a, t wll use a dfferent transton dstrbuton f the acton s not the same. Hence X cannot stablze untl we are guaranteed that the same acton s taken n both copes. That s, the acton must also stablze. The acton s only guaranteed to have stablzed when all of the varables on whch the choce of acton can possbly depend have stablzed. Otherwse, we mght encounter a par of states n whch we are forced to use dfferent actons n the two copes. We can analyze ths behavor by extendng the dependency graph to nclude a new node correspondng to the choce of acton. We then see what assumptons allow us to bound the set of ncomng and outgong edges. We can then use
8 the same analyss descrbed above to bound the mxng tme. The outgong edges correspond to the effect of an acton. In many processes, the acton only drectly affects the transton model of a small number of state varables n the process. In other words, for many varables X,wehavethatPa a (X ) and P a (X j Pa a (X )) are the same for all a. In ths case, the new acton node wll only have outgong edges to the remanng varables (those for whch the transton model mght dffer). We note that such localzed nfluence models have a long hstory both for nfluence dagram [Howard and Matheson, 1984] and for DBN-MDPs [Boutler et al., 1999]. Now, consder outgong edges. In general, the optmal polcy mght well be such that the acton depends on every varable. However, the mere representaton of such a polcy may be very complex, renderng ts use mpractcal n a DBN-MDP wth many varables. Therefore, we often want to restrct attenton to a smpler class of polces, such as a small fnte state machne or a small decson tree. If our target polcy s such that the choce of acton only depends on a small number of varables, then there wll only be a small number of ncomng edges nto the acton node n the dependency graph. Havng ntegrated the acton node nto the dependency graph, our analyss above holds unchanged. The only dfference from a random varable s that we do not have to nclude the acton node when computng the sze of the, that contans t, as we do not have to stochastcally make t couple; rather, t couples mmedately once ts parents have coupled. Fnally, we note that ths analyss easly accommodates DBN-MDPs where the decson about the acton s also decomposed nto several ndependent decsons (e.g., as n [Meuleau et al., 1998]). Dfferent component decsons can nfluence dfferent subsets of varables, and the choce of acton n each one can depend on dfferent subsets of varables. Each decson forms a separate node n the dependency graph, and can stablze ndependently of the other decsons. The analyss above gves us technques for estmatng the mxng rate of polces n DBN-MDPs. In partcular, f we want to focus on gettng a good steady-state return from DBN-E 3 n a reasonable amount of tme, ths analyss shows us how to restrct attenton to polces that are guaranteed to mx rapdly gven the structure of the gven DBN-MDP. 6 Conclusons Structured probablstc models, and partcularly Bayesan networks, have revolutonzed the feld of reasonng under uncertanty by allowng compact representatons of complex domans. Ther success s bult on the fact that ths structure can be exploted effectvely by nference and learnng algorthms. Ths success leads one to hope that smlar structure can be exploted n the context of plannng and renforcement learnng under uncertanty. Ths paper, together wth the recent work on representng and reasonng wth factored MDPs [Boutler et al., 1999], demonstrate that substantal computatonal gans can ndeed be obtaned from these compact, structured representatons. Ths paper leaves many nterestng problems unaddressed. Of these, the most ntrgung one s to allow the algorthm to learn the model structure as well as the parameters. The recent body of work on learnng Bayesan networks from data [Heckerman, 1995] lays much of the foundaton, but the ntegraton of these deas wth the problems of exploraton/explotaton s far from trval. Acknowledgements We are grateful to the members of the DAGS group for useful dscussons, and partcularly to Bran Mlch for pontng out a problem n an earler verson of ths paper. The work of Daphne Koller was supported by the ARO under the MURI program Integrated Approach to Intellgent Systems, by ONR contract N C-8554 under DARPA s HPKB program, and by the generosty of the Powell Foundaton and the Sloan Foundaton. References [Boutler et al., 1999] C. Boutler, T. Dean, and S. Hanks. Decson theoretc plannng: Structural assumptons and computatonal leverage. Journal of Artfcal Intellgence Research, To appear. [Boyen and Koller, 1998] X. Boyen and D. Koller. Tractable nference for complex stochastc processes. In Proc. UAI, pages 33 42, [Dobrushn, 1956] R.L. Dobrushn. Central lmt theorem for nonstatonary Markov chans. Theoryof Probablty and ts Applcatons, pages 65 80, [Forbes et al., 1995] J. Forbes, T. Huang, K. Kanazawa, and S.J. Russell. The BATmoble: Towards a Bayesan automated tax. In Proc. IJCAI, [Heckerman, 1995] D. Heckerman. A tutoral on learnng wth Bayesan networks. Techncal Report MSR-TR-95-06, Mcrosoft Research, [Howard and Matheson, 1984] R. A. Howard and J. E. Matheson. Influence dagrams. In R. A. Howard and J. E. Matheson, edtors, Readngs on the Prncples and Applcatons of Decson Analyss, pages Strategc Decsons Group, Menlo Park, Calforna, [Jensen et al., 1989] F.V. Jensen, U. Kjærulff, K.G. Olesen, and J. Pedersen. An expert system for control of waste water treatment a plot project. Techncal report, Judex Datasystemer A/S, Aalborg, In Dansh. [Jerrum and Snclar, 1993] M. Jerrum and A. Snclar. Polynomaltme approxmaton algorthms for the Isng model. SIAM Journal on Computng, 22: , [Kearns and Sngh, 1998] M. Kearns and S.P. Sngh. Near-optmal performance for renforcement learnng n polynomal tme. In Proc. ICML, pages , [Kearns et al., 1999] M. Kearns, Y. Mansour, and A. Ng. A sparse samplng algorthm for near-optmal plannng n large markov decson processes. In these proceedngs, [Koller and Parr, 1999] D. Koller and R. Parr. Computng factored value functons for polces n structured MDPs. In these proceedngs, [Lndvall, 1992] T. Lndvall. Lectureson the Couplng Method.Wley, [Meuleau et al., 1998] N. Meuleau, M. Hauskrecht, K-E. Km, L. Peshkn, L.P. Kaelblng, T. Dean, and C. Boutler. Solvng very large weakly coupled Markov decson processes. In Proc. AAAI, pages , 1998.
Luby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
Recurrence. 1 Definitions and main statements
Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
Extending Probabilistic Dynamic Epistemic Logic
Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
1 Example 1: Axis-aligned rectangles
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
The Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
Solving Factored MDPs with Continuous and Discrete Variables
Solvng Factored MPs wth Contnuous and screte Varables Carlos Guestrn Berkeley Research Center Intel Corporaton Mlos Hauskrecht epartment of Computer Scence Unversty of Pttsburgh Branslav Kveton Intellgent
Support Vector Machines
Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.
A Probabilistic Theory of Coherence
A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want
How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence
1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh
8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by
6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts
Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)
8 Algorithm for Binary Searching in Trees
8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the
On the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
+ + + - - This circuit than can be reduced to a planar circuit
MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to
PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
Generalizing the degree sequence problem
Mddlebury College March 2009 Arzona State Unversty Dscrete Mathematcs Semnar The degree sequence problem Problem: Gven an nteger sequence d = (d 1,...,d n ) determne f there exsts a graph G wth d as ts
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
v a 1 b 1 i, a 2 b 2 i,..., a n b n i.
SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are
POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and
POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek
HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo
J. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
Efficient Project Portfolio as a tool for Enterprise Risk Management
Effcent Proect Portfolo as a tool for Enterprse Rsk Management Valentn O. Nkonov Ural State Techncal Unversty Growth Traectory Consultng Company January 5, 27 Effcent Proect Portfolo as a tool for Enterprse
Lecture 3: Force of Interest, Real Interest Rate, Annuity
Lecture 3: Force of Interest, Real Interest Rate, Annuty Goals: Study contnuous compoundng and force of nterest Dscuss real nterest rate Learn annuty-mmedate, and ts present value Study annuty-due, and
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
Traffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
Inter-Ing 2007. INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007.
Inter-Ing 2007 INTERDISCIPLINARITY IN ENGINEERING SCIENTIFIC INTERNATIONAL CONFERENCE, TG. MUREŞ ROMÂNIA, 15-16 November 2007. UNCERTAINTY REGION SIMULATION FOR A SERIAL ROBOT STRUCTURE MARIUS SEBASTIAN
A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
Formulating & Solving Integer Problems Chapter 11 289
Formulatng & Solvng Integer Problems Chapter 11 289 The Optonal Stop TSP If we drop the requrement that every stop must be vsted, we then get the optonal stop TSP. Ths mght correspond to a ob sequencng
PERRON FROBENIUS THEOREM
PERRON FROBENIUS THEOREM R. CLARK ROBINSON Defnton. A n n matrx M wth real entres m, s called a stochastc matrx provded () all the entres m satsfy 0 m, () each of the columns sum to one, m = for all, ()
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
The OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
Financial Mathemetics
Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,
Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks
Bulletn of Mathematcal Bology (21 DOI 1.17/s11538-1-9517-4 ORIGINAL ARTICLE Product-Form Statonary Dstrbutons for Defcency Zero Chemcal Reacton Networks Davd F. Anderson, Gheorghe Cracun, Thomas G. Kurtz
RELIABILITY, RISK AND AVAILABILITY ANLYSIS OF A CONTAINER GANTRY CRANE ABSTRACT
Kolowrock Krzysztof Joanna oszynska MODELLING ENVIRONMENT AND INFRATRUCTURE INFLUENCE ON RELIABILITY AND OPERATION RT&A # () (Vol.) March RELIABILITY RIK AND AVAILABILITY ANLYI OF A CONTAINER GANTRY CRANE
Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
Sketching Sampled Data Streams
Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA [email protected] [email protected] Abstract Samplng s used as a unversal method to reduce the
BERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection
Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia
To appear n Journal o Appled Probablty June 2007 O-COSTAT SUM RED-AD-BLACK GAMES WITH BET-DEPEDET WI PROBABILITY FUCTIO LAURA POTIGGIA, Unversty o the Scences n Phladelpha Abstract In ths paper we nvestgate
Availability-Based Path Selection and Network Vulnerability Assessment
Avalablty-Based Path Selecton and Network Vulnerablty Assessment Song Yang, Stojan Trajanovsk and Fernando A. Kupers Delft Unversty of Technology, The Netherlands {S.Yang, S.Trajanovsk, F.A.Kupers}@tudelft.nl
Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money
Ch. 6 - The Tme Value of Money Tme Value of Money The Interest Rate Smple Interest Compound Interest Amortzng a Loan FIN21- Ahmed Y, Dasht TIME VALUE OF MONEY OR DISCOUNTED CASH FLOW ANALYSIS Very Important
CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS
CALL ADMISSION CONTROL IN WIRELESS MULTIMEDIA NETWORKS Novella Bartoln 1, Imrch Chlamtac 2 1 Dpartmento d Informatca, Unverstà d Roma La Sapenza, Roma, Italy [email protected] 2 Center for Advanced
Lecture 3: Annuity. Study annuities whose payments form a geometric progression or a arithmetic progression.
Lecture 3: Annuty Goals: Learn contnuous annuty and perpetuty. Study annutes whose payments form a geometrc progresson or a arthmetc progresson. Dscuss yeld rates. Introduce Amortzaton Suggested Textbook
2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet
2008/8 An ntegrated model for warehouse and nventory plannng Géraldne Strack and Yves Pochet CORE Voe du Roman Pays 34 B-1348 Louvan-la-Neuve, Belgum. Tel (32 10) 47 43 04 Fax (32 10) 47 43 01 E-mal: [email protected]
A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING. Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582
NMT EE 589 & UNM ME 482/582 ROBOT ENGINEERING Dr. Stephen Bruder NMT EE 589 & UNM ME 482/582 7. Root Dynamcs 7.2 Intro to Root Dynamcs We now look at the forces requred to cause moton of the root.e. dynamcs!!
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
Examensarbete. Rotating Workforce Scheduling. Caroline Granfeldt
Examensarbete Rotatng Workforce Schedulng Carolne Granfeldt LTH - MAT - EX - - 2015 / 08 - - SE Rotatng Workforce Schedulng Optmerngslära, Lnköpngs Unverstet Carolne Granfeldt LTH - MAT - EX - - 2015
Optimal resource capacity management for stochastic networks
Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, [email protected]
Mining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
Article received on July 15, 2008; accepted on April 03, 2009
AsstO: A Qualtatve MDP-based Recommender System for Power Plant Operaton AsstO: Un Sstema de Recomendacones basado en MDPs Cualtatvos para la Operacón de Plantas Generadoras Alberto Reyes 1, L. Enrque
1. Math 210 Finite Mathematics
1. ath 210 Fnte athematcs Chapter 5.2 and 5.3 Annutes ortgages Amortzaton Professor Rchard Blecksmth Dept. of athematcal Scences Northern Illnos Unversty ath 210 Webste: http://math.nu.edu/courses/math210
A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
Complete Fairness in Secure Two-Party Computation
Complete Farness n Secure Two-Party Computaton S. Dov Gordon Carmt Hazay Jonathan Katz Yehuda Lndell Abstract In the settng of secure two-party computaton, two mutually dstrustng partes wsh to compute
How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining
Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,
Section 5.3 Annuities, Future Value, and Sinking Funds
Secton 5.3 Annutes, Future Value, and Snkng Funds Ordnary Annutes A sequence of equal payments made at equal perods of tme s called an annuty. The tme between payments s the payment perod, and the tme
Implementation of Deutsch's Algorithm Using Mathcad
Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"
Lecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT
Chapter 4 ECOOMIC DISATCH AD UIT COMMITMET ITRODUCTIO A power system has several power plants. Each power plant has several generatng unts. At any pont of tme, the total load n the system s met by the
Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
Section 5.4 Annuities, Present Value, and Amortization
Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today
Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems
Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent
FORMAL ANALYSIS FOR REAL-TIME SCHEDULING
FORMAL ANALYSIS FOR REAL-TIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded
IMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
How To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
Simple Interest Loans (Section 5.1) :
Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part
How To Calculate An Approxmaton Factor Of 1 1/E
Approxmaton algorthms for allocaton problems: Improvng the factor of 1 1/e Urel Fege Mcrosoft Research Redmond, WA 98052 [email protected] Jan Vondrák Prnceton Unversty Prnceton, NJ 08540 [email protected]
The literature on many-server approximations provides significant simplifications toward the optimal capacity
Publshed onlne ahead of prnt November 13, 2009 Copyrght: INFORMS holds copyrght to ths Artcles n Advance verson, whch s made avalable to nsttutonal subscrbers. The fle may not be posted on any other webste,
Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters
Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany [email protected],
7.5. Present Value of an Annuity. Investigate
7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
Credit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6
PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has
