This paper can be downloaded without charge from the Social Sciences Research Network Electronic Paper Collection: http://ssrn.com/abstract=2694633

Workng Paper Coordnatng Prcng and Inventory Replenshment wth Nonparametrc Demand Learnng Boxao Chen Department of Industral and Operatons Engneerng Unversty of Mchgan Xul Chao Department of Industral and Operatons Engneerng Unversty of Mchgan Hyun-Soo Ahn Stephen M. Ross School of Busness Unversty of Mchgan Ross School of Busness Workng Paper Seres Workng Paper No. 194 June 015 Ths paper can be downloaded wthout charge from the Socal Scences Research Network Electronc Paper Collecton: http://ssrn.com/abstract=694633 UNIVERSITY OF MICHIGAN

Coordnatng Prcng and Inventory Replenshment wth Nonparametrc Demand Learnng Boxao Chen 1, Xul Chao and Hyun-Soo Ahn 3 Abstract We consder a frm e.g., retaler) sellng a sngle nonpershable product over a fnte-perod plannng horzon. Demand n each perod s stochastc and prce-dependent, and unsatsfed demands are backlogged. At the begnnng of each perod, the frm determnes ts sellng prce and nventory replenshment quantty, but t knows nether the form of demand dependency on sellng prce nor the dstrbuton of demand uncertanty a pror, hence t has to make prcng and orderng decsons based on hstorcal demand data. We propose a nonparametrc data-drven polcy that learns about the demand on the fly and, concurrently, apples learned nformaton to determne replenshment and prcng decsons. The polcy ntegrates learnng and acton n a sense that the frm actvely experments on prcng and nventory levels to collect demand nformaton wth the least possble proft loss. Besdes convergence of optmal polces, we show that the regret, defned as the average proft loss compared wth that of the optmal soluton when the frm has complete nformaton about the underlyng demand, vanshes at the fastest possble rate as the plannng horzon ncreases. Keywords: dynamc prcng, nventory control, demand learnng, nonparametrc estmaton, nonpershable products, asymptotc optmalty. 1 Department of Industral and Operatons Engneerng, Unversty of Mchgan, Ann Arbor, MI 48109. Emal: boxchen@umch.edu Department of Industral and Operatons Engneerng, Unversty of Mchgan, Ann Arbor, MI 48109. Emal: xchao@umch.edu 3 Department of Technology and Operatons, Ross School of Busness, Unversty of Mchgan, Ann Arbor, MI 48109. Emal: hsahn@umch.edu 1 Electronc copy avalable at: http://ssrn.com/abstract=694633

1 Introducton Balancng supply and demand s a challenge for all frms, and falure to do so can drectly affect the bottom-lne of a company. From the supply sde, frms can use operatonal levers such as producton and nventory decsons to adjust nventory level n pace of uncertan demand. From the demand sde, frms can deploy marketng levers such as prcng and promotonal decsons to shape the demand to better allocate the lmted or excess) nventory n the most proftable way. Wth the ncreasng avalablty of demand data and new technologes, e.g., electronc data nterchange, pont of sale devces, clck stream data etc., deployng both operatonal and marketng levers smultaneously s now possble. Indeed, both academcs and practtoners have recognzed that substantal benefts can be obtaned from coordnatng operatonal and prcng decsons. As a result, the research lterature on jont prcng and nventory decsons has rapdly grown n recent years, see, e.g., the survey papers by Petruzz and Dada 1999), Elmaghraby and Kesknocak 003), Yano and Glbert 003), and Chen and Smch-Lev 01). Despte the volumnous lterature, the majorty of the papers on jont optmzaton of prcng and nventory control have assumed that the frm knows how the market responds to ts sellng prces and the exact dstrbuton of uncertanty n customer demand for any gven prce. Ths s not true n many applcatons, partcularly wth demand of new products. In such settngs, the frm needs to learn about demand nformaton durng the dynamc decson makng process and smultaneously tres to maxmze ts proft. In ths paper, we consder a frm sellng a nonpershable product over a fnte-perod plannng horzon n a make-to-stock settng that allows backlogs. In each perod, the frm sets ts prce and nventory level n antcpaton of prce-senstve and uncertan demand. If the frm had complete nformaton about the underlyng demand dstrbuton, ths problem has been studed by, e.g., Federgruen and Hechng 1999), among others. The pont of departure ths paper takes s that the frm possesses lmted or even no pror knowledge about customer demand such as ts dependency on sellng prce or the dstrbuton of uncertanty n demand fluctuaton. We develop a nonparametrc data-drven algorthm that learns the demand-prce relatonshp and the random error dstrbuton on the fly. We also establsh the convergence rate of the regret, defned as the average proft loss per perod of tme compared wth that of the optmal soluton had the frm known the random demand nformaton, and that s fastest possble for any learnng algorthm. Ths work s the frst to present a nonparametrc data-drven algorthm for the classc jont prcng and nventory control problem that not only shows the convergence of the proposed polces but also the convergence rate for regret. Electronc copy avalable at: http://ssrn.com/abstract=694633

1.1 Related lterature Almost all early papers n jont prcng and nventory control, e.g., Whtn 1955), Federgruen and Hechng 1999), and Chen and Smch-Lev 004), among others, assume that a frm has complete knowledge about the dstrbuton of underlyng stochastc demand for any gven sellng prce. The complete nformaton assumpton provdes analytc tractablty necessary for characterzng the optmal polcy. The extenson to the parametrc case the frm knows the class of dstrbuton but not the parameters) has been studed by, for example, Subrahmanyan and Shoemaker 1996), Petruzz and Dada 00), and Zhang and Chen 006). Chung et al. 011) also consder the problem of dynamc prcng and nventory plannng wth demand learnng, and they develop learnng algorthms usng Bayesan method and Markov chan Monte Carlo MCMC) algorthms, and numercally evaluate the mportance of dynamc prcng. An alternatve to the parametrc approach s to model the frm s problem n a nonparametrc settng. Under ths framework, the frm does not make specfc assumptons about underlyng demand. Instead, the frm makes decsons solely based on the collected demand data, see Burnetas and Smth 000). Our work falls nto ths category. To our best knowledge, Burnetas and Smth 000) s the only paper that consders the jont prcng and nventory control problem n a nonparametrc settng. The authors consder a maketo-stock system for a pershable product wth lost sales and lnear costs, and propose an adaptve polcy to maxmze average proft. They assume that the prce s chosen from a fnte set and formulate the prcng problem as a mult-armed bandt problem, and show that the average proft under ther approxmaton polcy converges n probablty. No convergence rate or performance bound s obtaned for ther algorthm. Other approaches n the lterature on developng nonparametrc data-drven algorthms nclude onlne convex optmzaton Agarwal et al. 011, Znkevch 003, Hazan et al. 006), contnuumarmed bandt problems Auer et al. 007, Klenberg 005, Cope 009), and stochastc approxmaton Kefer and Wolfowtz 195, La and Robbns 1981, and Robbns and Monro 1951). In fact, Burnetas and Smth 000) s an example of mplementng such algorthms to the jont prcng and nventory control problem. However, these methodologes requre that the proposed soluton be reachable n each and every perod, whch s not the case wth our problem. Ths s because, n a demand learnng algorthm of jont prcng/nventory control problem, n each perod the algorthm utlzes the past demand data to prescrbe a prcng decson and an order up-to level. However, f the startng nventory level of the perod s already hgher than the prescrbed order up-to level, then the prescrbed nventory level for the perod cannot be reached. Actually, that s precsely the reason that Burnetas and Smth 000) focused on the case of pershable product hence the 3

frm has no carry-over nventory and the nventory decson obtaned by Burnetas and Smth 000) based on mult-armed bandt process can be mplemented n each perod). Agarwal et al. 011), Auer et al. 007), and Klenberg 005) propose learnng algorthms and obtan regrets that are not as good as ours n ths paper. Znkevch 003) and Hazan et al. 007) present machne learnng algorthms n whch the the exact gradent of the unknown objectve functon at the current decson can be computed, and ther results have been appled to dynamc nventory control n Huh and Rusmevchentong 009). However, n the jont prcng and nventory control problem wth unknown demand response, the gradent of the unknown objectve functon cannot be obtaned thus the method cannot be appled. 1. Postonng of ths paper The closest related research works to ours are Besbes and Zeev 015), Lev et al. 007) and Lev et al. 010), offerng nonparametrc approaches to pure prcng problem wth no nventory) and pure nventory control problem wth no prcng), respectvely. Besbes and Zeev 015) consder a dynamc prcng problem n whch a frm chooses ts sellng prce to maxmze expected revenue. The frm does not know the determnstc demand curve.e., how the average demand changes n prce) and learns t through nosy demand realzatons, and the authors establsh the suffcency of lnear approxmatons n maxmzng revenue. They assume that the frm has nfnte supply of nventory, or, alternatvely, the seller has no nventory constrant. In ths case, snce the expected revenue n each perod depends only on ts mean demand, the dstrbuton of random error s mmateral n ther learnng algorthm and analyss. On the other hand, n the dynamc newsvendor problem consdered n Lev et al. 007, 010), the essence for effectve nventory management s to strke a balance between overage cost and underage cost, for whch the dstrbuton of uncertan demand plays a key role. Lev et al. 007) and Lev et al. 010) apply Sample Average Approxmaton SAA) to estmate the demand dstrbuton and average cost functon, and they analyze the relatonshp between sample szes and accuracy of estmatons and nventory decsons. Our problem has both dynamc prcng and nventory control, and the frm knows nether the relatonshp between demand and sellng prce nor the dstrbuton of demand uncertanty. In Besbes and Zeev 015), the authors only need to estmate the average demand curve n order to maxmze revenue, and demand dstrbuton nformaton s rrelevant. In a remark, Besbes and Zeev 015) state that ther method of learnng the demand curve can be appled to maxmzng more general forms of objectve functons beyond the expected revenue whch, however, does not apply to our settng. Ths s because, n the general form presented n Besbes and Zeev 015), 4

the objectve functon stll has to be a known functon n terms of prce and the demand curve for a gven prce and a gven demand curve. Thus the frm must know the exact expresson of the objectve functon when the estmate of a demand curve s gven. In our problem, even wth a gven prce and nventory level and a gven demand curve, the objectve functon cannot be wrtten as a known determnstc functon. Indeed, ths functon contans the expected nventory holdng and backorder costs that depend on the dstrbuton of demand fluctuaton, whch s also unknown to the frm. In fact, the latter s a major techncal challenge encountered n ths paper because, as we wll explan below, the estmaton of the demand uncertanty, therefore also of the expected holdng/shortage cost, cannot be decoupled wth the estmaton of the average demand curve, whch s gathered through prce expermentaton. Standard SAA method s mplemented to the newsvendor problem by Lev et al. 007) and Lev et al. 010) whch, however, cannot be appled to our settng for determnng nventory decsons. In Lev et al. 007) and Lev et al. 010), dynamc nventory control s studed n whch prcng s not a decson and t s assumed mplctly) to be gven. The only nformaton the frm s uncertan about s the dstrbuton of random fluctuaton. Therefore, the frm can observe true realzatons of demand fluctuaton whch are used to buld an emprcal dstrbuton. In our model, however, the frm knows nether how average demand responds to the sellng prce demand curve) nor the dstrbuton of fluctuatng demand, but both of them affect demand realzatons. For any estmaton of average demand curve, the error of ths estmate wll affect the estmaton of dstrbuton of random demand fluctuaton. Hence, through the realzaton of random demand we are unable to obtan a true realzaton of random demand error wthout knowng the exact average demand functon. As a result, the standard SAA analyss s not applcable n our settng because unbased samples of the random error cannot be obtaned. Because the frm does not know the exact demand curve a pror, ts estmate of error dstrbuton usng demand data s nevtably based, and as a result, the data-drven optmzaton problem constructed to compute the prcng and orderng strateges s also based. Because of ths bas, t s no longer true that the soluton of the data-drven problem usng SAA must converge to the true optmal soluton. Fortunately, we are able to show that as the learnng algorthm proceeds, the bases wll be gradually dmnshng and that allows us to prove that our learnng algorthm stll converges to the true optmal soluton. Ths s done by establshng several mportant propertes of the newsvendor problem that bound the errors of based samples. One man contrbuton of ths paper s to explctly prove that the soluton obtaned from a based data-drven optmzaton problem stll converges to the true optmal soluton. Fnally, we hghlght on the result of the convergence rate of regret. Besbes and Zeev 015) obtan a convergence rate of T 1/ log T ) for ther dynamc prcng problem, where T s the length of 5

the plannng horzon. For the pure dynamc nventory control problem, Huh and Rusmevchentong 009) present a machne learnng algorthm wth convergence rate T 1/. For the jont prcng and nventory problem, we show that the regret of our learnng algorthm converges to zero at rate T 1/, whch s also the theoretcal lower bound. Thus, ths paper strengthens and extends the exstng work by achevng the tghtest convergence rate for the problem wth jont prcng and nventory control. One mportant mplcaton of our fndng s that the lnear demand approxmaton scheme of Besbes and Zeev 015) actually acheves the best possble convergence rate of regret, whch further mproves the result of Besbes and Zeev 015). That s, nothng s lost n the learnng algorthm n approxmatng the demand curve by a lnear model. 1.3 Organzaton The rest of ths paper s organzed as follows. Secton formulates the problem and descrbes the data-drven learnng algorthm for prcng and nventory control decsons. The followng two sectons Sectons 3 and 4) present our major theoretcal results together wth a numercal study, and the man steps of the techncal proofs, respectvely. The paper concludes wth a few remarks n Secton 5. Fnally, the detals of the mathematcal proofs are gven n the Appendx. Formulaton and Learnng Algorthm We consder an nventory system n whch a frm e.g., a retaler) sells a nonpershable product over a plannng horzon of T perods. At the begnnng of each perod t, the frm makes a replenshment decson, denoted by the order-up-to level, y t, and a prcng decson, denoted by p t, where y t Y = [y l, y h ] and p t P = [p l, p h ] for some known lower and upper bounds of nventory level and sellng prce, respectvely. We assume p h > p l snce otherwse, the problem s the pure nventory control problem and learnng algorthms have been developed n Huh and Rusmevchentong 009), Lev et al. 007), and Lev et al. 010). Durng perod t and when the sellng prce s set to p t, a random demand, denoted by D t p t ), s realzed and fulflled from on-hand nventory. Any leftover nventory s carred over to the next perod, and n case the demand exceeds y t, the unsatsfed demand s backlogged. The replenshment leadtme s zero,.e., an order placed at the begnnng of a perod can be used to satsfy demand n the same perod. Let h and b be the unt holdng and backlog costs per perod, and the unt purchasng cost s assumed, wthout loss of generalty, to be zero. The model as descrbed above s the well-known jont nventory and prcng decson problem studed n Federgruen and Hechng 1999), n whch t s assumed that the frm has complete 6

nformaton about the dstrbuton of Dt p t ). In ths paper we consder a settng where the frm does not have pror knowledge about the demand dstrbuton. In general, the demand n perod t s a functon of sellng prce p t n that perod and some random varable ɛ t, and t s stochastcally decreasng n p t. The most popular demand models n the lterature are the addtve demand model D t p t ) = λp t ) + ɛ t and multplcatve demand model D t p t ) = λp t ) ɛ t, where λ ) s a strctly decreasng determnstc functon and ɛ t, t = 1,,..., T, are ndependent and dentcally dstrbuted random varables. In ths paper, we shall study both addtve and the multplcatve demand models. However, the frm knows nether the functon λp t ) nor the dstrbuton functon of random varable ɛ t. The frm has to learn from hstorcal demand data, that are the realzatons of market responses to offered prces, and use that nformaton as a bass for decson makng. multplcatve demand. Suppose ɛ t has fnte support [l, u], wth l 0 for the case of To defne the frm s problem, we let x t denote the nventory level at the begnnng of perod t before replenshment decson. We assume that the system s ntally empty,.e., x 1 = 0. The system dynamcs are x t+1 = y t D t p t ) for all t = 1,..., T. An admssble polcy s represented by a sequence of prces and order-up-to levels, p t, y t ), t 1}, where p t, y t ) depends only on realzed demand and decsons made pror to perod t, and y t x t,.e., p t, y t ) s adapted to the fltraton generated by p s, y s ), D s p s ); s = 1,..., t 1}. The frm s objectve s to fnd an admssble polcy to maxmze ts total proft. If both the functon of λ ) and the dstrbuton of ɛ t are known a pror to the frm complete nformaton scenaro), then the optmzaton problem the frm wshes to solve s max p t, y t) P Y y t x t T t=1 p t E[ D t p t )] he[y t D t p t )] + be[ D t p t ) y t ] +), 1) where E stands for mathematcal expectaton wth respect to random demand D t p t ), and x + = maxx, 0} for any real number x. However, snce n our settng the frm does not know the demand dstrbuton, the frm s unable to evaluate the objectve functon of ths optmzaton problem. We develop a data-drven learnng algorthm to compute the nventory control and prcng polcy. It wll be shown n Secton 3 that the average proft of the algorthm converges to that of the case when complete demand dstrbuton nformaton s known a pror, and that the prcng and nventory control parameters also converge to that of the optmal control polcy for the case wth complete nformaton as the plannng horzon becomes long. To save space we shall only present the algorthm and analytcal results for the multplcatve demand model. The results and analyses for the addtve demand case are analogous, and we only hghlght the man dfferences at the end of ths secton. 7

Remark 1. For ease of exposton, n ths paper we assume the support of uncertanty ɛ t s bounded. Ths can be relaxed, and all the results hold as long as we assume the moment generatng functons of the relevant random varables are fnte n a small neghborhood of 0, or lght taled. Case of complete nformaton about demand. In the case of complete nformaton n whch the frm knows λ ) and the dstrbuton of ɛ t, t follows from 1) that, f p, y ) s the optmal soluton of each ndvdual term max pe[ D t p)] he[y D t p)] + be[ D t p) y] +}. ) p P,y Y and that ths soluton s reachable n every perod,.e., x t y for all t, then p, y ) s the optmal polcy for each perod. We refer to p and y as the optmal prce and optmal order up-to level or optmal base-stock level), respectvely. It s clear that the reachablty condton s satsfed f the system s ntally empty, whch we assume. We fnd t convenent to analyze ) usng a slghtly dfferent but equvalent form. logarthm on both sdes of D t p t ) = λp t ) ɛ t, we obtan Takng log D t p t ) = log λp t ) + log ɛ t, t = 1,..., T. Denote D t p t ) = log D t p t ), λp t ) = log λp t ) and ɛ t = log ɛ t. Then, the logarthm of demand can be wrtten as D t p t ) = λp t ) + ɛ t, t = 1,..., T. 3) We shall refer to λ ) as the demand-prce functon or demand-prce curve) and ɛ t as random error or random shock). Clearly, λ ) s also strctly decreasng n p P. Hence, n the case of complete nformaton, the frm knows the functon λ ) and the dstrbuton of ɛ t, and when the frm does not know functon λ ) and the dstrbuton of ɛ t, whch s our case, the frm wll need to learn about them. Wthout loss of generalty, we assume E[ɛ t ] = E[log ɛ t ] = 0. If ths s not the case,.e., E[log ɛ t ] = a 0, then E[loge α ɛ t )] = 0, thus f we let ˆλ ) = e a λ ) and ˆɛt = e a ɛ t, then D t p t ) = ˆλp t )ˆɛ t, and ˆλ ) and ˆɛ t satsfy the desred propertes. For convenence, let ɛ be a random varable dstrbuted as ɛ 1. In terms of λ ) and ɛ, we defne Gp, y) = pe λp) E [ e ɛ] he [ y e λp) e ɛ] + [ + be e λp) e ɛ y ] } +. Then problem ) can be re-wrtten as Problem CI: max p P,y Y = max p P Gp, y) 4) he [ y e λp) e ɛ] + [ + be e λp) e ɛ y ] } } +. pe λp) E [ e ɛ] mn y Y 8

The nner optmzaton problem mnmzaton) determnes the optmal order-up-to level that mnmzes the expected nventory and backlog cost for gven prce p, and we denote t by y e λp)). The outer optmzaton solves for the optmal prce p. Let the optmal soluton for 4) be denoted by p and y, then they satsfy y = ye λp ) ). The analyss above stpulates that the frm knows the demand-prce curve λp) and the dstrbuton of ɛ, thus we refer to t as problem CI complete nformaton). Learnng algorthm. In the absence of the pror knowledge about the demand process, the frm needs to collect the demand nformaton necessary to estmate λp) and the emprcal dstrbuton of random error ɛ, thus prce and nventory decsons not only affect the proft but also the demand nformaton realzed. The major dffculty les n that, the estmatons of demand-prce curve λp) and the dstrbuton of random error cannot be decoupled. Ths s because, the frm only observes realzed demands, hence wth any estmaton of demand-prce curve, the estmaton error transfers to the estmaton of the random error dstrbuton. Indeed, we are not even able to obtan unbased samples of the random error ɛ t. In our algorthm below we approxmate λp) by an affne functon, and construct an emprcal but based) error dstrbuton usng the collected data. We dvde the plannng horzon nto stages whose lengths are exponentally ncreasng n the stage ndex). At the start of each stage, the frm sets two pars of prces and order-up-to levels based on ts current lnear estmaton of demand-prce curve and based) emprcal dstrbuton of random error, and the collected demand data from ths stage are used to update the lnear estmaton of demand-prce curve and the based emprcal dstrbuton of random error. These are then utlzed to fnd the prcng and nventory decson for the next stage. The algorthm requres some nput parameters v, ρ and I 0, wth v > 1, I 0 > 0, and 0 < ρ 3/4 p h p l )I 1/4 0. To ntate the algorthm, t sets ˆp 1, ŷ 11, ŷ 1 }, where ˆp 1 P, ŷ 11 Y, ŷ 1 Y are the startng prcng and order-up-to levels. For 1, let 1 I = I 0 v, δ = ρi 1 ) 1 4, and t = I k wth t 1 = 0, 5) where I 0 v s the largest nteger less than or equal to I 0 v. The followng s the detaled procedure of the algorthm. Recall that x t s the startng nventory level at the begnnng of perod t, p t s the sellng prce set for perod t, and y t x t ) s the order-upto nventory level for perod t, t = 1,..., T. The number of learnng stages s n = log v 1 v I 0 v ), T +1 where x denotes the smallest nteger greater than or equal to x. k=1 Data-Drven Algorthm DDA) 9

Step 0. Intalzaton. Choose v > 1, ρ > 0 and I 0 > 0, and ˆp 1, ŷ 11, ŷ 1. Compute I 1 = I 0 v, δ 1 = ρi 0 ) 1 4, and ˆp 1 + δ 1. Step 1. Settng prces and order-up-to levels for stage. For = 1,..., n, set prces p t, t = t + 1,..., t + I, to p t = ˆp, t = t + 1,..., t + I, p t = ˆp + δ, t = t + I + 1,..., t + I ; and for t = t + 1,..., t + I, rase the nventory levels to y t = max ŷ 1, x t }, t = t + 1,..., t + I, y t = max ŷ, x t }, t = t + I + 1,..., t + I. Step. Estmatng the demand-prce functon and random errors usng data from stage. Let D t = log D t p t ) be the logarthm of demand realzatons for t = t + 1,..., t + I, and compute t ˆα +1, ˆβ +I ) } +1 ) = argmn D t α βp t ), 6) α,β η t = D t ˆα +1 ˆβ +1 p t ), for t = t + 1,..., t + I. 7) Step 3. Defnng and maxmzng the proxy proft functon, denoted by G DD +1 p, y). Defne G DD +1p, y) = peˆα +1 ˆβ +1 p 1 I t +I 1 e ηt I Then the data-drven optmzaton s defned by t +I ) h y eˆα +1 ˆβ + +1 p e ηt ) )} +b eˆα +1 ˆβ + +1 p e ηt y. Problem DD: max p,y) P Y GDD +1p, y) = max peˆα +1 ˆβ +1 p 1 t +I e ηt 8) p P I t 1 +I ) mn h y eˆα +1 ˆβ + ) )}} +1 p e ηt + b eˆα +1 ˆβ + +1 p e ηt y. y Y I Solve problem DD and set the frst par of prce and nventory level to ˆp +1, ŷ +1,1 ) = arg 10 max p,y) P Y GDD +1p, y),

and set the second prce to ˆp +1 + δ +1 and the second order-up-to level to ŷ +1, = arg max y Y GDD +1ˆp +1 + δ +1, y). In case ˆp +1 + δ +1 P, set the second prce to ˆp +1 δ +1. Remark. When ˆβ +1 > 0, the objectve functon n 8) after mnmzng over y Y s unmodal n p. To see why ths s true, let d = eˆα +1 ˆβ +1 p and thus p = ˆα +1 log d ˆβ +1 wth d D = [d l, d h ], where d l = eˆα +1 ˆβ +1 p h max d D d ˆα +1 log d ˆβ +1 and d h = eˆα +1 ˆβ +1 p l. Then the optmzaton problem 8) s equvalent to 1 I t +I t=t 1 +1 e ηt ) mn y Y 1 I t +I hy de η t ) + + bde ηt y) +)}}. The objectve functon of ths optmzaton problem s jontly concave n y, d) hence t s concave n d after mnmzng over y Y. Thus, t follows from p = ˆα +1 log d s strctly decreasng n d ˆβ +1 that the objectve functon n 8) after mnmzaton over y) s unmodal n p P. Remark 3. In Step 3 of DDA, the second prce s set to ˆp +1 δ +1 when ˆp +1 + δ +1 > p h. In ths case our condton ρ 3/4 p h p l )I 1/4 0 ensures that ˆp +1 δ +1 p l, thus ˆp +1 δ +1 P. Ths s because, when ˆp +1 > p h δ +1, we have ˆp +1 δ +1 > p h δ +1 p h δ 1 = p h ρi 0 ) 1/4 p l, where the last nequalty follows from the condton on ρ. Dscusson of algorthm and ts connectons wth the lterature. above, teraton focuses on stage that conssts of I perods. In our algorthm In Step 1, the algorthm sets the orderng quantty and sellng prce for each perod n stage, and they are derved from the prevous teraton. In Step, the algorthm uses the realzed demand data and least-squares method to update the lnear approxmaton, ˆα +1 ˆβ +1 p, of λp) and computes a based sample η t of random error ɛ t, for t = t + 1,..., t + I. Note that η t s not a sample of the random error ɛ t. Ths s because ɛ t = D t p t ) λp t ) and the logarthm of) observed demand s D t p t ). However as we do not know λp), t s approxmated by ˆα +1 ˆβ +1 p, therefore η t = D t p t ) ˆα +1 ˆβ +1 p t ) D t p t ) λp t ) = ɛ t. For the same reason, the constructed objectve functon for holdng and shortage costs s not a sample average of the newsvendor problem. In the tradtonal SAA, mathematcal expectatons are replaced by sample means, see e.g., Kleywegt et al. 001). Lev et al. 007) and Lev et al. 010)) apply SAA method n dynamc 11

newsvendor problems. The argument above shows that the tradtonal analyses that show SAA leads to the optmal soluton s not applcable to our settng. Indeed, n our nner layer optmzaton, we face a newsvendor problem for whch the frm needs to balance holdng and shortage cost, and the knowledge about demand dstrbuton s crtcal. However, the lack of samples of random error ɛ t makes the nner loop optmzaton problem sgnfcantly dfferent from Lev et al. 007) and Lev et al. 010)), whch consder pure nventory control problems and samples of random errors are avalable for applcatons of SAA result and analyss. Because of ths, t s not guaranteed that the SAA method wll lead to a true optmal soluton. The DDA algorthm ntegrates a process of earnng explotaton) and learnng exploraton) n each stage. The earnng phase conssts of the frst I perods startng at t + 1, durng whch the algorthm mplements the optmal strategy for the proxy problem G DD p, y). In the next I perods of learnng phase that starts from t + I + 1, the algorthm uses a dfferent prce ˆp + δ and ts correspondng order-up-to level. The purpose of ths phase s to allow the frm to obtan demand data to estmate the rate of change of the demand wth respect to the sellng prce. Note that, even though the frm devates from the optmal strategy of the proxy problem n the second phase, the polces, ˆp + δ, ŷ, ) and ˆp, ŷ,1 ), wll be very close to each other as δ dmnshes to zero. We wll show that they both converge to the true optmal soluton and the loss of proft from ths devaton converges to zero. The prcng part of our algorthm s smlar to the pure prcng problem consdered by Besbes and Zeev 015) as we also use lnear approxmaton to estmate the demand-prce functon then maxmze the resultng proxy proft functon. Although our algorthm s heavly nfluenced by ther work, there s a key dfference. Besbes and Zeev 015) consder a revenue management problem and they only need to estmate the determnstc demand-prce functon, and the dstrbuton of random errors s mmateral n ther analyss. In our model, however, due to the holdng and backloggng costs, the dstrbuton of the random error s crtcal and that has to be learned durng the decson process, but t cannot be separated from the estmaton of demand-prce curve, as dscussed above. Therefore, due to the lack of unbased samples of random error and that the learnng of demandprce curve and the random error dstrbuton cannot be decoupld, we are not able to prove that the DDA algorthm converges to the true optmal soluton by usng the approaches developed n Besbes and Zeev 015) for the prcng problem and n Lev et al. 007) for the newsvendor problem. To overcome ths dffculty, we construct several ntermedate brdgng problems between the data-drven problem and the complete nformaton problem, and perform a seres of convergence analyses to establsh the man results. 1

Performance Metrcs. To measure the performance of a polcy, we use two metrcs proposed n Besbes and Zeev 015): consstency and regret. An admssble polcy π = p t, y t ), t 1) s sad to be consstent f p t, y t ) p, y ) n probablty as t. The average per-perod) regret of a polcy π, denoted by Rπ, T ), s defned as the average proft loss per perod, gven by [ Rπ, T ) = Gp, y ) 1 T ] T E Gp t, y t ). 9) Obvously, the faster the regret converges to 0 as T, the better the polcy. In the next secton, we wll show that the DDA polcy s consstent, and we wll also characterze the rate at whch the regret converges to zero. t=1 3 Man Results In ths secton, we analyze the performance of the DDA polcy proposed n the prevous secton. We wll show that under a farly general assumpton on the underlyng demand process, whch covers a number of well-known demand models ncludng logt and exponental demand functons, the regret of DDA polcy converges to 0 at rate OT 1/ ). We also present a numercal study to llustrate ts effectveness. Recall that the demand n perod t s D t p t ) = λp t ) ɛ t. As λp) s strctly decreasng, t has strctly decreasng nverse functon. Let λ 1 d) be the nverse functon of λp), whch s defned on d [d l, d h ] = [ λp h ), λp l ) ]. We make the followng assumpton. Assumpton 1. The functon λp) satsfes the followng condtons: ) The revenue functon d λ 1 d) s concave n d [ d l, d h]. ) 0 < λ p) λp) λ p)) < for p [ p l, p h]. The frst condton s a standard assumpton n the lterature on jont optmzaton of prcng and nventory control see e.g., Federgruen and Hechng 1999, and Chen and Smch-Lev 004), and t guarantees that the objectve functon n problem CI after mnmzng over y s unmodal n p. The second assumpton mposes some shape restrcton on the underlyng demand functon, and smlar assumpton has been made n Besbes and Zeev 015). Techncally, ths condton assures that the prces converge to a fxed pont through a contracton mappng. Some examples that satsfy both condtons of Assumpton 1 are gven below. Example 1. The followng functons satsfy Assumpton 1. 13

) Exponental models: λp) = e k mp, m > 0. ) Logt models: λp) = a ek mp 1+e k mp for a > 0, m > 0, and k mp < 0 for p P. ) Iso-elastc constant elastcty) models: λp) = kp m for k > 0 and m > 1. We now present the man results of ths paper. Recall that p and y are the optmal prcng and nventory decsons for the case wth complete nformaton. Theorem 1 Polcy Convergence) Under Assumpton 1, the DDA polcy s consstent,.e., p t, y t ) p, y ) n probablty as t. Theorem 1 states that both prcng and orderng decsons from the DDA algorthm converge to the true optmal soluton p, y ) n probablty. Note that the convergence of nventory decson y t y s stronger than the convergence of order up-to levels ŷ,1 y and ŷ, y. Ths s because, the order up-to levels may or may not be achevable for each perod, thus the resultng nventory levels may overshoot the targetng order up-to levels. Theorem 1 shows that, despte these overshoots, the realzed nventory levels converge to the true optmal soluton n probablty. Convergence of nventory and prcng decsons alone does not guarantee the performance of DDA polcy s close to optmal. Our next result shows that DDA s asymptotcally optmal n terms of maxmzng the expected proft. Theorem Regret Convergence Rate) Under Assumpton 1, the DDA polcy s asymptotcally optmal. More specfcally, there exsts some constant K > 0 such that [ RDDA, T ) = Gp, y ) 1 T ] T E Gp t, y t ) KT 1. 10) t=1 Theorem shows that as the length of plannng horzon, T, grows, the regret of DDA polcy vanshes at the rate of O T 1/), hence DDA polcy s asymptotcally optmal as T goes to nfnty. Thus, even though the frm does not have pror knowledge about the demand process, the performance of the data-drven algorthm approaches the theoretcal maxmum as the plannng horzon becomes long. In Keskn and Zeev 014), the authors consder a parametrc data-drven prcng problem wth no nventory decson) where the demand error term s addtve and the average demand functon s lnear, and they prove that no learnng algorthm can acheve a convergence rate better than OT 1/ ). Our problem nvolves both prcng and nventory decsons, and the frm does not have pror knowledge about the parametrc form of the underlyng demand-prce functon 14

or the dstrbuton of the random error, and our algorthm acheves O T 1/), whch s the theoretcal lower bound. One nterestng mplcaton of ths fndng s that, lnear model n demand learnng acheves the best regret rate one can hope for, thus our result offers further evdence for the suffcency of Besbes and Zeev s lnear model. A numercal Study. We perform a numercal study on the performance of the DDA algorthm, and present our numercal results on the regret. We consder two demand curve envronments for λp): 1) exponental e k mp : k [k, k], m [m, m], where [k, k] = [0.1, 1.7], [m, m] = [0.3, ], ) logt e k mp 1+e k mp : k [k, k], m [m, m], where [k, k] = [ 0.3, 1], [m, m] = [,.5]. And we consder fve error dstrbutons for ɛ t : ) truncated normal on [0.5, 1.5] wth mean 1 and varance 0.1, ) truncated normal on [0.5, 1.5] wth mean 1 and varance 0.5, ) truncated normal on [0.5, 1.5] wth mean 1 and varance 0.35, v) truncated normal on [0.5, 1.5] wth mean 1 and varance 0.5, v) unform on [0.5, 1.5]. Here truncated normal on [a, b] wth mean µ and varance σ s defned as random varable X condtonng on X [a, b], where X s normally dstrbuted wth mean µ and varance σ. Followng Besbes and Zeev 015), for each combnaton of the above demand curve-error dstrbuton specfcatons, we randomly draw 500 nstances from the parameters k and m accordng to a unform dstrbuton on [k, k] and [m, m]. For each draw, we compute the percentage of proft loss per perod defned by Rπ, T ) Gp, y ) 100%. Then we compute the average proft loss per perod over the 500 draws and report them n Table 1. In all the experments, we set p l = 0.51, p h = 4, y l = 0, y h = 3, b = 1, h = 0.1, I 0 = 1, and ntal prce ˆp 1 = 1, ntal nventory order up-to level ŷ 11 = 1, ŷ 1 = 0.3. We test two values of ρ, ρ = 0.5 and ρ = 0.75, and two values of v, namely, v = 1.3 and v =. Table 1 summarzes the results when the underlyng demand curve s exponental, and Table dsplays the results when the underlyng demand curve s logt. Combnng both tables, one sees that when T = 100 perods, on average the proft loss from the DDA algorthm falls between 11% 15

Table 1: Exponental Demand Tme Perods T = 100 T = 500 T = 1000 T = 5000 T = 10000 ρ v = 1.3 v = v = 1.3 v = v = 1.3 v = v = 1.3 v = v = 1.3 v = Normal 0.5 6.83 6.1 3.39.46.54 1.71 1.5 0.86 0.87 0.6 σ = 0.1 0.75 6.84 6.31 3.65.59.89 1.84 1.39 1.06 0.95 0.76 Normal 0.5 15.36 1.75 8.73 6.55 6.74 4.76 3.48.31.67 1.69 σ = 0.5 0.75 11.70 9.74 6.48 4.58 5.1 3.39.60 1.78 1.8 1.7 Normal 0.5 18.0 15.1 11.04 8.09 8.65 5.77 4.55 3.03 3.39.4 σ = 0.35 0.75 13.6 10.83 7.64 5.18 5.91 3.76 3.08.03.6 1.51 Normal 0.5 0.03 16.55 1.07 9.47 9.40 6.87 5.11 3.54 3.88.64 σ = 0.5 0.75 14.84 1.15 8.41 6.1 6.59 4.44 3.51.41.54 1.76 Unform 0.5 18.53 15.0 9.98 7.18 7.59 5.39 3.69.6.58 1.86 0.75 14.08 11.11 8.1 5.57 6.49 4. 3.41.54.40 1.85 Maxmum 0.03 16.55 1.07 9.47 9.40 6.87 5.11 3.54 3.88.64 Average 14.00 11.58 7.95 5.78 6.19 4. 3.1..34 1.6 and 14% compared to the optmal proft under complete nformaton, n whch DDA starts wth no pror knowledge about the underlyng demand. When T = 500, the proft loss s further reduced to between 5% and 8%. The performance gets better and better when T becomes larger. Also, t s seen from the table that the overall performance of algorthm s better when the varance of the demand s smaller, whch s ntutve. As mentoned earler, Theorems 1 and contnue to hold for the addtve demand model D t p t ) = λp t ) + ɛ t wth mnor modfcatons. Specfcally, we need to modfy Assumpton 1 to Assumpton 1A below. Assumpton 1A. The demand-prce functon λp) satsfy the followng condtons: ) p λp) s unmodal n p on p P. ) 1 < λ p) λp) < 1, for all p P. λ p)) Note that these are exactly the same assumptons made n Besbes and Zeev 015) for the revenue management problem, and examples that satsfy Assumpton 1A nclude a) lnear wth λp) = k mp, m > 0, b) exponental wth λp) = e k mp, m > 0, and c) logt wth λp) = e k mp 1+e k mp, m > 0, e k mp < 3 for all p P. The learnng algorthm for the addtve demand model s smlar to that of the multplcatve 16

Table : Logt Demand Tme Perods T = 100 T = 500 T = 1000 T = 5000 T = 10000 ρ v = 1.3 v = v = 1.3 v = v = 1.3 v = v = 1.3 v = v = 1.3 v = Normal 0.5 6.80 5.6 4.35.30.63 1.63 1.6 0.89 0.85 0.63 σ = 0.1 0.75 10.09 8.34 3.4 3.67 4.4.67.15 1.60 1.45 1.15 Normal 0.5 13.7 9.57 6.83 4.44 4.98 3.17.34 1.56 1.66 1.10 σ = 0.5 0.75 1.58 9.86 6.89 4.51 5.4 3.30.67 1.87 1.81 1.35 Normal 0.5 17.13 1.5 8.65 6.01 6.5 4.10 3.04 1.98.1 1.41 σ = 0.35 0.75 13.84 10.49 7.49 4.85 5.8 3.55.85.00 1.96 1.43 Normal 0.5 19.38 13.75 9.99 6.5 7.31 4.57 3.35.18.34 1.57 σ = 0.5 0.75 14.49 11.30 7.84 5.4 6.07 3.79 3.00.11.05 1.51 Unform 0.5 1.0 15.9 9.51 6.0 7.16 4.46 3.36.39.9 1.7 0.75 17.46 14.63 10.44 6.97 8.74 5.35 4.81 3.63 3.38.73 Maxmum 1.0 15.9 10.44 6.97 8.74 5.35 4.81 3.63 3.38.73 Average 14.67 11.14 7.54 5.07 5.91 3.66.88.0 1.99 1.46 demand case, except that there s no need to transform t usng the logarthm of the determnstc porton of demand and the logarthm of random demand error. Instead, the algorthm drectly estmates λp) usng affne functon and computes the based samples of the random demand error n each teraton. 4 Sketches of the Proof In ths secton, we present the man deas and steps n provng the man results of ths paper. In the frst subsecton, we elaborate on the techncal ssues encountered n the proofs. The key deas of the proofs are dscussed n Subsecton 4., and the major steps for the proofs of Theorems 1 and are gven n Subsectons 4.3 and 4.4, respectvely. 4.1 Techncal ssues encountered To prove Theorem 1, we wll need to show E [ ˆp +1 p ) ] 0, E [ ˆp +1 + δ +1 p ) ] 0, as ; 11) E[y ŷ +1,1 ) ] 0, E[y ŷ +1, ) ] 0, as, 1) 17

where p s the optmal soluton of max p P where Jλp)) s defned as Qp, λp)) = max pe λp) E [ e ɛ] } Jλp)), p P Jλp)) = mn he [ y e λp) e ɛ] + [ + be e λp) e ɛ y ] } +. y Y However, both Q, ) and J ) are unknown to the frm because all the expectatons cannot be computed. To estmate J ), n 8) of the learnng algorthm we use the data-drven based estmaton of J+1 DD ˆα +1 ˆβ 1 +1 p) = mn y Y I and ˆp +1 s the optmal soluton of t +I max p P QDD +1p, ˆα +1 ˆβ +1 p) = max peˆα +1 ˆβ +1 p 1 p P I ) h y eˆα +1 ˆβ + ) )} +1 p e ηt + b eˆα +1 ˆβ + +1 p e ηt y, t +I } e ηt J+1 DD ˆα +1 ˆβ +1 p), n whch Q DD +1, ) s random and s constructed based on based random samples η t. To prove the convergence of the data-drven solutons to the true optmal soluton, we face two major challenges. The frst one s the comparson between J+1 DDˆα +1 ˆβ +1 p) and Jλp)) as functons of p. In J+1 DD, the true demand-prce functon s replaced by a lnear estmaton and, due to lack of knowledge about dstrbuton of random error, the expectaton s replaced by an arthmetc average from based samples η t not true samples of random error ɛ t. To put t dfferently, the objectve functon for J+1 DD s not a sample average approxmaton, but a based-sample average approxmaton. The second challenge les n the comparson of Q DD +1 p, ˆα +1 ˆβ +1 p) and Qp, λp)). Snce Q DD +1 s a functon of J DD +1 that s mnmum of a based-sample average approxmaton, the errors n replacng ɛ t by η t carry over to Q DD +1, makng t dffcult to compare ˆp +1, ŷ +1,1 ) and ˆp +1 + δ +1, ŷ +1, ) wth p, y ). To overcome the frst dffculty, we establsh several mportant propertes of the newsvendor problem and bound the errors of based samples Lemmas A, A3, A4, A8 n the Appendx). For the second, we dentfy hgh probablty events n whch unform convergence of the data-drven objectve functons can be obtaned Lemmas A1, A5, A6, and A7 n the Appendx). We note that n the revenue management problem settng, Besbes and Zeev 015) also prove the convergence result 11). In Besbes and Zeev 015), p s the optmal soluton of max p P Qp, λp)), and ˆp +1 s the optmal soluton of max p P Qp, ˆα +1 ˆβ +1 p), where Q, ) s a known and determnstc functon Qp, λp)) = pλp). As Besbes and Zeev 015) pont out, 18

ther analyss extends to more general functon Qp, λp)) n whch Q, ) s a known determnstc functon. Ths, however, s not true n our settng as Q, ) s not known, and as a matter of fact, one cannot even fnd an unbased sample average to estmate Q, ). Therefore, the challenges dscussed above were not present n Besbes and Zeev 015). 4. Man deas of the proof To compare the polcy and the resultant proft of DDA algorthm wth that of the optmal soluton, we frst note that these two problems dffer along several dmensons. For example, n DDA we approxmate λp) by an affne functon and estmate the parameters of the affne functon n each teraton, and we approxmate the expected revenue and the expected holdng and shortage costs usng based sample averages. These dfferences make the drect comparson of the two problems dffcult. Therefore, we ntroduce several ntermedate brdgng problems, and n each step we compare two adjacent problems that dffer just n one dmenson. For convenence, we follow Besbes and Zeev 015) to ntroduce notaton We proceed to prove 11) as follows: ᾰz) = λz) λ z)z, βz) = λ z), z P. 13) E [ p ˆp +1 ) ] E[ p p ᾰˆp ), βˆp ) ) } } Comparson of problems CI and B1 Lemma A1 + p ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp )) + } } Comparson of problems B1 and B Lemma A5 p +1 ᾰˆp ), βˆp ) ) ˆp +1 } } Comparson of problems B and DD Lemma A6 and Lemma A7 where the two new prces p, ) and p +1, ) are the optmal solutons of two brdgng problems. Specfcally, we let p α, β ) denote the optmal soluton for the frst brdgng problem B1 defned by Brdgng Problem B1: max p P pe α βp E [ e ɛ] mn y Y 14) ) ], he [y e α βp e ɛ] + [ } + + be e α βp e ɛ y] }, 15) whle p +1 α, β) denotes the optmal soluton for the second brdgng problem B defned by Brdgng Problem B: max pe α βp 1 p P I t +I e ɛt ) mn y Y 1 I t +I 19 h y e α βp e ɛt) + + b e α βp e ɛt y ) + )}}. 16)

Moreover, for gven p P, we let ye α βp ) denote the optmal order-up-to level for problem B1, and ỹ +1 e α βp ) denote the optmal order-up-to level for problem B. By Lemma A n the Appendx, the objectve functons for problems B1 and B are unmodal n p after mnmzng over y Y when β > 0. Comparng 15) wth 4), t s seen that problem B1 smplfes problem CI by replacng the demand-prce functon λp) by a lnear functon α βp, whle problem B s obtaned from problem B1 after replacng the mathematcal expectatons n problem B1 by ther sample averages,.e., problem B s the SAA of problem B1. Comparng 16) wth 8), t s noted that problems B and DD dffer n the coeffcents of the lnear functon as well as the arthmetc averages. More specfcally, n B the real random error samples ɛ t, t = t + 1,..., t + I, are used, whle n problem DD, based error samples η t are used n place of ɛ t, t = t + 1,..., t + I. Furthermore, note that the optmal prces for problems CI and B1, p and p ᾰˆp ), βˆp ) ), are determnstc, but the optmal solutons of problems B and DD, p +1 ᾰˆp ), βˆp ) ) and ˆp +1, are random. Specfcally, p +1 ᾰˆp ), βˆp ) ) s random because ɛ t s random, whle ˆp +1 s random due to demand uncertanty from perods 1 to t +1. Hence, to show the rght hand sde of 14) converges to 0, we wll frst develop an upper bound for p p ᾰˆp ), βˆp ) ) by comparng problems CI and B1, and the result s presented n Lemma A1. Snce p +1 ᾰˆp ), βˆp ) s random, we compare the two problems B1 and B and show the probablty that p ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) exceeds some small number dmnshes to 0 n Lemma A5. Smlarly, n Lemma A6 and Lemma A7 we compare problems B and DD and show the probablty that p +1 ᾰˆp ), βˆp ) ) ˆp +1 exceeds some small number also dmnshes to 0. Fnally, we combne these several results to complete the proof of 11). The dea for provng 1) s smlar, and that also reles heavly on the two brdgng problems Lemmas A6, A7, and A8). The detaled proofs for Theorem 1 and Theorem are gven n Subsectons 4.3 and 4.4. In the subsequent analyss, we assume that the space for feasble prce, P, and the space for order-up-to level, Y, are large enough so that the optmal solutons p and optmal ye λp) ) over R + for gven p P for problem CI fall nto P and Y, respectvely; and for gven q P, the optmal solutons p ᾰq), βq) ) and y eᾰq) βq)p ) for gven p P over R + for problem B1 fall nto P and Y, respectvely. Note that both problem CI and problem B1 depend only on prmtve data and do not depend on random samples, hence these are mld assumptons. We remark that our results and analyses contnue to hold even f these assumptons are not satsfed as long as we modfy Assumpton 1) to p ᾰz), βz) ) / z < 1 for z P. Ths condton reduces to Assumpton 1) f the optmal solutons for problem CI and problem B1 satsfy the feasblty condtons descrbed above. We end ths subsecton by lstng some regularty condtons needed to prove the man theoretcal 0

results. Regularty Condtons: ) ye λq) ) and y eᾰq) βq)p ) are Lpschtz contnuous on q for gven p P,.e., there exsts some constant K 1 > 0 such that for any q 1, q P, ye λq1) ) ye λq) ) K 1 q 1 q, 17) y eᾰq 1) βq 1 )p ) y eᾰq ) βq )p ) K1 q 1 q. 18) ) Gp, ȳe λp) )) has bounded second order dervatve wth respect to p P. ) E[D t p)] > 0 for any prce p P. v) λp) s twce dfferentable wth bounded frst and second order dervatves on p P. v) The probablty densty functon f ) of ɛ t satsfes mnfx), x [l, u]} > 0. It can be seen that all the functons n Example 1 satsfy the regularty condtons above wth approprate choces of p l and p h. 4.3 Proof of Theorem 1 The proofs for the convergence results are techncal and rely on several lemmas that are provded n the Appendx. In ths subsecton, we outlne the man steps n establshng the frst man result, Theorem 1. Convergence of prcng decsons. To prove the convergence of prcng decsons, we contnue the development n 14) as follows: E [ p ˆp +1 ) ] [ p E p ᾰˆp ), βˆp ) ) p + ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) + p+1 ᾰˆp ), βˆp ) ) ) ] ˆp +1 [ E γ p ˆp + p ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) + p+1 ᾰˆp ), βˆp ) ) ) ] ˆp +1 ) 1 + γ E [ p ˆp ) ] [ p +K E ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) + p+1 ᾰˆp ), βˆp ) ) ) ] ˆp +1 ) 1 + γ E [ p ˆp ) ] +K 3 E [ p ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) ] + K 3 E [ p+1 ᾰˆp ), βˆp ) ) ˆp +1 ], 19) 1

where the frst nequalty follows from the expanson n 14), the second nequalty follows from Lemma A1, and the thrd nequalty s justfed by γ < 1 n Lemma A1 and some constant K, and the last nequalty holds for some approprately chosen K 3 because of the nequalty a + b) a + b ) for any real numbers a and b. To bound E [ p ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) ] n 19), by Lemma A5 one has, for some constant K 4, E [ p ᾰˆp ), βˆp ) ) p +1 ᾰˆp ), βˆp ) ) ] K 4 + 0 5e 4I ξ dξ = 5π 1 K4. 0) 4I 1 And to bound E [ p+1 ᾰˆp ), βˆp ) ) ˆp +1 ] n 19), by Lemma A6 and Lemma A7, when s large enough greater than or equal to defned n the proof of Lemma A7), for some postve constants K 5, K 6, and K 7 one has [ E p+1 ᾰˆp ), βˆp ) ) ] ˆp +1 [ E ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ ) ] +1 K 5 + 8 I p h p l) E [K 6 ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 )] + 8 I p h p l) K 7 I 1. 1) Substtutng 0) and 1) nto 19), one has E [ p ˆp +1 ) ] ) 1 + γ E [ p ˆp ) ] + K 8 I 1. Lettng 1+γ = θ, we further obtan E [ ˆp +1 p ) ] 1 θ ˆp 1 p ) + K 8 θ j I 1 j K 1 9v 1 ) θ j v 1 ) j. ) We choose v > 1 that satsfes θv 1 < 1, then there exsts a postve constant K 10 such that 1 j=0 θj v 1 ) j K 10, therefore, for some constants K 11 and K 1, j=0 E [ ˆp +1 p ) ] K 11 v 1 ) K 1 I 1. 3) Moreover, we have, for some postve constant K 13, E [ ˆp +1 + δ +1 p ) ] E [ ˆp +1 p ) ] + δ +1 K 13 I 1 0, as. 4) j=0

Ths completes the proof of 11). Because mean-square convergence mples convergence n probablty, ths shows that the prcng decsons from DDA converge to p n probablty. Convergence of nventory decsons. To prove y t converges to y n probablty, we frst prove the convergence of order up-to levels 1). For some constant K 14, we have [ E y ] ŷ +1,1 y E[ e λp ) ) ye λˆp +1) ) + y e λˆp +1) ) y eᾰˆp +1) βˆp +1 )ˆp +1 ) + y ) eᾰˆp +1 βˆp +1 )ˆp +1 ) y eᾰˆp ) βˆp ) )ˆp +1 + y eᾰˆp ) βˆp ) )ˆp +1 ỹ+1 e ᾰˆp ) βˆp ) )ˆp +1 ỹ+1 + e ᾰˆp ) βˆp ) ) )ˆp +1 ] ŷ+1,1 y K 14 E[ e λp ) ) ye λˆp +1) ) + y e λˆp +1) ) y eᾰˆp +1) βˆp +1 )ˆp +1 ) 5) } } } } Dfference between p and ˆp +1 Zero + y eᾰˆp +1) βˆp ) +1 )ˆp +1 y e ᾰˆp ) βˆp ) )ˆp +1 } } Dfference between ˆp +1 and ˆp + y eᾰˆp ) βˆp ) )ˆp +1 ỹ+1 e ᾰˆp ) βˆp ) )ˆp +1 + ỹ +1 e ᾰˆp ) βˆp ) )ˆp +1 )] ŷ+1,1. } } } } Comparson of problems B1 and B Lemma A8 Comparson of problems B and DD Lemma A6 and Lemma A7 We want to upper bound each term on the rght hand sde of 5). Frst, t follows from 17) that, for some constant K 15 t holds E[ y e λp ) ) y e λˆp +1) ) ] K 15 E [ p ˆp +1 ]. By defnton of ᾰp) and βp) n 13) one has ᾰˆp +1 ) βˆp +1 )ˆp +1 = λˆp +1 ), thus the second term on the rght hand sde of 5) vanshes. For the thrd term, we apply the Lpschtz condton on y eᾰq) βq)p ) n 18) to obtan, for some constants K 16 and K 17, [ y E e ᾰˆp +1) βˆp ) +1)ˆp +1 y eᾰˆp ) βˆp )ˆp ) +1 ] K [ 16E ˆp +1 ˆp ] By Lemma A8, we have, for some constants K 18 and K 19, E [ y e ᾰˆp ) βˆp )ˆp +1 ) ỹ +1 e ᾰˆp ) βˆp )ˆp +1 ) ] K 18 K 17 E [ p ˆp + p ˆp +1 )]. + 0 e 4I ξ dξ K 19 I, 6) 3

and by Lemma A6 and Lemma A7 one has, for some constant K 0, [ ỹ+1 E eᾰˆp ) βˆp )ˆp +1 ) ŷ +1,1 ] K 0 E [ ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 ] K 0 I 1. Summarzng the analyses above we obtan, for some constants K 1 and K, [ y E ) ] ŷ +1,1 K 1 E [ p ˆp +1 + p ˆp ] + K 1 I 1 K I 1 7) 0 as, where the second nequalty follows from the convergence rate of the prcng decsons. Smlarly, we obtan [ y E ) ] ŷ +1, K I 1 0, as. We next show that E[y y t ) ] 0 as t. It suffces to prove ths for a) t t +1 + 1,..., t +1 + I +1 }, = 1,,..., and for b) t t +1 + I +1 + 1,..., t +1 + I +1 }, = 1,,.... We wll only provde the proof for a). The nventory order up-to level prescrbed from DDA for perods t t +1 + 1,..., t +1 + I +1 } s ŷ +1,1. Ths, however, may not be achevable for some perod t. Consder the event that the second order up-to level of learnng stage, ŷ,, s acheved durng perods t + I + 1,..., t + I }. Snce λp h )l D t λp l )u, t follows from Hoeffdng nequalty 4 that for any ζ > 0, t +I t +I P D t E D t ζ 1 exp ζ ) I λp l )u λp. 8) h )l) t=t +I +1 t=t +I +1 Let ζ = λp l )u λp h )l ) I ) 1 log I ) 1 n 8), then one has t +I P D t I E [D t +I +1] λp l )u λp h )l ) I ) 1 log I ) 1 t=t +I +1 1 1 I By regularty condton ), E [D t +I +1] > 0, thus when s large enough, we wll have 1 I E [D t +I +1] λp l )u λp h )l ) I ) 1 log I ) 1.. 9) 4 If the random demand s not bounded, then the same result s obtaned under the condton that the moment generatng functon of random demand s fnte around 0. 4

Hence t follows from 9) that, when s large enough, we wll have t +I P D t 1 I E [D t +I +1] 1 1. 30) t=t +I +1 I Defne event A 1 = ω : t+i t=t +I +1 D t 1 I E [D t +I +1], then 30) can be rewrtten as PA 1 ) 1 1 I. Note that when s large enough, 1 I E [D t +I +1] > y h y l, whch means that on the event A 1, the accumulatve demand durng t +I +1,..., t +I } s hgh enough to consume the ntal on-hand nventory of perod t + I + 1 and ŷ, wll be acheved. Therefore, for t t +1 + 1,..., t +1 + I +1 }, y t wll satsfy y t [ŷ,, ŷ +1,1 ] f ŷ +1,1 ŷ,, and y t [ŷ +1,1, ŷ, ] otherwse. Thus, E[y y t ) ] = PA 1 )E[y y t ) A1 ] + PA c 1)E[y y t ) A c 1 ] max E [ y ŷ, ) ], E [ y ŷ +1,1 ) ]} + 1 I y h y l). As shown above, E [ y ŷ, ) ] 0 and E [ y ŷ +1,1 ) ] 0 as. Hence t follows from 1/I 0 as that E [ y y t ) ] 0 for t t +1 + 1,..., t +1 + I +1 } as. Smlarly one can prove that E [ y y t ) ] 0 for t t +1 + I +1 + 1,..., t +1 + I +1 } as. Ths proves E[y y t ) ] 0 when t. And agan, snce convergence n probablty s mpled by mean-square convergence, we conclude that nventory decsons y t of DDA also converge to y n probablty as t. Ths completes the proof of Theorem 1. 4.4 Proof of Theorem We next prove the second man result, the convergence rate of regret. By defnton, the regret for the DDA polcy s [ RDDA, T ) = 1 T T E Gp, y ) Gp t, y t )) ]. t=1 5

We have [ T E Gp, y ) Gp t, y t ) )] E t=1 [ n =1 t +I + Gp, y ) Gˆp, ŷ,1 ) + Gˆp, ŷ,1 ) Gp t, y t ) ) t +I t=t +I +1 Gp, y ) Gˆp + δ, ŷ, ) + Gˆp + δ, ŷ, ) Gp t, y t ) ))] [ n = E I Gp, y ) Gˆp, ŷ,1 ) + Gp, y ) Gˆp + δ, ŷ, ) )] =1 n +E =1 t +I Gˆp, ŷ,1 ) Gp t, y t ) ) + t +I t=t +I +1 Gˆp + δ, ŷ, ) Gp t, y t ) ), 31) where n s the smallest number of stages that cover T,.e., n s the smallest nteger such that ) ) I n 0 =1 v T, and t satsfes log v 1 v I 0 v T + 1 n < log v 1 v I 0 v T + 1 + 1. The nequalty n 31) follows from that the rght hand sde ncludes I n 0 =1 v perods whch s greater than or equal to T. The frst expectaton on the rght hand sde of 31) s wth respect to the sum of the dfference between proft values of DDA decsons and the optmal soluton, hence ts analyss reles on the convergence rate of DDA polces; these are demonstrated n 3), 4), and 7). The second expectaton on the rght hand sde of 31) stems from the fact that n the process of mplementng DDA, t may happen that the nventory decsons from DDA are not mplementable. Ths ssue arses n learnng algorthms for nonpershable nventory systems and t presents addtonal challenges n evaluatng the regret. We note that n Huh and Rusmevchentong 009), a queueng approach s employed to resolve ths ssue for a pure nventory system wth no prcng decsons. To develop an upper bound for Gp, y ) Gˆp, ŷ,1 ) n 31), we frst apply Taylor expanson on Gp, ye λp) ) at pont p. Usng the fact that the frst order dervatve vanshes at p = p and the assumpton that the second order dervatve s bounded regularty condton )), we obtan, for some constant K 3 > 0, that G p, y e λp ) )) G ˆp, y e λˆp ) )) K 3 p ˆp ). 3) Notcng that y e λˆp ) ) maxmzes the concave functon G ˆp, y) for gven ˆp, we apply Taylor expanson wth respect to y at pont y = y e λˆp ) ) to yeld that, for some constant K 4, G ˆp, y e λˆp ) )) Gˆp, ŷ,1 ) K 4 y e λˆp ) ) ŷ,1 ). 33) 6

In addton, we have E [ye λˆp ) ) ŷ,1 ) ] ye λˆp E[ ) ) yeᾰˆp ) βˆp )ˆp ) + yeᾰˆp ) βˆp )ˆp ) yeᾰˆp 1) βˆp 1 )ˆp ) ) + yeᾰˆp 1) βˆp 1 )ˆp ) ỹ eᾰˆp 1) βˆp 1 )ˆp ) + ỹ eᾰˆp 1) βˆp 1 )ˆp ] ) ŷ,1 ye λˆp K 5 E[ ) ) yeᾰˆp ) βˆp )ˆp ) + yeᾰˆp ) βˆp )ˆp ) yeᾰˆp 1) βˆp 1 )ˆp ) + yeᾰˆp 1) βˆp 1 )ˆp ) ỹ eᾰˆp 1) βˆp 1 )ˆp ) + ỹ eᾰˆp 1) βˆp 1 )ˆp ) ŷ,1 ]. Ths s smlar to the rght hand sde of 5) except that + 1 s replaced by. Thus, usng the same analyss as that for 5), we obtan for some constant K 6. [ E ye λˆp ) ) ŷ,1 ) ] K 6 I 1 1 34) Applyng the results above, we obtan, for some constants K 7, K 8, and K 9, that E [Gp, y ) Gˆp, ŷ,1 )] = E[ G p, y e λp ) )) G ˆp, y e λˆp ) ))) + G ˆp, y e λˆp ) )) ) ] Gˆp, ŷ,1 ) K 7 E [ p ˆp ) ] [ ye λˆp + E ) ) ]) ) ŷ,1 ) K 8 K 10 I 1 1 + K 37I 1 1 = K 9 I 1 1, where the frst nequalty follows from 3) and 33), and the second nequalty follows from the convergence rate of prcng decsons 3) and 34). Smlarly, we establsh for some constants K 30, K 31 and K 3, that E [Gp, y ) Gˆp + δ, ŷ, )] K 30 E [ p ˆp δ ) ] + E [ye λˆp +δ ) ) ŷ, ) ]) K 30 E [ ) p ˆp ) + δ ] + K31 I 1 1 K 3 I 1 1. Note that, as seen from Lemma A7 n the Appendx, these results hold when s greater than or equal to some number. 7

Consequently, we have, for some constants K 33, K 34 and K 35, [ n E Gp, y ) Gˆp, ŷ,1 ) + Gp, y ) Gˆp + δ, ŷ, ) ) ] I = = n =1 = +1 n = +1 K 33 n = K 33 I 1 1 I + = K 33 I 0 ) 1 v 1 v 1 1 =1 K 33 I 1 1 + K 34 I 1 1 + K 34 v n 1 1 ) + K 34 Gp, y ) Gˆp, ŷ,1 ) + Gp, y ) Gˆp + δ, ŷ, ) ) I I 0 ) 1 v 1 v 1 K v 33 v 1 1 vlog I 0 v T +1)+1 1 ) 1 + K34 K 35 T 1, 35) where K 34 = =1 Gp, y ) Gˆp, ŷ,1 ) + Gp, y ) Gˆp + δ, ŷ, ) ) I. We next evaluate the second term of 31),.e., n t +I E Gˆp, ŷ,1 ) Gp t, y t ) ) + =1 t +I t=t +I +1 Gˆp + δ, ŷ, ) Gp t, y t ) ). 36) Recall from DDA that p t = ˆp for t = t +1,..., t +I and p t = ˆp +δ for t = t +I +1,..., t +I, and DDA sets two order-up-to levels for stage, ŷ 1 and ŷ, for the frst and second I perods, respectvely. The order-up-to levels may not be achevable, whch happens when x t > ŷ,1 for some t = t + 1,..., t + I, or x t > ŷ, for some t = t + I + 1,..., t + I. In such cases, y t = x t. If the nventory level before orderng at the begnnng of the frst I perods n perod t + 1) or at the begnnng of the second I perods n perod t + I + 1) of stage s hgher than the DDA order-up-to level, then the nventory level wll gradually decrease durng the I perods untl t drops to or below the order up-to level. We start wth the analyss of the frst I perods of state,.e., [ t +I E Gˆp, ŷ,1 ) Gp t, y t ) )]. A man ssue wth the analyss of ths part s that, f x t +1 > ŷ, then ŷ s not achevable. To resolve ths ssue, we apply a smlar argument as that n the proof of the second part of Theorem 1 to show that, f ths s the case, then wth very hgh probablty, after a relatvely) small number of perods, the prescrbed nventory order up-to level wll become achevable. 8

Consder the accumulatve demands durng perods t + 1 to t + demands consume at least x t +1 ŷ, then at perod t + I 1. If these accumulatve I 1, ŷ wll be surely acheved. Snce λp h )l D t λp l )u for t = 1,..., T, by Hoeffdng nequalty, for any ζ > 0 one has t + I 1 t + I 1 P D t E D t ζ 1 exp ζ I 1 λp l )u λp. 37) h )l) Let ζ = λp l )u λp h )l ) ) I 1 1 ) log I 1 1, then t follows from 37) that P t + I 1 D t I 1 E [D t +1] λp l )u λp h )l ) ) I 1 1 ) log I 1 1 1 1. I 1 38) By regularty condton ), E [D t +1] > 0. Thus, when s large enough, say greater than or equal to some number, we wll have I 1 E [D t +1] λp l )u λp h )l ) ) I 1 1 ) log I 1 1 1 I 1 E [D t +1] y h y l x t +1 ŷ. Based on 38), we defne event A as A = t + I 1 ) D t I 1 E [D t +1] λp l )u λp h )l) I 1 1 ) log I 1 1. 39) Then 38) can be restated as PA ) 1 1 I 1. 40) On the event A, the nventory order up-to level ŷ wll be acheved after perods 9

t + 1,..., t + I 1 }. By 40), we have E [ t +I = PA )E Gˆp, ŷ,1 ) Gp t, y t ) )] [ t +I maxh, b}y h y l ) maxh, b}y h y l )I 1, Gˆp, ŷ,1 ) Gp t, y t ) ) A ] + PA c )E I 1 + 1 maxh, b}y h y l )I I 1 [ t +I where the frst nequalty follows from, for perods t = t + 1,..., t + I, that Gˆp, ŷ,1 ) Gp t, y t ) ) A c Gˆp, ŷ,1 ) Gp t, y t ) = Gˆp, ŷ,1 ) Gˆp, y t ) maxh, b}y h y l ), /. and PA c ) 1 I 1/ Smlarly, for large enough that s greater than or equal to, we can establsh E t +I t=t +I +1 Gˆp + δ, ŷ, ) Gp t, y t ) ) maxh, b}y h y l )I 1. Based on the analyss above, we upper bound 36). Let K 36 = =1 maxh, b}yh y l )I, t can be seen that there exst some constants K 37 and K 38 such that n E =1 t +I maxh, b}y h y l )I + =1 Gˆp, ŷ,1 ) Gp t, y t ) ) + n = +1 K 36 + 4 maxh, b}y h y l )I 1 0 v 1 1 v 1 ) n ) K 36 + K 37 v 1 ) n+1 K 36 + K 37 v log v v 1 I 0 v T +1) 1 K 38 T 1. t +I t=t +I +1 4 maxh, b}y h y l )I 1 1 v 1 Gˆp + δ, ŷ, ) Gp t, y t ) ) ] 41) By combnng 35) and 41), we conclude RDDA, T ) 1 T ) K 35 T 1 + K38 T 1 K 39 T 1 for some constant K 39. The proof of Theorem s thus complete. 30

5 Concluson In ths paper, we consder a jont prcng and nventory control problem when the frm does not have pror knowledge about the demand dstrbuton and customer response to sellng prces. We mpose vrtually no explct assumpton about how the average demand changes n prce other than the fact that t s decreasng) and on the dstrbuton of uncertanty n demand. Ths paper s the frst to desgn a nonparametrc algorthm data-drven learnng algorthm for dynamc jont prcng and nventory control problem and present the convergence rate of polces and profts to those of the optmal ones. The regret of the learnng algorthm converges to zero at a rate that s the theoretcal lower bound OT 1/ ). There are a number of follow-up research topcs. One s to develop an asymptotcally optmal algorthm for the problem wth lost-sales and censored data. In the lost-sales case, the DDA algorthm proposed here cannot be drectly appled and the estmaton and optmzaton problems are more challengng as the proft functon of the data-drven problem s nether concave nor unmodal, and the demand data s censored. Another nterestng drecton for research s to develop a data-drven learnng algorthm for dynamc prcng and stockng decsons for multple products n an assortment. Acknowledgment: The authors are grateful to the Department Edtor, the Assocate Edtor, and two referees for ther constructve comments on an earler verson of ths paper, that have helped to sgnfcantly mprove the clarty and exposton of ths paper. References [1] Agarwal A, Foster DP, Hsu DJ, Kakade SM, Rakhln A 011) Stochastc convex optmzaton wth bandt feedback. Advances n Neural Informaton Processng Systems1035-1043. [] Auer P, Ortner R, Szepesvar C 007) Improved rates for the stochastc contnuum-armed bandt problem. Proceedngs of the 0th Internatonal Conference on Learnng Theory COLT) 454-468. [3] Besbes O, Zeev A 015) On the surprsng) suffcency of lnear models for dynamc prcng wth demand learnng. Management Sc. 614): 73-739. [4] Burnetas AN, Smth CE 000) Adaptve orderng and prcng for pershable products. Oper. Res. 483):436-443. [5] Chen X, Smch-Lev D 004) Coordnatng nventory control and prcng strateges wth random demand and fxed orderng cost: the fnte horzon case. Oper. Res. 56):887-896. 31

[6] Chen X, Smch-Lev D 01) Prcng and nventory management. Phlps R and Ozalp O, eds. The Handbook of Prcng Management. Oxford Unversty Press, 784-8. [7] Chung BD, L J, Yao T 011) Dynamc Prcng and Inventory Control wth Nonparametrc Demand Learnng. Int. J. Servces Operatons and Informatcs 63): 59-71. [8] Cope EW009) Regret and convergence bounds for a class of contnuum-armed bandt problems. Automatc Control, IEEE Transactons 546):143-153. [9] Elmaghraby W, Kesknocak P 003) Dynamc prcng n the presence of nventory consderatons: research overvew, current practces, and future drectons. Management Sc. 4910):187-1309. [10] Federgruen A, Hechng A 1999) Combned prcng and nventory control under uncertanty. Oper. Res. 473):454-475. [11] Hazan E, Kala A, Kale S, Agarwal A 006) Logarthmc regret algorthms for onlne convex optmzaton. Learnng Theory. Sprnger Berln Hedelberg, 499-513. [1] Heyman D, Sobel M 1984) Stochastc Models n Operatons Research, Vol. II: Stochastc Optmzaton. McGraw-Hll, New York. [13] Hoeffdng W 1963) Probablty nequaltes for sums of bounded random varables. J. Amer. Statst. Assoc. 58:13-30. [14] Huh WT, Rusmevchentong P 009) A nonparametrc asymptotc analyss of nventory plannng wth censored demand. Math. Oper. Res. 341):103-13. [15] Keskn NB, Zeev A 014) Dynamc prcng wth an unknown demand model: asymptotcally optmal sem-myopc polces, Oper. Res. 65):114-1167. [16] Kefer J, Wolfowtz J 195) Stochastc estmaton of the maxmum of a regresson functon. Ann. Math. Statst. 3):46-466. [17] Klenberg R 005) Nearly tght bounds for the contnuum-armed bandt problem. Advances n Neural Informaton Processng Systems: 697-704. [18] Kleywegt AJ, Shapro A, Homem-de-mello T. 001) The sample average approxmaton method for stochastc dscrete optmzaton. SIAM Journal on Optmzaton 1):479-50. [19] La TL, Robbns H 1981) Consstency and asymptotc effcency of slope estmates n stochastc approxmaton schemes. Probablty Theory and Related Felds 563):39-360. [0] Lev R, Peraks G, Uchanco J 010) The data-drven newsvendor problem: new bounds and nsghts. Workng Paper. Massachusetts Insttute of Technology, Cambrdge, MA. [1] Lev R, Roundy RO, Shmoys DB 007) Provably near-optmal samplng-based polces for stochastc nventory control models. Math. Oper. Res. 34):81-839. 3

[] Petruzz NC, Dada M 1999) Prcng and the newsvendor problem: A revew wth extensons. Operatons Research 47):183-194. [3] Petruzz NC, Dada M 00) Dynamc prcng and nventory control wth learnng. Naval Res. Logst. 49:303-35. [4] Robbns H, Monro S 1951) A stochastc approxmaton method. Ann. Math. Statst. ):400-407. [5] Subrahmanyan S, Shoemaker R 1996) Developng optmal prcng and nventory polces for retalers who face uncertan demand. Journal of Retalng 71):7-30. [6] Whtn TM 1955) Inventory control and prce theory. Management Sc. 1):61-68. [7] Yano CA and Glbert SM 003) Coordnated prcng and producton/procurement decsons: A revew. J. Elashberg, A. Chakravarty, eds. Managng Busness Interfaces: Marketng, Engneerng, and Manufacturng Perspectves. Kluwer, Norwell, MA. [8] Zhang L, Chen J 006) Bayesan soluton to prcng and nventory control under unknown demand dstrbuton. Oper. Res. Lett. 345):517-54. [9] Znkevch M 003) Onlne convex programmng and generalzed nfntesmal gradent ascent. Proc. 0th Internat. Conf. Machne Learn. ICML-003) Washngton, D.C. 33

Appendx In ths Appendx, we provde the techncal lemmas and proofs omtted n the man context. Lemma A1 compares the optmal solutons of problem CI and brdgng problem B1,.e., p and p ᾰˆp ), βˆp ) ). Lemma A1. Under Assumpton 1, there exsts some number γ [0, 1) such that for any ˆp P, we have p p ᾰˆp ), βˆp ) ) γ p ˆp. Proof. Frst we make the observaton that p = p ᾰp ), βp ) ). 4) Ths result lnks the optmal solutons of CI and B1 wth parameters ᾰp ), βp ), and t shows that p s a fxed pont of p ᾰz), βz) ) = z. To see why t s true, let Gp, λp)) = pe λp) E[e ɛ ] mn he [ y e λp) e ɛ] + [ + be e λp) e ɛ y ] } +. 43) y Y Then Assumpton 1) mples that Gp, λp)) s unmodal n p. Assumng that G has a unque maxmzer and that pᾰz), βz)) s the unque optmal soluton for problem B1 wth parameters ᾰz), βz) ), then 4) follows from Lemma A1 of Besbes and Zeev 015) by lettng ther functon G be 43). When the optmal soluton y over R + for problem CI for a gven p falls n Y, pα, β) s the maxmzer of pe α βp E[e ɛ ] Ae α βp, where A = mn z he[z e ɛ ] + + be[e ɛ z] +} s a constant. Thus pα, β) satsfes 1 βpα, β) ) E[e ɛ ] + Aβ = 0. Lettng α = ᾰz), β = βz) and takng dervatve of p ᾰz), βz) ) wth respect to z yeld By Assumpton 1), we have dp where γ = max z P dp ᾰz), βz) ) dz dp ᾰz), βz) dz = λ z) λ z)) = λ z) λz) 1. λ z)) ) < 1 for any z P. Ths shows that p ᾰp ), βp ) ) p ᾰˆp ), βˆp ) ) γ p ˆp, ᾰz), βz) dz ) < 1. Ths proves Lemma A1. 34

To compare the optmal solutons of Problems B1 and B, we need several techncal Lemmas. To that end, we change the decson varables n B1 and B. For gven parameters α and β > 0, defne d = e α βp, d D = [d l, d h ] where d l = e α βph rewrtten as Defne max d D d α log d β and d h = e α βpl. Then problem B1 can be E [ e ɛ] mn he [ y de ɛ] + [ + be de ɛ y ] }} +. y Y W d, y) = he y de ɛ) + + be de ɛ y ) + 44) and Gα, β, d) = d α log d β E [ e ɛ] mn y Y log d W d, y) = dα E [ e ɛ] W d, yd)), 45) β where yd) s the optmal soluton of 44) n Y for gven d. Let F ) be the cumulatve dstrbuton functon CDF) of e ɛ, then t can be verfed that ) b yd) = df 1, 46) b + h where F 1 ) s the nverse functon of F ). Also, we let d α, β) denote the optmal soluton of maxmzng 45) n D. Let Smlarly, we reformulate problem B wth decson varables d and y as max d D d α log d β 1 I t +I W +1 d, y) = 1 I ) 1 e ɛt mn y Y I t +I t +I h y de ɛt) + + b de ɛ t y ) + )}} h y de ɛt) + + b de ɛ t y ) + ), 47) and G +1 α, β, d) = d α log d β = d α log d β 1 I 1 I t +I t +I e ɛt ) mn y Y W +1 d, y) e ɛt ) W +1 d, ỹd)), 48) where ỹ +1 d) denotes the optmal soluton of W+1 d, y) n 47) on Y. Let d +1 α, β) be the optmal soluton for G +1,, d) n 48) on D. Also, let ỹ+1 u d) denote the optmal order-up-to 35

level for problem B on R + for gven p P here the superscrpt u stands for unconstraned ). Then ỹ u +1 d) = mn de ɛ j : 1 I t +I } 1 e ɛt e ɛ j } b, 49) b + h where 1A} s the ndcator functon takng value 1 f A s true and 0 otherwse. checked that It can be ỹ +1 d) = mn max ỹ+1 u d), y l}, y h}. 50) Snce ỹ +1 d) s random, t s possble for ỹ +1 d) to take value at the boundary, y h or y l. We frst compare the proft functons defned for the two problems 44), 45), and 47), 48). To ths end, we need the followng propertes. Lemma A. If β > 0, then both Gα, β, d) and G +1 α, β, d) are concave n d D, and both Gα, β, e α βp ) and G +1 α, β, e α βp ) are unmodal n p P. Proof. It s easly seen that W d, y) and W +1 d, y) are both jontly convex n d, y), hence mn y Y W d, y) and mn y Y W+1 d, y) are convex n d Proposton B4 of Heyman and Sobel 1984)). Therefore, the results follow from that the frst term of G and G +1 ) s concave when β > 0. The unmodalty of Gα, β, e α βp ) and G +1 α, β, e α βp ) follows from the concavty of G and G +1, and the fact that e α βp s strctly decreasng n p when β > 0. The followng mportant result shows that, for any gven d, W d, yd)) and W +1 d, ỹ +1 d)) are close to each other wth hgh probablty. Lemma A3. There exsts a postve constant K 40 such that, for any ξ > 0, P W d, yd)) W } +1 d, ỹ +1 d)) K 40 ξ 1 4e I ξ. max d D Proof. By trangle nequalty, we have max W d, yd)) W +1 d, ỹ +1 d)) d D max W d, yd) ) W ) +1 d, yd) + max W ) +1 d, yd) W+1 d, ỹ+1 d) ). 51) d D d D In what follows we develop upper bounds for max d D W d, yd)) W +1 d, yd)) and max d D W +1 d, yd)) W +1 d, ỹ +1 d)) separately. For any d D and y Y, we defne z = y/d. Then, from 46), the optmal z to mnmze W d, dz) s z = yd) d ) b = F 1. b + h 36

Moreover, we have W d, yd)) = W d, dz) = d he z e ɛ) + + be e ɛ z ) ) +, and W +1 d, yd)) = W 1 +1 d, dz) = d I t +I h z e ɛt) + + b e ɛ t z ) + )). 5) For t t + 1,..., t + I }, denote t = he[z e ɛt ] + + be[e ɛt z] +) hz e ɛt ) + + be ɛt z) +). Then E [ t ] = 0. Snce ɛ t s bounded, so s t, thus we apply Hoeffdng nequalty see Theorem 1 n Hoeffdng 1963, and Lev et al. 007 for ts applcaton n newsvendor problems) to obtan, for any ξ > 0, P whch deduces to d h 1 I P t +I max d D } t > d h 1 ξ = P I t +I t > ξ } e 4I ξ, 53) W d, yd)) W } +1 d, yd)) > d h ξ e 4I ξ. 54) Ths bounds the frst term on the rght hand sde of 51). To bound the second term n 51), we use ˆF x) = 1 I I t=1 1 e ɛt x}, x [l, u] to denote the emprcal dstrbuton of e ɛt. For θ > 0, we call ˆF z) a θ-estmate of F z) = b/b+h)), or smply a θ-estmate, f It can be verfed that P ˆF z) < ˆF z) b b + h θ. 55) b } b + h θ } = P ˆF z) < F z) θ } = P ˆF z) F z) < θ e I θ, where the last nequalty follows from Hoeffdng nequalty. Smlarly, we have P ˆF z) > b } b + h + θ e I θ. 37

Combnng the two results above we obtan } b P ˆF z) b + h θ 1 e I θ. Let A 3 θ) represent the event that ˆF z) s a θ-estmate, then the result above states that PA 3 θ)) 1 e I θ. 56) For d D, let z +1 d) = ỹ+1d) d z u +1 = mn and z +1 u = ỹu +1 d) d, then t follows from 49) that e ɛ j 1 : I t +I } 1 e ɛt e ɛ j } b. b + h And t follows from 50) that } } z +1 d) = mn max z u+1, yl, yh. d d By ỹ+1 u d) = d zu +1, we have W +1 d, ỹ+1 u d)) = W +1 d, d z +1 u ). In the followng, we develop an upper bound for W +1 d, dz) W +1 d, d z +1 u ) when ˆF ) s a θ-estmate. Frst, for any gven d D, f z z +1 u, then t follows from 5) that W +1 d, dz) = d I I t=1 [b e ɛt z ) 1 z u +1 < e ɛt} +b e ɛt z ) 1 z < e ɛt z +1 u } + h z e ɛ t ) 1 e ɛ t z }] d I [b e ɛt z ) 1 z +1 u < e ɛt} I t=1 +b z u +1 z ) 1 z < e ɛt z u +1} + h z e ɛ t ) 1 e ɛ t z }], 57) where the nequalty follows from replacng e ɛt n the second term by ts upper bound z u +1, and W +1 d, d z u +1) = d I d I I t=1 I t=1 [ be ɛt z +1))1 z u +1 u < e ɛt } ] +h z +1 u e ɛt )1z < e ɛt z +1} u + h z +1 u e ɛt )1e ɛt z} [ ] be ɛt z +1))1 z u +1 u < e ɛt } + h z +1 u e ɛt )1e ɛt z}, 58) wth the nequalty obtaned by droppng the nonnegatve mddle term. Consequently when z z u +1 38

we subtract 58) from 57) to obtan W +1 d, dz) W +1 d, d z +1) u d b z +1 u z)1 ˆF z +1)) u + b z +1 u z) ˆF z +1) u ˆF z)) + hz z +1) u ˆF ) z) = d z +1 u z) h + b) ˆF z) + b ) d z u +1 z ) b + h)θ, 59) where the second nequalty follows from ˆF z) Smlarly, f z > z u +1, then W +1 d, dz) = d I d I I t=1 I t=1 [ be ɛt z)1z < e ɛt } b b+h θ when ˆF ) s a θ-estmate. ] +hz e ɛt )1 z +1 u < e ɛt z} + hz e ɛt )1e ɛt z +1} u [ be ɛt z)1z < e ɛt } ] +hz z +1)1 z u +1 u < e ɛt z} + hz e ɛt )1e ɛt z +1} u, 60) where the nequalty follows replacng e ɛt n the second term by ts lower bound z u +1, and W +1 d, d z u +1) = d I d I I t=1 [ be ɛt z +1)1z u < e ɛt } ] +be ɛt z +1)1 z u +1 u < e ɛt z} + h z +1 u e ɛt )1e ɛt z +1} u I t=1 [ ] be ɛt z +1)1z u < e ɛt } + h z +1 u e ɛt )1e ɛt z +1} u, 61) agan the nequalty follows from droppng the nonnegatve second term. Subtractng 61) from 60), we obtan W +1 d, dz) W +1 d, d z +1) u d b z +1 u z)1 ˆF z)) + hz z +1) u ˆF z) ˆF z +1)) u + hz z +1) u ˆF ) z +1) u = dz z u +1)h + b) ˆF z) b) dz z u +1)b + h)θ, 6) where the last nequalty follows from ˆF z) that b b+h + θ when ˆF ) s a θ-estmate. The results 59) and 6) mply that, when ˆF ) s a θ-estmate, or 55) s satsfed, t holds W +1 d, dz) W +1 d, d z +1) u d z z +1 u b + h)θ. 39

As demand s bounded, d z +1 u s bounded too, hence t follows from dz Y that there exsts some constant K 41 > 0 such that d z z u +1 K41. Thus W +1 d, dz) W +1 d, d z u +1) K 41 b + h)θ. Snce z u +1 s the unconstraned mnmzer of W+1 d, dz), t follows that W +1 d, dz) W +1 d, d z +1 d)) W +1 d, dz) W +1 d, d z u +1) K 41 b + h)θ. As ths nequalty holds for any d D, t mples that, when ˆF ) s a θ-estmate, or on the event A 3 θ), max W+1 d, dz) W +1 d, d z +1 d))} K 41 b + h)θ. 63) d D Lettng θ = ξ n 63) we obtan P W+1 d, dz) W +1 d, d z +1 d))) max d D PA 3 ξ)) 1 e I ξ, } K 41 b + h)ξ where the last nequalty follows from 56). Ths proves, by notng W +1 d, yd)) W +1 d, ỹ +1 d)) 0 as ỹ +1 d) s the mnmzer of W+1 on Y, that } P W+1 d, yd)) W +1 d, ỹ +1 d))) K41 b + h)ξ 1 e I ξ. 64) max d D Applyng 54) and 64) n 51), we conclude that there exst a constant K 40 > 0 such that for any ξ > 0, when I s suffcently large, P maxw d, yd)) W } +1 d, ỹ +1 d)) K 40 ξ 1 e I ξ e 4I ξ 1 4e I ξ. d D Ths completes the proof of Lemma A3. Havng compared functons W and W +1, we next compare G wth G +1. Lemma A4. Gven parameters α and β, there exst a postve constant K 4 such that, for any ξ > 0, P max d D G α, β, d) G } +1 α, β, d) K 4 ξ 5e I ξ. Proof. For any d D, smlar argument as that used n provng 53) of Lemma A shows that, for any ξ > 0, P E[eɛt ] 1 I t +I 40 e ɛt ) ξ } 1 e 4I ξ,

where σ = Vare ɛt ). Let r α log d = max d D β d, then we have log d P max dα d D β = P r 1 E[e ɛt ] I E[e ɛt ] d α log d β t +I e ɛt ) r ξ 1 I } t +I e ɛt ) r ξ 1 e 4I ξ. 65) Hence, t follows from 45) and 48) that, for any d D and ξ > 0, P Gα, β, d) G } +1 α, β, d) K 40 + r )ξ = P max d D max d D d α log d E [ e ɛ] W d, yd)) β P max log d d D dα E [ e ɛ] d α log d 1 β β I P max log d d D dα E [ e ɛ] d α log d 1 β β I ) d α log d 1 β I t +I t +I = 1 P max log d d D dα E [ e ɛ] d α log d 1 β β I 1 P max log d d D dα E [ e ɛ] d α log d 1 β β I P 1 e 4I ξ 4e I ξ 1 5e I ξ, t +I } e ɛt ) W +1 d, ỹ +1 d))) } K 40 + r )ξ ) e ɛt + max W d, yd) W +1 d, ỹ +1 d)) d D e ɛt ) r ξ, } K 40 + r )ξ } and max W d, yd) W +1 d, ỹ +1 d)) K 40ξ d D t +I e ɛt ) > r ξ, } or max W d, yd)) W +1 d, ỹ +1 d)) > K 40ξ d D t +I ) } e ɛt > r ξ } max W d, yd) W +1 d, ỹ +1 d)) > K 40ξ where the last nequalty follows from 65) and Lemma A. Lettng K 4 = K 40 + r σ completes the proof of Lemma A4. d D 41

For any ξ > 0, we defne event A 4 ξ) = ω : max Gα, β, d) G } +1 α, β, d) K 4 ξ. 66) d D Then Lemma A4 can be reterated as PA 4 ξ)) 1 5e I ξ. Wth the preparatons above, we are now ready to compare the optmal solutons of problems B1 and B. Dfferent from B1, n problem B the dstrbuton of ɛ n the objectve functon s unknown, hence the expectatons are replaced by ther sample averages, gvng rse to the SAA problem. Lemma A5 below presents a useful result that bounds the probablty for the optmal soluton of problem B to be away from that of problem B1. Snce I tends to nfnty as t goes to nfnty, ths shows that the probablty that the two solutons, p ᾰˆp ), βˆp ) ) and p +1 ᾰˆp ), βˆp ) ), are sgnfcantly dfferent converges to zero when the length of the plannng horzon T ncreases. Lemma A5. F or any p P and any ξ > 0, p P ᾰp), βp) ) βp)) } p +1 ᾰp), K43 ξ 1 5e 4I ξ for some postve constant K 43. Proof. To slghtly smplfy the notaton, for gven parameters α and β, n ths proof we let By Taylor s expanson, Gd) = Gα, β, d), Gd) = G+1 α, β, d), d = dα, β), d = d+1 α, β). G d) = Gd) + G d) d d) + G q) d d), 67) where q [d, d] f d d and q [ d, d] f d > d. Snce we assume the mnmzer of W d, y) over R + falls nto Y, t follows from 45) that Gd) = d α log d β E[e ɛ ] Ad, where A = mn z he z e ɛ ) + + be e ɛ z ) +} > 0 s a constant. Thus, we have G d) = E[eɛ ] βd. Snce λ ) s assumed to be strctly decreasng, t follows that β ) s bounded below by a postve E[e number, say ā > 0. On β ā, let mn ɛ ] d D βd 67) that = m and t holds that m > 0, then t follows from G d) Gd) m d d). 68) Now we prove, on event A 4 ξ), that G d) Gd) K 4 ξ. 69) 4

We prove ths by contradcton. Suppose t s not true,.e., Gd) G d) > K 4 ξ, then t follows from 66) that Gd) G d) = ) ) ) Gd) Gd) + Gd) G d) + G d) G d) > K 4 ξ + K 4 ξ K 4 ξ = 0. Ths leads to Gd) > G d), contradctng wth d beng optmal for problem B. satsfed on A 4 ξ). Thus, 69) s Usng 68) and 69), we obtan that, on event A 4 ξ), d d 4K 4 m ξ, or equvalently, for some constant K 44, d d K 44 ξ 1. Let gd) = α log d β, then pα, β) = gd) and p +1 α, β) = g d). Snce the frst order dervatve of gd) wth respect to d D s bounded, there exst constant K 45 > 0, such that on A 4 ξ), t holds that pα, β) p +1 α, β) = gd) g d) K 45 d d K 44 K 45 ξ 1. Lettng K 43 = K 44 K 45, ths shows that for any values of α and β ā, } P pα, β) p +1 α, β) K 43 ξ 1 PA 4 ξ)) 1 5e I ξ. Substtutng α = ᾰp) and β = βp), we obtan the desred result n Lemma A5. Lemma A6 shows that ˆα +1, ˆβ +1 ), ᾰˆp ), βˆp )) and ᾰˆp +δ ), βˆp +δ )) approach each other when gets large. Lemma A6. There exsts a postve constant K 46 such that [ E ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 ] K 46 I 1. Proof. The proof of ths result bears smlarty wth that of Besbes and Zeev 015), hence here we only present the dfferences. For convenence we defne B 1 +1 = 1 I t +I ɛ t, B+1 = 1 t +I I 43 t=t +I +1 ɛ t.

Recall that ˆα +1 and ˆβ +1 are derved from the least-square method, and they are gven by ˆα +1 = λˆp ) + λˆp + δ ) + B1 +1 + B +1 + ˆβ ˆp + δ +1, 70) ˆβ +1 = λˆp + δ ) λˆp ) δ 1 δ B 1 +1 + B +1). 71) Applyng Taylor s expanson on λˆp + δ ) at pont ˆp to the second order for 71), we obtan ˆβ +1 = λ ˆp ) + 1 ) λ q )δ 1 B+1 1 + B δ +1) where q [ˆp, ˆp + δ ]. λˆp + δ ) at pont ˆp to the frst order, we have = βˆp ) 1 λ q )δ 1 δ B 1 +1 + B +1), 7) Substtutng ˆβ +1 n 70) by 7), and applyng Taylor s expanson on ˆα +1 = λˆp ) + 1 λ q )δ + B1 +1 + B +1 where q [ˆp, ˆp + δ ]. + λ ˆp ) ˆp + δ ) 1 λ q )δ 1 δ B 1 +1 + B +1) ) ˆp + δ ) = ᾰˆp ) + 1 λ q )δ + B1 +1 + B +1 1 λ ˆp )δ + 1 λ q )δ 1 ) B+1 1 + B δ +1) ˆp + δ ), 73) Snce the error terms ɛ t are assumed to be bounded, we apply Hoeffdng nequalty to obtan P B 1 +1 > ξ } e I ξ, P B +1 > ξ } e I ξ. Hence, P B +1 1 + B +1 > ξ } P B +1 1 > ξ } + P B +1 > ξ } 4e I ξ. Therefore, P B +1 1 + B+1 ξ } P B +1 1 + B +1 ξ } 1 4e I ξ. Smlar argument shows P B +1 1 + B+1 ξ } 1 4e I ξ. Snce λ ) and λ ) are bounded and δ converges to 0, from 73) we conclude that there must exst a constant K 47 such that, on the event B +1 1 + B +1 ξ and B +1 1 + B +1 ξ, t holds that ˆα +1 ᾰˆp ) K 47 δ + ξ ) + ξ. δ 44

Therefore, P ˆα +1 ᾰˆp ) K 47 δ + ξ )} + ξ δ P B +1 1 + B+1 ξ, B +1 1 + B +1 ξ } 1 8e I ξ, whch mples )} P ˆα +1 ᾰˆp ) K 48 δ + ξ δ + ξ 1 8e I ξ. 74) From 7) we have whch mples P ˆβ+1 βˆp ) )} K 49 δ + ξδ 1 4e I ξ, P ˆβ+1 βˆp ) )} K 50 δ + ξ 1 4e I ξ. 75) δ Followng the development of 74) and 75), we have )} P ˆα +1 λˆp + δ ) K 51 δ + ξ δ + ξ 1 8e I ξ. 76) and P ˆβ+1 βˆp + δ ) )} K 5 δ + ξ 1 4e I ξ. 77) δ Combnng74), 75), 76), and 77), we obtan P ˆα +1 λˆp ) + ˆβ+1 βˆp ) + ˆα +1 λˆp + δ ) + ˆβ+1 βˆp + δ ) 78) ) } K 53 δ + ξ δ + ξ whch s 1 4e I ξ, K54 P δ + K 55 ) 1 ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ ) } +1 K 53 δ ξ < 4e I ξ. 45

Therefore, [ K54 E δ + K 55 ) 1 ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 K 53 δ ) ] = K54 δ ) 1 + K 55 E [ ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 ] + 0 = 1 I. 4e I ξ dξ ) 1 K54 δ + K 55 K 53 δ Hence one has E [ ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 ] ) ) 1 1 K54 ) K54 + + K 55 K 53 δ + K 55 I δ δ K 46 I 1. Ths completes the proof of Lemma A6. 79) Lemma A7 bounds the dfference between the soluton for problem B, p +1 ᾰˆp ), βˆp ) ), and the soluton for problem DD, ˆp +1. Comparng the two problems, we note that there are two man dfferences: Frst, problem DD has an affne functon wth coeffcents ˆα +1 and ˆβ +1, whle problem B has an affne functon wth coeffcents ᾰˆp ) and βˆp ); second, n problem DD, the based sample of demand uncertanty, η t, s used, whle n problem B, an unbased sample ɛ t s used. Despte those dfferences, we have the followng result. Lemma A7. T here exsts some postve constants K 56 and such that for any one has P p+1 ᾰˆp ), βˆp ) ) ˆp +1 K56 ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 ) } 8 I, P ỹ+1 ᾰˆp ), βˆp ) ) ŷ +1 K56 ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 ) } 8 I. Proof. To compare the solutons of these two problems, we ntroduce a general functon based on the data-drven problem DD and problem B: Gven sellng prce p t = ˆp for t = t + 1,..., t + I 46

and p t = ˆp + δ for t = t + I + 1,..., t + I, logarthm demand data D t, t = t + 1,..., t + I, and two sets of parameters α 1, β 1 ), α, β ), defne ζ t 1+I α 1, β 1 ) = ζ t +1,..., ζ t +I ) and ζ t +I t=t +I +1 α, β ) = ζ t +I +1,..., ζ t +I ) by ζ t = D t α 1 β 1 p t ) = λˆp ) + ɛ t α 1 β 1 ˆp ), t = t + 1,..., t + I, ζ t = D t α β p t ) = λˆp + δ ) + ɛ t α β ˆp + δ )), t = t + I + 1,..., t + I. Then, we defne a functon H +1 by ) H +1 p, e α 1 β 1 p, ζ t 1+I α 1, β 1 ), ζ t +I t=t +I +1 α, β ) t +I t 1 +I = pe α 1 β 1 p 1 I e ζt mn y Y I h y e α 1 β 1 p e ζt) + + b e α 1 β 1 p e ζt y ) + )}. 80) Consder the optmzaton of H +1, and let ts optmal prce be denoted by p α 1, β 1 ), α, β ) ) ) = arg max H +1 p, e α 1 β 1 p, ζ t 1+I t=t p P +1 α 1, β 1 ), ζ t +I t=t +I +1 α, β ) 81) and ts optmal order-up-to level, for gven prce p, be denoted by y e α 1 β 1 p, α 1, β 1 ), α, β ) ) 1 = arg mn y Y I t +I h y e α 1 β 1 p e ζt) + + b e α 1 β 1 p e ζt y ) + )}.8) Smlar to Besbes and Zeev 015), we make the assumpton that the optmal solutons p α 1, β 1 ), α, β ) ) and y e α 1 β 1 p, α 1, β 1 ), α, β ) ) are dfferentable wth respect to α 1, α and β 1, β wth bounded frst order dervatves. Then, p α 1, β 1 )α, β ) ) and y e α 1 β 1 p, α 1, β 1 ), α, β ) ) are both Lpschtz and n partcular, there exsts a constant K 57 > 0 such that for any α 1, α, α 1, α and β 1, β, β 1, β, t holds that p α 1, β 1 )α, β ) ) p α 1, β 1)α, β ) ) 83) ) K 57 α 1 α 1 + β 1 β 1 + α α + β β y e α 1 β 1 p, α 1, β 1 ), α, β ) ) y e α 1 β 1 p, α 1, β 1), α, β ) ) 84) ) K 57 α 1 α 1 + β 1 β 1 + α α + β β., The optmzaton problem 80) wll serve as yet another brdgng problem between DD and B. To see that, observe that when α 1 = α = ˆα +1 and β 1 = β = ˆβ +1, problem 81) s reduced to the data-drven problem DD. That s, ˆp +1 = p ˆα +1, ˆβ +1 ), ˆα +1, ˆβ +1 ) ). 85) 47

On the other hand, when α 1 = ᾰˆp ), β 1 = βˆp ), α = ᾰˆp + δ ), β = βˆp + δ ), we deduce from the defnton of ᾰ ) and β ) that for t = t + 1,..., t + I, we have ζ t = D t α 1 β 1 p t ) = λˆp ) + ɛ t ᾰˆp ) βˆp )ˆp ) = ɛ t, 86) and for t = t + I + 1,..., t + I, t holds that ζ t = D t α β p t ) = λˆp + δ ) + ɛ t ᾰˆp + δ ) βˆp + δ )ˆp + δ )) = ɛ t. 87) Ths shows that when the parameters are ᾰˆp ), βˆp ) ) and ᾰˆp + δ ), βˆp + δ ) ), problem 81) s reduced to brdgng problem B. Ths gves us p +1 ᾰˆp ), βˆp ) ) = p ᾰˆp ), βˆp ) ), ᾰˆp + δ ), βˆp + δ ) )). 88) The two results 85) and 88) wll enable us to compare the optmal solutons of the data-drven optmzaton problem DD and brdgng problem B through one optmzaton problem 81). In Lemma A6, lettng ξ = I ) 1 log I ) 1 n 78), we obtan P ᾰˆp ) ˆα +1 + βˆp ) ˆβ +1 + ᾰˆp + δ ) ˆα +1 + βˆp + δ ) ˆβ +1 89) ) } K 53 I 1 + I ) 1 log I ) + I ) 1 log I ) 1 8 I. Ths mples P ᾰˆp ) ˆα +1 3K 53 ) 1 I ) 1 4 log I ) 1, βˆp ) ˆβ +1 3K 53 ) 1 I ) 1 4 log I ) 1, 90) } ᾰˆp + δ ) ˆα +1 3K 53 ) 1 I ) 1 4 log I ) 1, βˆp + δ ) ˆβ +1 3K 53 ) 1 I ) 1 4 log I ) 1 1 8 I. For convenence, we defne the event A 5 by A 5 = ω : ᾰˆp ) ˆα +1 3K 53 ) 1 I ) 1 4 log I ) 1, βˆp ) ˆβ +1 3K 53 ) 1 I ) 1 4 log I ) 1, 91) } ᾰˆp + δ ) ˆα +1 3K 53 ) 1 I ) 1 4 log I ) 1, βˆp + δ ) ˆβ +1 3K 53 ) 1 I ) 1 4 log I ) 1. Then by 91) one has PA c 5) 8 I. 9) When β 1 > 0, smlar to Remark and Lemma A, one can verfy that H +1,,, ) of 80) s unmodal n p thus ts optmal soluton s well-defned. Defne e = max log v, mn 3K 53 ) 1 I ) 1 4 log I ) 1 I 0 48 < mn p P }} βp), 93)

where we need e to be no less than log v I 0 to ensure that I ) 1 4 log I ) 1 s decreasng on. When, t follows that ˆβ +1 > 0 on A 5, hence on event A 5, problem DD s unmodal n p after mnmzng over y, and the optmal prcng s well-defned. These propertes wll enable us to prove that the convergence of parameters translates to convergence of the optmal solutons. Then the frst part n Lemma A7 on p follows drectly from 85), 88) and 83). From equatons 8), 86), and 87), we conclude ỹ +1 e ᾰˆp ) βˆp )ˆp +1 ) = y e ᾰˆp ) βˆp )ˆp +1, ᾰˆp ), βˆp ) ), ᾰˆp + δ ), βˆp + δ ) )), and t follows from the DDA polcy that ŷ +1,1 = y eˆα +1 ˆβ +1 ˆp +1, ˆα +1, ˆβ +1 ), ˆα +1, ˆβ +1 ) ). Then, smlar analyss as that n the proof of 83) can be used to prove 84). To prepare for the convergence proof of order-up-to levels n Theorem 1, we need another result. Recall that y e α βp) and ỹ +1 e α βp ) are the optmal y on Y for problem B1 and problem B respectvely for gven p P. We have the followng result. Lemma A8. T here exsts some constant K 58 such that, for any p P and ˆp P, for any ξ > 0, t holds that y P e ᾰˆp ) βˆp )p ) ỹ +1 e ᾰˆp ) βˆp )p ) } K58 ξ e 4I ξ. Proof. For p P, the optmal soluton for brdgng problem B1 s the same as 46), y eᾰˆp ) βˆp )p ). Thus ) y eᾰˆp ) βˆp )p = eᾰˆp ) βˆp )p F 1 b b + h ). 94) For gven p P, we follow 49) to defne ỹ u +1 e ᾰˆp ) βˆp )p ) as the unconstraned optmal order-upto level for problem B on R +, then t can be verfed that ỹ u +1 ) eᾰˆp ) βˆp )p = eᾰˆp ) βˆp )p mn e ɛ j : and, smlar to 50), we have 1 I t +I eᾰˆp ) ỹ ) βˆp )p +1 = mn max ỹ +1eᾰˆp u ) βˆp )p ), y l} y h}. } 1 e ɛt e ɛ j } b, 95) b + h It s seen that eᾰˆp ) y ) βˆp )p eᾰˆp ) ỹ ) βˆp )p ) ) y +1 eᾰˆp ) βˆp )p ỹ+1 u eᾰˆp ) βˆp )p. 96) 49

Now, for any z > 0, we have P F = P 1 P I 1 = P I ) b } b + h z )} e ᾰˆp ) βˆp )ˆp +1 b ) F 1 b + h z )} } b e ɛt F 1 b + h z b b + h )} ) } b b e ɛt F 1 b + h z b + h z z, ỹ+1 u eᾰˆp ) ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) ỹ+1 u eᾰˆp ) ) βˆp )ˆp +1 t +I t +I 1 1 where the frst nequalty follows from 95). Snce E apply Hoeffdng nequalty to obtan t 1 +I )} b P 1 e ɛt F 1 I b + h z [ )}] 1 e ɛt F 1 b b+h z ) } b b + h z z e 4I z. 97) = b b+h z, we Combnng ths wth 94) and 97), we obtan P F ỹ+1 u eᾰˆp ) ) ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) eᾰˆp ) ) } F y ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) z Smlarly, we have P F ỹ+1 u eᾰˆp ) ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) e 4I z. ) eᾰˆp ) ) } F y ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) z e 4I z. 98) 99) From regularty condton v), the probablty densty functon f ) of e ɛt satsfes r = mnfx), x [l, u]} > 0. From calculus, t s known that, for any x < y, there exsts a number z [x, y] such that F y) F x) = fz)y x) ry x). Applyng 98) and 99), for any ξ > 0, we obtan e 4I ξ P F P r = P ỹ u ỹ u +1 ỹu +1 +1 eᾰˆp ) ) ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) eᾰˆp ) F y ) βˆp )ˆp +1 eᾰˆp ) ) βˆp )ˆp +1 e ᾰˆp ) βˆp )ˆp +1 ) eᾰˆp ) y ) βˆp )ˆp +1 eᾰˆp ) βˆp )ˆp +1 ) y eᾰˆp ) βˆp )ˆp +1 ) 1 r eᾰˆp ) βˆp )ˆp +1 ξ } e ᾰˆp ) βˆp )ˆp +1 )) ξ } ξ e ᾰˆp ) βˆp )ˆp +1 ) }. 50

Let K 58 = maxˆp P,ˆp +1 P 1 r eᾰˆp ) βˆp )ˆp +1, then K 58 > 0. We have P ỹ+1 u eᾰˆp ) ) } ) βˆp )ˆp +1 y eᾰˆp ) βˆp )ˆp +1 K 58 ξ e 4I ξ, and Lemma A8 follows from the nequalty above and 96). 51