Evaluating probabilities under high-dimensional latent variable models
|
|
|
- Harriet Johns
- 10 years ago
- Views:
Transcription
1 Evaluating probabilities under ig-dimensional latent variable models Iain Murray and Ruslan alakutdinov Department of Computer cience University of oronto oronto, ON. M5 3G4. Canada. Abstract We present a simple new Monte Carlo algoritm for evaluating probabilities of observations in complex latent variable models, suc as Deep Belief Networks. Wile te metod is based on Markov cains, estimates based on sort runs are formally unbiased. In expectation, te log probability of a test set will be underestimated, and tis could form te basis of a probabilistic bound. e metod is muc ceaper tan gold-standard annealing-based metods and only sligtly more expensive tan te ceapest Monte Carlo metods. We give examples of te new metod substantially improving simple variational bounds at modest extra cost. Introduction Latent variable models capture underlying structure in data by explaining observations as part of a more complex, partially observed system. A large number of probabilistic latent variable models ave been developed, most of wic express a joint distribution P (v, ) over observed quantities v and teir unobserved counterparts. Altoug it is by no means te only way to evaluate a model, a natural question to ask is wat probability P (v) is assigned to a test observation?. In some models te latent variables associated wit a test input can be easily summed out: P (v) = P (v, ). As an example, standard mixture models ave a single discrete mixture component indicator for eac data point; te joint probability P (v, ) can be explicitly evaluated for eac setting of te latent variable. More complex grapical models explain data troug te combination of many latent variables. is provides ricer representations, but provides greater computational callenges. In particular, marginalizing out many latent variables can require complex integrals or exponentially large sums. One popular latent variable model, te Restricted Boltzmann Macine (RBM), is unusual in tat te posterior over iddens P ( v) is fully-factored, wic allows efficient evaluation of P (v) up to a constant. Almost all oter latent variable models ave posterior dependencies amongst latent variables, even if tey are independent a priori. Our current work is motivated by recent work on evaluating RBMs and teir generalization to Deep Belief Networks (DBNs) []. For bot types of models, a single constant was accurately approximated so tat P (v, ) could be evaluated point-wise. For RBMs, te remaining sum over idden variables was performed analytically. For DBNs, test probabilities were lower-bounded troug a variational tecnique. Peraps surprisingly, te bound was unable to reveal any significant improvement over RBMs in an experiment on MNI digits. It was unclear weter tis was due to looseness of te bound, or to tere being no difference in performance. A more accurate metod for summing over latent variables would enable better and broader evaluation of DBNs. In section 2 we consider existing Monte Carlo metods. ome of tem are certainly
2 more accurate, but proibitively expensive for evaluating large test sets. We ten develop a new ceap Monte Carlo procedure for evaluating latent variable models in section 3. Like te variational metod used previously, our metod is unlikely to spuriously over-state test-set performance. Our presentation is for general latent variable models, owever for a running example, we use DBNs (see section 4 and [2]). e benefits of our new approac are demonstrated in section 5. 2 Probability of observations as a normalizing constant e probability of a data vector, P (v), is te normalizing constant relating te posterior over idden variables to te joint distribution in Bayes rule, P ( v) = P (, v)/p (v). A large literature on computing normalizing constants exists in pysics, statistics and computer science. In principle, tere are many metods tat could be applied to evaluating te probability assigned to data by a latent variable model. We review a subset of tese metods, wit notation and intuitions tat will elp motivate and explain our new algoritm. In wat follows, all auxiliary distributions Q and transition operators are conditioned on te current test case v, tis is not sown in te notation to reduce clutter. Furter, all of tese metods assume tat we can evaluate P (, v). Grapical models wit undirected connections will require te separate estimation of a single constant as in []. 2. Importance sampling Importance sampling can in principle find te normalizing constant of any distribution. e algoritm involves averaging a simple ratio under samples from some convenient tractable distribution over te idden variables, Q(). Provided Q() 0 wenever P (, v) 0, we obtain: P (, v) Q() P ( (s), v ) P (v) = Q() Q ( (s)), (s) Q ( (s)). () Importance sampling relies on te sampling distribution Q() being similar to te target distribution P ( v). pecifically, te variance of te estimator is an α-divergence between te distributions [3]. Finding a tractable Q() wit small divergence is difficult in ig-dimensional problems. 2.2 e Harmonic mean metod Using Q()=P ( v) in () gives an estimator tat requires knowing P (v). As an alternative, te armonic mean metod, also called te reciprocal metod, gives an unbiased estimate of /P (v): P (v) = P () P (v) = P ( v) P (v ) P ( v (s)), (s) P ( (s) v). (2) In practice correlated samples from MCMC are used; ten te estimator is asymptotically unbiased. It was clear from te original paper and its discussion tat te armonic mean estimator can beave very poorly [4]. amples in te tails of te posterior ave large weigts, wic makes it easy to construct distributions were te estimator as infinite variance. A finite set of samples will rarely include any extremely large weigts, so te estimator s empirical variance can be misleadingly low. In many problems, te estimate of /P (v) will be an underestimate wit ig probability. at is, te metod will overestimate P (v) and often give no indication tat it as done so. ometimes te estimator will ave manageable variance. Also, more expensive versions of te estimator exist wit lower variance. However, it is still prone to overestimate test probabilities: If / ˆP HME (v) is te Harmonic Mean Estimator in (2), Jensen s inequality gives P (v) = / E [ / ˆP HME (v) ] E [ ˆPHME (v) ]. imilarly log P (v) will be overestimated in expectation. Hence te average of a large number of test log probabilities is igly likely to be an overestimate. Despite tese problems te estimator as received significant attention in statistics, and as been used for evaluating latent variable models in recent macine learning literature [5, 6]. is is understandable: all of te existing, more accurate metods are arder to implement and take considerably longer to run. In tis paper we propose a metod tat is nearly as easy to use as te armonic mean metod, but wit better properties.
3 2.3 Importance sampling based on Markov cains Paradoxically, introducing auxiliary variables and making a distribution muc iger-dimensional tan it was before, can elp find an approximating Q distribution tat closely matces te target distribution. As an example we give a partial review of Annealed Importance ampling (AI) [7], a special case of a larger family of equential Monte Carlo (MC) metods (see, e.g., [8]). ome of tis teory will be needed in te new metod we present in section 3. Annealing algoritms start wit a sample from some tractable distribution P. teps are taken wit a series of operators 2, 3,...,, wose stationary distributions, P s, are cooled towards te distribution of interest. e probability over te resulting sequence H = { (), (2),... () } is: ( Q AI (H) = P () ) ( s (s) (s )). (3) o compute importance weigts, we need to define a target distribution on te same state-space: P AI (H) = P ( () v ) ( s (s ) (s)). (4) Because () as marginal P ( v) = P (, v)/p (v), P AI (H) as our target, P (v), as its normalizing constant. e operators are te reverse operators, of tose used to define Q AI. For any transition operator tat leaves a distribution P ( v) stationary, tere is a unique corresponding reverse operator, wic is defined for any point in te support of P : ( ) = ( ) P ( v) ( ) P ( v) = ( ) P ( v) P (. (5) v) e sum in te denominator is known because leaves te posterior stationary. Operators tat are teir own reverse operator are said to satisfy detailed balance and are also known as reversible. Many transition operators used in practice, suc as Metropolis Hastings, are reversible. Non-reversible operators are usually composed from a sequence of reversible operations, suc as te component updates in a Gibbs sampler. e reverse of tese (so-called) non-reversible operators is constructed from te same reversible base operations, but applied in reverse order. e definitions above allow us to write: Q AI (H) = P AI (H) Q AI(H) P AI (H) = P AI(H) P ( ) () P ( () v ) = P AI (H) P (v) [ ( ) P () ] P ( (), v ) Ps ( (s) ) Ps ( (s ) ) s ( (s) (s )) s ( (s ) (s)) P AI(H) P (v). w(h) We can usually evaluate te Ps, wic are unnormalized versions of te stationary distributions of te Markov cain operators. erefore te AI importance weigt w(h) = / [ ] is tractable as long as we can evaluate P (, v). e AI importance weigt provides an unbiased estimate: [ ] E QAI(H) w(h) = P (v) P AI (H) = P (v). (7) H As wit standard importance sampling, te variance of te estimator depends on a divergence between P AI and Q AI. is can be made small, at large computational expense, by using undreds or tousands of steps, allowing te neigboring intermediate distributions P s () to be close. 2.4 Cib-style estimators Bayes rule implies tat for any special idden state, P (v) = P (, v)/p ( v). (8) is trivial identity suggests a family of estimators introduced by Cib [9]. First, we coose a particular idden state, usually one wit ig posterior probability, and ten estimate P ( v). We would like to obtain an estimator tat is based on a sequence of states H ={ (), (2),..., () } generated by a Markov cain tat explores te posterior distribution P ( v). e most naive estimate of P ( v) is te fraction of states in H tat are equal to te special state s I((s) = )/. (6)
4 Obviously tis estimator is impractical as it equals zero wit ig probability wen applied to igdimensional problems. A Rao Blackwellized version of tis estimator, ˆp(H), replaces te indicator function wit te probability of transitioning from (s) to te special state under a Markov cain transition operator tat leaves te posterior stationary. is can be derived directly from te operator s stationary condition: P ( v) = ( )P ( v) ˆp(H) ( (s) ), { (s) } P(H), (9) were P(H) is te joint distribution arising from steps of a Markov cain. If te cain as stationary distribution P ( v) and could be initialized at equilibrium so tat P(H) = P ( () ) v ( (s) (s )), (0) ten ˆp(H) would be an unbiased estimate of P ( v). For ergodic cains te stationary distribution is acieved asymptotically and te estimator is consistent regardless of ow it is initialized. If is a Gibbs sampling transition operator, te only way of moving from to is to draw eac element of in turn. If updates are made in index order from to M, te move as probability: M ( ) = P ( ) j :(j ), (j+):m. () j= Equations (9, ) ave been used in scemes for monitoring te convergence of Gibbs samplers [0]. It is wort empasizing tat we ave only outlined te simplest possible sceme inspired by Cib s general approac. For some Markov cains, tere are tecnical problems wit te above construction, wic require an extension explained in te appendix. Moreover te approac above is not wat Cib recommended. In fact, [] explicitly favors a more elaborate procedure involving sampling from a sequence of distributions. is opens up te possibility of many sopisticated developments, e.g. [2, 3]. However, our focus in tis work is on obtaining more useful results from simple ceap metods. ere are also well-known problems wit te Cib approac [4], to wic we will return. 3 A new estimator for evaluating latent-variable models We start wit te simplest Cib-inspired estimator based on equations (8,9,). Like many Markov cain Monte Carlo algoritms, (9) provides only (asymptotic) unbiasedness. For our purposes tis is not sufficient. Jensen s inequality tells us P (v) = P (, v) P ( v) = P [ (, v) P ( ] E[ˆp(H)] E, v). (2) ˆp(H) at is, we will overestimate te probability of a visible vector in expectation. Jensen s inequality also says tat we will overestimate log P (v) in expectation. Ideally we would like an accurate estimate of log P (v). However, if we must suffer some bias, ten a lower bound tat does not overstate performance will usually be preferred. An underestimate of P (v) would result from overestimating P ( v). e probability of te special state will often be overestimated in practice if we initialize our Markov cain at. ere are, owever, simple counter-examples were tis does not appen. Instead we describe a construction based on a sequence of Markov steps starting at tat does ave te desired effect. We draw a state sequence from te following carefully designed distribution, using te algoritm in figure : Q(H) = ( (s) ) s =s+ ( (s ) (s ) ) s s = ( (s ) (s +) ). (3) If te initial state were drawn from P ( v) instead of ( (s) ), ten te algoritm would give a sample from an equilibrium sequence wit distribution P(H) defined in (0). is can be cecked by repeated substitution of (5). is allows us to express Q in terms of P, as we did for AI: Q(H) = ( (s) ) P ( (s) v ) P(H) = P ( v) [ ( (s))] P(H). (4)
5 Inputs: v, observed test vector, a (preferably ig posterior probability) idden state, number of Markov cain steps, Markov cain operator tat leaves P ( v) stationary (4). Draw s Uniform({,... }) 2. Draw (s) ( (s) ) 3. for s = (s + ) : 4. Draw (s ) ( (s ) ) (s ) 5. for s = (s ) : : 6. Draw (s ) ( (s ) ) (s +) / 7. P (v) P (v, ) ( (s ) ) s = (3) (2) () Figure : Algoritm for te proposed metod. e grapical model sows Q(H s = 3) for = 4. At eac generated state ( (s ) ) is evaluated (step 7), rougly doubling te cost of sampling. e reverse operator, e, was defined in section 2.3. e quantity in square brackets is te estimator for P ( v) given in (9). e expectation of te reciprocal of tis quantity under draws from Q(H) is exactly te quantity needed to compute P (v): / E Q(H) [ ( (s))] = P ( P(H) = v) P ( v). (5) Altoug we are using te simple estimator from (9), by drawing H from a carefully constructed Markov cain procedure, te estimator is now unbiased in P (v). is is not an asymptotic result. As long as no division by zero as occurred in te above equations, te estimator is unbiased in P (v) for finite runs of te Markov cain. Jensen s implies tat log P (v) is underestimated in expectation. Neal noted tat Cibs metod will return incorrect answers in cases were te Markov cain does not mix well amongst modes [4]. Our new proposed metod will suffer from te same problem. Even if no transition probabilities are exactly zero, unbiasedness does not exclude being on a particular side of te correct answer wit very ig probability. Poor mixing may cause P ( v) to be overestimated wit ig probability, wic would result in an underestimate of P (v), i.e., an overly conservative estimate of test performance. e variance of te estimator is generally unknown, as it depends on te (generally unavailable) auto-covariance structure of te Markov cain. We can note one positive property: for te ideal Markov cain operator tat mixes in one step, te estimator as zero variance and gives te correct answer immediately. Altoug tis extreme will not actually occur, it does indicate tat on easy problems, good answers can be returned more quickly tan by AI. 4 Deep Belief Networks In tis section we provide a brief overview of Deep Belief Networks (DBNs), recently introduced by [2]. DBNs are probabilistic generative models, tat can contain many layers of idden variables. Eac layer captures strong ig-order correlations between te activities of idden features in te layer below. e top two layers of te DBN model form a Restricted Boltzmann Macine (RBM) wic is an undirected grapical model, but te lower layers form a directed generative model. e original paper introduced a greedy, layer-by-layer unsupervised learning algoritm tat consists of learning a stack of RBMs one layer at a time. Consider a DBN model wit two layers of idden features. e model s joint distribution is: H P (v,, 2 ) = P (v ) P ( 2, ), (6) were P (v ) represents a sigmoid belief network, and P (, 2 ) is te joint distribution defined by te second layer RBM. By explicitly summing out 2, we can easily evaluate an unnormalized probability P (v, )=ZP (v, ). Using an approximating factorial posterior distribution Q( v),
6 Estimated est Log probability AI Estimator MNI digits Our Proposed Estimator Estimate of Variational Lower Bound Number of Markov cain steps Estimated est Log probability AI Estimator Our Proposed Estimator Estimate of Variational Lower Bound Image Patces Number of Markov cain steps Figure 2: AI, our proposed estimator and a variational metod were used to sum over te idden states for eac of 50 randomly sampled test cases to estimate teir average log probability. e tree metods sared te same AI estimate of a single global normalization constant Z. obtained as a byproduct of te greedy learning procedure, and an AI estimate of te model s partition function Z, [] proposed obtaining an estimate of a variational lower bound: log P (v) Q( v) log P (v, ) log Z + H(Q( v)). (7) e entropy term H( ) can be computed analytically, since Q is factorial, and te expectation term was estimated by a simple Monte Carlo approximation: Q( v) log P (v, ) log P (v, (s) ), were (s) Q( v). (8).. Instead of te variational approac, we could also adopt AI to estimate P (v, ). is would be computationally very expensive, since we would need to run AI for eac test case. In te next section we sow tat variational lower bounds can be quite loose. Running AI on te entire test set, containing many tousands of test cases, is computationally too demanding. Our proposed estimator requires te same single AI estimate of Z as te variational metod, so tat we can evaluate P (v, ). It ten provides better estimates of log P (v) by approximately summing over for eac test case in a reasonable amount of computer time. 5 Experimental Results We present experimental results on two datasets: te MNI digits and a dataset of image patces, extracted from images of natural scenes taken from te collection of Van Hateren (ttp://lab.pys.rug.nl/imlib/). e MNI dataset contains 60,000 training and 0,000 test images of ten andwritten digits (0 to 9), wit pixels. e image dataset consisted of 30,000 training and 20,000 test patces. e raw image intensities were preprocessed and witened as described in [5]. Gibbs sampling was used as a Markov cain transition operator trougout. All log probabilities quoted use natural logaritms, giving values in nats. 5. MNI digits In our first experiment we used a deep belief network (DBN) taken from []. e network ad two idden layers wit 500 and 2000 idden units, and was greedily trained by learning a stack of two RBMs one layer at a time. Eac RBM was trained using te Contrastive Divergence (CD) learning rule. e estimate of te lower bound on te average test log probability, using (7), was o estimate ow loose te variational bound is, we randomly sampled 50 test cases, 5 of eac class, and ran AI for eac test case to estimate te true test log probability. Computationally, tis is equivalent to estimating 50 additional partition functions. Figure 2, left panel, sows te results. e estimate of te variational bound was per test case, wereas te estimate of te true test log probability using AI was Our proposed estimator, averaged over 0 runs, provided an answer of e special state for eac test example v was obtained by first sampling from te approximating distribution Q( v), and ten performing deterministic ill-climbing in log p(v, ) to get to a local mode.
7 AI used a and-tuned temperature scedule designed to equalize te variance of te intermediate log weigts [7]. We needed 0,000 intermediate distributions to get stable results, wic took about 3.6 days on a Pentium Xeon 3.00GHz macine, wereas for our proposed estimator we only used =40, wic took about 50 minutes. For a more direct comparison we tried giving AI 50 minutes, wic allows 00 temperatures. is run gave an estimate of 89.59, wic is lower tan te lower bound and tells us noting. Giving AI ten times more time, 000 temperatures, gave is is iger tan te lower bound, but still worse tan our estimator at = 40, or even = 5. Finally, using our proposed estimator, te average test log probability on te entire MNI test data was e difference of about 2 nats sows tat te variational bound in [] was rater tigt, altoug a very small improvement of te DBN over te RBM is now revealed. 5.2 Image Patces In our second experiment we trained a two-layer DBN model on te image patces of natural scenes. e first layer RBM ad 2000 idden units and 400 Gaussian visible units. e second layer represented a semi-restricted Boltzmann macine (RBM) wit 500 idden and 2000 visible units. e RBM contained visible-to-visible connections, and was trained using Contrastive Divergence togeter wit mean-field. Details of training can be found in [5]. e overall DBN model can be viewed as a directed ierarcy of Markov random fields wit idden-to-idden connections. o estimate te model s partition function, we used AI wit 5,000 intermediate distributions and 00 annealing runs. e estimated lower bound on te average test log probability (see Eq. 7), using a factorial approximate posterior distribution Q( v), wic we also get as a byproduct of te greedy learning algoritm, was e estimate of te true test log probability, using our proposed estimator, was In contrast to te model trained on MNI, te difference of over 20 nats sows tat, for model comparison purposes, te variational lower bound is quite loose. For comparison, we also trained square ICA and a mixture of factor analyzers (MFA) using code from [6, 7]. quare ICA acieves a test log probability of 55.4, and MFA wit 50 mixture components and a 30-dimensional latent space acieves , clearly outperforming DBNs. 6 Discussion Our new Monte Carlo procedure is formally unbiased in estimating P (v). In practice it is likely to underestimate te (log-)probability of a test set. Altoug te algoritm involves Markov cains, importance sampling underlies te estimator. erefore te metods discussed in [8] could be used to bound te probability of accidentally over-estimating a test set probability. In principle our procedure is a general tecnique for estimating normalizing constants. It would not always be appropriate owever, as it would suffer te problems outlined in [4]. As an example our metod will not succeed in estimating te global normalizing constant of an RBM. For our metod to work well, a state drawn from ( (s) ) sould look like it could be part of an equilibrium sequence H P(H). e details of te algoritm arose by developing existing Monte Carlo estimators, but te starting state (s) could be drawn from any arbitrary distribution: Q var (H) = [ q( (s) ) P(H) = P (v) P ( (s) v) ] q( (s) ) P(H). (9) P ( (s), v) As before te reciprocal of te quantity in square brackets would give an estimate of P (v). If an approximation q() is available tat captures more mass tan ( ), tis generalized estimator could perform better. We are opeful tat our metod will be a natural next step in a variety of situations were improvements are sougt over a deterministic approximation. Acknowledgments is researc was supported by NERC and CFI. Iain Murray was supported by te government of Canada. We tank Geoffrey Hinton and Radford Neal for useful discussions, imon Osindero for providing preprocessed image patces of natural scenes, and te reviewers for useful comments.
8 References [] Ruslan alakutdinov and Iain Murray. On te quantitative analysis of Deep Belief Networks. In Proceedings of te International Conference on Macine Learning, volume 25, pages , [2] Geoffrey E. Hinton, imon Osindero, and Yee Wye e. A fast learning algoritm for deep belief nets. Neural Computation, 8(7): , [3] om Minka. Divergence measures and message passing. R , Microsoft Researc, [4] Micael A. Newton and Adrian E. Raftery. Approximate Bayesian inference wit te weigted likeliood bootstrap. Journal of te Royal tatistical ociety, eries B (Metodological), 56():3 48, 994. [5] omas L. Griffits, Mark teyvers, David M. Blei, and Josua B. enenbaum. Integrating topics and syntax. In Advances in Neural Information Processing ystems (NIP*7). MI Press, [6] Hanna M. Wallac. opic modeling: beyond bag-of-words. In Proceedings of te 23rd international conference on Macine learning, pages ACM Press New York, NY, UA, [7] Radford M. Neal. Annealed importance sampling. tatistics and Computing, (2):25 39, 200. [8] Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. equential Monte Carlo samplers. Journal of te Royal tatistical ociety B, 68(3): 26, [9] iddarta Cib. Marginal likeliood from te Gibbs output. Journal of te American tatistical Association, 90(432):33 32, December 995. [0] Cristian Ritter and Martin A. anner. Facilitating te Gibbs sampler: te Gibbs stopper and te griddy- Gibbs sampler. Journal of te American tatistical Association, 87(49):86 868, 992. [] iddarta Cib and Ivan Jeliazkov. Marginal likeliood from te Metropolis Hastings output. Journal of te American tatistical Association, 96(453), 200. [2] Antonietta Mira and Geoff Nicolls. Bridge estimation of te probability density at a point. tatistica inica, 4:603 62, [3] Francesco Bartolucci, Luisa caccia, and Antonietta Mira. Efficient Bayes factor estimation from te reversible jump output. Biometrika, 93():4 52, [4] Radford M. Neal. Erroneous results in Marginal likeliood from te Gibbs output, 999. Available from ttp:// radford/cib-letter.tml. [5] imon Osindero and Geoffrey Hinton. Modeling image patces wit a directed ierarcy of Markov random fields. In Advances in Neural Information Processing ystems (NIP*20). MI Press, [6] Aapo Hyvärinen. Fast and robust fixed-point algoritms for independent component analysis. IEEE ransactions on Neural Networks, 0(3): , 999. [7] Zoubin Garamani and Geoffrey E. Hinton. e EM algoritm for mixtures of factor analyzers. ecnical Report CRG-R-96-, University of oronto, 997. [8] Vibav Gogate, Bozena Bidyuk, and Rina Decter. tudies in lower bounding probability of evidence using te Markov inequality. In 23rd Conference on Uncertainty in Artificial Intelligence (UAI), A Real-valued latents and Metropolis Hastings ere are tecnical difficulties wit te original Cib-style approac applied to Metropolis Hastings and continuous latent variables. e continuous version of equation (9), P ( v) = ( )P ( v) d ( (s) ), (s) P(H), (20) doesn t work if is te Metropolis Hastings operator. e Dirac-delta function at = contains a significant part of te integral, wic is ignored by samples from P ( v) wit probability one. Following [], te fix is to instead integrate over te generalized detailed balance relationsip (5). Cib and Jeliazkov implicitly took out te = point from all of teir integrals. We do te same: P ( v) = d ( )P ( v)/ d ( ). (2) e numerator can be estimated as before. As bot integrals omit =, te denominator is less tan one wen contains a delta function. For Metropolis Hastings: ( ) = q(; ) min (, a(; ) ), were a(; ) is an easy-to-compute acceptance ratio. ampling from q(; ) and averaging min(, a(; )) provides an estimate of te denominator. In our importance sampling approac tere is no need to separately approximate an additional quantity. e algoritm in figure still applies if te s are interpreted as probability density functions. If, due to a rejection, is drawn in step 2. ten te sum in step 7. will contain an infinite term giving a trivial underestimate P (v)=0. (teps 3 6 need not be performed in tis case.) On repeated runs, te average estimate is still unbiased, or an underestimate for cains tat can t mix. Alternatively, te variational approac (9) could be applied togeter wit Metropolis Hastings sampling.
Verifying Numerical Convergence Rates
1 Order of accuracy Verifying Numerical Convergence Rates We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, suc as te grid size or time step, and
The EOQ Inventory Formula
Te EOQ Inventory Formula James M. Cargal Matematics Department Troy University Montgomery Campus A basic problem for businesses and manufacturers is, wen ordering supplies, to determine wat quantity of
Geometric Stratification of Accounting Data
Stratification of Accounting Data Patricia Gunning * Jane Mary Horgan ** William Yancey *** Abstract: We suggest a new procedure for defining te boundaries of te strata in igly skewed populations, usual
Computer Science and Engineering, UCSD October 7, 1999 Goldreic-Levin Teorem Autor: Bellare Te Goldreic-Levin Teorem 1 Te problem We æx a an integer n for te lengt of te strings involved. If a is an n-bit
Optimized Data Indexing Algorithms for OLAP Systems
Database Systems Journal vol. I, no. 2/200 7 Optimized Data Indexing Algoritms for OLAP Systems Lucian BORNAZ Faculty of Cybernetics, Statistics and Economic Informatics Academy of Economic Studies, Bucarest
SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY
ASA Section on Survey Researc Metods SAMPLE DESIG FOR TE TERRORISM RISK ISURACE PROGRAM SURVEY G. ussain Coudry, Westat; Mats yfjäll, Statisticon; and Marianne Winglee, Westat G. ussain Coudry, Westat,
How To Ensure That An Eac Edge Program Is Successful
Introduction Te Economic Diversification and Growt Enterprises Act became effective on 1 January 1995. Te creation of tis Act was to encourage new businesses to start or expand in Newfoundland and Labrador.
2.23 Gambling Rehabilitation Services. Introduction
2.23 Gambling Reabilitation Services Introduction Figure 1 Since 1995 provincial revenues from gambling activities ave increased over 56% from $69.2 million in 1995 to $108 million in 2004. Te majority
Schedulability Analysis under Graph Routing in WirelessHART Networks
Scedulability Analysis under Grap Routing in WirelessHART Networks Abusayeed Saifulla, Dolvara Gunatilaka, Paras Tiwari, Mo Sa, Cenyang Lu, Bo Li Cengjie Wu, and Yixin Cen Department of Computer Science,
Research on the Anti-perspective Correction Algorithm of QR Barcode
Researc on te Anti-perspective Correction Algoritm of QR Barcode Jianua Li, Yi-Wen Wang, YiJun Wang,Yi Cen, Guoceng Wang Key Laboratory of Electronic Tin Films and Integrated Devices University of Electronic
Improved dynamic programs for some batcing problems involving te maximum lateness criterion A P M Wagelmans Econometric Institute Erasmus University Rotterdam PO Box 1738, 3000 DR Rotterdam Te Neterlands
Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?
Can a Lump-Sum Transfer Make Everyone Enjoy te Gains from Free Trade? Yasukazu Icino Department of Economics, Konan University June 30, 2010 Abstract I examine lump-sum transfer rules to redistribute te
Instantaneous Rate of Change:
Instantaneous Rate of Cange: Last section we discovered tat te average rate of cange in F(x) can also be interpreted as te slope of a scant line. Te average rate of cange involves te cange in F(x) over
- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring 2012. Handout by Julie Zelenski with minor edits by Keith Schwarz
CS106B Spring 01 Handout # May 3, 01 Huffman Encoding and Data Compression Handout by Julie Zelenski wit minor edits by Keit Scwarz In te early 1980s, personal computers ad ard disks tat were no larger
A strong credit score can help you score a lower rate on a mortgage
NET GAIN Scoring points for your financial future AS SEEN IN USA TODAY S MONEY SECTION, JULY 3, 2007 A strong credit score can elp you score a lower rate on a mortgage By Sandra Block Sales of existing
A system to monitor the quality of automated coding of textual answers to open questions
Researc in Official Statistics Number 2/2001 A system to monitor te quality of automated coding of textual answers to open questions Stefania Maccia * and Marcello D Orazio ** Italian National Statistical
College Planning Using Cash Value Life Insurance
College Planning Using Cas Value Life Insurance CAUTION: Te advisor is urged to be extremely cautious of anoter college funding veicle wic provides a guaranteed return of premium immediately if funded
Cyber Epidemic Models with Dependences
Cyber Epidemic Models wit Dependences Maocao Xu 1, Gaofeng Da 2 and Souuai Xu 3 1 Department of Matematics, Illinois State University [email protected] 2 Institute for Cyber Security, University of Texas
An inquiry into the multiplier process in IS-LM model
An inquiry into te multiplier process in IS-LM model Autor: Li ziran Address: Li ziran, Room 409, Building 38#, Peing University, Beijing 00.87,PRC. Pone: (86) 00-62763074 Internet Address: [email protected]
Distances in random graphs with infinite mean degrees
Distances in random graps wit infinite mean degrees Henri van den Esker, Remco van der Hofstad, Gerard Hoogiemstra and Dmitri Znamenski April 26, 2005 Abstract We study random graps wit an i.i.d. degree
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?
Comparison between two approaces to overload control in a Real Server: local or ybrid solutions? S. Montagna and M. Pignolo Researc and Development Italtel S.p.A. Settimo Milanese, ITALY Abstract Tis wor
M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)
Insertion and Deletion in VL Trees Submitted in Partial Fulfillment of te Requirements for Dr. Eric Kaltofen s 66621: nalysis of lgoritms by Robert McCloskey December 14, 1984 1 ackground ccording to Knut
Multivariate time series analysis: Some essential notions
Capter 2 Multivariate time series analysis: Some essential notions An overview of a modeling and learning framework for multivariate time series was presented in Capter 1. In tis capter, some notions on
Chapter 10: Refrigeration Cycles
Capter 10: efrigeration Cycles Te vapor compression refrigeration cycle is a common metod for transferring eat from a low temperature to a ig temperature. Te above figure sows te objectives of refrigerators
2 Limits and Derivatives
2 Limits and Derivatives 2.7 Tangent Lines, Velocity, and Derivatives A tangent line to a circle is a line tat intersects te circle at exactly one point. We would like to take tis idea of tangent line
Predicting the behavior of interacting humans by fusing data from multiple sources
Predicting te beavior of interacting umans by fusing data from multiple sources Erik J. Sclict 1, Ritcie Lee 2, David H. Wolpert 3,4, Mykel J. Kocenderfer 1, and Brendan Tracey 5 1 Lincoln Laboratory,
Theoretical calculation of the heat capacity
eoretical calculation of te eat capacity Principle of equipartition of energy Heat capacity of ideal and real gases Heat capacity of solids: Dulong-Petit, Einstein, Debye models Heat capacity of metals
An Intuitive Framework for Real-Time Freeform Modeling
An Intuitive Framework for Real-Time Freeform Modeling Mario Botsc Leif Kobbelt Computer Grapics Group RWTH Aacen University Abstract We present a freeform modeling framework for unstructured triangle
To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}
4. Variograms Te covariogram and its normalized form, te correlogram, are by far te most intuitive metods for summarizing te structure of spatial dependencies in a covariance stationary process. However,
ACT Math Facts & Formulas
Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Rationals: fractions, tat is, anyting expressable as a ratio of integers Reals: integers plus rationals plus special numbers suc as
Dirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
Derivatives Math 120 Calculus I D Joyce, Fall 2013
Derivatives Mat 20 Calculus I D Joyce, Fall 203 Since we ave a good understanding of its, we can develop derivatives very quickly. Recall tat we defined te derivative f x of a function f at x to be te
Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function
Lecture 10: Wat is a Function, definition, piecewise defined functions, difference quotient, domain of a function A function arises wen one quantity depends on anoter. Many everyday relationsips between
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
For Sale By Owner Program. We can help with our for sale by owner kit that includes:
Dawn Coen Broker/Owner For Sale By Owner Program If you want to sell your ome By Owner wy not:: For Sale Dawn Coen Broker/Owner YOUR NAME YOUR PHONE # Look as professional as possible Be totally prepared
WORKING PAPER SERIES THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS NO. 366 / JUNE 2004. by Peter Christoffersen and Stefano Mazzotta
WORKING PAPER SERIES NO. 366 / JUNE 24 THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS by Peter Cristoffersen and Stefano Mazzotta WORKING PAPER SERIES NO. 366 / JUNE 24 THE INFORMATIONAL
Model Quality Report in Business Statistics
Model Quality Report in Business Statistics Mats Bergdal, Ole Blac, Russell Bowater, Ray Cambers, Pam Davies, David Draper, Eva Elvers, Susan Full, David Holmes, Pär Lundqvist, Sixten Lundström, Lennart
SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS
CIRED Worksop - Rome, 11-12 June 2014 SAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGTS ON ELECTRICAL LOAD PATTERNS Diego Labate Paolo Giubbini Gianfranco Cicco Mario Ettorre Enel Distribuzione-Italy
What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.
Wat is? Spring 2008 Note: Slides are on te web Wat is finance? Deciding ow to optimally manage a firm s assets and liabilities. Managing te costs and benefits associated wit te timing of cas in- and outflows
Note nine: Linear programming CSE 101. 1 Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1
Copyrigt c Sanjoy Dasgupta Figure. (a) Te feasible region for a linear program wit two variables (see tet for details). (b) Contour lines of te objective function: for different values of (profit). Te
FINITE DIFFERENCE METHODS
FINITE DIFFERENCE METHODS LONG CHEN Te best known metods, finite difference, consists of replacing eac derivative by a difference quotient in te classic formulation. It is simple to code and economic to
CHAPTER TWO. f(x) Slope = f (3) = Rate of change of f at 3. x 3. f(1.001) f(1) Average velocity = 1.1 1 1.01 1. s(0.8) s(0) 0.8 0
CHAPTER TWO 2.1 SOLUTIONS 99 Solutions for Section 2.1 1. (a) Te average rate of cange is te slope of te secant line in Figure 2.1, wic sows tat tis slope is positive. (b) Te instantaneous rate of cange
Catalogue no. 12-001-XIE. Survey Methodology. December 2004
Catalogue no. 1-001-XIE Survey Metodology December 004 How to obtain more information Specific inquiries about tis product and related statistics or services sould be directed to: Business Survey Metods
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Pre-trial Settlement with Imperfect Private Monitoring
Pre-trial Settlement wit Imperfect Private Monitoring Mostafa Beskar University of New Hampsire Jee-Hyeong Park y Seoul National University July 2011 Incomplete, Do Not Circulate Abstract We model pretrial
Tis Problem and Retail Inventory Management
Optimizing Inventory Replenisment of Retail Fasion Products Marsall Fiser Kumar Rajaram Anant Raman Te Warton Scool, University of Pennsylvania, 3620 Locust Walk, 3207 SH-DH, Piladelpia, Pennsylvania 19104-6366
SAT Subject Math Level 1 Facts & Formulas
Numbers, Sequences, Factors Integers:..., -3, -2, -1, 0, 1, 2, 3,... Reals: integers plus fractions, decimals, and irrationals ( 2, 3, π, etc.) Order Of Operations: Aritmetic Sequences: PEMDAS (Parenteses
Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions
Artificial Neural Networks for Time Series Prediction - a novel Approac to Inventory Management using Asymmetric Cost Functions Sven F. Crone University of Hamburg, Institute of Information Systems [email protected]
Free Shipping and Repeat Buying on the Internet: Theory and Evidence
Free Sipping and Repeat Buying on te Internet: eory and Evidence Yingui Yang, Skander Essegaier and David R. Bell 1 June 13, 2005 1 Graduate Scool of Management, University of California at Davis ([email protected])
Operation go-live! Mastering the people side of operational readiness
! I 2 London 2012 te ultimate Up to 30% of te value of a capital programme can be destroyed due to operational readiness failures. 1 In te complex interplay between tecnology, infrastructure and process,
Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning
Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning Soo-Haeng Co Hoon Jang Taesik Lee Jon Turner Tepper Scool of Business, Carnegie Mellon University, Pittsburg,
Pioneer Fund Story. Searching for Value Today and Tomorrow. Pioneer Funds Equities
Pioneer Fund Story Searcing for Value Today and Tomorrow Pioneer Funds Equities Pioneer Fund A Cornerstone of Financial Foundations Since 1928 Te fund s relatively cautious stance as kept it competitive
Training Robust Support Vector Regression via D. C. Program
Journal of Information & Computational Science 7: 12 (2010) 2385 2394 Available at ttp://www.joics.com Training Robust Support Vector Regression via D. C. Program Kuaini Wang, Ping Zong, Yaoong Zao College
The Dynamics of Movie Purchase and Rental Decisions: Customer Relationship Implications to Movie Studios
Te Dynamics of Movie Purcase and Rental Decisions: Customer Relationsip Implications to Movie Studios Eddie Ree Associate Professor Business Administration Stoneill College 320 Wasington St Easton, MA
A hybrid model of dynamic electricity price forecasting with emphasis on price volatility
all times On a non-liquid market, te accuracy of a price A ybrid model of dynamic electricity price forecasting wit empasis on price volatility Marin Cerjan Abstract-- Accurate forecasting tools are essential
Bonferroni-Based Size-Correction for Nonstandard Testing Problems
Bonferroni-Based Size-Correction for Nonstandard Testing Problems Adam McCloskey Brown University October 2011; Tis Version: October 2012 Abstract We develop powerful new size-correction procedures for
Multigrid computational methods are
M ULTIGRID C OMPUTING Wy Multigrid Metods Are So Efficient Originally introduced as a way to numerically solve elliptic boundary-value problems, multigrid metods, and teir various multiscale descendants,
Math 113 HW #5 Solutions
Mat 3 HW #5 Solutions. Exercise.5.6. Suppose f is continuous on [, 5] and te only solutions of te equation f(x) = 6 are x = and x =. If f() = 8, explain wy f(3) > 6. Answer: Suppose we ad tat f(3) 6. Ten
h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance
EXTRA TM Instron Services Revolve Around You It is everyting you expect from a global organization Te global training centers offer a complete educational service for users of advanced materials testing
The modelling of business rules for dashboard reporting using mutual information
8 t World IMACS / MODSIM Congress, Cairns, Australia 3-7 July 2009 ttp://mssanz.org.au/modsim09 Te modelling of business rules for dasboard reporting using mutual information Gregory Calbert Command, Control,
Overview of Component Search System SPARS-J
Overview of omponent Searc System Tetsuo Yamamoto*,Makoto Matsusita**, Katsuro Inoue** *Japan Science and Tecnology gency **Osaka University ac part nalysis part xperiment onclusion and Future work Motivation
In other words the graph of the polynomial should pass through the points
Capter 3 Interpolation Interpolation is te problem of fitting a smoot curve troug a given set of points, generally as te grap of a function. It is useful at least in data analysis (interpolation is a form
Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula
Student Name: Date: Contact Person Name: Pone Number: Lesson 0 Perimeter, Area, and Similarity of Triangles Objectives Determine te perimeter of a triangle using algebra Find te area of a triangle using
OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS
New Developments in Structural Engineering and Construction Yazdani, S. and Sing, A. (eds.) ISEC-7, Honolulu, June 18-23, 2013 OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS JIALI FU 1, ERIK JENELIUS
CHAPTER 7. Di erentiation
CHAPTER 7 Di erentiation 1. Te Derivative at a Point Definition 7.1. Let f be a function defined on a neigborood of x 0. f is di erentiable at x 0, if te following it exists: f 0 fx 0 + ) fx 0 ) x 0 )=.
OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS
OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS ERIC T. CHUNG AND BJÖRN ENGQUIST Abstract. In tis paper, we developed and analyzed a new class of discontinuous
1. Case description. Best practice description
1. Case description Best practice description Tis case sows ow a large multinational went troug a bottom up organisational cange to become a knowledge-based company. A small community on knowledge Management
Tangent Lines and Rates of Change
Tangent Lines and Rates of Cange 9-2-2005 Given a function y = f(x), ow do you find te slope of te tangent line to te grap at te point P(a, f(a))? (I m tinking of te tangent line as a line tat just skims
Chapter 11. Limits and an Introduction to Calculus. Selected Applications
Capter Limits and an Introduction to Calculus. Introduction to Limits. Tecniques for Evaluating Limits. Te Tangent Line Problem. Limits at Infinit and Limits of Sequences.5 Te Area Problem Selected Applications
Strategic trading and welfare in a dynamic market. Dimitri Vayanos
LSE Researc Online Article (refereed) Strategic trading and welfare in a dynamic market Dimitri Vayanos LSE as developed LSE Researc Online so tat users may access researc output of te Scool. Copyrigt
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
Introduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos [email protected] Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap
Three New Graphical Models for Statistical Language Modelling
Andriy Mnih Geoffrey Hinton Department of Computer Science, University of Toronto, Canada [email protected] [email protected] Abstract The supremacy of n-gram models in statistical language modelling
Math Test Sections. The College Board: Expanding College Opportunity
Taking te SAT I: Reasoning Test Mat Test Sections Te materials in tese files are intended for individual use by students getting ready to take an SAT Program test; permission for any oter use must be sougt
Heterogeneous firms and trade costs: a reading of French access to European agrofood
Heterogeneous firms and trade costs: a reading of Frenc access to European agrofood markets Cevassus-Lozza E., Latouce K. INRA, UR 34, F-44000 Nantes, France Abstract Tis article offers a new reading of
NAFN NEWS SPRING2011 ISSUE 7. Welcome to the Spring edition of the NAFN Newsletter! INDEX. Service Updates Follow That Car! Turn Back The Clock
NAFN NEWS ISSUE 7 SPRING2011 Welcome to te Spring edition of te NAFN Newsletter! Spring is in te air at NAFN as we see several new services cropping up. Driving and transport emerged as a natural teme
SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle
768 IEEE RANSACIONS ON COMPUERS, VOL. 46, NO. 7, JULY 997 Compile-ime Sceduling of Dynamic Constructs in Dataæow Program Graps Soonoi Ha, Member, IEEE and Edward A. Lee, Fellow, IEEE Abstract Sceduling
Pressure. Pressure. Atmospheric pressure. Conceptual example 1: Blood pressure. Pressure is force per unit area:
Pressure Pressure is force per unit area: F P = A Pressure Te direction of te force exerted on an object by a fluid is toward te object and perpendicular to its surface. At a microscopic level, te force
Large-scale Virtual Acoustics Simulation at Audio Rates Using Three Dimensional Finite Difference Time Domain and Multiple GPUs
Large-scale Virtual Acoustics Simulation at Audio Rates Using Tree Dimensional Finite Difference Time Domain and Multiple GPUs Craig J. Webb 1,2 and Alan Gray 2 1 Acoustics Group, University of Edinburg
Pretrial Settlement with Imperfect Private Monitoring
Pretrial Settlement wit Imperfect Private Monitoring Mostafa Beskar Indiana University Jee-Hyeong Park y Seoul National University April, 2016 Extremely Preliminary; Please Do Not Circulate. Abstract We
2.12 Student Transportation. Introduction
Introduction Figure 1 At 31 Marc 2003, tere were approximately 84,000 students enrolled in scools in te Province of Newfoundland and Labrador, of wic an estimated 57,000 were transported by scool buses.
Chapter 7 Numerical Differentiation and Integration
45 We ave a abit in writing articles publised in scientiþc journals to make te work as Þnised as possible, to cover up all te tracks, to not worry about te blind alleys or describe ow you ad te wrong idea
FINANCIAL SECTOR INEFFICIENCIES AND THE DEBT LAFFER CURVE
INTERNATIONAL JOURNAL OF FINANCE AND ECONOMICS Int. J. Fin. Econ. 10: 1 13 (2005) Publised online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/ijfe.251 FINANCIAL SECTOR INEFFICIENCIES
Torchmark Corporation 2001 Third Avenue South Birmingham, Alabama 35233 Contact: Joyce Lane 972-569-3627 NYSE Symbol: TMK
News Release Torcmark Corporation 2001 Tird Avenue Sout Birmingam, Alabama 35233 Contact: Joyce Lane 972-569-3627 NYSE Symbol: TMK TORCHMARK CORPORATION REPORTS FOURTH QUARTER AND YEAR-END 2004 RESULTS
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Tis article as been accepted for publication in a future issue of tis journal, but as not been fully edited Content may cange prior to final publication Citation information: DOI 101109/TCC20152389842,
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
