Mean Field Theory for Sigmoid Belief Networks. Abstract

Size: px
Start display at page:

Download "Mean Field Theory for Sigmoid Belief Networks. Abstract"

Transcription

1 Journal of Artæcal Intellgence Research 4 è1996è Submtted 11è95; publshed 3è96 Mean Feld Theory for Sgmod Belef Networks Lawrence K. Saul Tomm Jaakkola Mchael I. Jordan Center for Bologcal and Computatonal Learnng Massachusetts Insttute of Technology 79 Amherst Street, E Cambrdge, MA Abstract We develop a mean æeld theory for sgmod belef networks based on deas from statstcal mechancs. Our mean æeld theory provdes a tractable approxmaton to the true probablty dstrbuton n these networks; t also yelds a lower bound on the lkelhood of evdence. We demonstrate the utlty of ths framework on a benchmark problem n statstcal pattern recognton the classæcaton of handwrtten dgts. 1. Introducton Bayesan belef networks èpearl, 1988; Laurtzen & Spegelhalter, 1988è provde a rch graphcal representaton of probablstc models. The nodes n these networks represent random varables, whle the lnks represent causal næuences. These assocatons endow drected acyclc graphs èdagsè wth a precse probablstc semantcs. The ease of nterpretaton aæorded by ths semantcs explans the growng appeal of belef networks, now wdely used as models of plannng, reasonng, and uncertanty. Inference and learnng n belef networks are possble nsofar as one can eæcently compute èor approxmateè the lkelhood of observed patterns of evdence èbuntne, 1994; Russell, Bnder, Koller, & Kanazawa, 1995è. There exst provably eæcent algorthms for computng lkelhoods n belef networks wth tree or chan-lke archtectures. In practce, these algorthms also tend to perform well on more general sparse networks. However, for networks n whch nodes have many parents, the exact algorthms are too slow èjensen, Kong, & Kjaefulæ, 1995è. Indeed, n large networks wth dense or layered connectvty, exact methods are ntractable as they requre summng over an exponentally large number of hdden states. One approach to dealng wth such networks has been to use Gbbs samplng èpearl, 1988è, a stochastc smulaton methodology wth roots n statstcal mechancs ègeman & Geman, 1984è. Our approach n ths paper reles on a dæerent tool from statstcal mechancs namely, mean æeld theory èpars, 1988è. The mean æeld approxmaton s well known for probablstc models that can be represented as undrected graphs so-called Markov networks. For example, n Boltzmann machnes èackley, Hnton, & Sejnowsk, 1985è, mean æeld learnng rules have been shown to yeld tremendous savngs n tme and computaton over samplng-based methods èpeterson & Anderson, 1987è. The man motvaton for ths work was to extend the mean æeld approxmaton for undrected graphcal models to ther drected counterparts. Snce belef networks can be transformed to Markov networks, and mean æeld theores for Markov networks are well known, t s natural to ask why a new framework s requred at all. The reason s that probablstc models whch have compact representatons as DAGs mayhave unweldy representatons as undrected graphs. As we shall see, avodng ths complexty and workng drectly on DAGs requres an extenson of exstng methods. In ths paper we focus on sgmod belef networks èneal, 1992è, for whch the resultng mean æeld theory s most straghtforward. These are networks of bnary random varables whose local cæ1996 AI Access Foundaton and Morgan Kaufmann Publshers. All rghts reserved.

2 Saul, Jaakkola, & Jordan condtonal dstrbutons are based on log-lnear models. We develop a mean æeld approxmaton for these networks and use t to compute a lower bound on the lkelhood of evdence. Our method apples to arbtrary partal nstantatons of the varables n these networks and makes no restrctons on the network topology. Note that once a lower bound s avalable, a learnng procedure can maxmze the lower bound; ths s useful when the true lkelhood tself cannot be computed eæcently. A smlar approxmaton for models of contnous random varables s dscussed by Jaakkola et al è1995è. The dea of boundng the lkelhood n sgmod belef networks was ntroduced n a related archtecture known as the Helmholtz machne èhnton, Dayan, Frey, & Neal 1995è. A fundamental advance of ths work was to establsh a framework for approxmaton that s especally conducve to learnng the parameters of layered belef networks. The close connecton between ths dea and the mean æeld approxmaton from statstcal mechancs, however, was not developed. In ths paper we hope not only to elucdate ths connecton, but also to convey a sense of whch approxmatons are lkely to generate useful lower bounds whle, at the same tme, remanng analytcally tractable. We develop here what s perhaps the smplest such approxmaton for belef networks, notng that more sophstcated methods èjaakkola & Jordan, 1996a; Saul & Jordan, 1995è are also avalable. It should be emphaszed that approxmatons of some form are requred to handle the multlayer neural networks used n statstcal pattern recognton. For these networks, exact algorthms are hopelessly ntractable; moreover, Gbbs samplng methods are mpractcally slow. The organzaton of ths paper s as follows. Secton 2 ntroduces the problems of nference and learnng n sgmod belef networks. Secton 3 contans the man contrbuton of the paper: a tractable mean æeld theory. Here we present the mean æeld approxmaton for sgmod belef networks and derve alower bound on the lkelhood of nstantated patterns of evdence. Secton 4 looks at a mean æeld algorthm for learnng the parameters of sgmod belef networks. For ths algorthm, we gve results on a benchmark problem n pattern recognton the classæcaton of handwrtten dgts. Fnally, secton 5 presents our conclusons, as well as future ssues for research. 2. Sgmod Belef Networks The great vrtue of belef networks s that they clearly exhbt the condtonal dependences of the underlyng probablty model. Consder a belef network deæned over bnary random varables S =ès 1 ;S 2 ;:::;S N è. We denote the parents of S by paès è çfs 1 ; S 2 ;:::S,1 g; ths s the smallest set of nodes for whch P ès js 1 ;S 2 ;:::;S,1 è=p ès jpaès èè: è1è In sgmod belef networks èneal, 1992è, the condtonal dstrbutons attached to each node are based on log-lnear models. In partcular, the probablty that the th node s actvated s gven by P ès =1jpaèS èè = ç X j J j S j + h 1 A ; è2è where J j and h are the weghts and bases n the network, and çèzè = 1 1+e,z è3è s the sgmod functon shown n Fgure 1. In sgmod belef networks, wehave J j = 0 for S j 62 paès è; moreover, J j = 0 for j ç snce the network's structure s that of a drected acyclc graph. The sgmod functon n eq. è2è provdes a compact parametrzaton of the condtonal probablty dstrbutons 1 n eq. è2è used to propagate belefs. In partcular, P ès jpaès èè depends on paès è only through a sum of weghted nputs, where the weghts may be vewed as the parameters n a 1. The relaton to nosy-or models s dscussed n appendx A. 62

3 Mean Feld Theory for Sgmod Belef Networks σ(z) z Fgure 1: Sgmod functon çèzè = ë1 + e,z ë,1.ifz s the sum of weghted nputs to node S, then P ès = 1jzè = çèzè s the condtonal probablty that node S s actvated. logstc regresson èmccullagh & Nelder, 1983è. The condtonal probablty dstrbuton for S may be summarzed as: hç P ç exp J j js j + h S P ès jpaès èè = h P : è4è 1 + exp J j js j + h Note that substtutng S = 1 n eq. è4è recovers the result n eq. è2è. Combnng eqs. è1è and è4è, we may wrte the jont probablty dstrbuton over the varables n the network as: P èsè = Y = Y P ès jpaès èè 8 é : exp è5è hç P ç 9 J j js j + h S = h P J j js j + h ; : è6è 1 + exp The denomnator n eq. è6è ensures that the probablty dstrbuton s normalzed to unty. We now turn to the problem of nference n sgmod belef networks. Absorbng evdence dvdes the unts n the belef network nto two types, vsble and hdden. The vsble unts èor ëevdence nodes"è are those for whch we have nstantated values; the hdden unts are those for whch we do not. When there s no possble ambguty,we wll use H and V to denote the subsets of hdden and vsble unts. Usng Bayes' rule, nference s done under the condtonal dstrbuton P èhjv è= P èh; V è P èv è ; è7è where P èv è= X H P èh; V è è8è s the lkelhood of the evdence V. In prncple, the lkelhood may be computed by summng over all 2 jhj conæguratons of the hdden unts. Unfortunately, ths calculaton s ntractable n large, densely connected networks. Ths ntractablty presents a major obstacle to learnng parameters for these networks, as nearly all procedures for statstcal estmaton requre frequent estmates of the lkelhood. The calculatons for exact probablstc nference are beset by the same dæcultes. 63

4 Saul, Jaakkola, & Jordan Unable to compute P èv èorwork drectly wth P èhjv è, we wll resort to an approxmaton from statstcal physcs known as mean æeld theory. 3. Mean Feld Theory The mean æeld approxmaton appears under a multtude of guses n the physcs lterature; ndeed, t s ëalmost as old as statstcal mechancs" èitzykson & Drouæe, 1991è. Let us breæy explan howt acqured ts name and why t s so ubqutous. In the physcal models descrbed by Markov networks, the varables S represent localzed magnetc moments èe.g., at the stes of a crystal lattceè, and the sums P j J js j + h represent local magnetc æelds. Roughly speakng, n certan cases a central lmt theorem may be appled to these sums, and a useful approxmaton s to gnore the æuctuatons n these æelds and replace them by ther mean value hence the name, ëmean æeld" theory. In some models, ths s an excellent approxmaton; n others, a poor one. Because of ts smplcty, however, t s wdely used as a ærst step n understandng many types of physcal phenomena. Though ths explans the phlologcal orgns of mean æeld theory, there are n fact many ways to derve what amounts to the same approxmaton èpars, 1988è. In ths paper we present the formulaton most approprate for nference and learnng n graphcal models. In partcular, we vew mean æeld theory as a prncpled method for approxmatng an ntractable graphcal model by a tractable one. Ths s done va a varatonal prncple that chooses the parameters of the tractable model to mnmze an entropc measure of error. The basc framework of mean æeld theory remans the same for drected graphs, though we have found t necessary to ntroduce extra mean æeld parameters n addton to the usual ones. As n Markov networks, one ænds a set of nonlnear equatons for the mean æeld parameters that can be solved by teraton. In practce, we have found ths teraton to converge farly quckly and to scale well to large networks. Let us now return to the problem posed at the end of the last secton. There we found that for many belef networks, t was ntractable to decompose the jont dstrbuton as P èsè = P èhjv èp èv è, where P èv è was the lkelhood of the evdence V. For the purposes of probablstc modelng, mean æeld theory has two man vrtues. Frst, t provdes a tractable approxmaton, QèHjV è ç P èhjv è, to the condtonal dstrbutons requred for nference. Second, t provdes a lower bound on the lkelhoods requred for learnng. Let us ærst consder the orgn of the lower bound. Clearly, for any approxmatng dstrbuton QèHjV è, we have the equalty: ln P èv è = ln X H = ln X H P èh; V è QèHjV è æ è9è ç ç P èh; V è : è10è QèHjV è To obtan a lower bound, we now apply Jensen's nequalty ècover & Thomas, 1991è, pushng the logarthm through the sum over hdden states and nto the expectaton: X ç P èh; V è ç ln P èv è ç QèHjV èln : è11è QèHjV è H It s straghtforward to verfy that the dæerence between the left and rght hand sde of eq. è11è s the Kullback-Lebler dvergence ècover & Thomas, 1991è: X ç ç QèHjV è KLèQjjP è= QèHjV èln : è12è P èhjv è H Thus, the better the approxmaton to P èhjv è, the tghter the bound on ln P èv è. 64

5 Mean Feld Theory for Sgmod Belef Networks Antcpatng the connecton to statstcal mechancs, we wll refer to QèHjV è as the mean æeld dstrbuton. It s natural to dvde the calculaton of the bound nto two components, both of whch are partcular averages over ths approxmatng dstrbuton. These components are the mean æeld entropy and energy; the overall bound s gven by ther dæerence: ln P èv è ç è, X H QèHjV èlnqèhjv è!, è, X H QèHjV èlnp èh; V è! : è13è Both terms havephyscal nterpretatons. The ærst measures the amount of uncertanty n the meanæeld dstrbuton and follows the standard deænton of entropy. The second measures the average value 2 of, ln P èh; V è; the name ëenergy" arses from nterpretng the probablty dstrbutons n belef networks as Boltzmann dstrbutons 3 at unt temperature. In ths case, the energy of each network conæguraton s gven èup to a constantè by mnus the logarthm of ts probablty under the Boltzmann dstrbuton. In sgmod belef networks, the energy has the form X X X , ln P èh; V è=, J j S S j, h S + ln4 1 + exp J j S j + h A5 ; è14è j as follows from eq. è6è. The ærst two terms n ths equaton are famlar from Markov networks wth parwse nteractons èhertz, Krogh, & Palmer, 1991è; the last term s pecular to sgmod belef networks. Note that the overall energy s nether a lnear functon of the weghts nor a polynomal functon of the unts. Ths s the prce we pay n sgmod belef networks for dentfyng P èhjv è as a Boltzmann dstrbuton and the log-lkelhood P èv è as ts partton functon. Note that ths dentæcaton was made mplctly n the form of eqs. è7è and è8è. The bound n eq. è11è s vald for any probablty dstrbuton QèHjV è. To make use of t, however, we must choose a dstrbuton that enables us to evaluate the rght hand sde of eq. è11è. Consder the factorzed X j QèHjV è= Y 2H ç S è1, ç è 1,S ; è15è n whch the bnary hdden unts fs g 2H appear as ndependent Bernoull varables wth adjustable means ç. A mean æeld approxmaton s obtaned by substtutng the factorzed dstrbuton, eq. è15è, for the true Boltzmann dstrbuton, eq. è7è. It may seem that ths approxmaton replaces the rch probablstc dependences n P èhjv èby an mpovershed assumpton of complete factorzablty. Though ths s true to some degree, the reader should keep n mnd that the values we choose for fç g 2H èand hence the statstcs of the hdden untsè wll depend on the evdence V. The best approxmaton of the form, eq. è15è, s found by choosng the mean values, fç g 2H, that mnmze the Kullback-Lebler dvergence, KLèQjjP è. Ths s equvalent to mnmzng the gap between the true log-lkelhood, ln P èv è, and the lower bound obtaned from mean æeld theory. The 2. A smlar average s performed n the E-step of an EM algorthm èdempster, Lard, & Rubn, 1977è; the dæerence here s that the average s performed over the mean æeld dstrbuton, QèHjV è, rather than the true posteror, P èh jv è. For a related dscusson, see Neal & Hnton è1993è. 3. Our termnology s as follows. Let S denote the degrees of freedom n a statstcal mechancal system. The energy of the system, EèSè, s a real-valued functon of these degrees of freedom, and the Boltzmann dstrbuton P èsè = e,æeèsè PS e,æeèsè deænes a probablty dstrbuton over the possble conæguratons of S. The parameter æ s the nverse temperature; t serves to calbrate the energy scale and wll be æxed to unty n our dscusson of belef networks. Fnally, the sum n the denomnator known as the partton functon ensures that the Boltzmann dstrbuton s normalzed to unty. 65

6 Saul, Jaakkola, & Jordan mean æeld bound on the log-lkelhood may be calculated by substtutng eq. è15è nto the rght hand sde of eq. è11è. The result of ths calculaton s ln P èv è ç X j, J j ç ç j + X X h ç, X ëç ln ç +è1, ç è lnè1, ç èë ; ç ç çç ln 1+e Pj Jj Sj+h where hæ ndcates an expectaton value over the mean æeld dstrbuton, eq. è15è. The terms n the ærst lne of eq. è16è represent the mean æeld energy, derved from eq. è14è; those n the second represent the mean æeld entropy. In a slght abuse of notaton, we have deæned mean values ç for the vsble unts; these of course are set to the nstantated values ç 2f0; 1g. Note that to compute the average energy n the mean æeld approxmaton, we must ænd the expected value of hln ë1+e z ë, where z = P j J js j + h s the sum of weghted nputs to the th unt n the belef network. Unfortunately, even under the mean æeld assumpton that the hdden unts are uncorrelated, ths average does not have a smple closed form. Ths term does not arse n the mean æeld theory for Markov networks wth parwse nteractons; agan, t s pecular to sgmod belef networks. In prncpal, the average may be performed by enumeratng the possble states of paès è. The result of ths calculaton, however, would be an extremely unweldy functon of the parameters n the belef network. Ths reæects the fact that n general, the sgmod belef network deæned by the weghts J j has an equvalent Markov network wth Nth order nteractons and not parwse ones. To avod ths complexty, we must develop a mean æeld theory that works drectly on DAGs. How we handle the expected value of hln ë1+e z ë s what dstngushes our mean æeld theory from prevous ones. Unable to compute ths term exactly, we resort to another bound. Note that for any random varable z and any real number ç, wehave the equalty: è16è æ æ ææ hlnë1 + e z ë = ln e çz e,çz è1 + e z è è17è E = çhz + D lnëe,çz + e è1,çèz ë : è18è We can upper bound the rght hand sde by applyng Jensen's nequalty n the opposte drecton as before, pullng the logarthm outsde the expectaton: E hlnë1 + e z ëççhz +ln De,çz + e è1,çèz : è19è Settng ç = 0 n eq. è19è gves the standard bound: hlnè1 + e z èçlnh1+e z. A tghter bound èseung, 1995è can be obtaned, however, by allowng non-zero values of ç. Ths s llustrated n Fgure 2 for the specal case where z s a Gaussan dstrbuted random varable wth zero mean and unt varance. The bound n eq. è19è has two useful propertes whch we state here wthout proof: èè the rght hand sde s a convex functon of ç; èè the value of ç whch mnmzes ths functon occurs n the nterval ç 2 ë0; 1ë. Thus, provded t s possble to evaluate eq. è19è for dæerent values of ç, the tghtest bound of ths form can be found by a smple one-dmensonal mnmzaton. The above bound can be put to mmedate use by attachng an extra mean æeld parameter ç to each unt n the belef network. We can then upper bound the ntractable terms n the mean æeld energy by ç ç çç 0 ln 1+e Pj Jj Sj+h ç X j J j ç j + h 1 A +ln D e,çz + e è1,çèz E ; è20è 66

7 Mean Feld Theory for Sgmod Belef Networks bound 0.8 exact ξ Fgure 2: Bound n eq. è19è for the case where z s normally dstrbuted wth zero mean and unt varance. In ths case, the exact result s hlnè1 + e z è =0:806; the bound gves mn ç nlnëe 2 1 ç2 + e 1 2 è1,çè2 ë at ç = 0 and gves 0:974. o = 0:818. The standard bound from Jensen's nequalty occurs P where z = J j js j + h. The expectatons nsde the logarthm can be evaluated exactly for the factoral dstrbuton, eq. è15è; for example, Y he,çz = e,çh j, 1, çj + ç j e,çjj æ : è21è A smlar result holds for he è1,çèz. Though these averages are tractable, we wll tend not to wrte them out n what follows. The reader, however, should keep n mnd that these averages do not present any dæculty; they are smply averages over products of ndependent random varables, as opposed to sums. Assemblng the terms n eqs. è16è and è20è gvesalower bound ln P èv è çl V, L V = X j, X ç X j X J j ç ç j + h ç, E ln De X,çz + e è1,çèz + X J j ç j + h 1 A ëç ln ç +è1, ç èlnè1, ç èë ; on the log-lkelhood of the evdence V. So far we have not specæed the parameters fç g 2H and fç g; n partcular, the bound n eq. è22è s vald for any choce of parameters. We naturally seek the values that maxmze the rght hand sde of eq. è22è. Suppose we æx the mean values fç g 2H and ask for the parameters fç g that yeld the tghtest possble bound. Note that the rght hand sde of eq. è22è does not couple terms wth ç that belong to dæerent unts n the network. The mnmzaton over fç g therefore reduces to N ndependent mnmzatons over the nterval ë0; 1ë. These can be done by anynumber of standard methods èpress, Flannery, Teukolsky, & Vetterlng, 1986è. To choose the means, we set the gradents of the bound wth respect to fç g 2H equal to zero. To ths end, let us deæne the ntermedate matrx: K j E, ln De,çz + e è1,çèz ; j è22è 67

8 Saul, Jaakkola, & Jordan S Fgure 3: The Markov blanket of unt S parents of ts chldren. ncludes ts parents and chldren, as well as the other where z s the weghted sum of nputs to th unt. Note that K j s zero unless S j s a parent of S ; n other words, t has the same connectvty as the weght matrx J j. Wthn the mean æeld approxmaton, K j measures the parental næuence of S j on S gven the nstantated evdence V. The degree of correlaton èpostve or negatveè s measured relatve to the other parents of S. The matrx elements of K may beevaluated by expandng the expectatons as n eq. è21è; a full dervaton s gven n appendx B. Settng the V equal to zero gves the ænal mean æeld equaton: ç = ç X h + j 1 ëj j ç j + J j èç j, ç j è+k j ëa ; è24è where çèæè s the sgmod functon. The argument of the sgmod functon may be vewed as an eæectve nput to the th unt n the belef network. Ths eæectve nput s composed of terms from the unt's Markov blanket èpearl, 1988è, shown n Fgure 3; n partcular, these terms take nto account the unt's nternal bas, the values of ts parents and chldren, and, through the matrx K j, the values of ts chldren's other parents. In solvng these equatons by teraton, the values of the nstantated unts are propagated throughout the entre network. An analogous propagaton of nformaton occurs n exact algorthms èlaurtzen & Spegelhalter, 1988è to compute lkelhoods n belef networks. Whle the factorzed approxmaton to the true posteror s not exact, the mean æeld equatons set the parameters fç g 2H to values whch make the approxmaton as accurate as possble. Ths n turn translates nto the tghtest mean æeld bound on the log-lkelhood. The overall procedure for boundng the log-lkelhood thus conssts of two alternatng steps: èè update fç g for æxed fç g; èè update fç g 2H for æxed fç g. The ærst step nvolves N ndependent mnmzatons over the nterval ë0; 1ë; the second s done by teratng the mean æeld equatons. In practce, the steps are repeated untl the mean æeld bound on the log-lkelhood converges 4 to a desred degree of accuracy. The qualty of the bound depends on two approxmatons: the complete factorzablty of the mean æeld dstrbuton, eq. è15è, and the logarthm bound, eq. è19è. How relable are these approxmatons n belef networks? To study ths queston, we performed numercal experments on the three layer belef network shown n Fgure 4. The advantage of workng wth such a small network è2x4x6è s that true lkelhoods can be computed by exact enumeraton. We consdered the partcular event that all the unts n the bottom layer were nstantated to zero. For ths event, we compared the mean æeld bound on the lkelhood to ts true value, obtaned by enumeratng the 4. It can be shown that asychronous updates of the mean æeld parameters lead to monotonc ncreases n the lower bound èjust as n the case of Markov networksè. 68

9 Mean Feld Theory for Sgmod Belef Networks Fgure 4: Three layer belef network è2x4x6è wth top-down propagaton of belefs. To model the mages of handwrtten dgts n secton 4, we used 8x24x64 networks where unts n the bottom layer encoded pxel values n 8x8 btmaps mean feld approxmaton unform approxmaton relatve error n log lkelhood relatve error n log lkelhood Fgure 5: Hstograms of relatve error n log-lkelhood over randomly generated three layer networks. At left: the relatve error from the mean æeld approxmaton; at rght: the relatve error f all states n the bottom layer are assumed to occur wth equal probablty. The log-lkelhood was computed for the event that the all the nodes n the bottom layer were nstantated to zero. states n the top two layers. Ths was done for random networks whose weghts and bases were unformly dstrbuted between -1 and 1. Fgure 5 èleftè shows the hstogram of the relatve error n log lkelhood, computed as L V = ln P èv è, 1; for these networks, the mean relatve error s 1.6è. Fgure 5 èrghtè shows the hstogram that results from assumng that all states n the bottom layer occur wth equal probablty; n ths case the relatve error was computed as èln 2,6 è= ln P èv è, 1. For ths ëunform" approxmaton, the root mean square relatve error s 22.6è. The large dscrepancy between these results suggests that mean æeld theory can provde a useful lower bound on the lkelhood n certan belef networks. Of course, what ultmately matters s the behavor of mean æeld theory n networks that solve meanngful problems. Ths s the subject of the next secton. 4. Learnng One attractve use of sgmod belef networks s to perform densty estmaton n hgh dmensonal nput spaces. Ths s a problem n parameter estmaton: gven a set of patterns over partcular unts n the belef network, ænd the set of weghts J j and bases h that assgn hgh probablty to these patterns. Clearly, the ablty to compute lkelhoods les at the crux of any algorthm for learnng the parameters n belef networks. 69

10 Saul, Jaakkola, & Jordan true log lkelhood lower bound true log lkelhood lower bound tranng tme tranng tme Fgure 6: Relatonshp between the true log-lkelhood and ts lower bound durng learnng. One possblty èat leftè s that both ncrease together. The other s that the true log-lkelhood decreases, closng the gap between tself and the bound. The latter can be vewed as a form of regularzaton. Mean æeld algorthms provde a strategy for dscoverng approprate values of J j and h wthout resort to Gbbs samplng. Consder, for nstance, the followng procedure. For each pattern n the tranng set, solve the mean æeld equatons for fç ;ç g and compute the assocated bound on the log-lkelhood, L V. Next, adapt the weghts n the belef network by gradent ascent 5 n the mean æeld bound, æj j = j æh = ; è25è è26è where ç s a sutably chosen learnng rate. Fnally, cycle through the patterns n the tranng set, maxmzng ther lkelhoods 6 for a æxed number of teratons or untl one detects the onset of overættng èe.g., by cross-valdatonè. The above procedure uses a lower bound on the log-lkelhood as a cost functon for tranng belef networks èhnton, Dayan, Frey, & Neal, 1995è. The fact that we have alower bound on the loglkelhood, rather than an upper bound, s of course crucal to the success of ths learnng algorthm. Adjustng the weghts to maxmze ths lower bound can aæect the true log-lkelhood n two ways èsee Fgure 6è. Ether the true log-lkelhood ncreases, movng n the same drecton as the bound, or the true log-lkelhood decreases, closng the gap between these two quanttes. For the purposes of maxmum lkelhood estmaton, the ærst outcome s clearly desrable; the second, though less desrable, can also be vewed n a postve lght. In ths case, the mean æeld approxmaton s actng as a regularzer, steerng the network toward smple, factoral solutons even at the expense of lower lkelhood estmates. We tested ths algorthm by buldng a maxmum-lkelhood classæer for mages of handwrtten dgts. The data conssted of examples of handwrtten dgts ë0-9ë compled by the U.S. Postal Servce Oæce of Advanced Technology. The examples were preprocessed to produce 8x8 bnary mages, as shown n Fgure 7. For each dgt, we dvded the avalable data nto a tranng set wth 700 examples and a test set wth 400 examples. We then traned a three layer network 7 èsee 5. Expressons for the gradents of L V are gven n the appendx B. 6. Of course, one can also ncorporate pror dstrbutons over the weghts and bases and maxmze an approxmaton to the log posteror probablty of the tranng set. 7. There are many possble archtectures that could be chosen for the purpose of densty estmaton; we used layered networks to permt a comparson wth prevous benchmarks on ths data set. 70

11 Mean Feld Theory for Sgmod Belef Networks Fgure 7: Bnary mages of handwrtten dgts: two and æve Table 1: Confuson matrx for dgt classæcaton. The entry n the th row and jth column counts the number of tmes that dgt was classæed as dgt j. Fgure 4è on each dgt, sweepng through each tranng set æve tmes wth learnng rate ç =0:05. The networks had 8 unts n the top layers, 24 unts n the mddle layer, and 64 unts n the bottom layer, makng them far too large to be treated wth exact methods. After tranng, we classæed the dgts n each test set by the network that assgned them the hghest lkelhood. Table 1 shows the confuson matrx n whch the jth entry counts the number of tmes dgt was classæed as dgt j. There were 184 errors n classæcaton èout of a possble 4000è, yeldng an overall error rate of 4.6è. Table 2 gves the performance of varous other algorthms on the same partton of ths data set. Table 3 shows the average log-lkelhood score of each network on the dgts n ts test set. ènote that these scores are actually lower bounds.è These scores are normalzed so that a network wth zero weghts and bases è.e., one n whch all 8x8 patterns are equally lkelyè would receve a score of -1. As expected, dgts wth relatvely smple constructons èe.g., zeros, ones, and sevensè are more easly modeled than the rest. Both measures of performance error rate and log-lkelhood score are compettve wth prevously publshed results èhnton, Dayan, Frey, & Neal, 1995è on ths data set. The success of the algorthm aærms both the strategy of maxmzng a lower bound and the utlty of the mean æeld approxmaton. Though smlar results can be obtaned va Gbbs samplng, ths seems to requre consderably more computaton than methods based on maxmzng a lower bound èfrey, Dayan, & Hnton, 1995è. 71

12 Saul, Jaakkola, & Jordan algorthm classæcaton error nearest neghbor 6.7è back-propagaton 5.6è wake-sleep 4.8è mean æeld 4.6è Table 2: Classæcaton error rates for the data set of handwrtten dgts. The ærst three were reported by Hnton et al è1995è. dgt log-lkelhood score all Table 3: Normalzed log-lkelhood score for each network on the dgts n ts test set. To obtan the raw score, multply by 400 æ 64 æ ln 2. The last row shows the score averaged across all dgts. 5. Dscusson Endowng networks wth probablstc semantcs provdes a unæed framework for ncorporatng pror knowledge, handlng mssng data, and performng nference under uncertanty. Probablstc calculatons, however, can quckly become ntractable, so t s mportant to develop technques that approxmate probablty dstrbutons n a æexble manner. Ths s especally true for networks wth multlayer archtectures and large numbers of hdden unts. Exact algorthms and Gbbs samplng methods are not generally practcal for such networks; approxmatons are requred. In ths paper we have developed a mean æeld approxmaton for sgmod belef networks. As a computatonal tool, our mean æeld theory has two man vrtues: ærst, t provdes a tractable approxmaton to the condtonal dstrbutons requred for nference; second, t provdes a lower bound on the lkelhoods requred for learnng. The problem of computng exact lkelhoods n belef networks s NP-hard ècooper, 1990è; the same s true for approxmatng lkelhoods to wthn a guaranteed degree of accuracy èdagum & Luby, 1993è. It follows that one cannot establsh unversal guarantees for the accuracy of the mean æeld approxmaton. For certan networks, clearly, the mean æeld approxmaton s bound to fal t cannot capture logcal constrants or strong correlatons between æuctuatng unts. Our prelmnary results, however, suggest that these worst-case results do not apply to all belef networks. It s worth notng, moreover, that all the above qualæcatons apply to Markov networks, and that n ths doman, mean æeld methods are already well-establshed. 72

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract

Georey E. Hinton. University oftoronto. Email: zoubin@cs.toronto.edu. Technical Report CRG-TR-96-1. May 21, 1996 (revised Feb 27, 1997) Abstract The EM Algorthm for Mxtures of Factor Analyzers Zoubn Ghahraman Georey E. Hnton Department of Computer Scence Unversty oftoronto 6 Kng's College Road Toronto, Canada M5S A4 Emal: zoubn@cs.toronto.edu Techncal

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Today s class. Chapter 13. Sources of uncertainty. Decision making with uncertainty

Today s class. Chapter 13. Sources of uncertainty. Decision making with uncertainty Today s class Probablty theory Bayesan nference From the ont dstrbuton Usng ndependence/factorng From sources of evdence Chapter 13 1 2 Sources of uncertanty Uncertan nputs Mssng data Nosy data Uncertan

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence 1 st Internatonal Symposum on Imprecse Probabltes and Ther Applcatons, Ghent, Belgum, 29 June 2 July 1999 How Sets of Coherent Probabltes May Serve as Models for Degrees of Incoherence Mar J. Schervsh

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001. Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Joe Pimbley, unpublished, 2005. Yield Curve Calculations

Joe Pimbley, unpublished, 2005. Yield Curve Calculations Joe Pmbley, unpublshed, 005. Yeld Curve Calculatons Background: Everythng s dscount factors Yeld curve calculatons nclude valuaton of forward rate agreements (FRAs), swaps, nterest rate optons, and forward

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

ErrorPropagation.nb 1. Error Propagation

ErrorPropagation.nb 1. Error Propagation ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems

An Analysis of Central Processor Scheduling in Multiprogrammed Computer Systems STAN-CS-73-355 I SU-SE-73-013 An Analyss of Central Processor Schedulng n Multprogrammed Computer Systems (Dgest Edton) by Thomas G. Prce October 1972 Techncal Report No. 57 Reproducton n whole or n part

More information

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts Power-of-wo Polces for Sngle- Warehouse Mult-Retaler Inventory Systems wth Order Frequency Dscounts José A. Ventura Pennsylvana State Unversty (USA) Yale. Herer echnon Israel Insttute of echnology (Israel)

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

Bayesian Cluster Ensembles

Bayesian Cluster Ensembles Bayesan Cluster Ensembles Hongjun Wang 1, Hanhua Shan 2 and Arndam Banerjee 2 1 Informaton Research Insttute, Southwest Jaotong Unversty, Chengdu, Schuan, 610031, Chna 2 Department of Computer Scence &

More information

greatest common divisor

greatest common divisor 4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. Sequential Optimizing Investing Strategy with Neural Networks MATHEMATICAL ENGINEERING TECHNICAL REPORTS Sequental Optmzng Investng Strategy wth Neural Networks Ryo ADACHI and Akmch TAKEMURA METR 2010 03 February 2010 DEPARTMENT OF MATHEMATICAL INFORMATICS GRADUATE

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

Support vector domain description

Support vector domain description Pattern Recognton Letters 20 (1999) 1191±1199 www.elsever.nl/locate/patrec Support vector doman descrpton Davd M.J. Tax *,1, Robert P.W. Dun Pattern Recognton Group, Faculty of Appled Scence, Delft Unversty

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

On Mean Squared Error of Hierarchical Estimator

On Mean Squared Error of Hierarchical Estimator S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

The Application of Fractional Brownian Motion in Option Pricing

The Application of Fractional Brownian Motion in Option Pricing Vol. 0, No. (05), pp. 73-8 http://dx.do.org/0.457/jmue.05.0..6 The Applcaton of Fractonal Brownan Moton n Opton Prcng Qng-xn Zhou School of Basc Scence,arbn Unversty of Commerce,arbn zhouqngxn98@6.com

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University Tme Seres Analyss n Studes of AGN Varablty Bradley M. Peterson The Oho State Unversty 1 Lnear Correlaton Degree to whch two parameters are lnearly correlated can be expressed n terms of the lnear correlaton

More information

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60 BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Multi-Conditional Learning for Joint Probability Models with Latent Variables

Multi-Conditional Learning for Joint Probability Models with Latent Variables Mult-Condtonal Learnng for Jont Probablty Models wth Latent Varables Chrs Pal, Xueru Wang, Mchael Kelm and Andrew McCallum Department of Computer Scence Unversty of Massachusetts Amherst Amherst, MA 01003

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Section 2 Introduction to Statistical Mechanics

Section 2 Introduction to Statistical Mechanics Secton 2 Introducton to Statstcal Mechancs 2.1 Introducng entropy 2.1.1 Boltzmann s formula A very mportant thermodynamc concept s that of entropy S. Entropy s a functon of state, lke the nternal energy.

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Dropout: A Simple Way to Prevent Neural Networks from Overfitting Journal of Machne Learnng Research 15 (2014) 1929-1958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever

More information

+ + + - - This circuit than can be reduced to a planar circuit

+ + + - - This circuit than can be reduced to a planar circuit MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date

MAPP. MERIS level 3 cloud and water vapour products. Issue: 1. Revision: 0. Date: 9.12.1998. Function Name Organisation Signature Date Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

Data Visualization by Pairwise Distortion Minimization

Data Visualization by Pairwise Distortion Minimization Communcatons n Statstcs, Theory and Methods 34 (6), 005 Data Vsualzaton by Parwse Dstorton Mnmzaton By Marc Sobel, and Longn Jan Lateck* Department of Statstcs and Department of Computer and Informaton

More information

Solving Factored MDPs with Continuous and Discrete Variables

Solving Factored MDPs with Continuous and Discrete Variables Solvng Factored MPs wth Contnuous and screte Varables Carlos Guestrn Berkeley Research Center Intel Corporaton Mlos Hauskrecht epartment of Computer Scence Unversty of Pttsburgh Branslav Kveton Intellgent

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining

Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,

More information

Enabling a Powerful Marine and Offshore Decision-Support Solution Through Bayesian Network Technique

Enabling a Powerful Marine and Offshore Decision-Support Solution Through Bayesian Network Technique Rsk Analyss, Vol. 26, No. 3, 2006 DOI: 10.1111/j.1539-6924.2006.00775.x Enablng a Powerful Marne and Offshore Decson-Support Soluton Through Bayesan Network Technque A. G. Eleye-Datubo, 1 A. Wall, 1 A.

More information

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models

ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models ActveClean: Interactve Data Cleanng Whle Learnng Convex Loss Models Sanjay Krshnan, Jannan Wang, Eugene Wu, Mchael J. Frankln, Ken Goldberg UC Berkeley, Columba Unversty {sanjaykrshnan, jnwang, frankln,

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting Propertes of Indoor Receved Sgnal Strength for WLAN Locaton Fngerprntng Kamol Kaemarungs and Prashant Krshnamurthy Telecommuncatons Program, School of Informaton Scences, Unversty of Pttsburgh E-mal: kakst2,prashk@ptt.edu

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Active Learning for Interactive Visualization

Active Learning for Interactive Visualization Actve Learnng for Interactve Vsualzaton Tomoharu Iwata Nel Houlsby Zoubn Ghahraman Unversty of Cambrdge Unversty of Cambrdge Unversty of Cambrdge Abstract Many automatc vsualzaton methods have been. However,

More information

Efficient Reinforcement Learning in Factored MDPs

Efficient Reinforcement Learning in Factored MDPs Effcent Renforcement Learnng n Factored MDPs Mchael Kearns AT&T Labs mkearns@research.att.com Daphne Koller Stanford Unversty koller@cs.stanford.edu Abstract We present a provably effcent and near-optmal

More information

Availability-Based Path Selection and Network Vulnerability Assessment

Availability-Based Path Selection and Network Vulnerability Assessment Avalablty-Based Path Selecton and Network Vulnerablty Assessment Song Yang, Stojan Trajanovsk and Fernando A. Kupers Delft Unversty of Technology, The Netherlands {S.Yang, S.Trajanovsk, F.A.Kupers}@tudelft.nl

More information

Optimal resource capacity management for stochastic networks

Optimal resource capacity management for stochastic networks Submtted for publcaton. Optmal resource capacty management for stochastc networks A.B. Deker H. Mlton Stewart School of ISyE, Georga Insttute of Technology, Atlanta, GA 30332, ton.deker@sye.gatech.edu

More information

The Analysis of Outliers in Statistical Data

The Analysis of Outliers in Statistical Data THALES Project No. xxxx The Analyss of Outlers n Statstcal Data Research Team Chrysses Caron, Assocate Professor (P.I.) Vaslk Karot, Doctoral canddate Polychrons Economou, Chrstna Perrakou, Postgraduate

More information

Sketching Sampled Data Streams

Sketching Sampled Data Streams Sketchng Sampled Data Streams Florn Rusu, Aln Dobra CISE Department Unversty of Florda Ganesvlle, FL, USA frusu@cse.ufl.edu adobra@cse.ufl.edu Abstract Samplng s used as a unversal method to reduce the

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions

Using Mixture Covariance Matrices to Improve Face and Facial Expression Recognitions Usng Mxture Covarance Matrces to Improve Face and Facal Expresson Recogntons Carlos E. homaz, Duncan F. Glles and Raul Q. Fetosa 2 Imperal College of Scence echnology and Medcne, Department of Computng,

More information