Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors


 Stephen Marsh
 1 years ago
 Views:
Transcription
1 via Dirichlet Foret Prior David ndrzeewi Xiaoin Zhu Mar raven Department of omputer Science, Department of iotatitic and Medical Informatic Univerity of WiconinMadion, Madion, WI 76 US btract Uer of topic modeling method often have nowledge about the compoition of word that hould have high or low probability in variou topic. We incorporate uch domain nowledge uing a novel Dirichlet Foret prior in a Latent Dirichlet llocation framewor. The prior i a mixture of Dirichlet tree ditribution with pecial tructure. We preent it contruction, and inference via collaped Gibb ampling. Experiment on ynthetic and real dataet demontrate our model ability to follow and generalize beyond uerpecified domain nowledge.. Introduction Topic modeling, uing approache uch a Latent Dirichlet llocation (LD (lei et al.,, ha enoyed popularity a a way to model hidden topic in data. However, in many application, a uer may have additional nowledge about the compoition of word that hould have high probability in variou topic. For example, in a biological application, one may prefer that the word termination, diaembly and releae appear with high probability in the ame topic, becaue they all decribe the ame phae of biological procee. Furthermore, a biologit could automatically extract thee preference from an exiting biomedical ontology, uch a the Gene Ontology (GO (The Gene Ontology onortium,. another example, an analyt may run topic modeling on a corpu of people wihe, inpect the reulting topic, and notice that into, college and cure, cancer all ap ppearing in Proceeding of the 6 th International onference on Machine Learning, Montreal, anada, 9. opyright 9 by the author(/owner(. pear with high probability in the ame topic. The analyt may want to interactively expre the preference that the two et of word hould not appear together, rerun topic modeling, and incorporate additional preference baed on the new reult. In both cae, we would lie thee preference to guide the recovery of latent topic. Standard LD lac a mechanim for incorporating uch domain nowledge. In thi paper, we propoe a principled approach to the incorporation of uch domain nowledge into LD. We how that many type of nowledge can be expreed with two primitive on word pair. orrowing name from the contrained clutering literature (au et al., 8, we call the two primitive MutLin and annotlin, although there are important difference. We then encode the et of MutLin and annotlin aociated with the domain nowledge uing a Dirichlet Foret prior, replacing the Dirichlet prior over the topicword multinomial p(word topic. The Dirichlet Foret prior i a mixture of Dirichlet tree ditribution with very pecific tree tructure. Our approach ha everal advantage: (i Dirichlet Foret can encode MutLin and annotlin, omething impoible with Dirichlet ditribution. (ii The uer can control the trength of the domain nowledge by etting a parameter η, allowing domain nowledge to be overridden if the data trongly ugget otherwie. (iii The Dirichlet Foret lend itelf to efficient inference via collaped Gibb ampling, a property inherited from the conugacy of Dirichlet tree. We preent experiment on everal ynthetic dataet and two real domain, demontrating that the reulting topic not only uccefully incorporate the pecified domain nowledge, but alo generalize beyond it by including/excluding other related word not explicitly mentioned in the MutLin and annotlin.
2 . Related Wor We review LD uing the notation of Griffith and Steyver (. Let there be T topic. Let w = w...w n repreent a corpu of D document, with a total of n word. We ue d i to denote the document of word w i, and z i the hidden topic from which w i i generated. Let φ (w for document d. The LD generative model i then: = p(w z =, and θ (d = p(z = θ Dirichlet(α ( z i θ (di Multinomial(θ (di ( φ Dirichlet(β ( w i z i,φ Multinomial(φ zi ( where α and β are hyperparameter for the documenttopic and topicword Dirichlet ditribution, repectively. For implicity we will aume ymmetric α and β, but aymmetric hyperparameter are alo poible. Previou wor ha modeled correlation in the LD documenttopic mixture uing the logitic Normal ditribution (lei & Lafferty, 6, DG (Pachino tructure (Li & Mcallum, 6, or the Dirichlet Tree ditribution (Tam & Schultz, 7. In addition, the concepttopic model (hemudugunta et al., 8 employ domain nowledge through pecial concept topic, in which only a particular et of word can be preent. Our wor complement the previou wor by encoding complex domain nowledge on word (epecially arbitrary annotlin into a flexible and computationally efficient prior.. Topic Modeling with Dirichlet Foret Our propoed model differ from LD in the way φ i generated. Intead of (, we have q DirichletForet(β, η φ DirichletTree(q where q pecifie a Dirichlet tree ditribution, β play a role analogou to the topicword hyperparameter in tandard LD, and η i the trength parameter of the domain nowledge. efore dicuing DirichletForet(β, η and DirichletTree(q, we firt explain how nowledge can be expreed uing Mut Lin and annotlin primitive... MutLin and annotlin MutLin and annotlin were originally propoed for contrained clutering to encourage two intance to fall into the ame cluter or into eparate cluter, repectively. We borrow the notion for topic modeling. Informally, the MutLin primitive prefer that two word tend to be generated by the ame topic, while the annotlin primitive prefer that two word tend to be generated by eparate topic. However, ince any topic φ i a multinomial over word, any two word (in general alway have ome probability of being generated by the topic. We therefore propoe the following definition: MutLin (u, v: Two word u, v have imilar probability within any topic, i.e., φ (u φ (v for =...T. It i important to note that the probabilitie can be both large or both mall, a long a they are imilar. For example, for the earlier biology example we could ay MutLin (termination, diaembly. annotlin (u, v: Two word u, v hould not both have large probability within any topic. It i permiible for one to have a large probability and the other mall, or both mall. For example, one primitive for the wih example can be annotlin (college, cure. Many type of domain nowledge can be decompoed into a et of MutLin and annotlin. We demontrate three type in our experiment: we can Split two or more et of word from a ingle topic into different topic by placing MutLin within the et and annotlin between them. We can Merge two or more et of word from different topic into one topic by placing MutLin among the et. Given a common et of word which appear in multiple topic (uch a topword in Englih, which tend to appear in all LD topic, we can Iolate them by placing MutLin within the common et, and then placing annotlin between the common et and the other highprobability word from all topic. It i important to note that our MutLin and annotlin are preference intead of hard contraint... Encoding MutLin It i wellnown that the Dirichlet ditribution i limited in that all word hare a common variance parameter, and are mutually independent except the normalization contraint (Mina, 999. However, for Mut Lin (u, v it i crucial to control the two word u, v differently than other word. The Dirichlet tree ditribution (Denni III, 99 i a generalization of the Dirichlet ditribution that allow uch control. It i a tree with the word a leaf node; ee Figure (a for an example. Let γ ( be the Dirichlet tree edge weight leading into node. Let ( be the immediate children of node in the tree, L the leave of the tree, I the internal node, and L( the
3 leave in the ubtree under. To generate a ample φ DirichletTree(γ, one firt draw a multinomial at each internal node I from Dirichlet(γ (, i.e., uing the weight from to it children a the Dirichlet parameter. One can thin of it a reditributing the probability ma reaching by thi multinomial (initially, the ma i at the root. The probability φ ( of a word L i then imply the product of the multinomial parameter on the edge from to the root, a hown in Figure (b. It can be hown (Denni III, 99 that thi procedure give DirichletTree(γ p(φ γ = ( L φ (γ( I ( ( Γ γ ( ( Γ ( γ ( L( φ ( ( where Γ( i the tandard gamma function, and the notation L mean L. The function ( γ( ( γ( i the difference between the indegree and outdegree of internal node. When thi difference ( = for all internal node I, the Dirichlet tree reduce to a Dirichlet ditribution. Lie the Dirichlet, the Dirichlet tree i conugate to the multinomial. It i poible to integrate out φ to get a ditribution over word count directly, imilar to the multivariate Pólya ditribution: p(w γ = ( ( I Γ γ ( ( Γ ( γ ( + n ( ( ( ( Γ γ ( + n ( ( Γ(γ ( Here n ( i the number of word toen in w that appear in L(. We encode MutLin uing a Dirichlet tree. Note that our definition of MutLin i tranitive: Mut Lin (u, v and MutLin (v,w imply MutLin (u, w. We thu firt compute the tranitive cloure of expreed MutLin. Our Dirichlet tree for Mut Lin ha a very imple tructure: each tranitive cloure i a ubtree, with one internal node and the word in the cloure a it leave. The weight from the internal node to it leave are ηβ. The root connect to thee internal node with weight L( β, where repreent the et ize. In addition, the root directly connect to other word not in any cloure, with weight β. For example, the tranitive cloure for a MutLin (, on vocabulary {,,} i imply {,}, correponding to the Dirichlet tree in Figure (a. To undertand thi encoding of MutLin, conider firt the cae when the domain nowledge trength parameter i at it weaet η =. Then indegree equal outdegree for any internal node (both are L( β, and the tree reduce to a Dirichlet ditribution with ymmetric prior β: the MutLin are turned off in thi cae. we increae η, the reditribution of probability ma at (governed by a Dirichlet under ha increaing concentration L( ηβ but the ame uniform baemeaure. Thi tend to reditribute the ma evenly in the tranitive cloure repreented by. Therefore, the MutLin are turned on when η >. Furthermore, the ma reaching i independent of η, and can till have a large variance. Thi properly encode the fact that we want MutLined word to have imilar, but not alway large, probabilitie. Otherwie, MutLined word would be forced to appear with large probability in all topic, which i clearly undeirable. Thi i impoible to repreent with Dirichlet ditribution. For example, the blue dot in Figure (c are φ ample from the Dirichlet tree in Figure (a, plotted on the probability implex of dimenion three. While it i alway true that p( p(, their total probability ma can be anywhere from to. The mot imilar Dirichlet ditribution i perhap the one with parameter (,,, which generate ample cloe to (.,., (Figure (d... Encoding annotlin annotlin are coniderably harder to handle. We firt tranform them into an alternative form that i amenable to Dirichlet tree. Note that annotlin are not tranitive: annotlin (, and annot Lin (, doe not entail annotlin (,. We define a annotlingraph where the node are word, and the edge correpond to the annotlin. Then the connected component of thi graph are independent of each other when encoding annotlin. We will ue thi property to factor a Dirichlettree election probability later. For example, the two annot Lin (, and (, form the graph in Figure (e with a ingle connected component {,,}. onider the ubgraph on connected component r. We define it complement graph by flipping the edge (on to off, off to on, a hown in Figure (f. Let there be Q (r maximal clique M r...m rq (r in thi complement graph. In the following, we imply call them clique, but it i important to remember that they are maximal clique of the complement graph, not the original annotlingraph. In our example, Q (r = and M r = {,}, M r = {}. Thee clique have the following interpretation: each clique (e.g., M r = {,} i the maximal ubet of word in the connected component that can occur together. When there are MutLin, all word in a MutLin tranitive cloure form a ingle node in thi graph.
4 β β.9.9 ηβ ηβ.8. φ=(..8.9 (a (b (c (d ηβ β ηβ β β β β β (e (f (g (h (i Figure. Encoding MutLin and annotlin with a Dirichlet Foret. (a Dirichlet tree encoding MutLin (, with β =, η = on vocabulary {,,}. (b ample φ from thi Dirichlet tree. (c large et of ample from the Dirichlet tree, plotted on the implex. Note p( p(, yet they remain flexible in actual value, which i deirable for a MutLin. (d In contrat, ample from a tandard Dirichlet with comparable parameter (,, force p( p(., and cannot encode a MutLin. (e The annotlingraph for annotlin (, and annotlin (,. (f The complementary graph, with two maximal clique {,} and {}. (g The Dirichlet ubtree for clique {,}. (h The Dirichlet ubtree for clique {}. (i Sample from the mixture model on (g,h, encoding both annotlin, again with β =, η =. That i, thee word are allowed to imultaneouly have large probabilitie in a given topic without violating any annotlin preference. y the maximality of thee clique, allowing any word outide the clique (e.g., to alo have a large probability will violate at leat annotlin (in thi example. We dicu the encoding for thi ingle connected component r now, deferring dicuion of the complete encoding to ection.. We create a mixture model of Q (r Dirichlet ubtree, one for each clique. Each topic elect exactly one ubtree according to probability p(q M rq, q =...Q (r. (6 onceptually, the elected ubtree indexed by q tend to reditribute nearly all probability ma to the word within M rq. Since there i no ma left for other clique, it i impoible for a word outide clique M rq to have a large probability. Therefore, no annot Lin will be violated. In reality, the ubtree are oft rather than hard, becaue annotlin are only preference. The Dirichlet ubtree for M rq i tructured a follow. The ubtree root connect to an internal node with weight η M rq β. The node connect to word in M rq, with weight β. The ubtree root alo directly connect to word not in M rq (but in the connected component r with weight β. Thi will end mot probability ma down to, and then flexibly reditribute it among word in M rq. For example, Figure (g,h how the Dirichlet ubtree for M r = {,} and M r = {} repectively. Sample from thi mixture model are hown in Figure (i, repreenting multinomial in which no annotlin i violated. Such behavior i not achievable by a Dirichlet ditribution, or a ingle Dirichlet tree. Finally, we mention that although in the wort cae the number of maximal clique Q (r in a connected component of ize r can grow exponentially a O( r / (Grigg et al., 988, in our experiment Q (r i no larger than, due in part to MutLined word collaping to ingle node in the annotlin graph... The Dirichlet Foret Prior In general, our domain nowledge i expreed by a et of MutLin and annotlin. We firt compute the tranitive cloure of MutLin. We then form a annotlingraph, where a node i either a MutLin cloure or a word not preent in any MutLin. Note that the domain nowledge mut be conitent in that no pair of word are imultaneouly annotlined and MutLined (either explicitly or implicitly through MutLin tranitive cloure. Let R be the number of connected component in the annotlingraph. Our Dirichlet Foret conit of R r= Q(r Dirichlet tree, repreented by the template in Figure. Each Dirichlet tree ha Dirichlet ditribution with very mall concentration do have ome election effect. For example, eta(.,. tend to concentrate probability ma on one of the two variable. However, uch prior are wea the peudo count in them are too mall becaue of the mall concentration. The poterior will be dominated by the data, and we would loe any encoded domain nowledge.
5 R branche beneath the root, one for each connected component. The tree differ in which ubtree they include under thee branche. For the rth branch, there are Q (r poible Dirichlet ubtree, correponding to clique M r...m rq (r. Therefore, a tree in the foret i uniquely identified by an index vector q = (q (...q (R, where q (r {...Q (r }. connected component η M q ( other w Q (... = η η or... word Mut Lin connected component R η... M Rq (R... other w Q (R Figure. Template of Dirichlet tree in the Dirichlet Foret To draw a Dirichlet tree q from the prior DirichletForet(β, η, we elect the ubtree independently becaue the R connected component are independent with repect to annotlin: p(q = R r= p(q(r. Each q (r i ampled according to (6, and correpond to chooing a olid box for the rth branch in Figure. The tructure of the ubtree within the olid box ha been defined in Section.. The blac node may be a ingle word, or a MutLin tranitive cloure having the ubtree tructure hown in the dotted box. The edge weight leading to mot node i γ ( = L( β, where L( i the et of leave under. However, for edge coming out of a MutLin internal node or going into a annotlin internal node, their weight are multiplied by the trength parameter η. Thee edge are mared by η in Figure. We now define the complete Dirichlet Foret model, integrating out ( collaping θ and φ. Let n (d be the number of word toen in document d that are aigned to topic. z i generated the ame a in LD: p(z α = ( D Γ(Tα D Γ(α T d= T = Γ(n(d + α. Γ(n (d + Tα There i one Dirichlet tree q per topic =...T, ampled from the Dirichlet Foret prior p(q = R r= p(q(r. Each Dirichlet tree q implicitly define it tree edge weight γ ( uing β, η, and it tree tructure L,I, (. Let n ( be the number of word toen in the corpu aigned to topic that appear under the node in the Dirichlet tree q. The probability of generating the corpu w, given the tree q :T q...q T and the topic aignment z, can be derived uing (: p(w q :T,z,β,η = ( ( I T Γ γ ( ( ( = Γ (γ ( + n ( ( Finally, the complete generative model i Γ(γ ( p(w,z,q :T α, β, η = p(w q :T,z,β,ηp(z α. Inference for Dirichlet Foret + n ( Γ(γ (. T p(q. = ecaue a Dirichlet Foret i a mixture of Dirichlet tree, which are conugate to multinomial, we can efficiently perform inference by Marov hain Monte arlo (MM. Specifically, we ue collaped Gibb ampling imilar to Griffith and Steyver (. However, in our cae the MM tate i defined by both the topic label z and the tree indice q :T. n MM iteration in our cae conit of a weep through both z and q :T. We preent the conditional probabilitie for collaped Gibb ampling below. (Sampling z i : Let n (d i, be the number of word toen in document d aigned to topic, excluding the word at poition i. Similarly, let n ( i, be the number of word toen in the corpu that are under node in topic Dirichlet tree, excluding the word at poition i. For candidate topic label v =...T, we have p(z i = v z i,q :T,w Iv( i (n (d i,v + α γ v (v( i v( + n (v( i i,v (, γ v ( + n ( i,v where I v ( i denote the ubet of internal node in topic v Dirichlet tree that are ancetor of leaf w i, and v ( i i the unique node that i immediate child and an ancetor of w i (including w i itelf. (Sampling q (r : Since the connected component are independent, ampling the tree q factor into ampling the clique for each connected component q (r. For candidate clique q =...Q(r, we have I,r=q p(q (r Γ = q z,q,q ( r,w Γ ( ( ( ( (γ ( γ ( + n ( ( M rq β Γ(γ ( + n ( Γ(γ ( where I,r=q denote the internal node below the rth branch of tree q, when clique M rq i elected.
6 (Etimating φ and θ: fter running MM for ufficient iteration, we follow tandard practice (e.g. (Griffith & Steyver, and ue the lat ample (z,q :T to etimate φ and θ. ecaue a Dirichlet tree i a conugate ditribution, it poterior i a Dirichlet tree with the ame tructure and updated edge weight. The poterior for the Dirichlet tree of the th topic i γ pot( = γ ( +n (, where the count n ( are collected from z,q :T,w. We etimate φ by the firt moment under thi poterior (Mina, 999: I ( w φ (w = γ pot(( w ( γ pot(. (7 The parameter θ i etimated the ame way a in tandard LD: θ = (n (d + α/(n (d + (d Tα.. Experiment Synthetic orpora: We preent reult on ynthetic dataet to how how the Dirichlet Foret (DF incorporate different type of nowledge. Recall that DF with η = i equivalent to tandard LD (verified with the code of (Griffith & Steyver,. Previou tudie often tae the lat MM ample (z and q :T, and dicu the topic φ :T derived from that ample. ecaue of the tochatic nature of MM, we argue that more inight can be gained if multiple independent MM ample are conidered. For each dataet, and each DF with a different η, we run a long MM chain with, iteration of burnin, and tae out a ample every, iteration afterward, for a total of ample. We have ome indication that our chain i wellmixed, a we oberve all expected mode, and that ample with label witching (i.e., equivalent up to label permutation occur with near equal frequency. For each ample, we derive it topic φ :T with (7 and then greedily align the φ from different ample, permuting the T topic label to remove the label witching effect. Within a dataet, we perform P on the baeline (η = φ and proect all ample into the reulting pace to obtain a common viualization (each row in Figure. Point are dithered to how overlap.. MutLin (,: The corpu conit of ix document over a vocabulary of five word. The document are:, DD, and EEEE, each repreented twice. We let T =,α =.,β =.. LD produce three ind of φ :T : roughly a third D E of the time the topic are around [ ], which i horthand for φ = (,,,, φ = (,,,, on the vocabulary DE. nother third are around ML(,, eta=.. L(,, eta= Iolate(, eta= Split(,D, eta= ML(,, eta=.. L(,, eta=.... Iolate(, eta=.... Split(,D, eta= ML(,, eta=.. L(,, eta=.... Iolate(, eta=.... Split(,D, eta= Figure. P proection of permutationaligned φ ample for the four ynthetic data experiment. [ E D D E]. ], and the final third around [ They correpond to cluter, and repectively in the upperleft panel of Figure. We add a ingle Mut Lin (,. When η =, the data till override our MutLin omewhat becaue cluter and do not diappear completely. η increae to, MutLin override the data and cluter and vanih, leaving only cluter. That i, running DF and taing the lat D ample i very liely to obtain the [ E] topic. Thi i what we want: and are preent or abent together in the topic and they alo pull, D along, even though, D are not in the nowledge we added. annotlin (,: The corpu ha four document:, DDDD, twice each; T =,α =,β =.. LD produce ix ind of φ :T evenly: [ D ], [ D], [ D ], [ D], [ D], [ D ], correponding to cluter and the line. We add a ingle annotlin (,. DF η increae, cluter [ D] diappear, becaue it involve a topic that violate the annot Lin. Other cluter become uniformly more liely. Iolate(: The corpu ha four document, all of which are ; T =,α =,β =.. LD pro
7 duce three cluter evenly: [ ], [ ], [ ]. We add Iolate(, which i compiled into annot Lin (, and annotlin (,. The DF ample concentrate to cluter : [ ], which indeed iolate into it own topic. Split(,D: The corpu ha ix document: DEEEE, DFFFF, each preent three time; α =.,β =.. LD with T = produce a large portion of topic around [ D E F] (not hown. We add Split(,D, which i compiled into Mut Lin (,, MutLin (,D, annotlin (,, and increae T =. However, DF with η = (i.e., LD with T = produce a large variety of topic: e.g., cluter i [ D 8 8 7F 8 8 E], cluter i E F], and cluter 7 i [ D E F]. [ 7D That i, imply adding one more topic doe not clearly eparate and D. On the other hand, with η increaing, DF eventually concentrate on cluter 7, which atifie the Split operation. Wih orpu: We now conider interactive topic modeling with DF. The corpu we ue i a collection of 89,7 New Year wihe ubmitted to The Time Square lliance (Goldberg et al., 9. Each wih i treated a a document, downcaed but without topword removal. For each tep in our interactive example, we et α =., β =., η =, and run MM for iteration before etimating the topic from the final ample. The domain nowledge in DF i accumulative along the tep. Step : We run LD with T =. Many of the mot probable word in the topic are conventional ( to, and or corpupecific ( wih, 8 topword, which obcure the meaning of the topic. Step : We manually create a word topword lit, and iue an Iolate preference. Thi i compiled into MutLin among thi et and annotlin between thi et and all other word in the top for all topic. T i increaed to 6. fter running DF, we end up with two topword topic. Importantly, with the topword explained by thee two topic, the top word for the other topic become much more meaningful. Step : We notice that one topic conflate two concept: enter college and cure dieae (top 8 word: go chool cancer into well free cure college. We iue Split( go,chool,into,college, cancer,free,cure,well to eparate the concept. Thi i compiled into Mut Lin within each quadruple, and a annotlin between them. T i increaed to 8. fter running DF, one of the topic clearly tae on the college concept, picing up related word which we did not explicitly encode in our prior. nother topic doe liewie for the cure concept (many wihe are lie mom tay cancer free. Other topic have minor change. Table. Wih topic from interactive topic modeling Topic Top word orted by φ = p(word topic Merge love loe weight together forever marry meet ucce health happine family good friend properity life life happy bet live time long wihe ever year  a do not what omeone o lie don much he money out mae money up houe wor able pay own lot people no people top le day every each other another iraq home afe end troop iraq bring war return oy love true peace happine dream oy everyone family happy healthy family baby afe properou vote better hope preident paul ron than peron buh Iolate and to for a the year in new all my god god ble eu everyone loved now heart chrit peace peace world earth win lottery around ave pam com call if u www viit Iolate i to wih my for and a be that the Split ob go great chool into good college hope move Split mom hope cancer free huband on well dad cure Step : We then notice that two topic correpond to romance concept. We apply Merge( love, forever, marry, together, love, meet, boyfriend, married, girlfriend, wedding, which i compiled into Mut Lin between thee word. T i decreaed to 7. fter running DF, one of the romance topic diappear, and the remaining one correpond to the merged romance topic ( loe, weight were in one of them, and remain o. Other previou topic urvive with only minor change. Table how the wih topic after thee four tep, where we place the DF operation next to the mot affected topic, and colorcode the word explicitly pecified in the domain nowledge. Yeat orpu: Wherea the previou experiment illutrate the utility of our approach in an interactive etting, we now conider a cae in which we ue bacground nowledge from an ontology to guide topic modeling. Our prior nowledge i baed on ix concept. The concept trancription, tranlation and replication characterize three important procee that are carried out at the molecular level. The concept initiation, elongation and termination decribe phae of the three aforementioned procee. ombination of concept from thee two et correpond to concept in the Gene Ontology (e.g., GO:6 i tranlational elongation, and GO:6 i trancription initiation. We guide our topic modeling uing MutLin among a mall et of word for each concept. Moreover, we ue annotlin among word to pecify that we prefer (i trancription, tranlation and replication to be repreented in eparate topic, and (ii initiation, elongation and termination to be repreented in eparate topic. We do not et any preference between the proce topic and the phae topic, however.
8 Table. Yeat topic. The left column how the eed word in the DF model. The middle column indicate the topic in which at leat eed word are among the highet probability word for LD, the o column give the number of other topic (not hared by another word. The right column how the ame topicword relationhip for the DF model. LD DF o trancription trancriptional template tranlation tranlational trn replication cycle diviion initiation tart aembly 7 elongation termination diaembly releae top The corpu that we ue for our experiment conit of 8,9 abtract elected from the MEDLINE databae for their relevance to yeat gene. We induce topic model uing DF to encode the MutLin and annotlin decribed above, and ue tandard LD a a control. We et T =,α =.,β =.,η =. For each word that we ue to eed a concept, Table how the topic that include it among their mot probable word. We mae everal obervation about the DFinduced topic. Firt, each concept i repreented by a mall number of topic and the Mut Lin word for each topic all occur a highly probable word in thee topic. Second, the annotlin preference are obeyed in the final topic. Third, the topic ue the proce and phae topic in a compoitionally. For example, DF Topic repreent trancription initiation and DF Topic 8 repreent replication initiation. Moreover, the topic that are ignificantly influenced by the prior typically include highly relevant term among their mot probable word. For example, the top word in DF Topic include TT, TFIID, promoter, and recruitment which are all pecifically germane to the compoite concept of trancription initiation. In the cae of tandard LD, the eed concept word are dipered acro a greater number of topic, and highly related word, uch a cycle and diviion often do not fall into the ame topic. Many of the topic induced by ordinary LD are emantically coherent, but the pecific concept uggeted by our prior do not naturally emerge without uing DF. cnowledgment: Thi wor wa upported by NIH/NLM grant T LM79 and R LM7, and the Wiconin lumni Reearch Foundation. Reference au, S., Davidon, I., & Wagtaff, K. (Ed.. (8. ontrained clutering: dvance in algorithm, theory, and application. hapman & Hall/R. lei, D., & Lafferty, J. (6. orrelated topic model. In dvance in neural information proceing ytem 8, 7. ambridge, M: MIT Pre. lei, D., Ng,., & Jordan, M. (. Latent Dirichlet allocation. Journal of Machine Learning Reearch,, 99. hemudugunta,., Holloway,., Smyth, P., & Steyver, M. (8. Modeling document by combining emantic concept with unupervied tatitical learning. Intl. Semantic Web onf. (pp. 9. Springer. Denni III, S. Y. (99. On the hyperdirichlet type and hyperliouville ditribution. ommunication in Statitic Theory and Method,, Goldberg,., Fillmore, N., ndrzeewi, D., Xu, Z., Gibon,., & Zhu, X. (9. May all your wihe come true: tudy of wihe and how to recognize them. Human Language Technologie: Proc. of the nnual onf. of the North merican hapter of the oc. for omputational Linguitic. L Pre. Griffith, T. L., & Steyver, M. (. Finding cientific topic. Proc. of the Nat. cademy of Science of the United State of merica,, 8. Grigg, J. R., Grintead,. M., & Guichard, D. R. (988. The number of maximal independent et in a connected graph. Dicrete Math., 68,. Li, W., & Mcallum,. (6. Pachino allocation: DGtructured mixture model of topic correlation. Proc. of the rd Intl. onf. on Machine Learning (pp M Pre. Mina, T. P. (999. The Dirichlettree ditribution (Technical Report. mina/paper/dirichlet/minadirtree.pdf. Tam, Y.., & Schultz, T. (7. orrelated latent emantic model for unupervied LM adaptation. IEEE Intl. onf. on coutic, Speech and Signal Proceing (pp.. The Gene Ontology onortium (. Gene Ontology: Tool for the unification of biology. Nature Genetic,, 9.
Who Will Follow You Back? Reciprocal Relationship Prediction
Who Will Follow You Back? Reciprocal Relationhip Prediction John Hopcroft Department of Computer Science Cornell Univerity Ithaca NY 4853 jeh@c.cornell.edu Tiancheng Lou Intitute for Interdiciplinary Information
More informationAsset Pricing: A Tale of Two Days
Aet Pricing: A Tale of Two Day Pavel Savor y Mungo Wilon z Thi verion: June 2013 Abtract We how that aet price behave very di erently on day when important macroeconomic new i cheduled for announcement
More informationTwo Trees. John H. Cochrane University of Chicago. Francis A. Longstaff The UCLA Anderson School and NBER
Two Tree John H. Cochrane Univerity of Chicago Franci A. Longtaff The UCLA Anderon School and NBER Pedro SantaClara The UCLA Anderon School and NBER We olve a model with two i.i.d. Luca tree. Although
More informationMULTIPLE SINK LOCATION PROBLEM AND ENERGY EFFICIENCY IN LARGE SCALE WIRELESS SENSOR NETWORKS
MULTIPLE SINK LOCATION PROBLEM AND ENERGY EFFICIENCY IN LARGE SCALE WIRELESS SENSOR NETWORKS by Eylem İlker Oyman B.S. in Computer Engineering, Boğaziçi Univerity, 1993 B.S. in Mathematic, Boğaziçi Univerity,
More informationSome Recent Advances on Spectral Methods for Unbounded Domains
COMMUICATIOS I COMPUTATIOAL PHYSICS Vol. 5, o. 24, pp. 195241 Commun. Comput. Phy. February 29 REVIEW ARTICLE Some Recent Advance on Spectral Method for Unbounded Domain Jie Shen 1, and LiLian Wang
More informationWarp Field Mechanics 101
Warp Field Mechanic 101 Dr. Harold Sonny White NASA Johnon Space Center 2101 NASA Parkway, MC EP4 Houton, TX 77058 email: harold.white1@naa.gov Abtract: Thi paper will begin with a hort review of the
More informationMay All Your Wishes Come True: A Study of Wishes and How to Recognize Them
May All Your Wishes Come True: A Study of Wishes and How to Recognize Them Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski Zhiting Xu, Bryan Gibson, Xiaojin Zhu Computer Sciences Department,
More informationOn Smoothing and Inference for Topic Models
On Smoothing and Inference for Topic Models Arthur Asuncion, Max Welling, Padhraic Smyth Department of Computer Science University of California, Irvine Irvine, CA, USA {asuncion,welling,smyth}@ics.uci.edu
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationTopics over Time: A NonMarkov ContinuousTime Model of Topical Trends
Topics over Time: A NonMarkov ContinuousTime Model of Topical Trends Xuerui Wang, Andrew McCallum Department of Computer Science University of Massachusetts Amherst, MA 01003 xuerui@cs.umass.edu, mccallum@cs.umass.edu
More informationIntroduction to Data Mining and Knowledge Discovery
Introduction to Data Mining and Knowledge Discovery Third Edition by Two Crows Corporation RELATED READINGS Data Mining 99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining
More informationSteering User Behavior with Badges
Steering User Behavior with Badges Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell University Cornell University Stanford University ashton@cs.stanford.edu {dph,
More informationGraphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec Carnegie Mellon University jure@cs.cmu.edu Jon Kleinberg Cornell University kleinber@cs.cornell.edu Christos
More informationScalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights
Seventh IEEE International Conference on Data Mining Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights Robert M. Bell and Yehuda Koren AT&T Labs Research 180 Park
More informationOnLine LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking
OnLine LDA: Adaptive Topic Models for Mining Text s with Applications to Topic Detection and Tracking Loulwah AlSumait, Daniel Barbará, Carlotta Domeniconi Department of Computer Science George Mason
More informationIntellectual Need and ProblemFree Activity in the Mathematics Classroom
Intellectual Need 1 Intellectual Need and ProblemFree Activity in the Mathematics Classroom Evan Fuller, Jeffrey M. Rabin, Guershon Harel University of California, San Diego Correspondence concerning
More informationIndexing by Latent Semantic Analysis. Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637
Indexing by Latent Semantic Analysis Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637 Susan T. Dumais George W. Furnas Thomas K. Landauer Bell Communications Research 435
More informationHow to Use Expert Advice
NICOLÒ CESABIANCHI Università di Milano, Milan, Italy YOAV FREUND AT&T Labs, Florham Park, New Jersey DAVID HAUSSLER AND DAVID P. HELMBOLD University of California, Santa Cruz, Santa Cruz, California
More informationCombating Web Spam with TrustRank
Combating Web Spam with TrustRank Zoltán Gyöngyi Hector GarciaMolina Jan Pedersen Stanford University Stanford University Yahoo! Inc. Computer Science Department Computer Science Department 70 First Avenue
More informationA First Encounter with Machine Learning. Max Welling Donald Bren School of Information and Computer Science University of California Irvine
A First Encounter with Machine Learning Max Welling Donald Bren School of Information and Computer Science University of California Irvine November 4, 2011 2 Contents Preface Learning and Intuition iii
More informationMaximizing the Spread of Influence through a Social Network
Maximizing the Spread of Influence through a Social Network David Kempe Dept. of Computer Science Cornell University, Ithaca NY kempe@cs.cornell.edu Jon Kleinberg Dept. of Computer Science Cornell University,
More informationNot Seeing is Also Believing: Combining Object and Metric Spatial Information
Not Seeing is Also Believing: Combining Object and Metric Spatial Information Lawson L.S. Wong, Leslie Pack Kaelbling, and Tomás LozanoPérez Abstract Spatial representations are fundamental to mobile
More informationFoundations of Data Science 1
Foundations of Data Science John Hopcroft Ravindran Kannan Version /4/204 These notes are a first draft of a book being written by Hopcroft and Kannan and in many places are incomplete. However, the notes
More informationBayesian Models of Graphs, Arrays and Other Exchangeable Random Structures
Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures Peter Orbanz and Daniel M. Roy Abstract. The natural habitat of most Bayesian methods is data represented by exchangeable sequences
More informationDiscovering objects and their location in images
Discovering objects and their location in images Josef Sivic Bryan C. Russell Alexei A. Efros Andrew Zisserman William T. Freeman Dept. of Engineering Science CS and AI Laboratory School of Computer Science
More informationDiscovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow
Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow Ashton Anderson Daniel Huttenlocher Jon Kleinberg Jure Leskovec Stanford University Cornell
More informationWhat are requirements?
2004 Steve Easterbrook. DRAFT PLEASE DO NOT CIRCULATE page 1 C H A P T E R 2 What are requirements? The simple question what are requirements? turns out not to have a simple answer. In this chapter we
More informationAn HDPHMM for Systems with State Persistence
Emily B. Fox ebfox@mit.edu Department of EECS, Massachusetts Institute of Technology, Cambridge, MA 9 Erik B. Sudderth sudderth@eecs.berkeley.edu Department of EECS, University of California, Berkeley,
More informationGOSSIP: IDENTIFYING CENTRAL INDIVIDUALS IN A SOCIAL NETWORK
GOSSIP: IDENTIFYING CENTRAL INDIVIDUALS IN A SOCIAL NETWORK ABHIJIT BANERJEE, ARUN G. CHANDRASEKHAR, ESTHER DUFLO, AND MATTHEW O. JACKSON Abstract. Can we identify the members of a community who are bestplaced
More informationCollective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map
Collective Intelligence and its Implementation on the Web: algorithms to develop a collective mental map Francis HEYLIGHEN * Center Leo Apostel, Free University of Brussels Address: Krijgskundestraat 33,
More information