Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors

Transcription

1 via Dirichlet Foret Prior David ndrzeewi Xiaoin Zhu Mar raven Department of omputer Science, Department of iotatitic and Medical Informatic Univerity of Wiconin-Madion, Madion, WI 76 US btract Uer of topic modeling method often have nowledge about the compoition of word that hould have high or low probability in variou topic. We incorporate uch domain nowledge uing a novel Dirichlet Foret prior in a Latent Dirichlet llocation framewor. The prior i a mixture of Dirichlet tree ditribution with pecial tructure. We preent it contruction, and inference via collaped Gibb ampling. Experiment on ynthetic and real dataet demontrate our model ability to follow and generalize beyond uerpecified domain nowledge.. Introduction Topic modeling, uing approache uch a Latent Dirichlet llocation (LD (lei et al.,, ha enoyed popularity a a way to model hidden topic in data. However, in many application, a uer may have additional nowledge about the compoition of word that hould have high probability in variou topic. For example, in a biological application, one may prefer that the word termination, diaembly and releae appear with high probability in the ame topic, becaue they all decribe the ame phae of biological procee. Furthermore, a biologit could automatically extract thee preference from an exiting biomedical ontology, uch a the Gene Ontology (GO (The Gene Ontology onortium,. another example, an analyt may run topic modeling on a corpu of people wihe, inpect the reulting topic, and notice that into, college and cure, cancer all ap- ppearing in Proceeding of the 6 th International onference on Machine Learning, Montreal, anada, 9. opyright 9 by the author(/owner(. pear with high probability in the ame topic. The analyt may want to interactively expre the preference that the two et of word hould not appear together, re-run topic modeling, and incorporate additional preference baed on the new reult. In both cae, we would lie thee preference to guide the recovery of latent topic. Standard LD lac a mechanim for incorporating uch domain nowledge. In thi paper, we propoe a principled approach to the incorporation of uch domain nowledge into LD. We how that many type of nowledge can be expreed with two primitive on word pair. orrowing name from the contrained clutering literature (au et al., 8, we call the two primitive Mut-Lin and annot-lin, although there are important difference. We then encode the et of Mut-Lin and annot-lin aociated with the domain nowledge uing a Dirichlet Foret prior, replacing the Dirichlet prior over the topic-word multinomial p(word topic. The Dirichlet Foret prior i a mixture of Dirichlet tree ditribution with very pecific tree tructure. Our approach ha everal advantage: (i Dirichlet Foret can encode Mut-Lin and annot-lin, omething impoible with Dirichlet ditribution. (ii The uer can control the trength of the domain nowledge by etting a parameter η, allowing domain nowledge to be overridden if the data trongly ugget otherwie. (iii The Dirichlet Foret lend itelf to efficient inference via collaped Gibb ampling, a property inherited from the conugacy of Dirichlet tree. We preent experiment on everal ynthetic dataet and two real domain, demontrating that the reulting topic not only uccefully incorporate the pecified domain nowledge, but alo generalize beyond it by including/excluding other related word not explicitly mentioned in the Mut-Lin and annot-lin.

2 . Related Wor We review LD uing the notation of Griffith and Steyver (. Let there be T topic. Let w = w...w n repreent a corpu of D document, with a total of n word. We ue d i to denote the document of word w i, and z i the hidden topic from which w i i generated. Let φ (w for document d. The LD generative model i then: = p(w z =, and θ (d = p(z = θ Dirichlet(α ( z i θ (di Multinomial(θ (di ( φ Dirichlet(β ( w i z i,φ Multinomial(φ zi ( where α and β are hyperparameter for the documenttopic and topic-word Dirichlet ditribution, repectively. For implicity we will aume ymmetric α and β, but aymmetric hyperparameter are alo poible. Previou wor ha modeled correlation in the LD document-topic mixture uing the logitic Normal ditribution (lei & Lafferty, 6, DG (Pachino tructure (Li & Mcallum, 6, or the Dirichlet Tree ditribution (Tam & Schultz, 7. In addition, the concept-topic model (hemudugunta et al., 8 employ domain nowledge through pecial concept topic, in which only a particular et of word can be preent. Our wor complement the previou wor by encoding complex domain nowledge on word (epecially arbitrary annot-lin into a flexible and computationally efficient prior.. Topic Modeling with Dirichlet Foret Our propoed model differ from LD in the way φ i generated. Intead of (, we have q DirichletForet(β, η φ DirichletTree(q where q pecifie a Dirichlet tree ditribution, β play a role analogou to the topic-word hyperparameter in tandard LD, and η i the trength parameter of the domain nowledge. efore dicuing DirichletForet(β, η and DirichletTree(q, we firt explain how nowledge can be expreed uing Mut- Lin and annot-lin primitive... Mut-Lin and annot-lin Mut-Lin and annot-lin were originally propoed for contrained clutering to encourage two intance to fall into the ame cluter or into eparate cluter, repectively. We borrow the notion for topic modeling. Informally, the Mut-Lin primitive prefer that two word tend to be generated by the ame topic, while the annot-lin primitive prefer that two word tend to be generated by eparate topic. However, ince any topic φ i a multinomial over word, any two word (in general alway have ome probability of being generated by the topic. We therefore propoe the following definition: Mut-Lin (u, v: Two word u, v have imilar probability within any topic, i.e., φ (u φ (v for =...T. It i important to note that the probabilitie can be both large or both mall, a long a they are imilar. For example, for the earlier biology example we could ay Mut-Lin (termination, diaembly. annot-lin (u, v: Two word u, v hould not both have large probability within any topic. It i permiible for one to have a large probability and the other mall, or both mall. For example, one primitive for the wih example can be annot-lin (college, cure. Many type of domain nowledge can be decompoed into a et of Mut-Lin and annot-lin. We demontrate three type in our experiment: we can Split two or more et of word from a ingle topic into different topic by placing Mut-Lin within the et and annot-lin between them. We can Merge two or more et of word from different topic into one topic by placing Mut-Lin among the et. Given a common et of word which appear in multiple topic (uch a topword in Englih, which tend to appear in all LD topic, we can Iolate them by placing Mut-Lin within the common et, and then placing annot-lin between the common et and the other high-probability word from all topic. It i important to note that our Mut-Lin and annot-lin are preference intead of hard contraint... Encoding Mut-Lin It i well-nown that the Dirichlet ditribution i limited in that all word hare a common variance parameter, and are mutually independent except the normalization contraint (Mina, 999. However, for Mut- Lin (u, v it i crucial to control the two word u, v differently than other word. The Dirichlet tree ditribution (Denni III, 99 i a generalization of the Dirichlet ditribution that allow uch control. It i a tree with the word a leaf node; ee Figure (a for an example. Let γ ( be the Dirichlet tree edge weight leading into node. Let ( be the immediate children of node in the tree, L the leave of the tree, I the internal node, and L( the

3 leave in the ubtree under. To generate a ample φ DirichletTree(γ, one firt draw a multinomial at each internal node I from Dirichlet(γ (, i.e., uing the weight from to it children a the Dirichlet parameter. One can thin of it a re-ditributing the probability ma reaching by thi multinomial (initially, the ma i at the root. The probability φ ( of a word L i then imply the product of the multinomial parameter on the edge from to the root, a hown in Figure (b. It can be hown (Denni III, 99 that thi procedure give DirichletTree(γ p(φ γ = ( L φ (γ( I ( ( Γ γ ( ( Γ ( γ ( L( φ ( ( where Γ( i the tandard gamma function, and the notation L mean L. The function ( γ( ( γ( i the difference between the in-degree and out-degree of internal node. When thi difference ( = for all internal node I, the Dirichlet tree reduce to a Dirichlet ditribution. Lie the Dirichlet, the Dirichlet tree i conugate to the multinomial. It i poible to integrate out φ to get a ditribution over word count directly, imilar to the multivariate Pólya ditribution: p(w γ = ( ( I Γ γ ( ( Γ ( γ ( + n ( ( ( ( Γ γ ( + n ( ( Γ(γ ( Here n ( i the number of word toen in w that appear in L(. We encode Mut-Lin uing a Dirichlet tree. Note that our definition of Mut-Lin i tranitive: Mut- Lin (u, v and Mut-Lin (v,w imply Mut-Lin (u, w. We thu firt compute the tranitive cloure of expreed Mut-Lin. Our Dirichlet tree for Mut- Lin ha a very imple tructure: each tranitive cloure i a ubtree, with one internal node and the word in the cloure a it leave. The weight from the internal node to it leave are ηβ. The root connect to thee internal node with weight L( β, where repreent the et ize. In addition, the root directly connect to other word not in any cloure, with weight β. For example, the tranitive cloure for a Mut-Lin (, on vocabulary {,,} i imply {,}, correponding to the Dirichlet tree in Figure (a. To undertand thi encoding of Mut-Lin, conider firt the cae when the domain nowledge trength parameter i at it weaet η =. Then in-degree equal out-degree for any internal node (both are L( β, and the tree reduce to a Dirichlet ditribution with ymmetric prior β: the Mut-Lin are turned off in thi cae. we increae η, the re-ditribution of probability ma at (governed by a Dirichlet under ha increaing concentration L( ηβ but the ame uniform bae-meaure. Thi tend to reditribute the ma evenly in the tranitive cloure repreented by. Therefore, the Mut-Lin are turned on when η >. Furthermore, the ma reaching i independent of η, and can till have a large variance. Thi properly encode the fact that we want Mut-Lined word to have imilar, but not alway large, probabilitie. Otherwie, Mut-Lined word would be forced to appear with large probability in all topic, which i clearly undeirable. Thi i impoible to repreent with Dirichlet ditribution. For example, the blue dot in Figure (c are φ ample from the Dirichlet tree in Figure (a, plotted on the probability implex of dimenion three. While it i alway true that p( p(, their total probability ma can be anywhere from to. The mot imilar Dirichlet ditribution i perhap the one with parameter (,,, which generate ample cloe to (.,., (Figure (d... Encoding annot-lin annot-lin are coniderably harder to handle. We firt tranform them into an alternative form that i amenable to Dirichlet tree. Note that annot-lin are not tranitive: annot-lin (, and annot- Lin (, doe not entail annot-lin (,. We define a annot-lin-graph where the node are word, and the edge correpond to the annot-lin. Then the connected component of thi graph are independent of each other when encoding annot-lin. We will ue thi property to factor a Dirichlet-tree election probability later. For example, the two annot- Lin (, and (, form the graph in Figure (e with a ingle connected component {,,}. onider the ubgraph on connected component r. We define it complement graph by flipping the edge (on to off, off to on, a hown in Figure (f. Let there be Q (r maximal clique M r...m rq (r in thi complement graph. In the following, we imply call them clique, but it i important to remember that they are maximal clique of the complement graph, not the original annot-lin-graph. In our example, Q (r = and M r = {,}, M r = {}. Thee clique have the following interpretation: each clique (e.g., M r = {,} i the maximal ubet of word in the connected component that can occur together. When there are Mut-Lin, all word in a Mut-Lin tranitive cloure form a ingle node in thi graph.

4 β β.9.9 ηβ ηβ.8. φ=(..8.9 (a (b (c (d ηβ β ηβ β β β β β (e (f (g (h (i Figure. Encoding Mut-Lin and annot-lin with a Dirichlet Foret. (a Dirichlet tree encoding Mut-Lin (, with β =, η = on vocabulary {,,}. (b ample φ from thi Dirichlet tree. (c large et of ample from the Dirichlet tree, plotted on the -implex. Note p( p(, yet they remain flexible in actual value, which i deirable for a Mut-Lin. (d In contrat, ample from a tandard Dirichlet with comparable parameter (,, force p( p(., and cannot encode a Mut-Lin. (e The annot-lin-graph for annot-lin (, and annot-lin (,. (f The complementary graph, with two maximal clique {,} and {}. (g The Dirichlet ubtree for clique {,}. (h The Dirichlet ubtree for clique {}. (i Sample from the mixture model on (g,h, encoding both annot-lin, again with β =, η =. That i, thee word are allowed to imultaneouly have large probabilitie in a given topic without violating any annot-lin preference. y the maximality of thee clique, allowing any word outide the clique (e.g., to alo have a large probability will violate at leat annot-lin (in thi example. We dicu the encoding for thi ingle connected component r now, deferring dicuion of the complete encoding to ection.. We create a mixture model of Q (r Dirichlet ubtree, one for each clique. Each topic elect exactly one ubtree according to probability p(q M rq, q =...Q (r. (6 onceptually, the elected ubtree indexed by q tend to reditribute nearly all probability ma to the word within M rq. Since there i no ma left for other clique, it i impoible for a word outide clique M rq to have a large probability. Therefore, no annot- Lin will be violated. In reality, the ubtree are oft rather than hard, becaue annot-lin are only preference. The Dirichlet ubtree for M rq i tructured a follow. The ubtree root connect to an internal node with weight η M rq β. The node connect to word in M rq, with weight β. The ubtree root alo directly connect to word not in M rq (but in the connected component r with weight β. Thi will end mot probability ma down to, and then flexibly reditribute it among word in M rq. For example, Figure (g,h how the Dirichlet ubtree for M r = {,} and M r = {} repectively. Sample from thi mixture model are hown in Figure (i, repreenting multinomial in which no annot-lin i violated. Such behavior i not achievable by a Dirichlet ditribution, or a ingle Dirichlet tree. Finally, we mention that although in the wort cae the number of maximal clique Q (r in a connected component of ize r can grow exponentially a O( r / (Grigg et al., 988, in our experiment Q (r i no larger than, due in part to Mut-Lined word collaping to ingle node in the annot-lin graph... The Dirichlet Foret Prior In general, our domain nowledge i expreed by a et of Mut-Lin and annot-lin. We firt compute the tranitive cloure of Mut-Lin. We then form a annot-lin-graph, where a node i either a Mut-Lin cloure or a word not preent in any Mut-Lin. Note that the domain nowledge mut be conitent in that no pair of word are imultaneouly annot-lined and Mut-Lined (either explicitly or implicitly through Mut-Lin tranitive cloure. Let R be the number of connected component in the annot-lin-graph. Our Dirichlet Foret conit of R r= Q(r Dirichlet tree, repreented by the template in Figure. Each Dirichlet tree ha Dirichlet ditribution with very mall concentration do have ome election effect. For example, eta(.,. tend to concentrate probability ma on one of the two variable. However, uch prior are wea the peudo count in them are too mall becaue of the mall concentration. The poterior will be dominated by the data, and we would loe any encoded domain nowledge.

5 R branche beneath the root, one for each connected component. The tree differ in which ubtree they include under thee branche. For the r-th branch, there are Q (r poible Dirichlet ubtree, correponding to clique M r...m rq (r. Therefore, a tree in the foret i uniquely identified by an index vector q = (q (...q (R, where q (r {...Q (r }. connected component η M q ( other w Q (... = η η or... word Mut Lin connected component R η... M Rq (R... other w Q (R Figure. Template of Dirichlet tree in the Dirichlet Foret To draw a Dirichlet tree q from the prior DirichletForet(β, η, we elect the ubtree independently becaue the R connected component are independent with repect to annot-lin: p(q = R r= p(q(r. Each q (r i ampled according to (6, and correpond to chooing a olid box for the r-th branch in Figure. The tructure of the ubtree within the olid box ha been defined in Section.. The blac node may be a ingle word, or a Mut-Lin tranitive cloure having the ubtree tructure hown in the dotted box. The edge weight leading to mot node i γ ( = L( β, where L( i the et of leave under. However, for edge coming out of a Mut-Lin internal node or going into a annot-lin internal node, their weight are multiplied by the trength parameter η. Thee edge are mared by η in Figure. We now define the complete Dirichlet Foret model, integrating out ( collaping θ and φ. Let n (d be the number of word toen in document d that are aigned to topic. z i generated the ame a in LD: p(z α = ( D Γ(Tα D Γ(α T d= T = Γ(n(d + α. Γ(n (d + Tα There i one Dirichlet tree q per topic =...T, ampled from the Dirichlet Foret prior p(q = R r= p(q(r. Each Dirichlet tree q implicitly define it tree edge weight γ ( uing β, η, and it tree tructure L,I, (. Let n ( be the number of word toen in the corpu aigned to topic that appear under the node in the Dirichlet tree q. The probability of generating the corpu w, given the tree q :T q...q T and the topic aignment z, can be derived uing (: p(w q :T,z,β,η = ( ( I T Γ γ ( ( ( = Γ (γ ( + n ( ( Finally, the complete generative model i Γ(γ ( p(w,z,q :T α, β, η = p(w q :T,z,β,ηp(z α. Inference for Dirichlet Foret + n ( Γ(γ (. T p(q. = ecaue a Dirichlet Foret i a mixture of Dirichlet tree, which are conugate to multinomial, we can efficiently perform inference by Marov hain Monte arlo (MM. Specifically, we ue collaped Gibb ampling imilar to Griffith and Steyver (. However, in our cae the MM tate i defined by both the topic label z and the tree indice q :T. n MM iteration in our cae conit of a weep through both z and q :T. We preent the conditional probabilitie for collaped Gibb ampling below. (Sampling z i : Let n (d i, be the number of word toen in document d aigned to topic, excluding the word at poition i. Similarly, let n ( i, be the number of word toen in the corpu that are under node in topic Dirichlet tree, excluding the word at poition i. For candidate topic label v =...T, we have p(z i = v z i,q :T,w Iv( i (n (d i,v + α γ v (v( i v( + n (v( i i,v (, γ v ( + n ( i,v where I v ( i denote the ubet of internal node in topic v Dirichlet tree that are ancetor of leaf w i, and v ( i i the unique node that i immediate child and an ancetor of w i (including w i itelf. (Sampling q (r : Since the connected component are independent, ampling the tree q factor into ampling the clique for each connected component q (r. For candidate clique q =...Q(r, we have I,r=q p(q (r Γ = q z,q,q ( r,w Γ ( ( ( ( (γ ( γ ( + n ( ( M rq β Γ(γ ( + n ( Γ(γ ( where I,r=q denote the internal node below the r-th branch of tree q, when clique M rq i elected.

6 (Etimating φ and θ: fter running MM for ufficient iteration, we follow tandard practice (e.g. (Griffith & Steyver, and ue the lat ample (z,q :T to etimate φ and θ. ecaue a Dirichlet tree i a conugate ditribution, it poterior i a Dirichlet tree with the ame tructure and updated edge weight. The poterior for the Dirichlet tree of the -th topic i γ pot( = γ ( +n (, where the count n ( are collected from z,q :T,w. We etimate φ by the firt moment under thi poterior (Mina, 999: I ( w φ (w = γ pot(( w ( γ pot(. (7 The parameter θ i etimated the ame way a in tandard LD: θ = (n (d + α/(n (d + (d Tα.. Experiment Synthetic orpora: We preent reult on ynthetic dataet to how how the Dirichlet Foret (DF incorporate different type of nowledge. Recall that DF with η = i equivalent to tandard LD (verified with the code of (Griffith & Steyver,. Previou tudie often tae the lat MM ample (z and q :T, and dicu the topic φ :T derived from that ample. ecaue of the tochatic nature of MM, we argue that more inight can be gained if multiple independent MM ample are conidered. For each dataet, and each DF with a different η, we run a long MM chain with, iteration of burn-in, and tae out a ample every, iteration afterward, for a total of ample. We have ome indication that our chain i well-mixed, a we oberve all expected mode, and that ample with label witching (i.e., equivalent up to label permutation occur with near equal frequency. For each ample, we derive it topic φ :T with (7 and then greedily align the φ from different ample, permuting the T topic label to remove the label witching effect. Within a dataet, we perform P on the baeline (η = φ and proect all ample into the reulting pace to obtain a common viualization (each row in Figure. Point are dithered to how overlap.. Mut-Lin (,: The corpu conit of ix document over a vocabulary of five word. The document are:, DD, and EEEE, each repreented twice. We let T =,α =.,β =.. LD produce three ind of φ :T : roughly a third D E of the time the topic are around [ ], which i horthand for φ = (,,,, φ = (,,,, on the vocabulary DE. nother third are around ML(,, eta=.. L(,, eta= Iolate(, eta= Split(,D, eta= ML(,, eta=.. L(,, eta=.... Iolate(, eta=.... Split(,D, eta= ML(,, eta=.. L(,, eta=.... Iolate(, eta=.... Split(,D, eta= Figure. P proection of permutation-aligned φ ample for the four ynthetic data experiment. [ E D D E]. ], and the final third around [ They correpond to cluter, and repectively in the upper-left panel of Figure. We add a ingle Mut- Lin (,. When η =, the data till override our Mut-Lin omewhat becaue cluter and do not diappear completely. η increae to, Mut-Lin override the data and cluter and vanih, leaving only cluter. That i, running DF and taing the lat D ample i very liely to obtain the [ E] topic. Thi i what we want: and are preent or abent together in the topic and they alo pull, D along, even though, D are not in the nowledge we added. annot-lin (,: The corpu ha four document:, DDDD, twice each; T =,α =,β =.. LD produce ix ind of φ :T evenly: [ D ], [ D], [ D ], [ D], [ D], [ D ], correponding to cluter and the line. We add a ingle annot-lin (,. DF η increae, cluter [ D] diappear, becaue it involve a topic that violate the annot- Lin. Other cluter become uniformly more liely. Iolate(: The corpu ha four document, all of which are ; T =,α =,β =.. LD pro-

7 duce three cluter evenly: [ ], [ ], [ ]. We add Iolate(, which i compiled into annot- Lin (, and annot-lin (,. The DF ample concentrate to cluter : [ ], which indeed iolate into it own topic. Split(,D: The corpu ha ix document: DEEEE, DFFFF, each preent three time; α =.,β =.. LD with T = produce a large portion of topic around [ D E F] (not hown. We add Split(,D, which i compiled into Mut- Lin (,, Mut-Lin (,D, annot-lin (,, and increae T =. However, DF with η = (i.e., LD with T = produce a large variety of topic: e.g., cluter i [ D 8 8 7F 8 8 E], cluter i E F], and cluter 7 i [ D E F]. [ 7D That i, imply adding one more topic doe not clearly eparate and D. On the other hand, with η increaing, DF eventually concentrate on cluter 7, which atifie the Split operation. Wih orpu: We now conider interactive topic modeling with DF. The corpu we ue i a collection of 89,7 New Year wihe ubmitted to The Time Square lliance (Goldberg et al., 9. Each wih i treated a a document, downcaed but without topword removal. For each tep in our interactive example, we et α =., β =., η =, and run MM for iteration before etimating the topic from the final ample. The domain nowledge in DF i accumulative along the tep. Step : We run LD with T =. Many of the mot probable word in the topic are conventional ( to, and or corpu-pecific ( wih, 8 topword, which obcure the meaning of the topic. Step : We manually create a -word topword lit, and iue an Iolate preference. Thi i compiled into Mut-Lin among thi et and annot-lin between thi et and all other word in the top for all topic. T i increaed to 6. fter running DF, we end up with two topword topic. Importantly, with the topword explained by thee two topic, the top word for the other topic become much more meaningful. Step : We notice that one topic conflate two concept: enter college and cure dieae (top 8 word: go chool cancer into well free cure college. We iue Split( go,chool,into,college, cancer,free,cure,well to eparate the concept. Thi i compiled into Mut- Lin within each quadruple, and a annot-lin between them. T i increaed to 8. fter running DF, one of the topic clearly tae on the college concept, picing up related word which we did not explicitly encode in our prior. nother topic doe liewie for the cure concept (many wihe are lie mom tay cancer free. Other topic have minor change. Table. Wih topic from interactive topic modeling Topic Top word orted by φ = p(word topic Merge love loe weight together forever marry meet ucce health happine family good friend properity life life happy bet live time long wihe ever year - a do not what omeone o lie don much he money out mae money up houe wor able pay own lot people no people top le day every each other another iraq home afe end troop iraq bring war return oy love true peace happine dream oy everyone family happy healthy family baby afe properou vote better hope preident paul ron than peron buh Iolate and to for a the year in new all my god god ble eu everyone loved now heart chrit peace peace world earth win lottery around ave pam com call if u www viit Iolate i to wih my for and a be that the Split ob go great chool into good college hope move Split mom hope cancer free huband on well dad cure Step : We then notice that two topic correpond to romance concept. We apply Merge( love, forever, marry, together, love, meet, boyfriend, married, girlfriend, wedding, which i compiled into Mut- Lin between thee word. T i decreaed to 7. fter running DF, one of the romance topic diappear, and the remaining one correpond to the merged romance topic ( loe, weight were in one of them, and remain o. Other previou topic urvive with only minor change. Table how the wih topic after thee four tep, where we place the DF operation next to the mot affected topic, and color-code the word explicitly pecified in the domain nowledge. Yeat orpu: Wherea the previou experiment illutrate the utility of our approach in an interactive etting, we now conider a cae in which we ue bacground nowledge from an ontology to guide topic modeling. Our prior nowledge i baed on ix concept. The concept trancription, tranlation and replication characterize three important procee that are carried out at the molecular level. The concept initiation, elongation and termination decribe phae of the three aforementioned procee. ombination of concept from thee two et correpond to concept in the Gene Ontology (e.g., GO:6 i tranlational elongation, and GO:6 i trancription initiation. We guide our topic modeling uing Mut-Lin among a mall et of word for each concept. Moreover, we ue annot-lin among word to pecify that we prefer (i trancription, tranlation and replication to be repreented in eparate topic, and (ii initiation, elongation and termination to be repreented in eparate topic. We do not et any preference between the proce topic and the phae topic, however.

8 Table. Yeat topic. The left column how the eed word in the DF model. The middle column indicate the topic in which at leat eed word are among the highet probability word for LD, the o column give the number of other topic (not hared by another word. The right column how the ame topic-word relationhip for the DF model. LD DF o trancription trancriptional template tranlation tranlational trn replication cycle diviion initiation tart aembly 7 elongation termination diaembly releae top The corpu that we ue for our experiment conit of 8,9 abtract elected from the MEDLINE databae for their relevance to yeat gene. We induce topic model uing DF to encode the Mut-Lin and annot-lin decribed above, and ue tandard LD a a control. We et T =,α =.,β =.,η =. For each word that we ue to eed a concept, Table how the topic that include it among their mot probable word. We mae everal obervation about the DF-induced topic. Firt, each concept i repreented by a mall number of topic and the Mut- Lin word for each topic all occur a highly probable word in thee topic. Second, the annot-lin preference are obeyed in the final topic. Third, the topic ue the proce and phae topic in a compoitionally. For example, DF Topic repreent trancription initiation and DF Topic 8 repreent replication initiation. Moreover, the topic that are ignificantly influenced by the prior typically include highly relevant term among their mot probable word. For example, the top word in DF Topic include TT, TFIID, promoter, and recruitment which are all pecifically germane to the compoite concept of trancription initiation. In the cae of tandard LD, the eed concept word are dipered acro a greater number of topic, and highly related word, uch a cycle and diviion often do not fall into the ame topic. Many of the topic induced by ordinary LD are emantically coherent, but the pecific concept uggeted by our prior do not naturally emerge without uing DF. cnowledgment: Thi wor wa upported by NIH/NLM grant T LM79 and R LM7, and the Wiconin lumni Reearch Foundation. Reference au, S., Davidon, I., & Wagtaff, K. (Ed.. (8. ontrained clutering: dvance in algorithm, theory, and application. hapman & Hall/R. lei, D., & Lafferty, J. (6. orrelated topic model. In dvance in neural information proceing ytem 8, 7. ambridge, M: MIT Pre. lei, D., Ng,., & Jordan, M. (. Latent Dirichlet allocation. Journal of Machine Learning Reearch,, 99. hemudugunta,., Holloway,., Smyth, P., & Steyver, M. (8. Modeling document by combining emantic concept with unupervied tatitical learning. Intl. Semantic Web onf. (pp. 9. Springer. Denni III, S. Y. (99. On the hyper-dirichlet type and hyper-liouville ditribution. ommunication in Statitic Theory and Method,, Goldberg,., Fillmore, N., ndrzeewi, D., Xu, Z., Gibon,., & Zhu, X. (9. May all your wihe come true: tudy of wihe and how to recognize them. Human Language Technologie: Proc. of the nnual onf. of the North merican hapter of the oc. for omputational Linguitic. L Pre. Griffith, T. L., & Steyver, M. (. Finding cientific topic. Proc. of the Nat. cademy of Science of the United State of merica,, 8. Grigg, J. R., Grintead,. M., & Guichard, D. R. (988. The number of maximal independent et in a connected graph. Dicrete Math., 68,. Li, W., & Mcallum,. (6. Pachino allocation: DG-tructured mixture model of topic correlation. Proc. of the rd Intl. onf. on Machine Learning (pp M Pre. Mina, T. P. (999. The Dirichlet-tree ditribution (Technical Report. mina/paper/dirichlet/mina-dirtree.pdf. Tam, Y.-., & Schultz, T. (7. orrelated latent emantic model for unupervied LM adaptation. IEEE Intl. onf. on coutic, Speech and Signal Proceing (pp.. The Gene Ontology onortium (. Gene Ontology: Tool for the unification of biology. Nature Genetic,, 9.