Dynamic Topic Models. Abstract. 1. Introduction


 Trevor Goodwin
 1 years ago
 Views:
Transcription
1 David M. Blei omputer Science Department, Princeton University, Princeton, NJ 08544, USA John D. Lafferty School of omputer Science, arnegie Mellon University, Pittsburgh PA 15213, USA Abstract A family of probabilistic time series models is developed to analye the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric avelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative indo into the contents of a large document collection. The models are demonstrated by analying the OR ed archives of the journal Science from 1880 through Introduction Managing the explosion of electronic document archives requires ne tools for automatically organiing, searching, indexing, and brosing large collections. Recent research in machine learning and statistics has developed ne techniques for finding patterns of ords in document collections using hierarchical probabilistic models Blei et al., 2003; Mcallum et al., 2004; RosenZvi et al., 2004; Griffiths and Steyvers, 2004; Buntine and Jakulin, 2004; Blei and Lafferty, These models are called topic models because the discovered patterns often reflect the underlying topics hich combined to form the documents. Such hierarchical probabilistic models are easily generalied to other kinds of data; for example, topic models have been used to analye images FeiFei and Perona, 2005; Sivic et al., 2005, biological data Pritchard et al., 2000, and survey data Erosheva, In an exchangeable topic model, the ords of each docu Appearing in Proceedings of the 23 rd International onference on Machine Learning, Pittsburgh, PA, opyright 2006 by the authors/oners. ment are assumed to be independently dran from a mixture of multinomials. The mixing proportions are randomly dran for each document; the mixture components, or topics, are shared by all documents. Thus, each document reflects the components ith different proportions. These models are a poerful method of dimensionality reduction for large collections of unstructured documents. Moreover, posterior inference at the document level is useful for information retrieval, classification, and topicdirected brosing. Treating ords exchangeably is a simplification that it is consistent ith the goal of identifying the semantic themes ithin each document. For many collections of interest, hoever, the implicit assumption of exchangeable documents is inappropriate. Document collections such as scholarly journals, , nes articles, and search query logs all reflect evolving content. For example, the Science article The Brain of Professor Laborde may be on the same scientific path as the article Reshaping the ortical Motor Map by Unmasking Latent Intracortical onnections, but the study of neuroscience looked much different in 1903 than it did in The themes in a document collection evolve over time, and it is of interest to explicitly model the dynamics of the underlying topics. In this paper, e develop a dynamic topic model hich captures the evolution of topics in a sequentially organied corpus of documents. We demonstrate its applicability by analying over 100 years of OR ed articles from the journal Science, hich as founded in 1880 by Thomas Edison and has been published through the present. Under this model, articles are grouped by year, and each year s articles arise from a set of topics that have evolved from the last year s topics. In the subsequent sections, e extend classical state space models to specify a statistical model of topic evolution. We then develop efficient approximate posterior inference techniques for determining the evolving topics from a sequential collection of documents. Finally, e present qualitative results that demonstrate ho dynamic topic models allo the exploration of a large document collection in ne
2 ays, and quantitative results that demonstrate greater predictive accuracy hen compared ith static topic models. 2. Dynamic Topic Models While traditional time series modeling has focused on continuous data, topic models are designed for categorical data. Our approach is to use state space models on the natural parameter space of the underlying topic multinomials, as ell as on the natural parameters for the logistic normal distributions used for modeling the documentspecific topic proportions. First, e revie the underlying statistical assumptions of a static topic model, such as latent Dirichlet allocation LDA Blei et al., Let 1:K be K topics, each of hich is a distribution over a fixed vocabulary. In a static topic model, each document is assumed dran from the folloing generative process: 1. hoose topic proportions from a distribution over the K 1simplex, such as a Dirichlet. 2. For each ord: a hoose a topic assignment Z Mult. b hoose a ord W Mult. This process implicitly assumes that the documents are dran exchangeably from the same set of topics. For many collections, hoever, the order of the documents reflects an evolving set of topics. In a dynamic topic model, e suppose that the data is divided by time slice, for example by year. We model the documents of each slice ith a K component topic model, here the topics associated ith slice t evolve from the topics associated ith slice t 1. For a Kcomponent model ith V terms, let t,k denote the V vector of natural parameters for topic k in slice t. The usual representation of a multinomial distribution is by its mean parameteriation. If e denote the mean parameter of a V dimensional multinomial by π, the ith component of the natural parameter is given by the mapping i = logπ i /π V. In typical language modeling applications, Dirichlet distributions are used to model uncertainty about the distributions over ords. Hoever, the Dirichlet is not amenable to sequential modeling. Instead, e chain the natural parameters of each topic t,k in a state space model that evolves ith Gaussian noise; the simplest version of such a model is t,k t 1,k N t 1,k,σ 2 I. 1 Our approach is thus to model sequences of compositional random variables by chaining Gaussian distributions in a dynamic model and mapping the emitted values to the simplex. This is an extension of the logistic normal distribu N N N A A A Figure 1. Graphical representation of a dynamic topic model for three time slices. Each topic s natural parameters t,k evolve over time, together ith the mean parameters t of the logistic normal distribution for the topic proportions. tion Aitchison, 1982 to timeseries simplex data West and Harrison, In LDA, the documentspecific topic proportions are dran from a Dirichlet distribution. In the dynamic topic model, e use a logistic normal ith mean to express uncertainty over proportions. The sequential structure beteen models is again captured ith a simple dynamic model t t 1 N t 1,δ 2 I. 2 For simplicity, e do not model the dynamics of topic correlation, as as done for static models by Blei and Lafferty By chaining together topics and topic proportion distributions, e have sequentially tied a collection of topic models. The generative process for slice t of a sequential corpus is thus as follos: 1. Dra topics t t 1 N t 1,σ 2 I. 2. Dra t t 1 N t 1,δ 2 I. 3. For each document: a Dra η N t,a 2 I b For each ord: i. Dra Z Multπη. ii. Dra W t,d,n Multπ t,. Note that π maps the multinomial natural parameters to the mean parameters, π k,t = exp k,t, P exp k,t,. The graphical model for this generative process is shon in Figure 1. When the horiontal arros are removed, breaking the time dynamics, the graphical model reduces to a set of independent topic models. With time dynamics, the kth K
3 topic at slice t has smoothly evolved from the kth topic at slice t 1. For clarity of presentation, e no focus on a model ith K dynamic topics evolving as in 1, and here the topic proportion model is fixed at a Dirichlet. The technical issues associated ith modeling the topic proportions in a time series as in 2 are essentially the same as those for chaining the topics together. 3. Approximate Inference Working ith time series over the natural parameters enables the use of Gaussian models for the time dynamics; hoever, due to the nonconjugacy of the Gaussian and multinomial models, posterior inference is intractable. In this section, e present a variational method for approximate posterior inference. We use variational methods as deterministic alternatives to stochastic simulation, in order to handle the large data sets typical of text analysis. While Gibbs sampling has been effectively used for static topic models Griffiths and Steyvers, 2004, nonconjugacy makes sampling methods more difficult for this dynamic model. The idea behind variational methods is to optimie the free parameters of a distribution over the latent variables so that the distribution is close in KullbackLiebler KL divergence to the true posterior; this distribution can then be used as a substitute for the true posterior. In the dynamic topic model, the latent variables are the topics t,k, mixture proportions t,d, and topic indicators t,d,n. The variational distribution reflects the group structure of the latent variables. There are variational parameters for each topic s sequence of multinomial parameters, and variational parameters for each of the documentlevel latent variables. The approximate variational posterior is K q k,1,..., k,t ˆ k,1,..., ˆ k,t 3 k=1 T Dt d=1 q t,d γ t,d N t,d n=1 q t,d,n φ t,d,n In the commonly used meanfield approximation, each latent variable is considered independently of the others. In the variational distribution of { k,1,..., k,t }, hoever, e retain the sequential structure of the topic by positing a dynamic model ith Gaussian variational observations {ˆ k,1,..., ˆ k,t }. These parameters are fit to minimie the KL divergence beteen the resulting posterior, hich is Gaussian, and the true posterior, hich is not Gaussian. A similar technique for Gaussian processes is described in Snelson and Ghahramani, The variational distribution of the documentlevel latent. N N N A A A Figure 2. A graphical representation of the variational approximation for the time series topic model of Figure 1. The variational parameters ˆ and ˆ are thought of as the outputs of a Kalman filter, or as observed data in a nonparametric regression setting. variables follos the same form as in Blei et al Each proportion vector t,d is endoed ith a free Dirichlet parameter γ t,d, each topic indicator t,d,n is endoed ith a free multinomial parameter φ t,d,n, and optimiation proceeds by coordinate ascent. The updates for the documentlevel variational parameters have a closed form; e use the conjugate gradient method to optimie the topiclevel variational observations. The resulting variational approximation for the natural topic parameters { k,1,..., k,t } incorporates the time dynamics; e describe one approximation based on a Kalman filter, and a second based on avelet regression Variational Kalman Filtering The vie of the variational parameters as outputs is based on the symmetry properties of the Gaussian density, f µ,σ x = f x,σ µ, hich enables the use of the standard forardbackard calculations for linear state space models. The graphical model and its variational approximation are shon in Figure 2. Here the triangles denote variational parameters; they can be thought of as hypothetical outputs of the Kalman filter, to facilitate calculation. To explain the main idea behind this technique in a simpler setting, consider the model here unigram models t in the natural parameteriation evolve over time. In this model there are no topics and thus no mixing parameters. The calculations are simpler versions of those e need for the more general latent variable models, but exhibit the es K
4 sential features. Our state space model is t t 1 N t 1,σ 2 I t,n t Multπ t and e form the variational state space model here ˆ t t N t, ˆν 2 t I The variational parameters are ˆ t and ˆν t. Using standard Kalman filter calculations Kalman, 1960, the forard mean and variance of the variational posterior are given by m t E t ˆ 1:t = ˆν 2 t V t 1 + σ 2 + ˆν 2 t m t V t E t m t 2 ˆ 1:t ˆν 2 t = V t 1 + σ 2 + ˆν t 2 V t 1 + σ 2 ˆν 2 t V t 1 + σ 2 + ˆν 2 t ˆ t ith initial conditions specified by fixed m 0 and V 0. The backard recursion then calculates the marginal mean and variance of t given ˆ 1:T as m t 1 E t 1 ˆ 1:T = σ 2 V t 1 + σ 2 m t σ 2 V t 1 + σ 2 m t Ṽ t 1 E t 1 m t 1 2 ˆ 1:T 2 Vt 1 = V t 1 + Ṽt V t 1 + σ 2 V t 1 + σ 2 ith initial conditions m T = m T and ṼT = V T. We approximate the posterior p 1:T 1:T using the state space posterior q 1:T ˆ 1:T. From Jensen s inequality, the loglikelihood is bounded from belo as log pd 1:T 4 q 1:T ˆ p 1:T pd 1:T 1:T 1:T log q 1:T ˆ d 1:T 1:T = E q log p 1:T + E q log pd t t + Hq Details of optimiing this bound are given in an appendix Variational Wavelet Regression The variational Kalman filter can be replaced ith variational avelet regression; for a readable introduction standard avelet methods, see Wasserman We rescale time so it is beteen 0 and 1. For 128 years of Science e take n = 2 J and J = 7. To be consistent ith our earlier notation, e assume that ˆ t = m t + ˆνɛ t here ɛ t N0,1. Our variational avelet regression algorithm estimates {ˆ t }, hich e vie as observed data, just as in the Kalman filter method, as ell as the noise level ˆν. For concreteness, e illustrate the technique using the Haar avelet basis; Daubechies avelets are used in our actual examples. The model is then J 1 ˆ t = φx t + 2 j 1 j=0 k=0 D jk ψ jk x t here x t = t/n, φx = 1 for 0 x 1, { 1 if 0 x 1 ψx = 2, 1 if 1 2 < x 1 and ψ jk x = 2 j/2 ψ2 j x k. Our variational estimate for the posterior mean becomes J 1 m t = ˆφx t + 2 j 1 j=0 k=0 ˆD jk ψ jk x t. here ˆ = n 1 n ˆ t, and ˆD jk are obtained by thresholding the coefficients Z jk = 1 n n ˆ t ψ jk x t. To estimate ˆ t e use gradient ascent, as for the Kalman filter approximation, requiring the derivatives m t / ˆ t. If soft thresholding is used, then e have that m t ˆ s = ˆ ˆ s φx t + J 1 2 j 1 j=0 k=0 ˆD jk ˆ s ψ jk x t. ith ˆ/ ˆ s = n 1 and { 1 ˆD jk / ˆ s = n ψ jkx s if Z jk > λ 0 otherise. Note also that Z jk > λ if and only if ˆD jk > 0. These derivatives can be computed using offtheshelf softare for the avelet transform in any of the standard avelet bases. Sample results of running this and the Kalman variational algorithm to approximate a unigram model are given in Figure 3. Both variational approximations smooth out the
5 Darin Einstein moon e+00 2e 04 4e 04 6e 04 0e+00 2e 04 4e 04 6e 04 8e 04 1e e+00 2e 04 4e 04 6e 04 0e+00 2e 04 4e 04 6e 04 8e 04 1e 03 Figure 3. omparison of the Kalman filter top and avelet regression bottom variational approximations to a unigram model. The variational approximations red and blue curves smooth out the local fluctuations in the unigram counts gray curves of the ords shon, hile preserving the sharp peaks that may indicate a significant change of content in the journal. The avelet regression is able to superresolve the double spikes in the occurrence of Einstein in the 1920s. The spike in the occurrence of Darin near 1910 may be associated ith the centennial of Darin s birth in local fluctuations in the unigram counts, hile preserving the sharp peaks that may indicate a significant change of content in the journal. While the fit is similar to that obtained using standard avelet regression to the normalied counts, the estimates are obtained by minimiing the KL divergence as in standard variational approximations. In the dynamic topic model of Section 2, the algorithms are essentially the same as those described above. Hoever, rather than fitting the observations from true observed counts, e fit them from expected counts under the documentlevel variational distributions in Analysis of Science We analyed a subset of 30,000 articles from Science, 250 from each of the 120 years beteen 1881 and Our data ere collected by JSTOR.jstor.org, a notforprofit organiation that maintains an online scholarly archive obtained by running an optical character recognition OR engine over the original printed journals. JS TOR indexes the resulting text and provides online access to the scanned images of the original content through keyord search. Our corpus is made up of approximately 7.5 million ords. We pruned the vocabulary by stemming each term to its root, removing function terms, and removing terms that occurred feer than 25 times. The total vocabulary sie is 15,955. To explore the corpus and its themes, e estimated a 20component dynamic topic model. Posterior inference took approximately 4 hours on a 1.5GHZ PoerP Macintosh laptop. To of the resulting topics are illustrated in Figure 4, shoing the top several ords from those topics in each decade, according to the posterior mean number of occurrences as estimated using the Kalman filter variational approximation. Also shon are example articles hich exhibit those topics through the decades. As illustrated, the model captures different scientific themes, and can be used to inspect trends of ord usage ithin them. To validate the dynamic topic model quantitatively, e consider the task of predicting the next year of Science given all the articles from the previous years. We compare the predictive poer of three 20topic models: the dynamic topic model estimated from all of the previous years, a static topic model estimated from all of the previous years, and a static topic model estimated from the single previous year. All the models are estimated to the same convergence criterion. The topic model estimated from all the previous data and dynamic topic model are initialied at the same point. The dynamic topic model performs ell; it alays assigns higher likelihood to the next year s articles than the other to models Figure 5. It is interesting that the predictive poer of each of the models declines over the years. We can tentatively attribute this to an increase in the rate of specialiation in scientific language.
6 9 : : 9 < D F E;;? = I D F E=? > D I D D? = =? J K GD 9 : L M < D I D F E;;? = F E=? > D =? J K GD 9 L M M I D? G? > D = E> D H? < = B J B J D? < D N < D F E;;? = 9 L 9 M I D D H? < =B? G? > D = E> J B J D? O < F B 9 L P M D H? < =B? G? > D = Q I GK? 9 L R M D H? < =B S I Q?? G? > D = E> Q I GK? 9 L T M N < D D H? < = B 9 L U M D S < I O J < = N D 9 L V M DS < < O J? = Q? 9 L X M < F? G I D 9 L : M < F? G D S < 9 L L = A B J D = K > D K = < F? G J D I D? D S < I D P M M M J D I D? I D J B J D? D S < Y K D K N H B J E> e f ggh i!! " # $ %% & ' ' Z [ \ ] ^ _ ` a b c d _ ` d Z h jh k gil m * ' +, n o f m go e / " 4 / 5 4 & ' " 5 6 ", ª «{ { { { } ~ ƒ ƒ } ~ ƒ ~ } ˆ }~ ƒ } ˆ ˆ ~ Š ƒ } ˆ Ž Š Ž Š Ž ~ } } Š } Ž Ž ˆ ~ }~ }ƒ ~ Ž Š }ƒ Š Œ ~ ƒ Œ ˆ ~ Š ƒ ƒ Ž ~ } }Š } Ž Ž } Œ ~ }ƒ Ž ~ } Ž } Ž Ž ~ Ž ~ Ž Ž Ž } } Š Ž }ƒ }ƒ }ƒ Ž Š Ž Š ƒ Ž Œ } }ƒ } Ž Š } ˆ } ~ Œ } } Š Š ~ } ˆ ~ ~ Ž } Ž } ~ Ž ~ } }~ ƒ ~ ƒ ~ ƒ } ~ } } ~ ƒ Œ } ~ } ƒ ~ }~ ~ ~ ~ }ƒ Ž } Ž ƒ Ž }~ } } Š Ž ƒ Ž Ž ƒ }~ } }ƒ ~ ~ }ƒ Ž Š Ž š } }Š Ž Ž } } Ž ˆ ~ Ž ~ Ž }~ }ƒ } } } Ž } } Ž } } Ž ~ Ž Ž ~ p q r s t u v x r y r p m h i h k f m h o il m œ ž œ Ÿ ž ž œ Ÿ ž œ Ÿ ž œ Ÿ ž œ Ÿ ž ž ž ž 8 " '!!, %% & 5 5. & * ' 0 / $ * & " 2 4 ± 4 ± " 8 5 ' $ ' ' 0 0 Figure 4. Examples from the posterior analysis of a 20topic dynamic model estimated from the Science corpus. For to topics, e illustrate: a the top ten ords from the inferred posterior distribution at ten year lags b the posterior estimate of the frequency as a function of year of several ords from the same to topics c example articles throughout the collection hich exhibit these topics. Note that the plots are scaled to give an idea of the shape of the trajectory of the ords posterior probability i.e., comparisons across ords are not meaningful. 5. Discussion We have developed sequential topic models for discrete data by using Gaussian time series on the natural parameters of the multinomial topics and logistic normal topic proportion models. We derived variational inference algorithms that exploit existing techniques for sequential data; e demonstrated a novel use of Kalman filters and avelet regression as variational approximations. Dynamic topic models can give a more accurate predictive model, and also offer ne ays of brosing large, unstructured document collections. There are many ays that the ork described here can be extended. One direction is to use more sophisticated state space models. We have demonstrated the use of a simple Gaussian model, but it ould be natural to include a drift term in a more sophisticated autoregressive model to explicitly capture the rise and fall in popularity of a topic, or in the use of specific terms. Another variant ould allo for heteroscedastic time series. Perhaps the most promising extension to the methods presented here is to incorporate a model of ho ne topics in the collection appear or disappear over time, rather than assuming a fixed number of topics. One possibility is to use a simple GaltonWatson or birthdeath process for the topic population. While the analysis of birthdeath or branching processes often centers on extinction probabilities, here a goal ould be to find documents that may be responsible for spaning ne themes in a collection.
7 Negative log likelihood log scale 1e+06 2e+06 4e+06 7e+06 LDA prev LDA all DTM data. PhD thesis, arnegie Mellon University, Department of Statistics. FeiFei, L. and Perona, P A Bayesian hierarchical model for learning natural scene categories. IEEE omputer Vision and Pattern Recognition. Griffiths, T. and Steyvers, M Finding scientific topics. Proceedings of the National Academy of Science, 101: Kalman, R A ne approach to linear filtering and prediction problems. Transaction of the AMSE: Journal of Basic Engineering, 82: Year Figure 5. This figure illustrates the performance of using dynamic topic models and static topic models for prediction. For each year beteen 1900 and 2000 at 5 year increments, e estimated three models on the articles through that year. We then computed the variational bound on the negative log likelihood of next year s articles under the resulting model loer numbers are better. DTM is the dynamic topic model; LDAprev is a static topic model estimated on just the previous year s articles; LDAall is a static topic model estimated on all the previous articles. Acknoledgments This research as supported in part by NSF grants IIS and IIS , the DARPA ALO project, and a grant from Google. References Aitchison, J The statistical analysis of compositional data. Journal of the Royal Statistical Society, Series B, 442: Blei, D., Ng, A., and Jordan, M Latent Dirichlet allocation. Journal of Machine Learning Research, 3: Blei, D. M. and Lafferty, J. D orrelated topic models. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18. MIT Press, ambridge, MA. Buntine, W. and Jakulin, A Applying discrete PA in data analysis. In Proceedings of the 20th onference on Uncertainty in Artificial Intelligence, pages AUAI Press. Erosheva, E Grade of membership and latent structure models ith application to disability survey Mcallum, A., orradaemmanuel, A., and Wang, X The authorrecipienttopic model for topic and role discovery in social netorks: Experiments ith Enron and academic . Technical report, University of Massachusetts, Amherst. Pritchard, J., Stephens, M., and Donnelly, P Inference of population structure using multilocus genotype data. Genetics, 155: RosenZvi, M., Griffiths, T., Steyvers, M., and Smith, P The authortopic model for authors and documents. In Proceedings of the 20th onference on Uncertainty in Artificial Intelligence, pages AUAI Press. Sivic, J., Rusell, B., Efros, A., Zisserman, A., and Freeman, W Discovering objects and their location in images. In International onference on omputer Vision IV Snelson, E. and Ghahramani, Z Sparse Gaussian processes using pseudoinputs. In Weiss, Y., Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems 18, ambridge, MA. MIT Press. Wasserman, L Springer. All of Nonparametric Statistics. West, M. and Harrison, J Bayesian Forecasting and Dynamic Models. Springer. A. Derivation of Variational Algorithm In this appendix e give some details of the variational algorithm outlined in Section 3.1, hich calculates a distribution q 1:T ˆ 1:T to maximie the loer bound on
8 log pd 1:T. The first term of the righthand side of 5 is E q log p t t 1 = V T 2 1 2σ 2 = V T 2 log σ 2 + log 2π E q t t 1 T t t 1 log σ 2 + log 2π 1 2σ 2 1 T σ 2 Tr m t m t 1 2 Ṽt + 1 2σ 2 TrṼ0 TrṼT using the Gaussian quadratic form identity E m,v x µ T Σ 1 x µ = The second term of 5 is m µ T Σ 1 m µ + TrΣ 1 V. E q log pd t t = n t E q t log exp t n t m t n ˆζ 1 t t + n t n t log ˆζ t exp m t + Ṽt/2 Next, e maximie ith respect to ˆ s : lˆ, ˆν ˆ = s 1 T mt σ 2 m t m t 1, ˆ m t 1, s ˆ s + n t n tˆζ 1 t exp m t + Ṽt/2 mt ˆ. s The forardbackard equations for m t can be used to derive a recurrence for m t / ˆ s. The forard recurrence is m t ˆ s = ˆν 2 t mt 1 v t 1 + σ 2 + ˆν t 2 ˆ + s ˆν 2 t 1 v t 1 + σ 2 + ˆν t 2 δ s,t, ith the initial condition m 0 / ˆ s = 0. The backard recurrence is then m t 1 σ 2 mt 1 ˆ = s V t 1 + σ 2 ˆ + s 1 σ 2 V t 1 + σ 2 mt ˆ s, ith the initial condition m T / ˆ s = m T / ˆ s. here n t = n t, introducing additional variational parameters ˆζ 1:T. The third term of 5 is the entropy Hq = = log Ṽt + T log 2π 2 log Ṽt + TV 2 log 2π. To maximie the loer bound as a function of the variational parameters e use a conjugate gradient algorithm. First, e maximie ith respect to ˆζ; the derivative is l ˆζ t = n t ˆζ 2 t Setting to ero and solving for ˆζ t gives exp m t + Ṽt/2 n t ˆζ t. ˆζ t = exp m t + Ṽt/2.
OnLine LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking
OnLine LDA: Adaptive Topic Models for Mining Text s with Applications to Topic Detection and Tracking Loulwah AlSumait, Daniel Barbará, Carlotta Domeniconi Department of Computer Science George Mason
More informationGenerative or Discriminative? Getting the Best of Both Worlds
BAYESIAN STATISTICS 8, pp. 3 24. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2007 Generative or Discriminative?
More informationTopics over Time: A NonMarkov ContinuousTime Model of Topical Trends
Topics over Time: A NonMarkov ContinuousTime Model of Topical Trends Xuerui Wang, Andrew McCallum Department of Computer Science University of Massachusetts Amherst, MA 01003 xuerui@cs.umass.edu, mccallum@cs.umass.edu
More informationExponential Family Harmoniums with an Application to Information Retrieval
Exponential Family Harmoniums with an Application to Information Retrieval Max Welling & Michal RosenZvi Information and Computer Science University of California Irvine CA 926973425 USA welling@ics.uci.edu
More informationDiscovering objects and their location in images
Discovering objects and their location in images Josef Sivic Bryan C. Russell Alexei A. Efros Andrew Zisserman William T. Freeman Dept. of Engineering Science CS and AI Laboratory School of Computer Science
More informationAn Alternative Prior Process for Nonparametric Bayesian Clustering
An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna M. Wallach Department of Computer Science University of Massachusetts Amherst Shane T. Jensen Department of Statistics The Wharton
More informationGaussian Process Kernels for Pattern Discovery and Extrapolation
Andrew Gordon Wilson Department of Engineering, University of Cambridge, Cambridge, UK Ryan Prescott Adams School of Engineering and Applied Sciences, Harvard University, Cambridge, USA agw38@cam.ac.uk
More informationTHE adoption of classical statistical modeling techniques
236 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 28, NO. 2, FEBRUARY 2006 Data Driven Image Models through Continuous Joint Alignment Erik G. LearnedMiller Abstract This paper
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationOn Smoothing and Inference for Topic Models
On Smoothing and Inference for Topic Models Arthur Asuncion, Max Welling, Padhraic Smyth Department of Computer Science University of California, Irvine Irvine, CA, USA {asuncion,welling,smyth}@ics.uci.edu
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, BlackwellMacQueen urn scheme, Chinese restaurant
More informationThree New Graphical Models for Statistical Language Modelling
Andriy Mnih Geoffrey Hinton Department of Computer Science, University of Toronto, Canada amnih@cs.toronto.edu hinton@cs.toronto.edu Abstract The supremacy of ngram models in statistical language modelling
More informationFlexible and efficient Gaussian process models for machine learning
Flexible and efficient Gaussian process models for machine learning Edward Lloyd Snelson M.A., M.Sci., Physics, University of Cambridge, UK (2001) Gatsby Computational Neuroscience Unit University College
More informationSumProduct Networks: A New Deep Architecture
SumProduct Networks: A New Deep Architecture Hoifung Poon and Pedro Domingos Computer Science & Engineering University of Washington Seattle, WA 98195, USA {hoifung,pedrod}@cs.washington.edu Abstract
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 ActiveSet Newton Algorithm for Overcomplete NonNegative Representations of Audio Tuomas Virtanen, Member,
More informationTop 10 algorithms in data mining
Knowl Inf Syst (2008) 14:1 37 DOI 10.1007/s1011500701142 SURVEY PAPER Top 10 algorithms in data mining Xindong Wu Vipin Kumar J. Ross Quinlan Joydeep Ghosh Qiang Yang Hiroshi Motoda Geoffrey J. McLachlan
More informationAn Introduction to Variable and Feature Selection
Journal of Machine Learning Research 3 (23) 11571182 Submitted 11/2; Published 3/3 An Introduction to Variable and Feature Selection Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 9478151, USA
More informationFast Set Intersection in Memory
Fast Set Intersection in Memory Bolin Ding University of Illinois at UrbanaChampaign 2 N. Goodin Avenue Urbana, IL 68, USA bding3@uiuc.edu Arnd Christian König Microsoft Research One Microsoft Way Redmond,
More informationProbability Product Kernels
Journal of Machine Learning Research 5 (004) 89 844 Submitted /04; Published 7/04 Probability Product Kernels Tony Jebara Risi Kondor Andrew Howard Computer Science Department Columbia University New York,
More informationConditional mean field
Conditional mean field Peter Carbonetto Department of Computer Science University of British Columbia Vancouver, BC, Canada V6T 1Z4 pcarbo@cs.ubc.ca Nando de Freitas Department of Computer Science University
More informationRepresentation Learning: A Review and New Perspectives
1 Representation Learning: A Review and New Perspectives Yoshua Bengio, Aaron Courville, and Pascal Vincent Department of computer science and operations research, U. Montreal also, Canadian Institute
More informationUsing Both Latent and Supervised Shared Topics for Multitask Learning
Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya 1, Aditya Rawal 2, Raymond J. Mooney 2, and Eduardo R. Hruschka 3 1 Department of ECE, University of Texas at Austin,
More informationModeling Pixel Means and Covariances Using Factorized ThirdOrder Boltzmann Machines
Modeling Pixel Means and Covariances Using Factorized ThirdOrder Boltzmann Machines Marc Aurelio Ranzato Geoffrey E. Hinton Department of Computer Science  University of Toronto 10 King s College Road,
More informationTHE PROBLEM OF finding localized energy solutions
600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Reweighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationExamplebased Synthesis of 3D Object Arrangements. Manolis Savva Stanford University
Examplebased Synthesis of 3D Object Arrangements Matthew Fisher Stanford University Daniel Ritchie Stanford University Manolis Savva Stanford University Database Input Scenes Thomas Funkhouser Princeton
More informationMultiway Clustering on Relation Graphs
Multiway Clustering on Relation Graphs Arindam Banerjee Sugato Basu Srujana Merugu Abstract A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes
More informationFrom Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract
To Appear in the IEEE Trans. on Pattern Analysis and Machine Intelligence From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose Athinodoros S. Georghiades Peter
More informationSegmenting Motion Capture Data into Distinct Behaviors
Segmenting Motion Capture Data into Distinct Behaviors Jernej Barbič Alla Safonova JiaYu Pan Christos Faloutsos Jessica K. Hodgins Nancy S. Pollard Computer Science Department Carnegie Mellon University
More informationTHE development of methods for automatic detection
Learning to Detect Objects in Images via a Sparse, PartBased Representation Shivani Agarwal, Aatif Awan and Dan Roth, Member, IEEE Computer Society 1 Abstract We study the problem of detecting objects
More information