Modeling Tweet Arrival Times using Log-Gaussian Cox Processes
|
|
|
- Silvia Fisher
- 10 years ago
- Views:
Transcription
1 Modeling Tweet Arrival Times using Log-Gaussian Cox Processes Michal Lukasik, 1 P.K. Srijith, 1 Trevor Cohn, 2 and Kalina Bontcheva 1 1 Department of Computer Science, The University of Sheffield 2 Department of Computing and Information Systems, The University of Melbourne {m.lukasik, pk.srijith, k.bontcheva}@shef.ac.uk [email protected] Abstract Research on modeling time series text corpora has typically focused on predicting what text will come next, but less well studied is predicting when the next text event will occur. In this paper we address the latter case, framed as modeling continuous inter-arrival times under a log- Gaussian Cox process, a form of inhomogeneous Poisson process which captures the varying rate at which the tweets arrive over time. In an application to rumour modeling of tweets surrounding the 214 Ferguson riots, we show how interarrival times between tweets can be accurately predicted, and that incorporating textual features further improves predictions. 1 Introduction Twitter is a popular micro-blogging service which provides real-time information on events happening across the world. Evolution of events over time can be monitored there with applications to disaster management, journalism etc. For example, Twitter has been used to detect the occurrence of earthquakes in Japan through user posts (Sakaki et al., 21). Modeling the temporal dynamics of tweets provides useful information about the evolution of events. Inter-arrival time prediction is a type of such modeling and has application in many settings featuring continuous time streaming text corpora, including journalism for event monitoring, real-time disaster monitoring and advertising on social media. For example, journalists track several rumours related to an event. Predicted arrival times of tweets can be applied for ranking rumours according to their activity and narrow the interest to investigate a rumour with a short interarrival time over that of a longer one. Modeling the inter-arrival time of tweets is a challenging task due to complex temporal patterns exhibited. Tweets associated with an event stream arrive at different rates at different points in time. For example, Figure 1a shows the arrival times (denoted by black crosses) of tweets associated with an example rumour around Ferguson riots in 214. Notice the existence of regions of both high and low density of arrival times over a one hour interval. We propose to address inter-arrival time prediction problem with log-gaussian Cox process (LGCP), an inhomogeneous Poisson process (IPP) which models tweets to be generated by an underlying intensity function which varies across time. Moreover, it assumes a non-parametric form for the intensity function allowing the model complexity to depend on the data set. We also provide an approach to consider textual content of tweets to model inter-arrival times. We evaluate the models using Twitter rumours from the 214 Ferguson unrest, and demonstrate that they provide good predictions for inter-arrival times, beating the baselines e.g. homogeneous Poisson Process, Gaussian Process regression and univariate Hawkes Process. Even though the central application is rumours, one could apply the proposed approaches to model the arrival times of tweets corresponding to other types of memes, e.g. discussions about politics. This paper makes the following contributions: 1. Introduces log-gaussian Cox process to predict tweet arrival times. 2. Demonstrates how incorporating text improves results of inter-arrival time prediction. 2 Related Work Previous approaches to modeling inter-arrival times of tweets (Perera et al., 21; Sakaki et al., 21; Esteban et al., 212; Doerr et al., 213) were not complex enough to consider their time varying characteristics. Perera et al. (21) modeled 25 Proceedings of the 215 Conference on Empirical Methods in Natural Language Processing, pages , Lisbon, Portugal, September 215. c 215 Association for Computational Linguistics.
2 Intensity function values LGCPTXT LGCP Intensity function values LGCPTXT LGCP 1h Time 2h 1h Time 2h (a) rumour #39 (b) rumour #6 Figure 1: Intensity functions and corresponding predicted arrival times for different methods across example Ferguson rumours. Arrival times predicted by LGCP are denoted by red pluses, LGCPTXT by blue dots, and ground truth by black crosses. Light regions denote uncertainty of predictions. inter-arrival times as independent and exponentially distributed with a constant rate parameter. A similar model is used by Sakaki et al. (21) to monitor the tweets related to earthquakes. The renewal process model used by Esteban et al. (212) assumes the inter-arrival times to be independent and identically distributed. Gonzalez et al. (214) attempts to model arrival times of tweets using a Gaussian process but assumes the tweet arrivals to be independent every hour. These approaches do not take into account the varying characteristics of arrival times of tweets. Point processes such as Poisson and Hawkess process have been used for spatio-temporal modeling of meme spread in social networks (Yang and Zha, 213; Simma and Jordan, 21). Hawkes processes (Yang and Zha, 213) were also found to be useful for modeling the underlying network structure. These models capture relevant network information in the underlying intensity function. We use a log-gaussian cox process which provides a Bayesian method to capture relevant information through the prior. It has been found to be useful e.g. for conflict mapping (Zammit-Mangion et al., 212) and for frequency prediction in Twitter (Lukasik et al., 215). 3 Data & Problem In this section we describe the data and we formalize the problem of modeling tweet arrival times. Data We consider the Ferguson rumour data set (Zubiaga et al., 215), consisting of tweets on rumours around 214 Ferguson unrest. It consists of conversational threads that have been manually labeled by annotators to correspond to rumours 1. Since some rumours have few posts, we consider only those with at least 15 posts in the first hour as they express interesting behaviour (Lukasik et al., 215). This results in 114 rumours consisting of a total of 498 tweets. Problem Definition Let us consider a time interval [, 2] measured in hours, a set of rumours R = {E i } n i=1, where rumour E i consists of a set of m i posts E i = {p i j }m i j=1. Posts are tuples p i j = (xi j, ti j ), where xi j is text (in our case a vector of Brown clusters counts, see section 5) and t i j is time of occurrence of post p i j, measured in time since the first post on rumour E i. We introduce the problem of predicting the exact time of posts in the future unobserved time interval, which is studied as inter-arrival time prediction. In our setting, we observe posts over a target rumour i for one hour and over reference rumours (other than i) for two hours. Thus, the training data set is R O = {Ei O}n i=1, where Ei O = {p i j }mo i 1 (m O i represents number of posts observed for i th rumour). We query the model for a complete set of times {t i j }m i of posts about m O i +1 rumour i in the future one hour time interval. 1 For a fully automated approach, a system for early detection of rumours (Zhao et al., 215) could be run first and our models then applied to the resulting rumours. 251
3 4 Model The problem of modeling the inter-arrival times of tweets can be solved using Poisson processes (Perera et al., 21; Sakaki et al., 21). A homogeneous Poisson process (HPP) assumes the intensity to be constant (with respect to time and the rumour statistics). It is not adequate to model the inter-arrival times of tweets because it assumes constant rate of point arrival across time. Inhomogeneous Poisson process (IPP) (Lee et al., 1991) can model tweets occurring at a variable rate by considering the intensity to be a function of time, i.e. λ(t). For example, in Figure 1a we show intensity functions learnt for two different IPP models. Notice how the generated arrival times vary according to the intensity function values. Log-Gaussian Cox process We consider a log-gaussian Cox process (LGCP) (Møller and Syversveen, 1998), a special case of IPP, where the intensity function is assumed to be stochastic. The intensity function λ(t) is modeled using a latent function f(t) sampled from a Gaussian process (Rasmussen and Williams, 25). To ensure positivity of the intensity function, we consider λ(t) = exp (f(t)). This provides a nonparametric Bayesian approach to model the intensity function, where the complexity of the model is learnt from the training data. Moreover, we can define the functional form of the intensity function through appropriate GP priors. Modeling inter-arrival time Inhomogeneous Poisson process (unlike HPP) uses a time varying intensity function and hence, the distribution of inter-arrival times is not independent and identically distributed (Ross, 21). In IPP, the number of tweets y occurring in an interval [s, e] is Poisson distributed with rate e s λ(t)dt. p(y λ(t), [s, e]) = Poisson(y e s λ(t)dt) = ( e s λ(t)dt)y exp( e s λ(t)dt) y! (1) Assume that n th tweet occurred at time E n = s and we are interested in the inter-arrival time T n of the next tweet. The arrival time of next tweet E n+1 can be obtained as E n+1 = E n + T n. The cumulative distribution for T n, which provides the probability that a tweet occurs by time s + u can be obtained as 2 p(t n u) = 1 p(t n > u λ(t), E n = s) = 1 p( events in [s, s + u] λ(t)) = 1 exp( = 1 exp( s+u s u λ(t)dt) λ(s + t)dt) (2) The derivation is obtained by considering a Poisson probability for counts with rate parameter given by s+u s λ(t)dt and applying integration by substitution to obtain (2). The probability density function of the random variable T n is obtained by taking the derivative of (2) with respect to u: p(t n = u) = λ(s + u) exp( u λ(s + t)dt). (3) The computational difficulties arising from integration are dealt by assuming the intensity function to be constant in an interval and approximating the inter-arrival time density as (Møller and Syversveen, 1998; Vanhatalo et al., 213) p(t n = u) = λ(s + u) exp( uλ(s + u )). (4) 2 We associate a distinct intensity function λ i (t) = exp(f i (t)) with each rumour E i as they have varying temporal profiles. The latent function f i is modelled to come from a zero mean Gaussian process (GP) (Rasmussen and Williams, 25) prior with covariance defined by a squared exponential (SE) kernel over time, k time (t, t ) = a exp( (t t ) 2 /l). We consider the likelihood of posts Ei O over the entire training period to be product of Poisson distribution (1) over equal length sub-intervals with the rate in a sub-interval [s, e] approximated as (e s) exp(f i ( 1 2 (s + e))). The likelihood of posts in the rumour data is obtained by taking the product of the likelihoods over individual rumours. The distribution of the posterior p(f i Ei O) is intractable and a Laplace approximation (Rasmussen and Williams, 25) is used to obtain the posterior. The predictive distribution f i (t i ) at time t i is obtained using the approximated posterior. The intensity function value at the point t i is then obtained as λ i (t i Ei O ) = exp ( f i (t i ) ) p ( f i (t i ) Ei O ) dfi (t i ). 2 We suppress the conditioning variables for brevity. 252
4 Algorithm 1 Importance sampling for predicting the next arrival time 1: Input: Intensity function λ(t), previous arrival time s, proposal distribution q(t) = exp(t; 2), number of samples N 2: for i = 1 to N do 3: Sample u i q(t). 4: Obtain weights w i = p(u i) q(u i ), where p(t) is given by (4). 5: end for 6: Predict expected inter-arrival time as ū = N i=1 u w i i N j=1 w j 7: Predict the next arrival time as t = s + ū. 8: Return: t Importance sampling We are interested in predicting the next arrival time of a tweet given the time at which the previous tweet was posted. This is achieved by sampling the inter-arrival time of occurrence of the next tweet using equation (4). We use the importance sampling scheme (Gelman et al., 23) where an exponential distribution is used as the proposal density. We set the rate parameter of this exponential distribution to 2 which generates points with a mean value around.5. Assuming the previous tweet occurred at time s, we obtain the arrival time of next tweet as outlined in Algorithm 1. We run this algorithm sequentially, i.e. the time t returned from Algorithm 1 becomes starting time s in the next iteration. We stop at the end of the interval of interest, for which a user wants to find times of post occurrences. Incorporating text We consider adding the kernel over text from posts to the previously introduced kernel over time. We join text from the observed posts together, so a different component is added to kernel values across different rumours. The full kernel then takes ( form k TXT ((t, i), (t, i )) = k time (t, t ) + k text p i x i j EO i j, ) x p i i j EO i j. We compare text via linear kernel with additive underlying base similarity, expressed by k text (x, x ) = b + cx T x. Optimization All model parameters (a, l, b, c) are obtained by maximizing the marginal likelihood p(e O i ) = p(e O i f i)p(f i )df i over all rumour data sets. 5 Experiments Data preprocessing In our experiments, we consider the first two hours of each rumour lifespan. The posts from the first hour of a target rumour is considered as observed (training data) and we predict the arrival times of tweets in the second hour. We consider observations over equal sized time intervals of length six minutes in the rumour lifespan for learning the intensity function. The text in the tweets is represented by using Brown cluster ids associated with the words. This is obtained using 1 clusters acquired on a large scale Twitter corpus (Owoputi et al., 213). Evaluation metrics Let the arrival times predicted by a model be (ˆt 1,..., ˆt M ) and let the actual arrival times be (t 1,..., t N ). We introduce two metrics based on root mean squared error (RMSE) for evaluating predicted inter-arrival times. First is aligned root mean squared error (ARMSE), where we align the initial K = min(m, N) arrival times and calculate the RMSE between such two subsequences. The second is called penalized root mean squared error (PRMSE). In this metric we penalize approaches which predict a different number of inter-arrival times than the actual number. The PRMSE metric is defined as the square root of the following expression. 1 K K (ˆt i t i ) 2 + I[M > N] i=1 +I[M < N] N i=m+1 M i=n+1 (T ˆt i ) 2 (T t i ) 2 (5) The second and third term in (5) respectively penalize for the excessive or insufficient number of points predicted by the model. Baselines We consider a homogeneous Poisson process (HPP) (Perera et al., 21) as a baseline which results in exponentially distributed interarrival times with rate λ. The rate parameter is set to the maximum likelihood estimate, the reciprocal of the mean of the inter-arrival times in the training data. The second baseline is a GP with a linear kernel (GPLIN), where the inter-arrival time is modeled as a function of time of occurrence of last tweet. This model tends to predict small inter-arrival times yielding a huge number of points. We limit the number of predicted points 253
5 method ARMSE PRMSE GPLIN 2.6± ±93.9 HPP 21.85± ±96.5 HP 15.94± ±59.1 LGCP 13.31± ±92.97 LGCP Pooled 19.18± ±12.2 LGCPTXT 15.52± ±115.7 Table 1: ARMSE and PRMSE between the true event times and the predicted event times expressed in minutes (lower is better) over the 114 Ferguson rumours, showing mean ± std. dev. Key denotes significantly worse than LGCPTXT method according to one-sided Wilcoxon signed rank test (p <.5). In case of ARMSE, LGCP is not significantly better than LGCP TXT according to Wilcoxon test. to 1 (above the maximum count yielded by any rumour from our dataset), thus reducing the error from this method. We also compare against Hawkes Process (HP) (Yang and Zha, 213), a self exciting point process where an occurrence of a tweet increases the probability of tweets arriving soon afterwards. We consider a univariate Hawkes process where the intensity function is modeled as λ i (t) = µ + t i j <t k time(t i j, t). The kernel parameters and µ are learnt by maximizing the likelihood. We apply the importance sampling algorithm discussed in Algorithm 1 for generating arrival times for Hawkess process. We consider this baseline only in the single-task setting, where reference rumours are not considered. LGCP settings In the case of LGCP, the model parameters of the intensity function associated with a rumour are learnt from the observed interarrival times from that rumour alone. LGCP Pooled and LGCPTXT consider a different setting where this is learnt additionally using the interarrival times of all other rumours observed over the entire two hour life-span. Results Table 1 reports the results of predicting arrival times of tweets in the second hour of the rumour lifecycle. In terms of ARMSE, LGCP is the best method, performing better than LGCP- TXT (though not statistically significantly) and outperforming other approaches. However, this metric does not penalize for the wrong number of predicted arrival times. Figure 1b depicts an example rumour, where LGCP greatly overestimates the number of points in the interval of interest. Here, the three points from the ground truth (denoted by black crosses) and the initial three points predicted by the LGCP model (denoted by red pluses), happen to lie very close, yielding a low ARMSE error. However, LGCP predicts a large number of arrivals in this interval making it a bad model compared to LGCPTXT which predicts only four points (denoted by blue dots). ARMSE fails to capture this and hence we use PRMSE. Note that Hawkes Process is performing worse than the LGCP approach. According to PRMSE, LGCPTXT is the most successful method, significantly outperforming all other according to Wilcoxon signed rank test. Figure 1a depicts the behavior of LGCP and LGCP- TXT on rumour 39 with a larger number of points from the ground truth. Here, LGCPTXT predicts relatively less number of arrivals than LGCP. The performance of Hawkes Process is again worse than the LGCP approach. The self excitory nature of Hawkes process may not be appropriate for this dataset and setting, where in the second hour the number of points tends to decrease as time passes. We also note, that GPLIN performs very poorly according to PRMSE. This is because the interarrival times predicted by GPLIN for several rumours become smaller as time grows resulting in a large number of arrival times. 6 Conclusions This paper introduced the log-gaussian Cox processes for the problem of predicting the interarrival times of tweets. We showed how text from posts helps to achieve significant improvements. Evaluation on a set of rumours from Ferguson riots showed efficacy of our methods comparing to baselines. The proposed approaches are generalizable to problems other than rumours, e.g. disaster management and advertisement campaigns. Acknowledgments Work partially supported by the European Union under grant agreement No PHEME. References Christian Doerr, Norbert Blenn, and Piet Van Mieghem Lognormal infection times of online information spread. PLOS ONE,
6 J. Esteban, A. Ortega, S. McPherson, and M. Sathiamoorthy Analysis of Twitter Traffic based on Renewal Densities. ArXiv e-prints. Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 23. Bayesian Data Analysis. Chapman and Hall/CRC. Roberto Gonzalez, Alfonso Muñoz, José Alberto Hernández, and Ruben Cuevas On the tweet arrival process at twitter: analysis and applications. Trans. Emerging Telecommunications Technologies, 25(2): S. H. Lee, M. M. Crawford, and J. R. Wilson Modeling and simulation of a nonhomogeneous poisson process having cyclic behavior. Communications in Statistics Simulation, 2(2): Michal Lukasik, Trevor Cohn, and Kalina Bontcheva Point process modelling of rumour dynamics in social media. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 215, pages Shuang-Hong Yang and Hongyuan Zha Mixture of mutually exciting processes for viral diffusion. In ICML (2), volume 28 of JMLR Proceedings, pages 1 9. Andrew Zammit-Mangion, Michael Dewar, Visakan Kadirkamanathan, and Guido Sanguinetti Point process modelling of the afghan war diary. In Proceedings of the National Academy of Sciences, Vol. 19, No. 31, pages Zhe Zhao, Paul Resnick, and Qiaozhu Mei Early detection of rumors in social media from enquiry posts. In International World Wide Web Conference Committee (IW3C2). Arkaitz Zubiaga, Maria Liakata, Rob Procter, Kalina Bontcheva, and Peter Tolmie Towards detecting rumours in social media. In AAAI Workshop on AI for Cities. Jesper Møller and Anne Randi Syversveen Log Gaussian Cox processes. Scandinavian Journal of Statistics, pages Olutobi Owoputi, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith Improved part-of-speech tagging for online conversational text with word clusters. In In Proceedings of NAACL. Rohan DW Perera, Sruthy Anand, KP Subbalakshmi, and R Chandramouli. 21. Twitter analytics: Architecture, tools and analysis. In Military Communications Conference, 21-MILCOM 21, pages Carl Edward Rasmussen and Christopher K. I. Williams. 25. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press. Sheldon M. Ross. 21. Introduction to Probability Models, Tenth Edition. Academic Press, Inc., Orlando, FL, USA. Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 21. Earthquake shakes twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web, WWW 1, pages Aleksandr Simma and Michael I. Jordan. 21. Modeling events with cascades of poisson processes. In UAI, pages Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari Gpstuff: Bayesian modeling with gaussian processes. J. Mach. Learn. Res., 14(1):
Twitter Analytics: Architecture, Tools and Analysis
Twitter Analytics: Architecture, Tools and Analysis Rohan D.W Perera CERDEC Ft Monmouth, NJ 07703-5113 S. Anand, K. P. Subbalakshmi and R. Chandramouli Department of ECE, Stevens Institute Of Technology
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
Gaussian Process Training with Input Noise
Gaussian Process Training with Input Noise Andrew McHutchon Department of Engineering Cambridge University Cambridge, CB PZ [email protected] Carl Edward Rasmussen Department of Engineering Cambridge University
A survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
Gaussian Processes in Machine Learning
Gaussian Processes in Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany [email protected] WWW home page: http://www.tuebingen.mpg.de/ carl
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
Gaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Latent variable and deep modeling with Gaussian processes; application to system identification. Andreas Damianou
Latent variable and deep modeling with Gaussian processes; application to system identification Andreas Damianou Department of Computer Science, University of Sheffield, UK Brown University, 17 Feb. 2016
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Multiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
A Log-Robust Optimization Approach to Portfolio Management
A Log-Robust Optimization Approach to Portfolio Management Dr. Aurélie Thiele Lehigh University Joint work with Ban Kawas Research partially supported by the National Science Foundation Grant CMMI-0757983
Problem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering
IEICE Transactions on Information and Systems, vol.e96-d, no.3, pp.742-745, 2013. 1 Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering Ildefons
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
The Chinese Restaurant Process
COS 597C: Bayesian nonparametrics Lecturer: David Blei Lecture # 1 Scribes: Peter Frazier, Indraneel Mukherjee September 21, 2007 In this first lecture, we begin by introducing the Chinese Restaurant Process.
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts
Page 1 of 20 ISF 2008 Models for Product Demand Forecasting with the Use of Judgmental Adjustments to Statistical Forecasts Andrey Davydenko, Professor Robert Fildes [email protected] Lancaster
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
A random point process model for the score in sport matches
IMA Journal of Management Mathematics (2009) 20, 121 131 doi:10.1093/imaman/dpn027 Advance Access publication on October 30, 2008 A random point process model for the score in sport matches PETR VOLF Institute
**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.
**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Section 5. Stan for Big Data. Bob Carpenter. Columbia University
Section 5. Stan for Big Data Bob Carpenter Columbia University Part I Overview Scaling and Evaluation data size (bytes) 1e18 1e15 1e12 1e9 1e6 Big Model and Big Data approach state of the art big model
Volatility Density Forecasting & Model Averaging Imperial College Business School, London
Imperial College Business School, London June 25, 2010 Introduction Contents Types of Volatility Predictive Density Risk Neutral Density Forecasting Empirical Volatility Density Forecasting Density model
Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Applications of R Software in Bayesian Data Analysis
Article International Journal of Information Science and System, 2012, 1(1): 7-23 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx
ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY
CANADIAN APPLIED MATHEMATICS QUARTERLY Volume 12, Number 4, Winter 2004 ANALYZING NETWORK TRAFFIC FOR MALICIOUS ACTIVITY SURREY KIM, 1 SONG LI, 2 HONGWEI LONG 3 AND RANDALL PYKE Based on work carried out
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Questions and Answers
GNH7/GEOLGG9/GEOL2 EARTHQUAKE SEISMOLOGY AND EARTHQUAKE HAZARD TUTORIAL (6): EARTHQUAKE STATISTICS Question. Questions and Answers How many distinct 5-card hands can be dealt from a standard 52-card deck?
More details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Package EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
A Study on the Comparison of Electricity Forecasting Models: Korea and China
Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 675 683 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.675 Print ISSN 2287-7843 / Online ISSN 2383-4757 A Study on the Comparison
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Local Gaussian Process Regression for Real Time Online Model Learning and Control
Local Gaussian Process Regression for Real Time Online Model Learning and Control Duy Nguyen-Tuong Jan Peters Matthias Seeger Max Planck Institute for Biological Cybernetics Spemannstraße 38, 776 Tübingen,
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
Hedging Options In The Incomplete Market With Stochastic Volatility. Rituparna Sen Sunday, Nov 15
Hedging Options In The Incomplete Market With Stochastic Volatility Rituparna Sen Sunday, Nov 15 1. Motivation This is a pure jump model and hence avoids the theoretical drawbacks of continuous path models.
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Network Architectures & Services
Network Architectures & Services Fernando Kuipers ([email protected]) Multi-dimensional analysis Network peopleware Network software Network hardware Individual: Quality of Experience Friends: Recommendation
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING
A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India [email protected] 2 School of Computer
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Anomaly detection for Big Data, networks and cyber-security
Anomaly detection for Big Data, networks and cyber-security Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with Nick Heard (Imperial College London),
Simulating a File-Sharing P2P Network
Simulating a File-Sharing P2P Network Mario T. Schlosser, Tyson E. Condie, and Sepandar D. Kamvar Department of Computer Science Stanford University, Stanford, CA 94305, USA Abstract. Assessing the performance
Sentiment Analysis Tool using Machine Learning Algorithms
Sentiment Analysis Tool using Machine Learning Algorithms I.Hemalatha 1, Dr. G. P Saradhi Varma 2, Dr. A.Govardhan 3 1 Research Scholar JNT University Kakinada, Kakinada, A.P., INDIA 2 Professor & Head,
Imperfect Debugging in Software Reliability
Imperfect Debugging in Software Reliability Tevfik Aktekin and Toros Caglar University of New Hampshire Peter T. Paul College of Business and Economics Department of Decision Sciences and United Health
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Lecture notes: single-agent dynamics 1
Lecture notes: single-agent dynamics 1 Single-agent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on single-agent models.
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
Distribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach
WC25 User-in-the-loop: A Spatial Traffic Shaping Approach Ziyang Wang, Rainer Schoenen,, Marc St-Hilaire Department of Systems and Computer Engineering Carleton University, Ottawa, Ontario, Canada Sources
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
BINOMIAL OPTIONS PRICING MODEL. Mark Ioffe. Abstract
BINOMIAL OPTIONS PRICING MODEL Mark Ioffe Abstract Binomial option pricing model is a widespread numerical method of calculating price of American options. In terms of applied mathematics this is simple
Efficient Cluster Detection and Network Marketing in a Nautural Environment
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection Jian Zhang School of Computer Science Cargenie Mellon University Pittsburgh, PA 15213 [email protected] Zoubin
7.1 The Hazard and Survival Functions
Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence
Invited Applications Paper
Invited Applications Paper - - Thore Graepel Joaquin Quiñonero Candela Thomas Borchert Ralf Herbrich Microsoft Research Ltd., 7 J J Thomson Avenue, Cambridge CB3 0FB, UK [email protected] [email protected]
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Estimation and attribution of changes in extreme weather and climate events
IPCC workshop on extreme weather and climate events, 11-13 June 2002, Beijing. Estimation and attribution of changes in extreme weather and climate events Dr. David B. Stephenson Department of Meteorology
Final Mathematics 5010, Section 1, Fall 2004 Instructor: D.A. Levin
Final Mathematics 51, Section 1, Fall 24 Instructor: D.A. Levin Name YOU MUST SHOW YOUR WORK TO RECEIVE CREDIT. A CORRECT ANSWER WITHOUT SHOWING YOUR REASONING WILL NOT RECEIVE CREDIT. Problem Points Possible
Traffic Behavior Analysis with Poisson Sampling on High-speed Network 1
Traffic Behavior Analysis with Poisson Sampling on High-speed etwork Guang Cheng Jian Gong (Computer Department of Southeast University anjing 0096, P.R.China) Abstract: With the subsequent increasing
Poisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Predictive modelling around the world 28.11.13
Predictive modelling around the world 28.11.13 Agenda Why this presentation is really interesting Introduction to predictive modelling Case studies Conclusions Why this presentation is really interesting
On Correlating Performance Metrics
On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Predicting Information Popularity Degree in Microblogging Diffusion Networks
Vol.9, No.3 (2014), pp.21-30 http://dx.doi.org/10.14257/ijmue.2014.9.3.03 Predicting Information Popularity Degree in Microblogging Diffusion Networks Wang Jiang, Wang Li * and Wu Weili College of Computer
Chapter 14 Managing Operational Risks with Bayesian Networks
Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian
2WB05 Simulation Lecture 8: Generating random variables
2WB05 Simulation Lecture 8: Generating random variables Marko Boon http://www.win.tue.nl/courses/2wb05 January 7, 2013 Outline 2/36 1. How do we generate random variables? 2. Fitting distributions Generating
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization
Non Linear Dependence Structures: a Copula Opinion Approach in Portfolio Optimization Jean- Damien Villiers ESSEC Business School Master of Sciences in Management Grande Ecole September 2013 1 Non Linear
Classification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
Marshall-Olkin distributions and portfolio credit risk
Marshall-Olkin distributions and portfolio credit risk Moderne Finanzmathematik und ihre Anwendungen für Banken und Versicherungen, Fraunhofer ITWM, Kaiserslautern, in Kooperation mit der TU München und
1.5 / 1 -- Communication Networks II (Görg) -- www.comnets.uni-bremen.de. 1.5 Transforms
.5 / -- Communication Networks II (Görg) -- www.comnets.uni-bremen.de.5 Transforms Using different summation and integral transformations pmf, pdf and cdf/ccdf can be transformed in such a way, that even
