Selection Sampling from Large Data sets for Targeted Inference in Mixture Modeling


 Ashlee Blair
 2 years ago
 Views:
Transcription
1 Selection Sampling from Large Data sets for Targeted Inference in Mixture Modeling Ioanna Manolopoulou, Cliburn Chan and Mike West December 29, 2009 Abstract One of the challenges of Markov chain Monte Carlo in large datasets is the need to scan through the whole data at each iteration of the sampler, which can be computationally prohibitive. Several approaches have been developed to address this, typically drawing computationally manageable subsamples of the data. Here we consider the specific case where most of the data from a mixture model provides little or no information about the parameters of interest, and we aim to select subsamples such that the information extracted is most relevant. The motivating application arises in flow cytometry, where several measurements from a vast number of cells are available. Interest lies in identifying specific rare cell subtypes and characterizing them according to their corresponding markers. We present a Markov chain Monte Carlo approach where an initial subsample of the full data is used to draw a further set of observations from a low probability region of interest, and describe how inferences can be made efficiently by reducing the dimensionality of the problem. Finally, we extend our method to a Sequential Monte Carlo framework whereby the targeted subsample is augmented sequentially as estimates improve, and introduce a stopping rule for determining the size of the targeted subsample. We implement our algorithm on a flow cytometry dataset, providing higher resolution inferences for rare cell subtypes. Postdoctoral Fellow, Department of Statistical Science, ( Professor, Department of Biostatistics and Bioinformatics ( Professor, Department of Statistical Science, Duke University, Durham NC ( 1
2 1 Introduction Following technological advances, in many biological fields a vast amount of data is available; take, for example, flow cytometry, where tens of thousands to millions of individual cells, each with multiple different fluorescenttagged antibody labels, are assayed in a single blood or other fluid sample (see Chan et al., 2008). Although Markov chain Monte Carlo is a very powerful tool for drawing inferences, it requires calculating the likelihood of the full data at each iteration. This is a serious drawback in the case of big datasets, often deeming it computationally prohibitive. Several approaches have been developed in order to address this problem. In most cases, very large datasets are addressed by drawing inferences on computationally manageable subsamples which are drawn randomly from the full data. Ridgeway and Madigan (2002) proposed a twostep algorithm of drawing subsamples in a Sequential Monte Carlo sampler without a mutation step, which was then improved by Balakrishnan and Madigan (2006) by introducing a rejuvenation step based on a kernel smoothing approximation similar to Liu and West (2000). In this paper we are interested in drawing inferences about low probability regions in sample space when large amounts of data from a mixture model are available, yielding few observations in the region of interest. Computational methods in mixture models have been studied extensively and provide a very flexible tool for modelling complex distributions; see, for example MacEachern (1998), MacEachern et al. (1999) and Muller et al. (1996). The motivating application arises in flow cytometry, where a vast number of observations (corresponding to cells) is available, with several markers for each cell; (see Chan et al., 2008). The data are assumed to follow a Gaussian mixture model, with individual components or groups of components representing cell types. Specific interest lies in characterizing a given cell subtype, which may often be significantly rare. For example, polyfunctional lymphocyte subsets that are of interest in predicting vaccine efficacy (Seder et al., 2008) may have frequencies of 0.01% or less of the total peripheral blood cell population. As a result, random subsamples typically contain very few observations of the rare subtype. The key idea is to use an initial random subsample in order to construct a weight function directed around the region of interest, which is subsequently used to draw a targeted subsample. Using nonparametric Bayesian mixture models, we implement a twostep Markov chain Monte Carlo approach of first using the random subsample to draw inferences, and then combining it with the targeted 2
3 subsample. We extend the method to a Sequential Monte Carlo algorithm whereby the targeted subsample is augmented sequentially as more information is available, until no more informative data points appear to be present in the full data. The idea of selective sampling through a weight function has been used in the context of discovery models; see West (1994) and West (1996). We assume that the data follows mixture distribution J f(x) = π j f j (x). j=1 Owing to the flow cytometry application, we assume data which follow a Gaussian Mixture Model as implemented by Chan et al. (2008) (see Appendix A), where cell subtypes correspond to groups of Gaussian components; our algorithms, however, may easily be adapted for nongaussian mixtures. The region which we aim at drawing inferences about is determined by the scientific question at hand, and need not be a lowprobability region. In this paper we focus on drawing inferences about the parameters φ K = (µ K, Σ K ) of a lowprobability component K of a Gaussian mixture with Dirichlet process mixing weights characterized by θ = (µ, Σ, π, z, V, α), specified eg. as the component centered closest to a specific point. 2 Markov chain Monte Carlo approach The objective is to identify and analyze subsamples of the data which contain information about the specific subset of the parameters of interest. The key idea is to obtain a rough estimate about the lowprobability component K based on a random subsample, which is subsequently used to draw weighted subsamples of the data that are more likely to be relevant to our analysis, providing us with higher resolution about the structure of the distribution in the region of interest. The direct approach is to follow a twostep procedure of Markov chain Monte Carlo samplers. We use an initial, randomly drawn subsample from the data in order to obtain an estimate of the parameters, and use this estimate to draw a more informative subsample. The two subsamples are then combined in a joint Markov chain Monte Carlo sampler to provide us with more accurate estimates of φ K. Although interest specifically lies in estimating the parameters of component K given by µ K, Σ K, inference on the full set of µ, Σ is required in order to carry out the analysis. 3
4 We denote the two subsamples (random and targeted) X R and X T of size n R and n T respectively. The first is drawn randomly from the data, whereas the second is drawn according to weights w i 1 i N. We aim to choose the weights so that the targeted subsample contains mostly observations from component K, thus we may choose w i = w(x i ) = N(m, τs), where m, S are estimates of µ K, Σ K from the initial analysis (based on the random subsample), and τ is a tuning scalar. The constant τ will determine how far from the initial estimates the targeted subsample will be. In other words, we choose the second subsample of the data according to how well it fits with the estimated distribution of component K, possibly allowing for a wider distribution through the constant τ. The likelihood of the data (X R, X T ) in component k then takes the following form. For observations in the random subsample: f(x R i z i = k, µ, Σ) = N(x R i µ k, Σ k ), i = 1,..., n R and f(x R i µ, Σ) = K π k N(x R i µ k, Σ k ), i = 1,..., n R k=1 where and For observations in the targeted subsample: f(x T i X R, z i = k, µ, Σ) w(x T i )N(x T i µ K, Σ K )) Σ k = (Σ 1 k N(x T i µ k, Σ k ), i = 1,..., n T, + (τs) 1 ) 1 µ k = Σ k (Σ 1 k µ k + (τs) 1 m) f(x T i X R, µ, Σ) = K π k (θ)n(x T i µ k, Σ k ), i = 1,..., n T, k=1 4
5 where π k (θ) = π k N (µ k m, (τs + Σ k )) K k=1 π kn (µ k m, (τs + Σ k )). Note here that we are using only the unnormalized weights w(x i ) N(x i m, τs) even though we are drawing without replacement, assuming that N i=1 N(x i m, τs) remains unchanged after drawing each of the targeted data points, in other words that the unnormalized weights sum to. This means that we assume a very large number of data points within the region of nonnegligible support of the weight function w(x). The first Markov chain Monte Carlo sampler is a standard blocked Gibbs sampler (see Ishwaran and James, 2002) with target distribution p(µ, Σ, π, z, V, α X R ). In order to carry out the second Markov chain Monte Carlo sampler based on the random and targeted subsamples combined, the posterior distributions of the parameters of z, π, µ, Σ, α have to be recalculated so that efficient proposals are constructed. The posterior for z is multinomial with probabilities for both subsamples. p(z i = k X R, X T, µ, Σ) π k f(x i z i = k, µ k, Σ k ) The posterior distribution of π X R, X T, z, µ, Σ does not follow a closedform distribution; see Equation A1 in Appendix B. The contribution of the targeted subsample to the posterior becomes more significant as τs increases, allowing observations in the targeted subsample to belong to components other than K. The posterior for α only depends on the data through V and thus will have the usual posterior distribution (see Ishwaran and James, 2002). ( α Gamma η 1 + K 1, η 2 ) log(1 V 1:K 1 ) 5 (1)
6 The posterior for µ k can be calculated exactly as µ k X, z, Σ k N(m µ k, Sµ k ), where S µ k, mµ k may be readily calculated through Equation A2 in Appendix B. The posterior for Σ does not follow an Inverse Wishart and cannot be easily sampled from (see Equation A3 in Appendix B). Due to the nonstandard posterior distributions and the dimensionality of the problem, approximating them in order to construct efficient proposals is crucial. 2.1 Markov chain Monte Carlo updates After obtaining the targeted subsample, we construct a Markov chain Monte Carlo sampler with target distribution p(µ, Σ, π, z, V, α X R, X T ) p(x T z T, X R, θ)p(z T X R, θ)p(x R z R, θ)p(z R θ)p(θ). The chain is initialized by drawing µ, Σ, π, z, V, α from their priors, then iterates through the following steps. 1. Update z by generating from the posterior p(z X R, X T, π, µ, Σ) π z f(x R, X T µ, Σ). 2. Update π through a MetropoligHastings step by generating from the posterior p(v X R ), set π k = V k k 1 j=1 (1 V j) and accept the proposed move with probability min ( 1, K i=1 ( ) ) π n T i (θ) i. π i (θ) If the targeted subsample of indeed drawn such that almost all of its points belong to component K, the acceptance probability will be 1. 6
7 3. Update α from its posterior given V given in Equation (1). 4. Update µ through a Gibbs step using µ k X, z, Σ k N(m µ k, Sµ k ) above. 5. The posterior distribution of Σ does not take closed form. We construct a proposal distribution q(σ k XR, X T, z, µ) for a MetropolisHastings step using that f(x T i X R, z i = k, µ, Σ) = π k (θ)n(x T i µ k, Σ k ), where π k (θ) = π k N (µ k m, (τs + Σ k )) K k=1 π kn (µ k m, (τs + Σ k )), we can use the inverse transformation to obtain where X T i X R, z i = k, µ, Σ N(µ k, Σ k ), X T i ( Σ 1 ) = Σ k k XT i S 1 m. In practice, of course, Σ k is not known and the transformation of X T can only be approximated using an estimate of Σ k, e.g. the previous iteration of Σ. q(σ k X R, X T, z, µ) IW (W k + S 0, n k + s 0 + p 1), where W k = z i =k ( X i Xk ), where X R = X R. In addition, a discount factor may be used in order to increase the variance of the proposal kernel. The Markov chain Monte Carlo sampler sweeps through the updates described above, yielding estimates for the posterior distribution of the parameters of interest. However, due to the high number of parameters to be estimated and the difficulty in defining efficient proposals, the acceptance rate quickly drops to zero for targeted subsamples of moderate size. 3 Focusing on the lowprobability component The dimensionality of the problem, combined with the difficulty to construct efficient proposals, results in Markov chain Monte Carlo samplers which require very long running times in order to 7
8 eventually be sampling from the true posterior. At the same time, the approach described above does not exploit the results from the initial run based on the random sample, except for extracting the estimates of µ K, Σ K. We describe how the dimensionality of the problem can be greatly reduced using the posterior distribution estimates obtained from the initial Markov chain Monte Carlo simulation. Notice that the objective is to draw inferences about a region in the sample space which has very low probability. Consequently, very few points in the initial random sample will belong to that region. On the other hand, the targeted sample will, generally, contain observations from the lowprobability region. This implies that the posterior distribution of the parameters based on both the random and targeted samples (X R, X T ) p(µ, Σ X R, X T ) = p(µ, Σ X R, X T, z R ) p(z R X R, X T ), z R can be approximated as p(π, µ, Σ X R, X T ) = p(π, µ, Σ X R, X T, z R ) p(z R X R ), } {{ } } {{ } z R (a) (b) using that p(z R X R, X T ) p(z R X R ). Here (a) requires integrating over a much smaller set of parameters z T and can be calculated much more efficiently, and (b) is known from the first Markov chain Monte Carlo run. This decouples the zdependence of the random and the targeted sample, greatly reducing the dimensionality of the second analysis. The second Markov chain Monte Carlo is then adapted to a set of chains run for a set of particles drawn from the posterior distribution estimate of the first chain. For particles l = 1 : L, draw a sample of (z, π, µ, Σ) l X R from the posterior distribution estimates obtained in the Markov chain Monte Carlo sampler, and carry out the second sampler for each particle only on µ K, Σ K X R, X T, (z R, π, φ K ) l, combining samples at the end. This approach greatly reduces both the complexity of the calculations per sweep, as well as the total number of samples required in order to obtain a good approximation of the posterior distribution. However, because the posteriors µ K, Σ K X R and µ K, Σ K X R, X T may vary greatly, the sampler still suffers from very low acceptance rates and with a moderately sized targeted subsample can fail to reach the region in parameter space of high posterior probability. 8
9 4 Sequential Monte Carlo approach The focused approach drastically reduces the dimensionality of the algorithm, and as a result the computational complexity. However, MetropolisHastings updates still show low acceptance rates, because the two posteriors given X R in the one case, and X R, X T in the other, are very different. In addition, the size of the targeted subsample is chosen manually rather than through an automated procedure. Both drawbacks may be addressed drawing the targeted sample through a Sequential Monte Carlo simulation rather than using a twostep procedure. Sequential Monte Carlo methods provide simulationbased inferences from a sequence of probability distributions. A large number of random samples (particles) is used to approximate the sequence of distributions, so that asymptotically it converges to the true target distribution; see Doucet et al. (2001) and Lopes et al.. Here Sequential Monte Carlo can be used instead of the twostep procedure as described above (whereby an initial random sample X R is drawn, subsequently giving rise to the targeted sample X T ). Here we use a sequential scheme such that the targeted sample is selected one (or more) data point at a time, at each draw updating the estimates about the parameters of component K for a set of particles. In other words, we use the fact that the likelihood of the data may be expressed as p(x 1:n µ, Σ) = n p(x i X 1:i 1, µ, Σ). i=1 For each of a set of particles j = 1 : J, draw a sample of (z, π, µ, Σ) X R from the posterior distribution estimates obtained in the Markov chain Monte Carlo sampler. Then repeatedly augment the targeted subsample and mutate the parameter estimates through the following steps. For j = 1 : J and for a fixed sequence of τ 1:J 1. Draw u = U{1 : J} and set m j 1 = {µ j 1 K } u and S j 1 = {Σ j 1 K } u where {φ j k } u is the sample of the uth particle at step j for component k. 2. Draw another batch of targeted observations X T j without replacement according to weights w i f(x i m j 1, τ j 1 S j 1 ). 3. Update the configuration indicators z using the posterior weights π k N(x µ k, Σ k ). 9
10 4. Using a fixed number of MetropolisHastings steps following the iterates described in the Markov chain Monte Carlo approach above, update The posterior distribution of µ k now becomes where m µ k = S µ µ k, Σ k, π k, α X R, X T 1:j, z. µ k X R, X T 1:j, z, Σ k N(m µ k, Sµ k ), S µ k = (Σ 1 k /t 0 + n R k Σ 1 k + ( n k Σ 1 k x k j i=1 ( (τi S i ) 1 Σ k + I ) 1 Σ 1 k ) 1 j ( (τi S i ) 1 Σ k + I ) ) 1 (τi S i ) 1 m i + µ 0, i=1 where n k is the total number of data points in component k and n T k points in that component coming from the targeted sample. is the number of data It can be shown that, asymptotically (as the number of particles tends to infinity), the approximation of the target distribution will converge to the true density, with the error being of the order N. The parameter τ is a tuning parameter which allows monitoring both the dispersal of the targeted sample and also the assumption of infinite weights. Although in the example presented here (see Subsection 4.1) the parameter τ is held fixed at τ i = 1 i, values > 1 or < 1 may be more beneficial (see Appendix C). Owing to the method in which the parameters m, S of the weight function is fixed at each step of the resampling, weight functions located around different regions of sample space may be chosen. When the lowprobability component follows a mixture distribution between different regions of sample space, this will be reflected in the estimates obtained from each particle, resulting in each particle corresponding to a different draw. Through our adaptive algorithm, the sample space is explored flexibly and posterior estimates of the parameters are updated incrementally as the targeted subsample is augmented, allowing more efficient inferences. This approach immediately poses the question of when to stop drawing observations for the targeted subsample. Ideally, we would like the targeted sample to contain all data points of component 10
11 K. In order to address this, we introduce a decision rule such that the targeted sample stops being augmented when no more data points in the remaining original data show a high probability of belonging to component K. A natural approach to use is the Bayes Factor for that component; see West and Harrison (1997). In other words, we introduce an extra decision step. 5a. If there are no unsampled observations with Bayes Factor BF K (x i ) = π K (x i)/(1 π K (x i)) π K /(1 π K ) > BF, where π K (x i) π K N(x i µ K, Σ K ), stop. The calculation of the Bayes Factor is computationally demanding; as an alternative, the stopping rule may be expressed purely as a function of the weights. In other words, 5b. If there are less than N threshold unsampled observations within a c threshold contour of the weight function, stop. The Sequential Monte Carlo approach provides an efficient method of drawing inferences about parameters relevant to a low probability region of sample space, at the same time allowing the algorithm to automatically monitor the number of observations in the region of interest. 4.1 Example: flow cytometry The motivating example for this study is a problem arising in flow cytometry, where cellular subtypes may be associated with one (or more) components of a Gaussian mixture model (see Chan et al., 2008). Flow cytometers detect fluorescent reporter markers that typically correspond to specific cell surface or intracellular proteins on individual cells, and can assay millions of such cells in a fluid stream in minutes. Datasets are typically very large, and as a result inference on the full data is computationally prohibitive. Interest lies in identifying and characterizing rare cell subtypes using a mixture model fitted on those markers. The ability to identify such rare cell subsets play important roles in many medical contexts  for example, the detection of antigenspecific cells with MHC 11
12 class I or class II markers, identification of polyfunctional T lymphocytes that correlate with vaccine efficacy or host resistance to pathogens, or in resolving variants of already low frequency cell types, e.g. subtypes of conventional dendritic cells. We use a dataset of 50,000 data points from human peripheral blood cells, with 6 marker measurements each: Forward Scatter, Side Scatter, CD4, IFNg+IL2, CD8, CD3 1. The objective is to provide higher resolution on the structure and patterns of covariation of cells of a specific cell subtype, specifically CD3+CD4+ and CD3+CD8+ cells secreting IL2/IFNg when challenged with a specific viral antigen. The data show a clear component structure for some of the markers (see Figure 1), whereas in others the rare cell subtypes of interest are not separated. We specify the statistical question as drawing inferences about the component centered closest to the markers corresponding to a specific cell of known rare subtype. To illustrate our methods and for ease of exposition, we adapt our algorithm by targeting inferences towards the component with highest CD4 centre. An initial sample of size 5,000 is drawn, providing us with initial estimates m, S for the mean and covariance of the component closest to the high CD4+ region. Due to the strong covariation between the markers, several components are needed (see Figure 3) in order to capture the inhomogeneity of the data. Using initial weights w(x) N(x m, S), we apply our Sequential Monte Carlo algorithm to obtain a complete targeted subsample in terms of the stopping rule as well as posterior samples for all our parameters. Looking at the posterior distribution of the total number of components based on the initial MCMC sampler given the random subsample, and subsequently after the SMC sampler given both the random and targeted subsamples in Figure 3, we observe that indeed the targeted approach has provided a better fit for the structure of the data, reflected through the increased number of components (see Figure 2). More specifically, observing samples from the mixture model (see Figure 3) in the CD4 and IFNg markers before and after the targeted subsample, we see that the targeted approach has led to the emergence of more Gaussian components around the region of the rare cell subtypes, providing higher resolution about the structure and covariation of their markers. 1 Data from an NIAID/BD IntraCellular Staining Quality Assurance Panel (ICS QAP) kindly provided by the Duke Center for AIDS Research (CFAR) Immune Monitoring Core 12
13 Figure 1: Pair plots for the last 4 markers: CD4, IFNg, CD8 and CD3. The complete data set is shown in yellow. We aim to use the random subsample (shown in red) in order to obtain samples from the initial posterior p(µ, Σ, π, α X R ) and draw the targeted subsample (shown in blue) using estimates of the distribution of the data (superimposed as a contour plot). More importantly, our targeted approach has revealed components in the low probability subregion which emerge due to the covariation with the remaining markers. These findings agree with the biologists expectation that cell subtypes may have a nongaussian structure. 13
14 Figure 2: Posterior distributions for the number of components in the Gaussian mixture model, given only the random subsample p(k X R ) shown in black and given both the random and targeted subsample p(k X R, X T ) shown in white. 5 Additional comments One of the key aspects of this work consists in defining the low probability region of interest and specifying the weight function used in the targeted sample. Naturally, the low probability region in sample space is strongly driven by the scientific question in hand. Based on that, and taking into account algorithmic tractability and efficiency, different weight functions may be used. In this work we presented methods relating to inferences about a specific component, defined in terms of a identifying criterion. In the flow cytometry example used in this paper, this was chosen as the component with mean closest to a specific point. Although the weight function used had a Gaussian shape, the analysis revealed a nongaussian structure in the region of interest; using mixtures of components as a weight function would be a straightforward extension of our methods. In fact, using a hierarchical model using mixtures of mixtures may provide a better fit to the nongaussian inhomogeneous structure of the flow cytometry data; our targeted subsampling approach can be implemented using such models at little additional computational cost. 14
15 Figure 3: Using the flow cytometry data, using the Sequential Monte Carlo targeted resampling algorithm, sample realization of the mixture model (a) based on the random subsample and (b) based on both the random subsample and the targeted subsample. Crosses are shown at the mean of each component, with 50% contours drawn. A natural extension to the weight functions used in this work stems from the fact that, in the original flow cytometry data, the identifying criterion for the component of interest is not defined on a fixed number of dimensions. Instead, it is defined as the set of markers which are significant in identifying the component in the region of low probability in sample space, which itself is unknown. In other words, the Gaussian mixture may be defined only on a subset of the p markers (unknown), such that we draw inferences about the parameters of the mixture p(θq X) xi Rq are for variable dimensions 1 : q, q p. The targeted learning about θq can be incorporated in the analysis such that, within the Sequential design, the weight function w(x) N (x m, S) is updated at each round of resampling both in terms of the mean m and covariance S of the Gaussian distribution, but also in terms of the markers over which the weight function is defined. In the case of flow cytometry data, this can be viewed as soft gating of cells into cell subtypes, based on both the values of the individual markers, but also the set of significant markers. One of the main challenges in drawing inferences about targeted subsamples is constructing efficient proposals for the parameters of interest, as the convergence of the algorithms is influenced by several factors. The size of the targeted subsample in relation to the random subsample plays 15
16 a significant role. This becomes especially important when the assumption of an infinite number of observations within the region of interest is breached, as this would lead to a likelihood used for the targeted subsample which deviates severely from the true likelihood because of sampling without replacement. The multiplicative constant τ also plays a significant role in constructing a weight function which is wide enough to not violate the infinite weights assumption, at the same time targeting the region of interest. Finally, our algorithms were implemented in MATLAB and the code is freely available upon request. 16
17 A Gaussian Mixture Model We are given data X comprising a total N data points from a pdimensional gaussian mixture distribution K f(x i µ, Σ) = π k N(x i µ k, Σ k ), k=1 using a standard truncated Dirichlet process mixing distribution (see Ishwaran and James, 2002). Here N(x µ, Σ) represents the probability density function of a normal distribution with mean µ and covariance matrix Σ, and the parameters π k represent the mixing weights. Let θ = {π 1:K, φ 1:K }, φ j = {µ j, Σ j }. The mixture model can be realized through the configuration indicators z i for each observation x i, so that we obtain the standard hierarchical model (x i z i = k, φ k ) N(x i µ k, Σ k ), (φ k G) G, (G α, G 0 ) DP (α, G 0 ), where G( ) is an uncertain distribution function, G 0 ( ) is the prior mean of G( ) and α > 0 the total mass, or precision of the DP. From the Pólya urn scheme, α θ i θ 1,..., θ i 1 i 1 + α G 1 i 1 0( ) + δ θj ( ). i 1 + α For conditional conjugacy, it is convenient to use normalinverse Wishart priors, i.e., G 0 (µ, Σ) = N(µ µ 0, t 0 Σ)IW (Σ s 0, S 0 ). j=1 Finally, we assume a Gamma prior for the Dirichlet precision parameters α Gamma(η 1, η 2 ), and the mixing probabilities are such that π k = V k k 1 i=1 (1 V i), where V i Beta(1, α). B Posterior Distributions Given both the random and the targeted subsample, the posterior distributions of the parameters take the following form. 17
18 The posterior for z is multinomial with probabilities p(z i = k X R, X T, µ, Σ) π k f(x i z i = k, µ k, Σ k ) for both subsamples. The π k s can be realized through a set of stickbreaking weights V (see Ishwaran and James, 2002), such that, given the random subsample, γ 1 = 1 + n k K γ 2 = α + l=k+1 V k X R, z Beta(γ 1, γ 2 ), with V K = 1. The posterior distribution of π given both the random and targeted subsample is given by n k p(π X R, X T, z, µ, Σ) = p(π X R, z R, µ, Σ) = k=1 K k=1 π nt k k ( K π kn µ k m, ( ) ) (τs) 1 + Σ 1 1 k K j=1 π jn(µ j m, ( (τs) 1 + Σ 1 j ) 1) n T k. (A1) The contribution of the targeted subsample to the posterior distribution for π provides little additional information about the distribution of π when τs is small. The posterior for α only depends on the data through V and thus will have the usual posterior distribution ( α Gamma η 1 + K 1, η 2 ) log(1 V 1:K 1 ). The posterior for µ k can be calculated exactly as µ k X, z, Σ k N(m µ k, Sµ k ), 18
19 where ( S µ k = Σ k (1/t 0 + n R k + n T k (τs) 1 Σ k + I ) ) 1 1 ( m µ k = S µ k n k Σ 1 k x ( k n T k (τs) 1 Σ k + I ) ) 1 τ 1 S 1 m + µ 0, (A2) where n k is the total number of data points in component k and n T k is the number of data points in that component coming from the targeted subsample. Notice that the contribution of the targeted subsample to the posterior variance of µ k is n T k (τ 1 S 1 Σ k + I) 1, and since S is an estimate of Σ k, this quantity is of the order n T k τ, implying that the narrower the weight τ+1 function, the less information about µ k available, which is intuitive. The posterior for Σ does not follow an Inverse Wishart distribution, and has the form Σ k X, z, µ k Σ k s 0 Σ k nr k /2 (τs) 1 + Σ 1 k nt k /2 exp { S 0Σ 1 k 2 n k i=1 x T i Σ 1 k x i 2 + n R µ T k Σ 1 k xr nr k 2 µt k Σ 1 k µ k nt k 2 µt k ( (τs) 1 Σ k + I ) 1 Σ 1 k µ k n T k µ T k } nt k 2 m(σ 1 τs + I) 1 τsm ( (τs) 1 Σ k + I ) 1 (τs) 1 m (A3) C Weight functions In both the Markov chain Monte Carlo and Sequential Monte Carlo approaches described above, the targeted sample was weighted proportionally to N(x i m, τs), where m and S are estimates of the mean and covariance of the lowprobability component K. The multiplicative constant τ works as a tuning parameter. A larger value will allow for wider dispersal of the targeted subsample, accounting for uncertainty of the initial estimate of φ K. As τ decreases, the weights w i in the targeted sample become heavily weighted around a small number of data points. As a result, the assumption of an infinite number of points with nonnegligible weight becomes invalid. If our initial estimate of µ K, Σ K is bad, a small τ will restrict the targeted sample to a region away from the full low probability region of interest. Within the context of the MetropolisHastings updates, 19
20 as τ increases, the acceptance rate for µ, Σ increases, since the targeted sample looks more like the random sample. In that case, the posterior distribution of φ K is not pulled too far from the proposed values. At the same time, as τ increases, acceptance rate for π decreases, because the information about π given by the targeted sample becomes significant, and the proposed values (which are based only on the random subsample) may potentially become bad. Consider the onedimensional case where w(x i ) N(x i m, τs), p = 1, and assume that µ, Σ, π are all known, and that there is an infinite number of data points. The weight function becomes w(x i ) N(x i µ K, τσ K ), and the coefficient τ may be chosen such that the probability of drawing data points from the lowprobability component is maximized. Figure 4: Example in one dimension, here the blue curve represents the mixture f(x π, µ, Σ) and the red line the density of the lowprobability component N(x µ K, Σ K ). The black curve then represents the weight function N(x µ K, τσ K ), and the green curve the mixture distribution of the targeted sample, f(x π, µ, Σ). Ideally we want the common area of the green and red curve to be maximized. Considering the overlap between the distribution of the targeted subsample and the lowprobability component, we plot the common area for varying τ, and obtain the graph shown in Figure 5. As is seen from Figure 5, in terms of maximizing the overlap between the low probability com 20
21 Figure 5: Example of S(τ) for several values of (µ K, π K ), using a numerical approximation of the integral in order to calculate the common area. ponent and the targeted subsample, the optimum value of τ varies. Specifically, the closer the remaining components are to the component of interest (and similarly the higher their variance) yields a lower value for the optimum τ, and the same happens when the weight of the component of interest decreases. Combining the above results with the fact that a large τ will improve the acceptance rate for µ, Σ but reduce the acceptance rate for π, and taking into account uncertainty on the S = ˆΣ K, it is apparent that the optimum coefficient τ is not uniquely 1, and plays a significant role which affects many levels of the analysis. Acknowledgements Research was partially supported by grants to Duke University from the NSF (DMS ) and the National Institutes of Health (grant P50GM and contract HHSN C). Aspects of the research were also partially supported by the NSF grant DMS to the Statistical 21
22 and Applied Mathematical Sciences Institute. Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the NSF or NIH. References S. Balakrishnan and D. Madigan. A onepass sequential Monte Carlo method for Bayesian analysis of massive datasets. Bayesian Analysis, 1(2): , C. Chan, F. Feng, J. Ottinger, D. Foster, M. West, and T. Kepler. Statistical mixture modeling for cell subtype identification in flow cytometry. Cytometry A, 73: , A. Doucet, N. De Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice. Springer, H. Ishwaran and L. James. Approximate Dirichlet process computing in finite normal mixtures: Smoothing and prior information. Journal of Computational and Graphical Statistics, 11: , J. Liu and M. West. Combined parameter and state estimation in simulationbased filtering. In J. F. G. De Freitas A. Doucet and N. J. Gordon, editors, Sequential Monte Carlo Methods in Practice. New York. SpringerVerlag, New York, H. F. Lopes, N. G. Polson, and M. Taddy. Particle learning for general mixtures. Submitted. S. N. MacEachern. Estimating mixture of dirichlet process models. Journal of Computational and Graphical Statistics, 7(2): , S. N. MacEachern, M. Clyde, and J. S. Liu. Sequential importance sampling for nonparametric bayes models: The next generation. The Canadian Journal of Statistics/La Revue Canadienne de Statistique, 27(2): , P. Muller, A. Erkanli, and M. West. Bayesian curve fitting using multivariate normal mixtures. Biometrika, 83(1):67,
23 G. Ridgeway and D. Madigan. Bayesian analysis of massive datasets via particle filters. In KDD 02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 5 13, New York, NY, USA, ACM. R.A. Seder, P.A. Darrah, and M. Roederer. Tcell quality in memory and protection: implications for vaccine design. Nature Reviews Immunology, 8(4): , M. West. Discovery sampling and selection models. In Decision Theory and Related Topics, M. West. Inference in successive sampling discovery models. Journal of econometrics, 75(1): , M. West and P. J. Harrison. Bayesian Forecasting and Dynamic Models. SpringerVerlag, New York,
Bayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationParametric Models Part I: Maximum Likelihood and Bayesian Density Estimation
Parametric Models Part I: Maximum Likelihood and Bayesian Density Estimation Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2015 CS 551, Fall 2015
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationBayesian Statistics in One Hour. Patrick Lam
Bayesian Statistics in One Hour Patrick Lam Outline Introduction Bayesian Models Applications Missing Data Hierarchical Models Outline Introduction Bayesian Models Applications Missing Data Hierarchical
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February 5, 2013 J. Olsson Monte Carlobased
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid ElArini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationGaussian Processes to Speed up Hamiltonian Monte Carlo
Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Murray, Iain http://videolectures.net/mlss09uk_murray_mcmc/ Rasmussen, Carl Edward. "Gaussian processes to speed up hybrid Monte Carlo
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More information11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Jimmy Olsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 3, 2012 Changes in HA1 Problem
More informationA Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data
A Bootstrap MetropolisHastings Algorithm for Bayesian Analysis of Big Data Faming Liang University of Florida August 9, 2015 Abstract MCMC methods have proven to be a very powerful tool for analyzing
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationSupplement to Call Centers with Delay Information: Models and Insights
Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290
More informationThe Exponential Family
The Exponential Family David M. Blei Columbia University November 3, 2015 Definition A probability density in the exponential family has this form where p.x j / D h.x/ expf > t.x/ a./g; (1) is the natural
More informationModelbased Synthesis. Tony O Hagan
Modelbased Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that
More informationCentre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationLecture 3 : Hypothesis testing and modelfitting
Lecture 3 : Hypothesis testing and modelfitting These dark lectures energy puzzle Lecture 1 : basic descriptive statistics Lecture 2 : searching for correlations Lecture 3 : hypothesis testing and modelfitting
More informationModeling and Analysis of Call Center Arrival Data: A Bayesian Approach
Modeling and Analysis of Call Center Arrival Data: A Bayesian Approach Refik Soyer * Department of Management Science The George Washington University M. Murat Tarimcilar Department of Management Science
More informationMarkov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
More informationComputational Statistics for Big Data
Lancaster University Computational Statistics for Big Data Author: 1 Supervisors: Paul Fearnhead 1 Emily Fox 2 1 Lancaster University 2 The University of Washington September 1, 2015 Abstract The amount
More informationMachine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
More informationMicroclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set
Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set Jeffrey W. Miller Brenda Betancourt Abbas Zaidi Hanna Wallach Rebecca C. Steorts Abstract Most generative models for
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationDirichlet Process Gaussian Mixture Models: Choice of the Base Distribution
Görür D, Rasmussen CE. Dirichlet process Gaussian mixture models: Choice of the base distribution. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 5(4): 615 66 July 010/DOI 10.1007/s1139001010511 Dirichlet
More informationBayesX  Software for Bayesian Inference in Structured Additive Regression
BayesX  Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, LudwigMaximiliansUniversity Munich
More informationInference on Phasetype Models via MCMC
Inference on Phasetype Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationP (x) 0. Discrete random variables Expected value. The expected value, mean or average of a random variable x is: xp (x) = v i P (v i )
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationMessagepassing sequential detection of multiple change points in networks
Messagepassing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationAn Introduction to Using WinBUGS for CostEffectiveness Analyses in Health Economics
Slide 1 An Introduction to Using WinBUGS for CostEffectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics Part 1 Slide 2 Talk overview Foundations of Bayesian statistics
More informationLecture notes: singleagent dynamics 1
Lecture notes: singleagent dynamics 1 Singleagent dynamic optimization models In these lecture notes we consider specification and estimation of dynamic optimization models. Focus on singleagent models.
More informationLab 8: Introduction to WinBUGS
40.656 Lab 8 008 Lab 8: Introduction to WinBUGS Goals:. Introduce the concepts of Bayesian data analysis.. Learn the basic syntax of WinBUGS. 3. Learn the basics of using WinBUGS in a simple example. Next
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationSYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation
SYSM 6304: Risk and Decision Analysis Lecture 3 Monte Carlo Simulation M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu September 19, 2015 Outline
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulationbased method for estimating the parameters of economic models. Its
More informationMonte Carlobased statistical methods (MASM11/FMS091)
Monte Carlobased statistical methods (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February 7, 2014 M. Wiktorsson
More informationHypothesis Testing. 1 Introduction. 2 Hypotheses. 2.1 Null and Alternative Hypotheses. 2.2 Simple vs. Composite. 2.3 OneSided and TwoSided Tests
Hypothesis Testing 1 Introduction This document is a simple tutorial on hypothesis testing. It presents the basic concepts and definitions as well as some frequently asked questions associated with hypothesis
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationValidation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT
Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulationbased method designed to establish that software
More informationSampling for Bayesian computation with large datasets
Sampling for Bayesian computation with large datasets Zaiying Huang Andrew Gelman April 27, 2005 Abstract Multilevel models are extremely useful in handling large hierarchical datasets. However, computation
More informationIntroduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization
Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello 1 Motivation Recall: Discrete filter Discretize
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationAn Introduction to Basic Statistics and Probability
An Introduction to Basic Statistics and Probability Shenek Heyward NCSU An Introduction to Basic Statistics and Probability p. 1/4 Outline Basic probability concepts Conditional probability Discrete Random
More informationA Tutorial on Particle Filters for Online Nonlinear/NonGaussian Bayesian Tracking
174 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002 A Tutorial on Particle Filters for Online Nonlinear/NonGaussian Bayesian Tracking M. Sanjeev Arulampalam, Simon Maskell, Neil
More informationBayesian Image SuperResolution
Bayesian Image SuperResolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationA Coefficient of Variation for Skewed and HeavyTailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University
A Coefficient of Variation for Skewed and HeavyTailed Insurance Losses Michael R. Powers[ ] Temple University and Tsinghua University Thomas Y. Powers Yale University [June 2009] Abstract We propose a
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationNormal distribution. ) 2 /2σ. 2π σ
Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a
More informationProbability and Statistics
CHAPTER 2: RANDOM VARIABLES AND ASSOCIATED FUNCTIONS 2b  0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute  Systems and Modeling GIGA  Bioinformatics ULg kristel.vansteen@ulg.ac.be
More informationTwo Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering
Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationSelected Topics in Electrical Engineering: Flow Cytometry Data Analysis
Selected Topics in Electrical Engineering: Flow Cytometry Data Analysis Bilge Karaçalı, PhD Department of Electrical and Electronics Engineering Izmir Institute of Technology Outline Compensation and gating
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationMT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo. 3 MT426 Notebook 3 3. 3.1 Definitions... 3. 3.2 Joint Discrete Distributions...
MT426 Notebook 3 Fall 2012 prepared by Professor Jenny Baglivo c Copyright 20042012 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MT426 Notebook 3 3 3.1 Definitions............................................
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationReject Inference in Credit Scoring. JieMen Mok
Reject Inference in Credit Scoring JieMen Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationAuxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationMANBITESDOG BUSINESS CYCLES ONLINE APPENDIX
MANBITESDOG BUSINESS CYCLES ONLINE APPENDIX KRISTOFFER P. NIMARK The next section derives the equilibrium expressions for the beauty contest model from Section 3 of the main paper. This is followed by
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More information1. χ 2 minimization 2. Fits in case of of systematic errors
Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first
More informationMonte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)
Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 6 Sequential Monte Carlo methods II February
More informationLikelihood Approaches for Trial Designs in Early Phase Oncology
Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth GarrettMayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University
More informationSTT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables
Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random
More informationStatistical Analysis with Missing Data
Statistical Analysis with Missing Data Second Edition RODERICK J. A. LITTLE DONALD B. RUBIN WILEY INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface PARTI OVERVIEW AND BASIC APPROACHES
More informationE3: PROBABILITY AND STATISTICS lecture notes
E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationBayesian Methods. 1 The Joint Posterior Distribution
Bayesian Methods Every variable in a linear model is a random variable derived from a distribution function. A fixed factor becomes a random variable with possibly a uniform distribution going from a lower
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationMaster s thesis tutorial: part III
for the Autonomous Compliant Research group Tinne De Laet, Wilm Decré, Diederik Verscheure Katholieke Universiteit Leuven, Department of Mechanical Engineering, PMA Division 30 oktober 2006 Outline General
More informationAdaptive Search with Stochastic Acceptance Probabilities for Global Optimization
Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,
More informationDealing with large datasets
Dealing with large datasets (by throwing away most of the data) Alan Heavens Institute for Astronomy, University of Edinburgh with Ben Panter, Rob Tweedie, Mark Bastin, Will Hossack, Keith McKellar, Trevor
More information