Network analysis with the W -graph model

Size: px
Start display at page:

Download "Network analysis with the W -graph model"

Transcription

1 Network analysis with the W -graph model (via the Stochastic Block Model) S. Robin Joint work with P. Latouche and S. Ouadah INRA / AgroParisTech IMS, June 2015, Singapore S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 1 / 42

2 Outline 1 Modeling heterogeneity in interaction networks 2 Statistical inference of latent space models (focus on SBM) 3 From SBM to W -graph: Averaging models 4 Goodness-of-fit. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 2 / 42

3 Modeling heterogeneity in interaction networks Modeling heterogeneity in (biological) interaction networks S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 3 / 42

4 Modeling heterogeneity in interaction networks Heterogeneity in biological networks Heterogeneity in biological networks Biological networks describe interactions between entities: genes, proteins, individuals, species... Observed networks display heterogeneous topologies, that one would like to decipher and better understand.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 4 / 42

5 Modeling heterogeneity in interaction networks Heterogeneity in biological networks Heterogeneity in biological networks Biological networks describe interactions between entities: genes, proteins, individuals, species... Observed networks display heterogeneous topologies, that one would like to decipher and better understand. Dolphine social network. H. pylori PPI network. [Newman and Girvan (2004)]. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 4 / 42

6 Modeling heterogeneity in interaction networks Heterogeneity in biological networks Heterogeneous means not homogeneous, that is: different from an Erdös-Renyi (ER) graph. Erdös-Renyi random graph G(n, p): Consider n nodes, node pairs 1 i < j n are independently connected with same probability p: (Y ij ) iid, Y ij B(p). Very intensively studied. Fits very few real-life networks. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 5 / 42

7 Modeling heterogeneity in interaction networks Latent space models Latent space models Latent variables allow to capture some underlying structure of a network (see review [Matias and R. (2014)]).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 6 / 42

8 Modeling heterogeneity in interaction networks Latent space models Latent space models Latent variables allow to capture some underlying structure of a network (see review [Matias and R. (2014)]). General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 6 / 42

9 Modeling heterogeneity in interaction networks Latent space models Latent space models Latent variables allow to capture some underlying structure of a network (see review [Matias and R. (2014)]). General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 6 / 42

10 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42

11 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n);. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42

12 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42

13 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42

14 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Y = Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42

15 Modeling heterogeneity in interaction networks Latent space models A variety of state-space models Latent position models. [Hoff et al. (2002)]: Z i R d, logit γ(z, z ) = a z z [Handcock et al. (2007)]: Z i k p k N d (µ k, σ 2 ki ) [Daudin et al. (2010)]: Z i S K, γ(z, z ) = k,l z k z lγ kl. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 8 / 42

16 Modeling heterogeneity in interaction networks Latent space models A variety of state-space models Latent position models. [Hoff et al. (2002)]: Z i R d, logit γ(z, z ) = a z z [Handcock et al. (2007)]: Z i k p k N d (µ k, σ 2 ki ) [Daudin et al. (2010)]: Z i S K, γ(z, z ) = k,l z k z lγ kl In this talk, focus on the Stochastic Block Model (SBM) and the W -graph model (and its associated graphon).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 8 / 42

17 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)]. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42

18 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n);. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42

19 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: π = (π 1,...π K ); {Z i } iid M(1; π). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42

20 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: π = (π 1,...π K ); {Z i } iid M(1; π) Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42

21 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: π = (π 1,...π K ); {Z i } iid M(1; π) Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42

22 Modeling heterogeneity in interaction networks Latent space models W -graph model Latent variables: Graphon function γ(z, z ) (Z i ) iid U [0,1], Graphon function γ: γ(z, z ) : [0, 1] 2 [0, 1] Edges: Pr{Y ij = 1} = γ(z i, Z j ) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 10 / 42

23 Modeling heterogeneity in interaction networks Latent space models Interpreting the graphon function The graphon function provides a global picture of the network s topology. Scale free Community Small world. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 11 / 42

24 Modeling heterogeneity in interaction networks Latent space models Few words about the W -graph Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 12 / 42

25 Modeling heterogeneity in interaction networks Latent space models Few words about the W -graph Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing. Statistical point of view. Not much attention has been paid to its inference until recently: [Airoldi et al. (2013)], [Chatterjee et al. (2014)], [Olhede and Wolfe (2014)],... SBM can be used as a proxy for W -graph.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 12 / 42

26 Modeling heterogeneity in interaction networks Some generalizations of latent space graph models Some generalizations of latent space graph models Latent space models can be extended in various directions.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 13 / 42

27 Modeling heterogeneity in interaction networks Some generalizations of latent space graph models Some generalizations of latent space graph models Latent space models can be extended in various directions. Weighted or directed networks. Edges may have values: count, real, {0, +,, ±},... Latent space model can be adapted as Y ij Z i, Z j F(γ(Z i, Z j )) where F is can be any distribution: Poisson, normal, multinomial, etc. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 13 / 42

28 Modeling heterogeneity in interaction networks Some generalizations of latent space graph models Some generalizations of latent space graph models Latent space models can be extended in various directions. Weighted or directed networks. Edges may have values: count, real, {0, +,, ±},... Latent space model can be adapted as Y ij Z i, Z j F(γ(Z i, Z j )) where F is can be any distribution: Poisson, normal, multinomial, etc. Accounting for covariates. Latent space model can also accommodate for covariates, via a regression term: where x ij = (x 1 ij,... x d ij ). Y ij Z i, Z j F(γ(Z i, Z j ) + x ijβ) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 13 / 42

29 Statistical inference of latent space models Statistical inference of latent space models S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 14 / 42

30 Statistical inference of latent space models Incomplete data models Incomplete data models Aim. Based on the observed network Y = (Y ij ), one want typically to infer the parameters θ = (π, γ) the hidden states Z = (Z i ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 15 / 42

31 Statistical inference of latent space models Incomplete data models Incomplete data models Aim. Based on the observed network Y = (Y ij ), one want typically to infer the parameters θ = (π, γ) the hidden states Z = (Z i ) State space models belong to the class of incomplete data models as the edges (Y ij ) are observed, the latent positions (or status) (Z i ) are not, and neither are the parameter.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 15 / 42

32 Statistical inference of latent space models Incomplete data models Frequentist or Bayesian inference Frequentist inference. θ is fixed and Z is random. The aim is then to provide an estimate θ of θ, provide the conditional distribution P θ (Z Y ) (for classification purposes and as a side product of the inference).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 16 / 42

33 Statistical inference of latent space models Incomplete data models Frequentist or Bayesian inference Frequentist inference. θ is fixed and Z is random. The aim is then to provide an estimate θ of θ, provide the conditional distribution P θ (Z Y ) (for classification purposes and as a side product of the inference). Bayesian inference. Both θ and Z are random. The aim is then to provide the joint conditional distribution P(θ, Z Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 16 / 42

34 Statistical inference of latent space models Incomplete data models Frequentist or Bayesian inference Frequentist inference. θ is fixed and Z is random. The aim is then to provide an estimate θ of θ, provide the conditional distribution P θ (Z Y ) (for classification purposes and as a side product of the inference). Bayesian inference. Both θ and Z are random. The aim is then to provide the joint conditional distribution P(θ, Z Y ). Whatever the approach, we have to deal with conditional distributions: P θ (Z Y ) or P(θ, Z Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 16 / 42

35 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42

36 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42

37 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42

38 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), P(Z i, Z j Y ): graph moralization, S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42

39 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), P(Z i, Z j Y ): graph moralization, this holds for each pair (i, j), S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42

40 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), P(Z i, Z j Y ): graph moralization, this holds for each pair (i, j), Conditional distribution. The dependency graph of Z given Y is a clique. No factorization can be hoped (unlike for HMM). P θ (Z Y ) can not be computed (efficiently). S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42

41 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model:.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42

42 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42

43 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42

44 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π) P(Y γ, Z).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42

45 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π) P(Y γ, Z) P(θ, Z Y ) is even more involved.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42

46 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π) P(Y γ, Z) P(θ, Z Y ) is even more involved. Both frequentist and Bayesian inference require the calculation of conditional distributions that can not be computed. Either sampling (MCMC) or approximation is required.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42

47 Statistical inference of latent space models Variational (Bayes) inference Variational (Bayes) inference Variational approximations aim at replacing an intractable exact distribution P with a tractable approximate distribution P. Typically: P θ (Z Y ) i P θ,y (Z i ) P(θ, Z Y ) P Y (θ) P Y (Z) P(θ, Z Y ) P Y (θ) i P Y (Z i ) Popular strategy: minimize the Küllback-Leibler divergence between P and P: min KL[ P(Z) P θ (Z Y )] or min KL[ P(θ, Z) P(θ, Z Y )] Variational EM (VEM) algorithm [Wainwright and Jordan (2008)]. Variational Bayes EM (VBEM) algorithm [Beal and Ghahramani (2003)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 19 / 42

48 Statistical inference of latent space models Variational (Bayes) inference VBEM inference for SBM: E. coli s operon network [Picard et al. (2009)] S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 20 / 42

49 Statistical inference of latent space models Variational (Bayes) inference VBEM inference for SBM: E. coli s operon network Meta-graph representation. [Picard et al. (2009)] S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 20 / 42

50 Statistical inference of latent space models Variational (Bayes) inference VBEM inference for SBM: E. coli s operon network Meta-graph representation. Parameter estimates. K = 5 [Picard et al. (2009)] S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 20 / 42

51 Statistical inference of latent space models Variational (Bayes) inference Accuracy of VBEM estimates for SBM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 :. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 21 / 42

52 Statistical inference of latent space models Variational (Bayes) inference Accuracy of VBEM estimates for SBM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 : Width of the posterior credibility intervals. π 1, γ 11, γ 12, γ 22 [Gazal et al. (2012)]. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 21 / 42

53 Statistical inference of latent space models Variational (Bayes) inference First half summary Latent space graph models are useful to describe network heterogeneity. Their statistical inference raises some specific issues. Variational approximations help to circumvent these issues.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 22 / 42

54 Statistical inference of latent space models Variational (Bayes) inference First half summary Latent space graph models are useful to describe network heterogeneity. Their statistical inference raises some specific issues. Variational approximations help to circumvent these issues. And also Theoretical justifications of these approximations exist for SBM: [Celisse et al. (2012)], [Mariadassou and Matias (2014)] VEM and VBEM algorithms have been specifically developed for SBM: [Daudin et al. (2008)], [Latouche et al. (2012)] Model selection (choice of K has also be addressed): [same refs as above].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 22 / 42

55 From SBM to W -graph: Averaging models From SBM to W -graph: Averaging models S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 23 / 42

56 From SBM to W -graph: Averaging models SBM as a W -graph model SBM as a W -graph model Latent variables: Graphon function γ SBM K (z, z ) (Z i ) iid M(1, π) Blockwise constant graphon: γ(z, z ) = γ kl Edges: Pr{Y ij = 1} = γ(z i, Z j ) block widths = π k, block heights γ kl S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 24 / 42

57 From SBM to W -graph: Averaging models SBM as a W -graph model Variational Bayes estimation of γ(z, z ) VBEM inference provides the approximate posteriors: Posterior mean Ẽ(γSBM K (z, z ) Y, K) (π Y ) Dir(π ) (γ kl Y ) Beta(γ 0 kl, γ 1 kl ) Estimate of γ(u, v). Due to the uncertainty of the π k, the posterior mean of γk SBM is smooth (Explicit integration using [Gouda and Szántai (2010)]) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 25 / 42

58 From SBM to W -graph: Averaging models Bayesian model averaging Bayesian model averaging Bayesian model averaging (BMA). Consider a series of models 1,..., K,... in which a certain function of the parameter f (θ) can always be defined.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 26 / 42

59 From SBM to W -graph: Averaging models Bayesian model averaging Bayesian model averaging Bayesian model averaging (BMA). Consider a series of models 1,..., K,... in which a certain function of the parameter f (θ) can always be defined. Bayesian inference within each model K provides the posterior P(θ K, Y ) P(f (θ) K, Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 26 / 42

60 From SBM to W -graph: Averaging models Bayesian model averaging Bayesian model averaging Bayesian model averaging (BMA). Consider a series of models 1,..., K,... in which a certain function of the parameter f (θ) can always be defined. Bayesian inference within each model K provides the posterior P(θ K, Y ) P(f (θ) K, Y ). BMA [Hoeting et al. (1999)] relies on the marginal posterior of f (θ): P(f (θ) Y ) = K P(K Y )P(f (θ) K, Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 26 / 42

61 From SBM to W -graph: Averaging models Bayesian model averaging Variational Bayes model averaging Pushing it further: Consider the model K as an additional hidden variable: P(Z, θ, K Y ) P(Z, θ, K) := P(Z K) P(θ K) P(K) Note that no additional independence assumption is needed. 1 in terms of Küllback-Leibler divergence. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 27 / 42

62 From SBM to W -graph: Averaging models Bayesian model averaging Variational Bayes model averaging Pushing it further: Consider the model K as an additional hidden variable: P(Z, θ, K Y ) P(Z, θ, K) := P(Z K) P(θ K) P(K) Note that no additional independence assumption is needed. Variational Bayes model averaging (VBMA). The optimal 1 approximation of P(K Y ) satisfies [Volant et al. (2012)]: P(K) P(K)e log P(Y K) KL(K) = P(K Y )e KL(K) where KL(K) = KL[ P(Z, θ K); P(Z, θ Y, K)]. 1 in terms of Küllback-Leibler divergence. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 27 / 42

63 From SBM to W -graph: Averaging models Inferring the graphon function Inferring the graphon function Model averaging: There is no true K in the W -graph model. Apply VBMA recipe to γ(z, z ). For K = 1..K max, fit an SBM model via VBEM and compute γ K SBM (z, z ) = Ẽ[γ C(z),C(z ) Y, K].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 28 / 42

64 From SBM to W -graph: Averaging models Inferring the graphon function Inferring the graphon function Model averaging: There is no true K in the W -graph model. Apply VBMA recipe to γ(z, z ). For K = 1..K max, fit an SBM model via VBEM and compute γ K SBM (z, z ) = Ẽ[γ C(z),C(z ) Y, K]. Then perform model averaging as γ(z, z ) = Ẽ[γ C(z),C(z ) Y ] = K P(K) γ SBM K (z, z ), [Latouche and R. (2013)].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 28 / 42

65 From SBM to W -graph: Averaging models Inferring the graphon function PPI network Like many PPI networks, E. coli s network is highly concentrated around few nodes. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 29 / 42

66 From SBM to W -graph: Averaging models Inferring the graphon function PPI network Like many PPI networks, E. coli s network is highly concentrated around few nodes. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 29 / 42

67 From SBM to W -graph: Averaging models Inferring the graphon function Ecological network between fungal species Link between 2 fungi if they are observed on one common host. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 30 / 42

68 From SBM to W -graph: Averaging models Inferring the graphon function Ecological network between fungal species Link between 2 fungi if they are observed on one common host. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 30 / 42

69 From SBM to W -graph: Averaging models Inferring the graphon function Brain network Links = connexions between areas of the macaque s cortex S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 31 / 42

70 From SBM to W -graph: Averaging models Inferring the graphon function Brain network Links = connexions between areas of the macaque s cortex S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 31 / 42

71 From SBM to W -graph: Averaging models Inferring the graphon function Blog network (non-biological) Links = connexions between French political blogs S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 32 / 42

72 From SBM to W -graph: Averaging models Inferring the graphon function Blog network (non-biological) Links = connexions between French political blogs S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 32 / 42

73 Goodness-of-fit Goodness-of-fit S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 33 / 42

74 Goodness-of-fit Motifs frequency Motifs frequency Network motifs have a biological (or sociological) interpretation in terms of building blocks of the global network Triangles = friends of my friends are my friends. Latent space graph models only describe binary interactions, conditional on the latent positions Goodness of fit criterion based on motif frequencies? S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 34 / 42

75 Goodness-of-fit Motifs frequency Moments of motif counts Moments under SBM: The first moments EN(m), VN(m) of the count are known for exchangeable graph models (incl. SBM) [Picard et al. (2008)]: E SBM N(m) µ SBM (m) =: f (θ SBM ) where µ SBM (m) is the motif occurrence probability under SBM. Moments under W -graph: Motif probability under the W -graph can be estimated as µ(m) = P(K)Ẽ(µ SBM(m) X, K) k Estimates of E W N(m) and V W N(m) can be derived accordingly [Latouche and R. (2013)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 35 / 42

76 Goodness-of-fit Motifs frequency Network frequencies in the blog network Motif Count Mean Std. dev. ( 10 3 ) ( 10 3 ) ( 10 3 ) No specific structure seems to be exceptional wrt the model s expectations. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 36 / 42

77 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42

78 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42

79 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. Results: ICL selects K = 7 groups that are partly related with phylums. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42

80 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. e γ kl T1 T2 T3 T4 T5 T6 T7 T T T T T T T π k SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. Results: ICL selects K = 7 groups that are partly related with phylums.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42

81 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. e γ kl T1 T2 T3 T4 T5 T6 T7 T T T T T T T π k SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. Results: ICL selects K = 7 groups that are partly related with phylums.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42

82 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), [Mariadassou et al. (2010)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42

83 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42

84 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), e λ kl T 1 T 2 T 3 T 4 T T T T π k β [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42

85 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), e λ kl T 1 T 2 T 3 T 4 T T T T π k β [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42

86 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), e λ kl T 1 T 2 T 3 T 4 T T T T π k β [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance. Groups are no longer associated with the phylogenetic structure. SBM = residual heterogeneity of the regression.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42

87 Goodness-of-fit Residual graphon Residual graphon A simple graph model with covariates. When edge covariates x ij are available, simply fit a logistic regression [Pattison and Robins (2007)]: (Y ij ) independent logit p ij = x ijβ. Y ij B(p ij ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 39 / 42

88 Goodness-of-fit Residual graphon Residual graphon A simple graph model with covariates. When edge covariates x ij are available, simply fit a logistic regression [Pattison and Robins (2007)]: (Y ij ) independent logit p ij = x ijβ. Y ij B(p ij ) Introducing a residual term. To assess the fit of the model, simply add a residual graphon-like term: (Z i ) iid U[0, 1] logit p ij = x ijβ + γ(z i, Z j ). Y ij Z i, Z j B(p ij ) A VBEM algorithm can be designed to get P(β, θ, Z) P(β, θ, Z Y ): On-going work + [Jaakkola and Jordan (2000)].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 39 / 42

89 Goodness-of-fit Residual graphon Tree network Binary version: Links between tree species if they host at least one common fungal parasite.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 40 / 42

90 Goodness-of-fit Residual graphon Tree network Binary version: Links between tree species if they host at least one common fungal parasite. Regression: covariates = genetic distance, taxonomic distance, geographic distance The residual graphon is not flat: some heterogeneity remains.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 40 / 42

91 Goodness-of-fit Residual graphon Blog network Blog network: Already shown.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 41 / 42

92 Goodness-of-fit Residual graphon Blog network Blog network: Already shown. Regression: covariates = same political party, pair includes a journalist The residual graphon is still not flat.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 41 / 42

93 Conclusion & future work Some conclusions. The graphon provides a representation of the network topology It can be estimated using variational Bayes inference R packages mixer and blockmodels It can be combined with covariates as a residual term. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42

94 Conclusion & future work Some conclusions. The graphon provides a representation of the network topology It can be estimated using variational Bayes inference R packages mixer and blockmodels It can be combined with covariates as a residual term Future work. Formal goodness-of-fit test Quality of variational Bayes estimates in SBM with covariates Thank you for your attention.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42

95 Airoldi, E. M., Costa, T. B. and Chan, S. H. (2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems, Beal, J., M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayes. Statist Bollobás, B., Janson, S. and Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Rand. Struct. Algo. 31 (1) Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Statis Chatterjee, S. et al. (2014). Matrix estimation by universal singular value thresholding. The Annals of Statistics. 43 (1) Daudin, J.-J., Picard, F. and Robin, S. (Jun, 2008). A mixture model for random graphs. Stat. Comput. 18 (2) Daudin, J.-J., Pierre, L. and Vacher, C. (2010). Model for heterogeneous random networks using continuous latent variables and an application to a tree fungus network. Biometrics. 66 (4) Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. 7 (28) Gazal, S., Daudin, J.-J. and Robin, S. (2012). Accuracy of variational estimates for random graph mixture models. Journal of Statistical Computation and Simulation. 82 (6) Girvan, M. and Newman, M. E. J. (2002). Community strucutre in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (12) Gouda, A. and Szántai, T. (2010). On numerical calculation of probabilities according to Dirichlet distribution. Ann. Oper. Res DOI: /s Handcock, M., Raftery, A. and Tantrum, J. (2007). Model-based clustering for social networks. JRSSA. 170 (2) doi: /j X x. Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science. 14 (4) Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 (460) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42

96 Jaakkola, T. S. and Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing. 10 (1) Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational bayesian inference and complexity control for stochastic block models. Statis. Model. 12 (1) Latouche, P. and Robin, S. (2013), Bayesian model averaging of stochastic block models to estimate the graphon function and motif frequencies in a W-graph model. Technical report, arxiv: Lauritzen, S. (1996). Graphical Models. Oxford Statistical Science Series. Clarendon Press. Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. Journal of Combinatorial Theory, Series B. 96 (6) von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Stat. 36 (2) Mariadassou, M., Robin, S. and Vacher, C. (2010). Uncovering structure in valued graphs: a variational approach. Ann. Appl. Statist. 4 (2) Mariadassou, M. and Matias, C. (2014). Convergence of the groups posterior distribution in latent or stochastic block models. Bernoulli.???? to appear. Matias, Catherine and Robin, Stéphane. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM: Proc Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks,. Phys. Rev. E Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E (69) Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block-structures. J. Amer. Statist. Assoc Olhede, S. C. and Wolfe, P. J. (2014). Network histograms and universality of blockmodel approximation. Proceedings of the National Academy of Sciences. 111 (41) Pattison, P. E. and Robins, G. L. (2007). Handbook of Probability Theory with Applications. chapter Probabilistic Network Theory. Sage Publication. Picard, F., Daudin, J.-J., Koskas, M., Schbath, S. and Robin, S. (2008). Assessing the exceptionality of network motifs,. J. Comp. Biol. 15 (1) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42

97 Picard, F., Miele, V., Daudin, J.-J., Cottret, L. and Robin, S. (2009). Deciphering the connectivity structure of biological networks using mixnet. BMC Bioinformatics. Suppl 6 S17. doi: / s6-s17. Vacher, C., Piou, D. and Desprez-Loustau, M.-L. (2008). Architecture of an antagonistic tree/fungus network: The asymmetric influence of past evolutionary history. PLoS ONE. 3 (3) e1740. doi: /journal.pone Volant, S., Magniette, M.-L. M. and Robin, S. (2012). Variational bayes approach for model aggregation in unsupervised classification with markovian dependency. Comput. Statis. & Data Analysis. 56 (8) Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 (1 2) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42

Mixture Models for Genomic Data

Mixture Models for Genomic Data Mixture Models for Genomic Data S. Robin AgroParisTech / INRA École de Printemps en Apprentissage automatique, Baie de somme, May 2010 S. Robin (AgroParisTech / INRA) Mixture Models May 10 1 / 48 Outline

More information

A mixture model for random graphs

A mixture model for random graphs A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Statistical and computational challenges in networks and cybersecurity

Statistical and computational challenges in networks and cybersecurity Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Message-passing sequential detection of multiple change points in networks

Message-passing sequential detection of multiple change points in networks Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Complex Networks Analysis: Clustering Methods

Complex Networks Analysis: Clustering Methods Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Big Data, Machine Learning, Causal Models

Big Data, Machine Learning, Causal Models Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion

More information

Equivalence Concepts for Social Networks

Equivalence Concepts for Social Networks Equivalence Concepts for Social Networks Tom A.B. Snijders University of Oxford March 26, 2009 c Tom A.B. Snijders (University of Oxford) Equivalences in networks March 26, 2009 1 / 40 Outline Structural

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK zoubin@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~zoubin September 16, 2004 Abstract We give

More information

Statistical Inference for Networks Graduate Lectures. Hilary Term 2009 Prof. Gesine Reinert

Statistical Inference for Networks Graduate Lectures. Hilary Term 2009 Prof. Gesine Reinert Statistical Inference for Networks Graduate Lectures Hilary Term 2009 Prof. Gesine Reinert 1 Overview 1: Network summaries. What are networks? Some examples from social science and from biology. The need

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

DATA MINING IN FINANCE

DATA MINING IN FINANCE DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

Big graphs: Theory and Practice, January 6-8, 2016, UC San Diego. Abstracts

Big graphs: Theory and Practice, January 6-8, 2016, UC San Diego. Abstracts Big graphs: Theory and Practice, January 6-8, 2016, UC San Diego Anima Anandkumar (UC Irvine) Abstracts Learning mixed membership community models via spectral methods Abstract: Learning hidden communities

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4. Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Dirichlet Processes A gentle tutorial

Dirichlet Processes A gentle tutorial Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.

More information

Stationary random graphs on Z with prescribed iid degrees and finite mean connections

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

More information

A scalable multilevel algorithm for graph clustering and community structure detection

A scalable multilevel algorithm for graph clustering and community structure detection A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

More information

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *

Open Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin * Send Orders for Reprints to reprints@benthamscience.ae 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network

More information

Un point de vue bayésien pour des algorithmes de bandit plus performants

Un point de vue bayésien pour des algorithmes de bandit plus performants Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Model-Based Cluster Analysis for Web Users Sessions

Model-Based Cluster Analysis for Web Users Sessions Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr

More information

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

More information

PROBABILISTIC NETWORK ANALYSIS

PROBABILISTIC NETWORK ANALYSIS ½ PROBABILISTIC NETWORK ANALYSIS PHILIPPA PATTISON AND GARRY ROBINS INTRODUCTION The aim of this chapter is to describe the foundations of probabilistic network theory. We review the development of the

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni 1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Multiple Location Profiling for Users and Relationships from Social Network and Content

Multiple Location Profiling for Users and Relationships from Social Network and Content Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li ruili1@illinois.edu Shengjie Wang wang260@illinois.edu Kevin Chen-Chuan Chang, kcchang@illinois.edu Department

More information

Data Mining and Neural Networks in Stata

Data Mining and Neural Networks in Stata Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Monte Carlo testing with Big Data

Monte Carlo testing with Big Data Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:

More information

Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

More information

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences Eric P. Xing, Michael I. Jordan, Richard M. Karp and Stuart Russell Computer Science Division University of California, Berkeley

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Bayesian Clustering for Email Campaign Detection

Bayesian Clustering for Email Campaign Detection Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss

More information

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

More information

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore

Publication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Introduction to Support Vector Machines. Colin Campbell, Bristol University Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.

More information

On the shape of binary trees

On the shape of binary trees On the shape of binary trees Mireille Bousquet-Mélou, CNRS, LaBRI, Bordeaux ArXiv math.co/050266 ArXiv math.pr/0500322 (with Svante Janson) http://www.labri.fr/ bousquet A complete binary tree n internal

More information

PS 271B: Quantitative Methods II. Lecture Notes

PS 271B: Quantitative Methods II. Lecture Notes PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.

More information

INFERRING GENE DEPENDENCY NETWORKS FROM GENOMIC LONGITUDINAL DATA: A FUNCTIONAL DATA APPROACH

INFERRING GENE DEPENDENCY NETWORKS FROM GENOMIC LONGITUDINAL DATA: A FUNCTIONAL DATA APPROACH REVSTAT Statistical Journal Volume 4, Number 1, March 2006, 53 65 INFERRING GENE DEPENDENCY NETWORKS FROM GENOMIC LONGITUDINAL DATA: A FUNCTIONAL DATA APPROACH Authors: Rainer Opgen-Rhein Department of

More information

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations

More information

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline

More information

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Cluster detection algorithm in neural networks

Cluster detection algorithm in neural networks Cluster detection algorithm in neural networks David Meunier and Hélène Paugam-Moisy Institute for Cognitive Science, UMR CNRS 5015 67, boulevard Pinel F-69675 BRON - France E-mail: {dmeunier,hpaugam}@isc.cnrs.fr

More information

Finding the M Most Probable Configurations Using Loopy Belief Propagation

Finding the M Most Probable Configurations Using Loopy Belief Propagation Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel

More information

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian

More information

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs. STATISTICS Statistics is one of the natural, mathematical, and biomedical sciences programs in the Columbian College of Arts and Sciences. The curriculum emphasizes the important role of statistics as

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information