Network analysis with the W -graph model
|
|
- Barrie Ross
- 8 years ago
- Views:
Transcription
1 Network analysis with the W -graph model (via the Stochastic Block Model) S. Robin Joint work with P. Latouche and S. Ouadah INRA / AgroParisTech IMS, June 2015, Singapore S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 1 / 42
2 Outline 1 Modeling heterogeneity in interaction networks 2 Statistical inference of latent space models (focus on SBM) 3 From SBM to W -graph: Averaging models 4 Goodness-of-fit. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 2 / 42
3 Modeling heterogeneity in interaction networks Modeling heterogeneity in (biological) interaction networks S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 3 / 42
4 Modeling heterogeneity in interaction networks Heterogeneity in biological networks Heterogeneity in biological networks Biological networks describe interactions between entities: genes, proteins, individuals, species... Observed networks display heterogeneous topologies, that one would like to decipher and better understand.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 4 / 42
5 Modeling heterogeneity in interaction networks Heterogeneity in biological networks Heterogeneity in biological networks Biological networks describe interactions between entities: genes, proteins, individuals, species... Observed networks display heterogeneous topologies, that one would like to decipher and better understand. Dolphine social network. H. pylori PPI network. [Newman and Girvan (2004)]. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 4 / 42
6 Modeling heterogeneity in interaction networks Heterogeneity in biological networks Heterogeneous means not homogeneous, that is: different from an Erdös-Renyi (ER) graph. Erdös-Renyi random graph G(n, p): Consider n nodes, node pairs 1 i < j n are independently connected with same probability p: (Y ij ) iid, Y ij B(p). Very intensively studied. Fits very few real-life networks. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 5 / 42
7 Modeling heterogeneity in interaction networks Latent space models Latent space models Latent variables allow to capture some underlying structure of a network (see review [Matias and R. (2014)]).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 6 / 42
8 Modeling heterogeneity in interaction networks Latent space models Latent space models Latent variables allow to capture some underlying structure of a network (see review [Matias and R. (2014)]). General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 6 / 42
9 Modeling heterogeneity in interaction networks Latent space models Latent space models Latent variables allow to capture some underlying structure of a network (see review [Matias and R. (2014)]). General setting for binary graphs. [Bollobás et al. (2007)]: A latent (unobserved) variable Z i is associated with each node: {Z i } iid π Edges Y ij = I{i j} are independent conditionally to the Z i s: {Y ij } independent {Z i } : Pr{Y ij = 1} = γ(z i, Z j ) We focus here on model approaches, in contrast with, e.g. Graph clustering [Girvan and Newman (2002)], [Newman (2004)]; Spectral clustering [von Luxburg et al. (2008)].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 6 / 42
10 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42
11 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n);. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42
12 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42
13 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42
14 Modeling heterogeneity in interaction networks Latent space models Latent space models State-space model: principle. Consider n nodes (i = 1..n); Z i = unobserved position of node i, e.g. {Z i } iid N (0, I ) Y = Edge {Y ij } independent given {Z i }, e.g. Pr{Y ij = 1} = γ(z i, Z j ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 7 / 42
15 Modeling heterogeneity in interaction networks Latent space models A variety of state-space models Latent position models. [Hoff et al. (2002)]: Z i R d, logit γ(z, z ) = a z z [Handcock et al. (2007)]: Z i k p k N d (µ k, σ 2 ki ) [Daudin et al. (2010)]: Z i S K, γ(z, z ) = k,l z k z lγ kl. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 8 / 42
16 Modeling heterogeneity in interaction networks Latent space models A variety of state-space models Latent position models. [Hoff et al. (2002)]: Z i R d, logit γ(z, z ) = a z z [Handcock et al. (2007)]: Z i k p k N d (µ k, σ 2 ki ) [Daudin et al. (2010)]: Z i S K, γ(z, z ) = k,l z k z lγ kl In this talk, focus on the Stochastic Block Model (SBM) and the W -graph model (and its associated graphon).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 8 / 42
17 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)]. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42
18 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n);. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42
19 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: π = (π 1,...π K ); {Z i } iid M(1; π). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42
20 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: π = (π 1,...π K ); {Z i } iid M(1; π) Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42
21 Modeling heterogeneity in interaction networks Latent space models Stochastic Block Model (SBM) A mixture model for random graphs. [Nowicki and Snijders (2001)] Consider n nodes (i = 1..n); Z i = unobserved label of node i: π = (π 1,...π K ); {Z i } iid M(1; π) Edge Y ij depends on the labels: {Y ij } independent given {Z i }, Pr{Y ij = 1} = γ(z i, Z j ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 9 / 42
22 Modeling heterogeneity in interaction networks Latent space models W -graph model Latent variables: Graphon function γ(z, z ) (Z i ) iid U [0,1], Graphon function γ: γ(z, z ) : [0, 1] 2 [0, 1] Edges: Pr{Y ij = 1} = γ(z i, Z j ) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 10 / 42
23 Modeling heterogeneity in interaction networks Latent space models Interpreting the graphon function The graphon function provides a global picture of the network s topology. Scale free Community Small world. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 11 / 42
24 Modeling heterogeneity in interaction networks Latent space models Few words about the W -graph Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 12 / 42
25 Modeling heterogeneity in interaction networks Latent space models Few words about the W -graph Probabilistic point of view. W -graph have been mostly studied in the probability literature: [Lovász and Szegedy (2006)], [Diaconis and Janson (2008)] Motif (sub-graph) frequencies are invariant characteristics of a W -graph. Intrinsic un-identifiability of the graphon function γ is often overcome by imposing that u γ(u, v) dv is monotonous increasing. Statistical point of view. Not much attention has been paid to its inference until recently: [Airoldi et al. (2013)], [Chatterjee et al. (2014)], [Olhede and Wolfe (2014)],... SBM can be used as a proxy for W -graph.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 12 / 42
26 Modeling heterogeneity in interaction networks Some generalizations of latent space graph models Some generalizations of latent space graph models Latent space models can be extended in various directions.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 13 / 42
27 Modeling heterogeneity in interaction networks Some generalizations of latent space graph models Some generalizations of latent space graph models Latent space models can be extended in various directions. Weighted or directed networks. Edges may have values: count, real, {0, +,, ±},... Latent space model can be adapted as Y ij Z i, Z j F(γ(Z i, Z j )) where F is can be any distribution: Poisson, normal, multinomial, etc. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 13 / 42
28 Modeling heterogeneity in interaction networks Some generalizations of latent space graph models Some generalizations of latent space graph models Latent space models can be extended in various directions. Weighted or directed networks. Edges may have values: count, real, {0, +,, ±},... Latent space model can be adapted as Y ij Z i, Z j F(γ(Z i, Z j )) where F is can be any distribution: Poisson, normal, multinomial, etc. Accounting for covariates. Latent space model can also accommodate for covariates, via a regression term: where x ij = (x 1 ij,... x d ij ). Y ij Z i, Z j F(γ(Z i, Z j ) + x ijβ) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 13 / 42
29 Statistical inference of latent space models Statistical inference of latent space models S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 14 / 42
30 Statistical inference of latent space models Incomplete data models Incomplete data models Aim. Based on the observed network Y = (Y ij ), one want typically to infer the parameters θ = (π, γ) the hidden states Z = (Z i ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 15 / 42
31 Statistical inference of latent space models Incomplete data models Incomplete data models Aim. Based on the observed network Y = (Y ij ), one want typically to infer the parameters θ = (π, γ) the hidden states Z = (Z i ) State space models belong to the class of incomplete data models as the edges (Y ij ) are observed, the latent positions (or status) (Z i ) are not, and neither are the parameter.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 15 / 42
32 Statistical inference of latent space models Incomplete data models Frequentist or Bayesian inference Frequentist inference. θ is fixed and Z is random. The aim is then to provide an estimate θ of θ, provide the conditional distribution P θ (Z Y ) (for classification purposes and as a side product of the inference).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 16 / 42
33 Statistical inference of latent space models Incomplete data models Frequentist or Bayesian inference Frequentist inference. θ is fixed and Z is random. The aim is then to provide an estimate θ of θ, provide the conditional distribution P θ (Z Y ) (for classification purposes and as a side product of the inference). Bayesian inference. Both θ and Z are random. The aim is then to provide the joint conditional distribution P(θ, Z Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 16 / 42
34 Statistical inference of latent space models Incomplete data models Frequentist or Bayesian inference Frequentist inference. θ is fixed and Z is random. The aim is then to provide an estimate θ of θ, provide the conditional distribution P θ (Z Y ) (for classification purposes and as a side product of the inference). Bayesian inference. Both θ and Z are random. The aim is then to provide the joint conditional distribution P(θ, Z Y ). Whatever the approach, we have to deal with conditional distributions: P θ (Z Y ) or P(θ, Z Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 16 / 42
35 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42
36 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42
37 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42
38 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), P(Z i, Z j Y ): graph moralization, S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42
39 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), P(Z i, Z j Y ): graph moralization, this holds for each pair (i, j), S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42
40 Statistical inference of latent space models Incomplete data models Conditional distributions (1/2) Graphical models describe the conditional independences between the random variables from a model [Lauritzen (1996)]. Frequentist setting: iid Z i s, P(Y ij Z i, Z j ), P(Z i, Z j Y ): graph moralization, this holds for each pair (i, j), Conditional distribution. The dependency graph of Z given Y is a clique. No factorization can be hoped (unlike for HMM). P θ (Z Y ) can not be computed (efficiently). S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 17 / 42
41 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model:.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42
42 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42
43 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42
44 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π) P(Y γ, Z).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42
45 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π) P(Y γ, Z) P(θ, Z Y ) is even more involved.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42
46 Statistical inference of latent space models Incomplete data models Conditional distributions (2/2) Bayesian perspective. Things get worst because θ = (π, γ) is also random. Model: P(θ) P(Z π) P(Y γ, Z) P(θ, Z Y ) is even more involved. Both frequentist and Bayesian inference require the calculation of conditional distributions that can not be computed. Either sampling (MCMC) or approximation is required.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 18 / 42
47 Statistical inference of latent space models Variational (Bayes) inference Variational (Bayes) inference Variational approximations aim at replacing an intractable exact distribution P with a tractable approximate distribution P. Typically: P θ (Z Y ) i P θ,y (Z i ) P(θ, Z Y ) P Y (θ) P Y (Z) P(θ, Z Y ) P Y (θ) i P Y (Z i ) Popular strategy: minimize the Küllback-Leibler divergence between P and P: min KL[ P(Z) P θ (Z Y )] or min KL[ P(θ, Z) P(θ, Z Y )] Variational EM (VEM) algorithm [Wainwright and Jordan (2008)]. Variational Bayes EM (VBEM) algorithm [Beal and Ghahramani (2003)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 19 / 42
48 Statistical inference of latent space models Variational (Bayes) inference VBEM inference for SBM: E. coli s operon network [Picard et al. (2009)] S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 20 / 42
49 Statistical inference of latent space models Variational (Bayes) inference VBEM inference for SBM: E. coli s operon network Meta-graph representation. [Picard et al. (2009)] S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 20 / 42
50 Statistical inference of latent space models Variational (Bayes) inference VBEM inference for SBM: E. coli s operon network Meta-graph representation. Parameter estimates. K = 5 [Picard et al. (2009)] S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 20 / 42
51 Statistical inference of latent space models Variational (Bayes) inference Accuracy of VBEM estimates for SBM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 :. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 21 / 42
52 Statistical inference of latent space models Variational (Bayes) inference Accuracy of VBEM estimates for SBM: Simulation study Credibility intervals: π 1 : +, γ 11 :, γ 12 :, γ 22 : Width of the posterior credibility intervals. π 1, γ 11, γ 12, γ 22 [Gazal et al. (2012)]. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 21 / 42
53 Statistical inference of latent space models Variational (Bayes) inference First half summary Latent space graph models are useful to describe network heterogeneity. Their statistical inference raises some specific issues. Variational approximations help to circumvent these issues.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 22 / 42
54 Statistical inference of latent space models Variational (Bayes) inference First half summary Latent space graph models are useful to describe network heterogeneity. Their statistical inference raises some specific issues. Variational approximations help to circumvent these issues. And also Theoretical justifications of these approximations exist for SBM: [Celisse et al. (2012)], [Mariadassou and Matias (2014)] VEM and VBEM algorithms have been specifically developed for SBM: [Daudin et al. (2008)], [Latouche et al. (2012)] Model selection (choice of K has also be addressed): [same refs as above].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 22 / 42
55 From SBM to W -graph: Averaging models From SBM to W -graph: Averaging models S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 23 / 42
56 From SBM to W -graph: Averaging models SBM as a W -graph model SBM as a W -graph model Latent variables: Graphon function γ SBM K (z, z ) (Z i ) iid M(1, π) Blockwise constant graphon: γ(z, z ) = γ kl Edges: Pr{Y ij = 1} = γ(z i, Z j ) block widths = π k, block heights γ kl S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 24 / 42
57 From SBM to W -graph: Averaging models SBM as a W -graph model Variational Bayes estimation of γ(z, z ) VBEM inference provides the approximate posteriors: Posterior mean Ẽ(γSBM K (z, z ) Y, K) (π Y ) Dir(π ) (γ kl Y ) Beta(γ 0 kl, γ 1 kl ) Estimate of γ(u, v). Due to the uncertainty of the π k, the posterior mean of γk SBM is smooth (Explicit integration using [Gouda and Szántai (2010)]) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 25 / 42
58 From SBM to W -graph: Averaging models Bayesian model averaging Bayesian model averaging Bayesian model averaging (BMA). Consider a series of models 1,..., K,... in which a certain function of the parameter f (θ) can always be defined.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 26 / 42
59 From SBM to W -graph: Averaging models Bayesian model averaging Bayesian model averaging Bayesian model averaging (BMA). Consider a series of models 1,..., K,... in which a certain function of the parameter f (θ) can always be defined. Bayesian inference within each model K provides the posterior P(θ K, Y ) P(f (θ) K, Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 26 / 42
60 From SBM to W -graph: Averaging models Bayesian model averaging Bayesian model averaging Bayesian model averaging (BMA). Consider a series of models 1,..., K,... in which a certain function of the parameter f (θ) can always be defined. Bayesian inference within each model K provides the posterior P(θ K, Y ) P(f (θ) K, Y ). BMA [Hoeting et al. (1999)] relies on the marginal posterior of f (θ): P(f (θ) Y ) = K P(K Y )P(f (θ) K, Y ).. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 26 / 42
61 From SBM to W -graph: Averaging models Bayesian model averaging Variational Bayes model averaging Pushing it further: Consider the model K as an additional hidden variable: P(Z, θ, K Y ) P(Z, θ, K) := P(Z K) P(θ K) P(K) Note that no additional independence assumption is needed. 1 in terms of Küllback-Leibler divergence. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 27 / 42
62 From SBM to W -graph: Averaging models Bayesian model averaging Variational Bayes model averaging Pushing it further: Consider the model K as an additional hidden variable: P(Z, θ, K Y ) P(Z, θ, K) := P(Z K) P(θ K) P(K) Note that no additional independence assumption is needed. Variational Bayes model averaging (VBMA). The optimal 1 approximation of P(K Y ) satisfies [Volant et al. (2012)]: P(K) P(K)e log P(Y K) KL(K) = P(K Y )e KL(K) where KL(K) = KL[ P(Z, θ K); P(Z, θ Y, K)]. 1 in terms of Küllback-Leibler divergence. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 27 / 42
63 From SBM to W -graph: Averaging models Inferring the graphon function Inferring the graphon function Model averaging: There is no true K in the W -graph model. Apply VBMA recipe to γ(z, z ). For K = 1..K max, fit an SBM model via VBEM and compute γ K SBM (z, z ) = Ẽ[γ C(z),C(z ) Y, K].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 28 / 42
64 From SBM to W -graph: Averaging models Inferring the graphon function Inferring the graphon function Model averaging: There is no true K in the W -graph model. Apply VBMA recipe to γ(z, z ). For K = 1..K max, fit an SBM model via VBEM and compute γ K SBM (z, z ) = Ẽ[γ C(z),C(z ) Y, K]. Then perform model averaging as γ(z, z ) = Ẽ[γ C(z),C(z ) Y ] = K P(K) γ SBM K (z, z ), [Latouche and R. (2013)].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 28 / 42
65 From SBM to W -graph: Averaging models Inferring the graphon function PPI network Like many PPI networks, E. coli s network is highly concentrated around few nodes. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 29 / 42
66 From SBM to W -graph: Averaging models Inferring the graphon function PPI network Like many PPI networks, E. coli s network is highly concentrated around few nodes. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 29 / 42
67 From SBM to W -graph: Averaging models Inferring the graphon function Ecological network between fungal species Link between 2 fungi if they are observed on one common host. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 30 / 42
68 From SBM to W -graph: Averaging models Inferring the graphon function Ecological network between fungal species Link between 2 fungi if they are observed on one common host. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 30 / 42
69 From SBM to W -graph: Averaging models Inferring the graphon function Brain network Links = connexions between areas of the macaque s cortex S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 31 / 42
70 From SBM to W -graph: Averaging models Inferring the graphon function Brain network Links = connexions between areas of the macaque s cortex S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 31 / 42
71 From SBM to W -graph: Averaging models Inferring the graphon function Blog network (non-biological) Links = connexions between French political blogs S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 32 / 42
72 From SBM to W -graph: Averaging models Inferring the graphon function Blog network (non-biological) Links = connexions between French political blogs S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 32 / 42
73 Goodness-of-fit Goodness-of-fit S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 33 / 42
74 Goodness-of-fit Motifs frequency Motifs frequency Network motifs have a biological (or sociological) interpretation in terms of building blocks of the global network Triangles = friends of my friends are my friends. Latent space graph models only describe binary interactions, conditional on the latent positions Goodness of fit criterion based on motif frequencies? S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 34 / 42
75 Goodness-of-fit Motifs frequency Moments of motif counts Moments under SBM: The first moments EN(m), VN(m) of the count are known for exchangeable graph models (incl. SBM) [Picard et al. (2008)]: E SBM N(m) µ SBM (m) =: f (θ SBM ) where µ SBM (m) is the motif occurrence probability under SBM. Moments under W -graph: Motif probability under the W -graph can be estimated as µ(m) = P(K)Ẽ(µ SBM(m) X, K) k Estimates of E W N(m) and V W N(m) can be derived accordingly [Latouche and R. (2013)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 35 / 42
76 Goodness-of-fit Motifs frequency Network frequencies in the blog network Motif Count Mean Std. dev. ( 10 3 ) ( 10 3 ) ( 10 3 ) No specific structure seems to be exceptional wrt the model s expectations. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 36 / 42
77 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42
78 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42
79 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. Results: ICL selects K = 7 groups that are partly related with phylums. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42
80 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. e γ kl T1 T2 T3 T4 T5 T6 T7 T T T T T T T π k SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. Results: ICL selects K = 7 groups that are partly related with phylums.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42
81 Goodness-of-fit Residual graphon Covariates: Tree interaction (valued) network Data: n = 51 tree species, Y ij = number of shared parasites [Vacher et al. (2008)]. e γ kl T1 T2 T3 T4 T5 T6 T7 T T T T T T T π k SBM: Given Z i = k, Z j = l, Y ij P(e γ kl ), γ kl = log-mean number of shared parasites. Results: ICL selects K = 7 groups that are partly related with phylums.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 37 / 42
82 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), [Mariadassou et al. (2010)]. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42
83 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance. S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42
84 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), e λ kl T 1 T 2 T 3 T 4 T T T T π k β [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42
85 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), e λ kl T 1 T 2 T 3 T 4 T T T T π k β [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42
86 Goodness-of-fit Residual graphon Accounting for the taxonomic distance Model: x ij = distance(i, j) Y ij P(e γ kl+βx ij ), e λ kl T 1 T 2 T 3 T 4 T T T T π k β [Mariadassou et al. (2010)]. Results: β = for x = 3.82, e βx =.298 The mean number of shared parasites decreases with taxonomic distance. Groups are no longer associated with the phylogenetic structure. SBM = residual heterogeneity of the regression.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 38 / 42
87 Goodness-of-fit Residual graphon Residual graphon A simple graph model with covariates. When edge covariates x ij are available, simply fit a logistic regression [Pattison and Robins (2007)]: (Y ij ) independent logit p ij = x ijβ. Y ij B(p ij ). Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 39 / 42
88 Goodness-of-fit Residual graphon Residual graphon A simple graph model with covariates. When edge covariates x ij are available, simply fit a logistic regression [Pattison and Robins (2007)]: (Y ij ) independent logit p ij = x ijβ. Y ij B(p ij ) Introducing a residual term. To assess the fit of the model, simply add a residual graphon-like term: (Z i ) iid U[0, 1] logit p ij = x ijβ + γ(z i, Z j ). Y ij Z i, Z j B(p ij ) A VBEM algorithm can be designed to get P(β, θ, Z) P(β, θ, Z Y ): On-going work + [Jaakkola and Jordan (2000)].. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 39 / 42
89 Goodness-of-fit Residual graphon Tree network Binary version: Links between tree species if they host at least one common fungal parasite.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 40 / 42
90 Goodness-of-fit Residual graphon Tree network Binary version: Links between tree species if they host at least one common fungal parasite. Regression: covariates = genetic distance, taxonomic distance, geographic distance The residual graphon is not flat: some heterogeneity remains.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 40 / 42
91 Goodness-of-fit Residual graphon Blog network Blog network: Already shown.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 41 / 42
92 Goodness-of-fit Residual graphon Blog network Blog network: Already shown. Regression: covariates = same political party, pair includes a journalist The residual graphon is still not flat.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 41 / 42
93 Conclusion & future work Some conclusions. The graphon provides a representation of the network topology It can be estimated using variational Bayes inference R packages mixer and blockmodels It can be combined with covariates as a residual term. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42
94 Conclusion & future work Some conclusions. The graphon provides a representation of the network topology It can be estimated using variational Bayes inference R packages mixer and blockmodels It can be combined with covariates as a residual term Future work. Formal goodness-of-fit test Quality of variational Bayes estimates in SBM with covariates Thank you for your attention.. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42
95 Airoldi, E. M., Costa, T. B. and Chan, S. H. (2013). Stochastic blockmodel approximation of a graphon: Theory and consistent estimation. In Advances in Neural Information Processing Systems, Beal, J., M. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayes. Statist Bollobás, B., Janson, S. and Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Rand. Struct. Algo. 31 (1) Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Statis Chatterjee, S. et al. (2014). Matrix estimation by universal singular value thresholding. The Annals of Statistics. 43 (1) Daudin, J.-J., Picard, F. and Robin, S. (Jun, 2008). A mixture model for random graphs. Stat. Comput. 18 (2) Daudin, J.-J., Pierre, L. and Vacher, C. (2010). Model for heterogeneous random networks using continuous latent variables and an application to a tree fungus network. Biometrics. 66 (4) Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. 7 (28) Gazal, S., Daudin, J.-J. and Robin, S. (2012). Accuracy of variational estimates for random graph mixture models. Journal of Statistical Computation and Simulation. 82 (6) Girvan, M. and Newman, M. E. J. (2002). Community strucutre in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (12) Gouda, A. and Szántai, T. (2010). On numerical calculation of probabilities according to Dirichlet distribution. Ann. Oper. Res DOI: /s Handcock, M., Raftery, A. and Tantrum, J. (2007). Model-based clustering for social networks. JRSSA. 170 (2) doi: /j X x. Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science. 14 (4) Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 (460) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42
96 Jaakkola, T. S. and Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing. 10 (1) Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational bayesian inference and complexity control for stochastic block models. Statis. Model. 12 (1) Latouche, P. and Robin, S. (2013), Bayesian model averaging of stochastic block models to estimate the graphon function and motif frequencies in a W-graph model. Technical report, arxiv: Lauritzen, S. (1996). Graphical Models. Oxford Statistical Science Series. Clarendon Press. Lovász, L. and Szegedy, B. (2006). Limits of dense graph sequences. Journal of Combinatorial Theory, Series B. 96 (6) von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Stat. 36 (2) Mariadassou, M., Robin, S. and Vacher, C. (2010). Uncovering structure in valued graphs: a variational approach. Ann. Appl. Statist. 4 (2) Mariadassou, M. and Matias, C. (2014). Convergence of the groups posterior distribution in latent or stochastic block models. Bernoulli.???? to appear. Matias, Catherine and Robin, Stéphane. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM: Proc Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks,. Phys. Rev. E Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E (69) Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block-structures. J. Amer. Statist. Assoc Olhede, S. C. and Wolfe, P. J. (2014). Network histograms and universality of blockmodel approximation. Proceedings of the National Academy of Sciences. 111 (41) Pattison, P. E. and Robins, G. L. (2007). Handbook of Probability Theory with Applications. chapter Probabilistic Network Theory. Sage Publication. Picard, F., Daudin, J.-J., Koskas, M., Schbath, S. and Robin, S. (2008). Assessing the exceptionality of network motifs,. J. Comp. Biol. 15 (1) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42
97 Picard, F., Miele, V., Daudin, J.-J., Cottret, L. and Robin, S. (2009). Deciphering the connectivity structure of biological networks using mixnet. BMC Bioinformatics. Suppl 6 S17. doi: / s6-s17. Vacher, C., Piou, D. and Desprez-Loustau, M.-L. (2008). Architecture of an antagonistic tree/fungus network: The asymmetric influence of past evolutionary history. PLoS ONE. 3 (3) e1740. doi: /journal.pone Volant, S., Magniette, M.-L. M. and Robin, S. (2012). Variational bayes approach for model aggregation in unsupervised classification with markovian dependency. Comput. Statis. & Data Analysis. 56 (8) Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 (1 2) S. Robin Joint work with P. Latouche and S. Ouadah (INRA / Network AgroParisTech) analysis using W -graphs IMS, Singapore 42 / 42
Mixture Models for Genomic Data
Mixture Models for Genomic Data S. Robin AgroParisTech / INRA École de Printemps en Apprentissage automatique, Baie de somme, May 2010 S. Robin (AgroParisTech / INRA) Mixture Models May 10 1 / 48 Outline
More informationA mixture model for random graphs
A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationStatistical and computational challenges in networks and cybersecurity
Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationExponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009
Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationMessage-passing sequential detection of multiple change points in networks
Message-passing sequential detection of multiple change points in networks Long Nguyen, Arash Amini Ram Rajagopal University of Michigan Stanford University ISIT, Boston, July 2012 Nguyen/Amini/Rajagopal
More informationFalse Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationComplex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich nefedov@isi.ee.ethz.ch 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationGraph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationBig Data, Machine Learning, Causal Models
Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion
More informationEquivalence Concepts for Social Networks
Equivalence Concepts for Social Networks Tom A.B. Snijders University of Oxford March 26, 2009 c Tom A.B. Snijders (University of Oxford) Equivalences in networks March 26, 2009 1 / 40 Outline Structural
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
More informationUnsupervised Learning
Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK zoubin@gatsby.ucl.ac.uk http://www.gatsby.ucl.ac.uk/~zoubin September 16, 2004 Abstract We give
More informationStatistical Inference for Networks Graduate Lectures. Hilary Term 2009 Prof. Gesine Reinert
Statistical Inference for Networks Graduate Lectures Hilary Term 2009 Prof. Gesine Reinert 1 Overview 1: Network summaries. What are networks? Some examples from social science and from biology. The need
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationDATA MINING IN FINANCE
DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationMachine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
More informationBig graphs: Theory and Practice, January 6-8, 2016, UC San Diego. Abstracts
Big graphs: Theory and Practice, January 6-8, 2016, UC San Diego Anima Anandkumar (UC Irvine) Abstracts Learning mixed membership community models via spectral methods Abstract: Learning hidden communities
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationMS1b Statistical Data Mining
MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationPrinciples of Data Mining by Hand&Mannila&Smyth
Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences
More informationChapter ML:XI (continued)
Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationStationary random graphs on Z with prescribed iid degrees and finite mean connections
Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative
More informationA scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationInference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationOpen Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *
Send Orders for Reprints to reprints@benthamscience.ae 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network
More informationUn point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationModel-Based Cluster Analysis for Web Users Sessions
Model-Based Cluster Analysis for Web Users Sessions George Pallis, Lefteris Angelis, and Athena Vakali Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece gpallis@ccf.auth.gr
More informationCHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS
Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships
More informationPROBABILISTIC NETWORK ANALYSIS
½ PROBABILISTIC NETWORK ANALYSIS PHILIPPA PATTISON AND GARRY ROBINS INTRODUCTION The aim of this chapter is to describe the foundations of probabilistic network theory. We review the development of the
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationHow To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationMultiple Location Profiling for Users and Relationships from Social Network and Content
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li ruili1@illinois.edu Shengjie Wang wang260@illinois.edu Kevin Chen-Chuan Chang, kcchang@illinois.edu Department
More informationData Mining and Neural Networks in Stata
Data Mining and Neural Networks in Stata 2 nd Italian Stata Users Group Meeting Milano, 10 October 2005 Mario Lucchini e Maurizo Pisati Università di Milano-Bicocca mario.lucchini@unimib.it maurizio.pisati@unimib.it
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationMonte Carlo testing with Big Data
Monte Carlo testing with Big Data Patrick Rubin-Delanchy University of Bristol & Heilbronn Institute for Mathematical Research Joint work with: Axel Gandy (Imperial College London) with contributions from:
More informationItem selection by latent class-based methods: an application to nursing homes evaluation
Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationHandling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationData Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationA Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences Eric P. Xing, Michael I. Jordan, Richard M. Karp and Stuart Russell Computer Science Division University of California, Berkeley
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationBayesian Clustering for Email Campaign Detection
Peter Haider haider@cs.uni-potsdam.de Tobias Scheffer scheffer@cs.uni-potsdam.de University of Potsdam, Department of Computer Science, August-Bebel-Strasse 89, 14482 Potsdam, Germany Abstract We discuss
More informationMachine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
More informationPublication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore
Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationIntroduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
More informationOn the shape of binary trees
On the shape of binary trees Mireille Bousquet-Mélou, CNRS, LaBRI, Bordeaux ArXiv math.co/050266 ArXiv math.pr/0500322 (with Svante Janson) http://www.labri.fr/ bousquet A complete binary tree n internal
More informationPS 271B: Quantitative Methods II. Lecture Notes
PS 271B: Quantitative Methods II Lecture Notes Langche Zeng zeng@ucsd.edu The Empirical Research Process; Fundamental Methodological Issues 2 Theory; Data; Models/model selection; Estimation; Inference.
More informationINFERRING GENE DEPENDENCY NETWORKS FROM GENOMIC LONGITUDINAL DATA: A FUNCTIONAL DATA APPROACH
REVSTAT Statistical Journal Volume 4, Number 1, March 2006, 53 65 INFERRING GENE DEPENDENCY NETWORKS FROM GENOMIC LONGITUDINAL DATA: A FUNCTIONAL DATA APPROACH Authors: Rainer Opgen-Rhein Department of
More informationCHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA
Examples: Mixture Modeling With Longitudinal Data CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA Mixture modeling refers to modeling with categorical latent variables that represent subpopulations
More informationSYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis
SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October 17, 2015 Outline
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationCluster detection algorithm in neural networks
Cluster detection algorithm in neural networks David Meunier and Hélène Paugam-Moisy Institute for Cognitive Science, UMR CNRS 5015 67, boulevard Pinel F-69675 BRON - France E-mail: {dmeunier,hpaugam}@isc.cnrs.fr
More informationFinding the M Most Probable Configurations Using Loopy Belief Propagation
Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationSTATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.
STATISTICS Statistics is one of the natural, mathematical, and biomedical sciences programs in the Columbian College of Arts and Sciences. The curriculum emphasizes the important role of statistics as
More informationUsing Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
More informationNETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
More information