Graphical Modeling for Genomic Data

Size: px
Start display at page:

Download "Graphical Modeling for Genomic Data"

Transcription

1 Graphical Modeling for Genomic Data Carel F.W. Peeters Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics VU University medical center Amsterdam, the Netherlands Summer School: Big Data in Clinical Medicine Enschede, 03/07/2014 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

2 Outline 1 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Omics and Genomic Data Approaches and Desire 2 Preliminaries II: Graphical Modeling Pathways and Graphs Undirected Graphical Modeling Directed Graphical Modeling 3 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision The Ridge Precision Estimator Illustration 4 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model as Graphical Object Illustration 5 So What and Further Research CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

3 Preliminaries I: Molecular Biology and Genomics Data Outline 1 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Omics and Genomic Data Approaches and Desire 2 Preliminaries II: Graphical Modeling Pathways and Graphs Undirected Graphical Modeling Directed Graphical Modeling 3 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision The Ridge Precision Estimator Illustration 4 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model as Graphical Object Illustration 5 So What and Further Research CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

4 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology The eukaryotic cell Cell Smallest independent living unit. Contains a complete copy of the genome. Genome Total genetic constitution of an organism: the full (haploid) set of chromosomes with all its genes. CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

5 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Chromosome Chromosome A structure of coiled DNA. Chromosomal DNA encodes genetic information. CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

6 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Genes CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

7 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Central dogma molecular biology Illustration: CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

8 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Complexities DNA copy number (CN) Normal: Each somatic cell contains 2 copies of every chromosome Aberration: Abnormal number of copies of one or more sections of DNA Logic: CN GE ; CN GE CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

9 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Complexities DNA methylation Refers to the addition of methyl group to CpG site Pre-transcriptional regulator of gene expression Logic: If CpG-site methylated gene off Illustration: CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

10 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Complexities Gene Transcription mrna Translation Protein mir micro RNA (mirna) A family of small RNAs, approx. 22 nucleotides in length Bind to sequences of complementarity in target mrna Post-transcriptional regulators of mrna Logic: mirna GE ; mirna GE RNA degradation or limiting of RNA translation Implicated in cancer CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

11 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Message Message Not enough to look at gene expression alone Integration The functional statistical integration of data from multiple high-throughput omics platforms Why go integrative? Regulatory mechanisms can only be understood at multiple genomic levels Detection of more robust markers (in terms regulatory significance) CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

12 Preliminaries I: Molecular Biology and Genomics Data Omics and Genomic Data Omics and omics data -ome A totality of some (molecular biological) sort -omics Collective quantification of some pool of molecular molecules Genomics The omics of the genome (of some organism) CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

13 Preliminaries I: Molecular Biology and Genomics Data Omics and Genomic Data Array data CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

14 Preliminaries I: Molecular Biology and Genomics Data Omics and Genomic Data Design CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

15 Preliminaries I: Molecular Biology and Genomics Data Omics and Genomic Data Profiles CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

16 Preliminaries I: Molecular Biology and Genomics Data Omics and Genomic Data Challenge: Dimensionality genomic data CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

17 Preliminaries I: Molecular Biology and Genomics Data Omics and Genomic Data Challenge: Dimensionality genomic data CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

18 Preliminaries I: Molecular Biology and Genomics Data Approaches and Desire Unit of analysis DNA gene DNA region DNA pathway CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

19 Preliminaries I: Molecular Biology and Genomics Data Approaches and Desire Featurewise and regional analyzes Approach Restrict dimension model Test model across genome Employ familywise error control CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

20 Preliminaries I: Molecular Biology and Genomics Data Approaches and Desire Our focus: Pathways CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

21 Preliminaries I: Molecular Biology and Genomics Data Approaches and Desire Motivation Pathways Knowledge incomplete Knowledge biased towards well-known pathways Loosely defined using repositories (e.g., KEGG) CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

22 Preliminaries I: Molecular Biology and Genomics Data Approaches and Desire Motivation Desire Consider data from multiple genomic platforms Exploratively infer graph (reconstruct topology) Cope with high-dimensional situation Maintain computational friendliness CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

23 Preliminaries II: Graphical Modeling Outline 1 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Omics and Genomic Data Approaches and Desire 2 Preliminaries II: Graphical Modeling Pathways and Graphs Undirected Graphical Modeling Directed Graphical Modeling 3 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision The Ridge Precision Estimator Illustration 4 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model as Graphical Object Illustration 5 So What and Further Research CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

24 Preliminaries II: Graphical Modeling Pathways and Graphs Graphs Representation Pathways are represented by a graph (or network) Vertices Node or vertex represents molecular feature Edges Edge or arrow represents some functional relation CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

25 Preliminaries II: Graphical Modeling Pathways and Graphs Correlation networks Example Three variables: Y 1, Y 2, and Y 3 cor(y 1, Y 2) = 0 cor(y 1, Y 3) = 0 cor(y 2, Y 3) 0 Marginal dependence Undirected edge represents marginal dependence CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

26 Preliminaries II: Graphical Modeling Pathways and Graphs Interpretational danger CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

27 Preliminaries II: Graphical Modeling Pathways and Graphs Solution: Conditioning CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

28 Preliminaries II: Graphical Modeling Pathways and Graphs Solution: Conditioning CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

29 Preliminaries II: Graphical Modeling Pathways and Graphs Solution: Conditioning CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

30 Preliminaries II: Graphical Modeling Pathways and Graphs Conditional dependence Partial correlation Measures degree of association between two random variables when controlling for third variables Conditioned correlation cor(y 1, Y 2 Y 3) cor(y 1, Y 3 Y 2) cor(y 2, Y 3 Y 1) If, e.g., cor(y 2, Y 3 Y 1) = 0, we say Y 2 and Y 3 are independent given Y 1 cor(y 1, Y 2 Y 3) 0 cor(y 1, Y 3 Y 2) 0 cor(y 2, Y 3 Y 1) = 0 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

31 Preliminaries II: Graphical Modeling Undirected Graphical Modeling Gaussian graphical modeling Graphical modeling A class of probabilistic models utilizing graphs to express conditional (in)dependence relations between random variables Gaussian setting Vertices: Correspond to random variables with normal distribution Edges: Correspond to the conditional dependence structure Say y N p(0, Σ), and define Σ 1 Ω. Then, for Y j, Y j vertex set V, j j ω jj ωjj ω j j = 0 ω jj = 0 Y j Y j V \ {Y j, Y j } Y j Y j ω 11 ω 12 ω 13 ω 14 ω 21 ω ω 31 0 ω 33 ω 34 ω 41 0 ω 43 ω 44 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

32 Preliminaries II: Graphical Modeling Undirected Graphical Modeling Gaussian graphical modeling Ω = ω 11 ω 12 ω 13 ω 21 ω 22 0 ω 31 0 ω 33 Σ = Ω 1 = σ 11 σ 12 σ 13 σ 21 σ 22 σ 23 σ 31 σ 32 σ 33 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

33 Preliminaries II: Graphical Modeling Directed Graphical Modeling Undirected and directed graphs CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

34 Preliminaries II: Graphical Modeling Directed Graphical Modeling d-separation CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

35 Undirected Graphical Modeling with the Graphical Ridge Outline 1 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Omics and Genomic Data Approaches and Desire 2 Preliminaries II: Graphical Modeling Pathways and Graphs Undirected Graphical Modeling Directed Graphical Modeling 3 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision The Ridge Precision Estimator Illustration 4 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model as Graphical Object Illustration 5 So What and Further Research CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

36 Undirected Graphical Modeling with the Graphical Ridge To start: Easy Code > CVres <- optpenalty.aloocv(y, , 0.01, step=100) > rprec <- ridges(cov(y), CVres$optLambda) > P0 <- sparsify(symm(rprec), type="localfdr", FDRcut=0.95) > Ugraph(P0, type="fancy", prune=true) CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

37 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision Setting Consider p denotes the number of variables n denotes the number of observations The Sample Covariance matrix Let S denote the sample covariance matrix Inverse S 1 is proportional to the partial correlation matrix Usage Many statistical models directly dependent on S and its inverse S 1 : Multivariate regression Factor analysis Structural equation models Graphical models... CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

38 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision Problem However When n close to p: S is ill-behaved When p > n: S is singular and its inverse S 1 is undefined Desired Provision allowing graphical modeling when p > n CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

39 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision Explaining the inverse The scalar inverse Let a denote a number (excluding 0) The inverse is then the number b such that a b = 1 Clearly, b = 1 a Matrix A matrix is a generalization of a number, an array of numbers a 11 a 12 a 1p a 21 a 22 a 2p A = a p1 a p2 a pp CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

40 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision Explaining the inverse The Matrix Inverse Consider the matrix A. Its inverse B = A 1 is defined such that AB = I, where I = Solution A 1 = [ A A 1 11 A12Q 1 A 21A 1 11 A 1 11 A12Q 1 Q 1 A 21A 1 11 Q 1 with Q 1 denoting the Schur complement and Q = A 22 A 21A 1 11 A12. ], CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

41 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision Singularity CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

42 Undirected Graphical Modeling with the Graphical Ridge The Ridge Precision Estimator Ridge estimator of the precision matrix Ridge regularization Analytic penalized ML estimator: where { [ ˆΩ(λ) = λi p + 1 ] } 1/2 (S λt) (S λt), 4 2 T denotes a p.d. symmetric target matrix λ (0, ) denotes a penalty parameter To do Choose value penalty parameter Determine support CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

43 Undirected Graphical Modeling with the Graphical Ridge The Ridge Precision Estimator Visual explanation CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

44 Undirected Graphical Modeling with the Graphical Ridge The Ridge Precision Estimator Choosing the penalty value K-fold cross-validation (CV) Single iteration of K-fold CV CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

45 Undirected Graphical Modeling with the Graphical Ridge The Ridge Precision Estimator Choosing the penalty value K-fold CV score ϕ K (λ) = K k=1 } n k { ln ˆΩ(λ) k + tr[ ˆΩ(λ) k S k ], n k is the size of subset k, for k = 1,..., K disjoint subsets; S k denotes the sample covariance matrix on kth test set; ˆΩ(λ) k denotes the estimated regularized precision matrix on kth training set Highest predictive accuracy Choose n k = 1, such that K = n (known as leave-one-out CV - LOOCV) Problem K-fold CV is computationally demanding for large p and/or large K Solution Computationally efficient approximate LOOCV score CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

46 Undirected Graphical Modeling with the Graphical Ridge The Ridge Precision Estimator Edge selection Mixture distribution Partial correlation distribution modeled by mixture distribution: η 0 [0, 1] is the mixture weight; f 0 is the distribution of a null-edge; f ε is the distribution of a present edge η 0f 0 + (1 η 0)f ε Posterior probability edge presence Allows to determine empirical posterior probability that edge is present given the value of the estimated partial correlation CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

47 Undirected Graphical Modeling with the Graphical Ridge Illustration Example Data TCGA breast cancer data ( MAPK pathway genes (as defined by KEGG) p = 262, n = 496 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

48 Undirected Graphical Modeling with the Graphical Ridge Illustration Comparison Data UPP ER+ breast cancer data ( Apoptosis pathway genes (as defined by KEGG) p = 83 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

49 Undirected Graphical Modeling with the Graphical Ridge Illustration Comparison CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

50 Undirected Graphical Modeling with the Graphical Ridge Illustration Software rags2ridges R package that implements The ridge estimator Supporting functionalities for graphical modeling Availability Available for free from the Comprehensive R Archive Network: R R is a free software programming language and software environment for statistical computing CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

51 Directed Cyclic Mixed Graphs for Genomic Data Integration Outline 1 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Omics and Genomic Data Approaches and Desire 2 Preliminaries II: Graphical Modeling Pathways and Graphs Undirected Graphical Modeling Directed Graphical Modeling 3 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision The Ridge Precision Estimator Illustration 4 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model as Graphical Object Illustration 5 So What and Further Research CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

52 Directed Cyclic Mixed Graphs for Genomic Data Integration CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

53 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model and assumptions Model The SEM model we consider can be expressed as: y i := By i + Γx i + ɛ i, i = 1,..., n. Assumptions 1 Properly preprocessed data 2 y i y i, i i 3 ɛ i N p(0, Ψ), with Ψ diag[ψ 11,..., ψ pp], and ψ jj > 0, j 4 x i N q(0, Φ), with Φ 0 5 x i ɛ i, i, i 6 (I p B) is nonsingular and β jj = 0, j CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

54 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Graphical representation Question Can we read off conditional independencies? CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

55 Directed Cyclic Mixed Graphs for Genomic Data Integration Model as Graphical Object m-separation Stretching idea of the collider CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

56 Directed Cyclic Mixed Graphs for Genomic Data Integration Model as Graphical Object Directed cyclic mixed graph CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

57 Directed Cyclic Mixed Graphs for Genomic Data Integration Model as Graphical Object Approach Steps 1 Regularize the joint sample covariance matrix on y i and x i 2 Test for vanishing partial correlations to obtain sparse representation 3 Solve for parameters with simple iterative algorithm ω yy 11 ω yy 12 ω yy 13 ω yy 14 ω yx 11 ω yx ω yy 21 ω yy 22 ω yy 23 ω yy 24 ω yx ω yy 31 ω yy 32 ω yy 33 0 ω yx ω yy 41 ω yy 42 0 ω yy 44 0 ω yx ω xy 11 ω xy 12 ω xy 13 0 ω xx 11 0 ω xx 13 ω xx 14 ω xy ω xy 24 0 ω xx ω xx 31 0 ω xx 33 ω xx ω xx 41 0 ω xx 43 ω xx 44 CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

58 Directed Cyclic Mixed Graphs for Genomic Data Integration Illustration Application: GBM CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

59 Directed Cyclic Mixed Graphs for Genomic Data Integration Illustration Application: GBM CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

60 Directed Cyclic Mixed Graphs for Genomic Data Integration Illustration Application: GBM CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

61 Directed Cyclic Mixed Graphs for Genomic Data Integration Illustration Application: GBM CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

62 Directed Cyclic Mixed Graphs for Genomic Data Integration Illustration Application: GBM CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

63 So What and Further Research Outline 1 Preliminaries I: Molecular Biology and Genomics Data Some Molecular Biology Omics and Genomic Data Approaches and Desire 2 Preliminaries II: Graphical Modeling Pathways and Graphs Undirected Graphical Modeling Directed Graphical Modeling 3 Undirected Graphical Modeling with the Graphical Ridge Sample Covariance and Precision The Ridge Precision Estimator Illustration 4 Directed Cyclic Mixed Graphs for Genomic Data Integration Model Model as Graphical Object Illustration 5 So What and Further Research CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

64 So What and Further Research So what? Why of interest Enables exploration networks in situations unsuitable for standard statistics Can aid in the identification of more robust markers Can point to markers of interest for perturbation experiments Can aid in focussing temporal experiments CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

65 So What and Further Research Further research Extensions Consider data from more than 2 platforms Modeling differential networks Modeling temporal networks CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

66 References References Koster, J.T.A. (1996) Markov Properties of Nonrecursive Causal Models. Annals of Statistics, 24:2148 Pearl, J. (2009, 2nd ed.) Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press Peeters, C.F.W., & van Wieringen, W.N. (2014) rags2ridges: Ridge estimation of precision matrices from high-dimensional data. R Package Version 1.2 Peeters, C.F.W., van Wieringen, W.N., & van de Wiel, M.A. (in preparation) Gaussian Directed Cyclic Mixed Graph Modeling for Genomic Data Integration. Richardson, T. (2003) Markov properties for acyclic directed mixed graphs. Scandinavian Journal of Statistics, 30:145 Schäfer, J., & K. Strimmer (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4:32 Vujačić, I. and Abbruzzo, A. and Wit, E. C. (2014) A computationally fast alternative to cross-validation in penalized Gaussian graphical models. arxiv: v2 [stat.me] van Wieringen, W.N. & Peeters, C.F.W. (under review) Ridge Estimation of Inverse Covariance Matrices from High-Dimensional Data. arxiv: [stat.me] CFWP (VUmc) Graphs for Genomic Data Enschede, 03/07/ / 66

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca

More information

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

5 Directed acyclic graphs

5 Directed acyclic graphs 5 Directed acyclic graphs (5.1) Introduction In many statistical studies we have prior knowledge about a temporal or causal ordering of the variables. In this chapter we will use directed graphs to incorporate

More information

Extracting correlation structure from large random matrices

Extracting correlation structure from large random matrices Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4

More information

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models Journal of Business Finance & Accounting, 30(9) & (10), Nov./Dec. 2003, 0306-686X Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models ELENA STANGHELLINI* 1. INTRODUCTION Consumer

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Name Class Date. Figure 13 1. 2. Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d. 13 Multiple Choice RNA and Protein Synthesis Chapter Test A Write the letter that best answers the question or completes the statement on the line provided. 1. Which of the following are found in both

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Identifying Gene Regulatory Networks from Gene Expression Data

Identifying Gene Regulatory Networks from Gene Expression Data 27 Identifying Gene Regulatory Networks from Gene Expression Data Vladimir Filkov University of California, Davis 27.1 Introduction... 27-1 27.2 Gene Networks... 27-2 Definition Biological Properties Utility

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16 Course Director: Dr. Barry Grant (DCM&B, bjgrant@med.umich.edu) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems

More information

Traffic Driven Analysis of Cellular Data Networks

Traffic Driven Analysis of Cellular Data Networks Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu

More information

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006

Special report. Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Special report Chronic Lymphocytic Leukemia (CLL) Genomic Biology 3020 April 20, 2006 Gene And Protein The gene that causes the mutation is CCND1 and the protein NP_444284 The mutation deals with the cell

More information

Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Vertical data integration for melanoma prognosis. Australia 3 Melanoma Institute Australia, NSW 2060 Australia. kaushala@maths.usyd.edu.au.

Vertical data integration for melanoma prognosis. Australia 3 Melanoma Institute Australia, NSW 2060 Australia. kaushala@maths.usyd.edu.au. Vertical integration for melanoma prognosis Kaushala Jayawardana 1,4, Samuel Müller 1, Sarah-Jane Schramm 2,3, Graham J. Mann 2,3 and Jean Yang 1 1 School of Mathematics and Statistics, University of Sydney,

More information

Protein Synthesis How Genes Become Constituent Molecules

Protein Synthesis How Genes Become Constituent Molecules Protein Synthesis Protein Synthesis How Genes Become Constituent Molecules Mendel and The Idea of Gene What is a Chromosome? A chromosome is a molecule of DNA 50% 50% 1. True 2. False True False Protein

More information

Qualitative modeling of biological systems

Qualitative modeling of biological systems Qualitative modeling of biological systems The functional form of regulatory relationships and kinetic parameters are often unknown Increasing evidence for robustness to changes in kinetic parameters.

More information

GenBank, Entrez, & FASTA

GenBank, Entrez, & FASTA GenBank, Entrez, & FASTA Nucleotide Sequence Databases First generation GenBank is a representative example started as sort of a museum to preserve knowledge of a sequence from first discovery great repositories,

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression (Learning Objectives) Explain the role of gene expression is differentiation of function of cells which leads to the emergence of different tissues, organs, and organ systems

More information

WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE

WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE WORKSHOP ON TOPOLOGY AND ABSTRACT ALGEBRA FOR BIOMEDICINE ERIC K. NEUMANN Foundation Medicine, Cambridge, MA 02139, USA Email: eneumann@foundationmedicine.com SVETLANA LOCKWOOD School of Electrical Engineering

More information

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016

Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings

More information

Cancer Modification and Networked Partial Correlations

Cancer Modification and Networked Partial Correlations The Networked Partial Correlation and its Application to the Analysis of Genetic Interactions arxiv:1510.02510v2 [q-bio.qm] 5 May 2016 Alberto Roverato Università di Bologna, Italy alberto.roverato@unibo.it

More information

Understanding the dynamics and function of cellular networks

Understanding the dynamics and function of cellular networks Understanding the dynamics and function of cellular networks Cells are complex systems functionally diverse elements diverse interactions that form networks signal transduction-, gene regulatory-, metabolic-

More information

Visualizing Networks: Cytoscape. Prat Thiru

Visualizing Networks: Cytoscape. Prat Thiru Visualizing Networks: Cytoscape Prat Thiru Outline Introduction to Networks Network Basics Visualization Inferences Cytoscape Demo 2 Why (Biological) Networks? 3 Networks: An Integrative Approach Zvelebil,

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Control of Gene Expression

Control of Gene Expression Control of Gene Expression What is Gene Expression? Gene expression is the process by which informa9on from a gene is used in the synthesis of a func9onal gene product. What is Gene Expression? Figure

More information

A role of microrna in the regulation of telomerase? Yuan Ming Yeh, Pei Rong Huang, and Tzu Chien V. Wang

A role of microrna in the regulation of telomerase? Yuan Ming Yeh, Pei Rong Huang, and Tzu Chien V. Wang A role of microrna in the regulation of telomerase? Yuan Ming Yeh, Pei Rong Huang, and Tzu Chien V. Wang Department of Molecular and Cellular Biology, Chang Gung University, Kwei San, Tao Yuan 333, Taiwan

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Human Genome Organization: An Update. Genome Organization: An Update

Human Genome Organization: An Update. Genome Organization: An Update Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion

More information

3. The Junction Tree Algorithms

3. The Junction Tree Algorithms A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )

More information

Genetomic Promototypes

Genetomic Promototypes Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,

More information

Some probability and statistics

Some probability and statistics Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,

More information

Model-based Synthesis. Tony O Hagan

Model-based Synthesis. Tony O Hagan Model-based Synthesis Tony O Hagan Stochastic models Synthesising evidence through a statistical model 2 Evidence Synthesis (Session 3), Helsinki, 28/10/11 Graphical modelling The kinds of models that

More information

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

More information

Computational localization of promoters and transcription start sites in mammalian genomes

Computational localization of promoters and transcription start sites in mammalian genomes Computational localization of promoters and transcription start sites in mammalian genomes Thomas Down This dissertation is submitted for the degree of Doctor of Philosophy Wellcome Trust Sanger Institute

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Granger-causality graphs for multivariate time series

Granger-causality graphs for multivariate time series Granger-causality graphs for multivariate time series Michael Eichler Universität Heidelberg Abstract In this paper, we discuss the properties of mixed graphs which visualize causal relationships between

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Constrained Least Squares

Constrained Least Squares Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587 CICN may05/1 Background The least squares problem: min Ax b 2 x Sometimes,

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold

More information

Basic Concepts of DNA, Proteins, Genes and Genomes

Basic Concepts of DNA, Proteins, Genes and Genomes Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Translation Study Guide

Translation Study Guide Translation Study Guide This study guide is a written version of the material you have seen presented in the replication unit. In translation, the cell uses the genetic information contained in mrna to

More information

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized: Module 3F Protein Synthesis So far in this unit, we have examined: How genes are transmitted from one generation to the next Where genes are located What genes are made of How genes are replicated How

More information

Molecular Computing. david.wishart@ualberta.ca 3-41 Athabasca Hall Sept. 30, 2013

Molecular Computing. david.wishart@ualberta.ca 3-41 Athabasca Hall Sept. 30, 2013 Molecular Computing david.wishart@ualberta.ca 3-41 Athabasca Hall Sept. 30, 2013 What Was The World s First Computer? The World s First Computer? ENIAC - 1946 Antikythera Mechanism - 80 BP Babbage Analytical

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening

Micro RNAs: potentielle Biomarker für das. Blutspenderscreening Micro RNAs: potentielle Biomarker für das Blutspenderscreening micrornas - Background Types of RNA -Coding: messenger RNA (mrna) -Non-coding (examples): Ribosomal RNA (rrna) Transfer RNA (trna) Small nuclear

More information

GENE REGULATION. Teacher Packet

GENE REGULATION. Teacher Packet AP * BIOLOGY GENE REGULATION Teacher Packet AP* is a trademark of the College Entrance Examination Board. The College Entrance Examination Board was not involved in the production of this material. Pictures

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

More information

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk

Portfolio Distribution Modelling and Computation. Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk Portfolio Distribution Modelling and Computation Harry Zheng Department of Mathematics Imperial College h.zheng@imperial.ac.uk Workshop on Fast Financial Algorithms Tanaka Business School Imperial College

More information

EPIGENETICS DNA and Histone Model

EPIGENETICS DNA and Histone Model EPIGENETICS ABSTRACT A 3-D cut-and-paste model depicting how histone, acetyl and methyl molecules control access to DNA and affect gene expression. LOGISTICS TIME REQUIRED LEARNING OBJECTIVES DNA is coiled

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University

Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation

More information

Inferring the role of transcription factors in regulatory networks

Inferring the role of transcription factors in regulatory networks Inferring the role of transcription factors in regulatory networks Philippe Veber 1, Carito Guziolowski 1, Michel Le Borgne 2, Ovidiu Radulescu 1,3 and Anne Siegel 4 1 Centre INRIA Rennes Bretagne Atlantique,

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

1 Mutation and Genetic Change

1 Mutation and Genetic Change CHAPTER 14 1 Mutation and Genetic Change SECTION Genes in Action KEY IDEAS As you read this section, keep these questions in mind: What is the origin of genetic differences among organisms? What kinds

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

A mixture model for random graphs

A mixture model for random graphs A mixture model for random graphs J-J Daudin, F. Picard, S. Robin robin@inapg.inra.fr UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:

More information

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Molecular Genetics. RNA, Transcription, & Protein Synthesis Molecular Genetics RNA, Transcription, & Protein Synthesis Section 1 RNA AND TRANSCRIPTION Objectives Describe the primary functions of RNA Identify how RNA differs from DNA Describe the structure and

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Expression Quantification (I)

Expression Quantification (I) Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task

More information

Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing

Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center for Causal Discovery (CCD) of Biomedical Knowledge from Big Data University of Pittsburgh Carnegie Mellon University Pittsburgh Supercomputing Center Yale University PIs: Ivet Bahar, Jeremy Berg,

More information

Gene Models & Bed format: What they represent.

Gene Models & Bed format: What they represent. GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step

More information

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

In this section, we will consider techniques for solving problems of this type.

In this section, we will consider techniques for solving problems of this type. Constrained optimisation roblems in economics typically involve maximising some quantity, such as utility or profit, subject to a constraint for example income. We shall therefore need techniques for solving

More information

Systems Biology through Data Analysis and Simulation

Systems Biology through Data Analysis and Simulation Biomolecular Networks Initiative Systems Biology through Data Analysis and Simulation William Cannon Computational Biosciences 5/30/03 Cellular Dynamics Microbial Cell Dynamics Data Mining Nitrate NARX

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information