Probabilistic methods for post-genomic data integration

Size: px
Start display at page:

Download "Probabilistic methods for post-genomic data integration"

Transcription

1 Probabilistic methods for post-genomic data integration Dirk Husmeier Biomathematics & Statistics Scotland (BioSS) JMB, The King s Buildings, Edinburgh EH9 3JZ United Kingdom dirk

2 Integrated analysis of regulatory networks

3 Integrated analysis of regulatory networks Expression data alone are not sufficient ombining multiple sources of information yields complementary constraints

4 ombining promoter sequences and gene expression data

5 ombining promoter sequences and gene expression data onventional approach: Find clusters of co-expressed genes Identify regulatory elements by searching for common over-represented motifs in the promoter regions of these genes

6 Shortcomings of the conventional algorithm

7 Microarray data Model Promoter sequences

8 Microarray data Model Promoter sequences

9 Microarray data Model Promoter sequences

10 Segal s unifying probabilistic model

11 Microarray data Model Promoter sequences

12 Microarray data Model Promoter sequences

13 Microarray data Model Promoter sequences

14 Segal, Yelensky, Koller (2003) Bioinformatics 19

15 Segal, Yelensky, Koller (2003) Bioinformatics 19 Revision: Motif finding

16 Motif: TAT A G T G A A T T T A T A G A

17 Motif: TAT A G T G A A T T T A T A G A

18 Motif: TAT A G T G A A T T T A T A G A

19 Motif: TAT A G T G A A T T T A T A G A

20 Motif: TAT A G T G A A T T T A T A G A

21 Motif: TAT A G T G A A T T T A T A G A

22 Motif: TAT A G T G A A T T T A T A G A

23 Position Specific Scoring Matrix (PSSM) Search for a motif of length W in binding sequences

24 Position Specific Scoring Matrix (PSSM) Search for a motif of length W in binding sequences W 4 matrix ψ k (l): Probability that the nucleotide in the kth position, k [1,, W ], is an l {A,, G, T }

25 Position Specific Scoring Matrix (PSSM) Search for a motif of length W in binding sequences W 4 matrix ψ k (l): Probability that the nucleotide in the kth position, k [1,, W ], is an l {A,, G, T } Background model for non-binding sequences 4-dim vector θ 0 (l): Probability of nucleotide l; this distribution is position-independent

26 Sequence S 1, S 2,, S N

27 Sequence S 1, S 2,, S N Non-binding sequence: R=0 P (S 1, S 2,, S N R = 0) = N θ 0 (S t ) t=1

28 Sequence S 1, S 2,, S N Non-binding sequence: R=0 P (S 1, S 2,, S N R = 0) = N θ 0 (S t ) Binding sequence: R=1, motif starting at position m+1 t=1 k=1 t=1 P (S 1, S 2,, S N R = 1, start = m + 1) m W N = θ 0 (S t ) ψ k (S m+k ) θ 0 (S t ) = N θ 0 (S t ) t=1 W k=1 ψ k (S m+k ) θ 0 (S m+k ) t=m+w +1

29 Binding sequence: R=1, motif starting at position m+1 P (S 1, S 2,, S N R = 1, start = m + 1) = N W θ 0 (S t ) t=1 k=1 ψ k (S m+k ) θ 0 (S m+k )

30 Binding sequence: R=1, motif starting at position m+1 P (S 1, S 2,, S N R = 1, start = m + 1) = N θ 0 (S t ) t=1 W k=1 Binding sequence: R=1, motif starting anywhere P (S 1, S 2,, S N R = 1) = = N W m=0 ψ k (S m+k ) θ 0 (S m+k ) P (start = m + 1)P (S 1, S 2,, S N R = 1, start = m + 1) N 1 θ 0 (S t ) N W + 1 t=1 N W m=0 W k=1 ψ k (S m+k ) θ 0 (S m+k )

31 Binding sequence: R=1, motif starting at position m+1 P (S 1, S 2,, S N R = 1, start = m + 1) = N θ 0 (S t ) t=1 W k=1 Binding sequence: R=1, motif starting anywhere P (S 1, S 2,, S N R = 1) = = N W m=0 ψ k (S m+k ) θ 0 (S m+k ) P (start = m + 1)P (S 1, S 2,, S N R = 1, start = m + 1) N 1 θ 0 (S t ) N W + 1 t=1 N W m=0 W k=1 ψ k (S m+k ) θ 0 (S m+k ) Objective: Prediction of binding activity from sequence: P (R = 1 S 1, S 2,, S N )

32 P (R = 1 S 1, S 2,, S N ) = = = Apply Bayes rule: P (S 1, S 2,, S N R = 1)P (R = 1) P (S 1, S 2,, S N R = 0)P (R = 0) + P (S 1, S 2,, S N R = 1)P (R = 1) ( ) P (R = 0)P (S 1, S 2,, S N R = 0) P (R = 1)P (S 1, S 2,, S N R = 1) ( [ ] N W 1 ) 1 P (R = 1) 1 W ψ k (S m+k ) 1 + P (R = 0) (N W + 1) θ 0 (S m+k ) m=0 k=1

33 P (R = 1 S 1, S 2,, S N ) = = = Apply Bayes rule: P (S 1, S 2,, S N R = 1)P (R = 1) P (S 1, S 2,, S N R = 0)P (R = 0) + P (S 1, S 2,, S N R = 1)P (R = 1) ( ) P (R = 0)P (S 1, S 2,, S N R = 0) P (R = 1)P (S 1, S 2,, S N R = 1) ( [ ] N W 1 ) 1 P (R = 1) 1 W ψ k (S m+k ) 1 + P (R = 0) (N W + 1) θ 0 (S m+k ) m=0 k=1 Define: w k (l) = log ψ k(l) θ 0 (l), w 0 = log P (R=1) P (R=0), logit(z) = 1 1+exp( z)

34 P (R = 1 S 1, S 2,, S N ) ( [ w 0 = logit log N W + 1 N W m=0 ( W )] ) exp w k (S t+k ) k=1 4 W + 1 parameters: w k (l), w 0

35 Motif: TAT A G T G A A T T T A T A G A

36 Motif: TAT A G T G A A T T T A T A G A Score 1

37 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2

38 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t

39 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N

40 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N +

41 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N + Nonlinear transfer function

42 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N + Nonlinear transfer function P(R=1 sequence)

43 P (R = 1 S 1, S 2,, S N ) ( [ w 0 = logit log N W + 1 N W m=0 ( W )] ) exp w k (S t+k ) k=1 4 W + 1 parameters: w k (l), w 0

44 Wolfgang Lehrach Biomathematics & Statistics Scotland Ab initio prediction of protein interaction

45 SH3 yeast two-hybrid interaction network Tong et al (2002), Science 295, interactions between 28 SH3 proteins and 143 binding peptides 9 binding partners per SH3 domain on average

46

47 Final Test Set Performance 1 True positive rate (sensitivity) Reiss 062 None 064 Naive 069 Gaussian 071 Laplacian with pruning 073 Laplacian False positive rate (1 specificity)

48 The model of Segal, Yelensky and Koller Bioinformatics 19, 2003

49 P(gR 2 gs) TAT A G gs 1 gs 2 gs N gr 1 gr 2

50 Transcriptional Regulation Basics Evaluation MotifScanne ases onclusions

51 P(gR 2 gs) TAT A G gs 1 gs 2 gs N gr 1 gr 2

52 P(gR 2 gs) TAT A G gs gs2 1 gsn gr 1 gr 2 gm gr 1 gr 2 gm

53 P(gR 2 gs) TA T A G gs1 gs2 gs N gr 1 gr 2 gr 1 gr 2 gm P(gE3 gm) gm 1 2 gm ge ge 1 2 ge 3

54 P(gR 2 gs) TAT A G gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3

55 P (gr i = 1 gs 1, gs 2,, gs N ) ( [ w 0 = logit log N W + 1 N W m=0 ( W )] ) exp w k (gs t+k ) k=1 4 W + 1 parameters per binding motif R i : w i k (l), wi 0

56 gs1 gs2 gs N gr 1 gr 2 gr 1 gr 2 gm gm ge ge 1 2 ge 3

57 Softmax function P (gm = m gr 1 = r 1, gr 2 = r 2,, gr N = r N ) ( L ) exp i=1 u mir i = ( m exp L ) i=1 u mir i Parameter matrix: Number of motifs/regulators number of modules

58 gs1 gs2 gsn gr1 gr2 gm P(gE 3 gm) 0 gm ge 1 ge 2 ge3

59 Independent Gaussian distributions P (ge 1, ge 2,, ge L gm = m) = j P (ge j gm = m) P (ge j gm = m) = N(µ j,m, σ j,m ) For each module m and each condition j: Mean: µ j,m Standard deviation: σ j,m

60 Parameter estimation

61 P(gR 2 gs) TA T A G gs1 gs2 gs N gr 1 gr 2 gr 1 gr 2 gm P(gE3 gm) gm 1 2 gm ge ge 1 2 ge 3

62 P(gR 2 gs) TA T A G gs1 gs2 gsn gr 1 gr 2 gr 1 gr 2 gm P(gE3 gm) gm 1 2 gm ge ge 1 2 ge 3

63 Bayesian approach P(parameters data) = P(parameters, latent variables data)

64 Bayesian approach P(parameters data) = P(parameters, latent variables data) Intractable!

65 Bayesian approach P(parameters data) = P(parameters, latent variables data) Intractable! Gibbs sampling parameters P(parameters latent variables, data) latent variables P(latent variables parameters, data)

66 y P(x,y) x

67 y P(x,y) x

68 y P(x,y) P(x y) x

69 y P(y x) P(x,y) x

70 y P(x,y) x

71 Still too expensive Find one good set of parameters rather than a whole sample from the posterior distribution Hard-assignment EM algorithm Various heuristic simplifications See Bioinformatics 19, 2003 for details

72 gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3

73 E-step gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3

74 M-step gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3

75 gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3

76 gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3

77 Segal, Yelensky, Koller (2003) Bioinformatics 19 Saccharomyces cerevisiae

78 From Segal et al, Bioinformatics 2003

79 Experiment microarrays, measuring responses to various stress conditions (Gasch et al 2000) onventional algorithms: 20% of the predicted motifs are known Unified probabilistic model: 45% of the predicted motifs are known

80 Experiment 2 77 microarrays, expression during the cell cycle (Spellman et al 1998) onventional algorithms: 30% of the predicted motifs are known Unified probabilistic model: 56% of the predicted motifs are known

81 From Segal et al, Bioinformatics 2003

82

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Current Motif Discovery Tools and their Limitations

Current Motif Discovery Tools and their Limitations Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.

More information

Genetomic Promototypes

Genetomic Promototypes Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.

More information

T cell Epitope Prediction

T cell Epitope Prediction Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska MIC - Detecting Novel Associations in Large Data Sets by Nico Güttler, Andreas Ströhlein and Matt Huska Outline Motivation Method Results Criticism Conclusions Motivation - Goal Determine important undiscovered

More information

Comparing Methods for Identifying Transcription Factor Target Genes

Comparing Methods for Identifying Transcription Factor Target Genes Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF

More information

Unraveling protein networks with Power Graph Analysis

Unraveling protein networks with Power Graph Analysis Unraveling protein networks with Power Graph Analysis PLoS Computational Biology, 2008 Loic Royer Matthias Reimann Bill Andreopoulos Michael Schroeder Schroeder Group Bioinformatics 1 Complex Networks

More information

Exercise with Gene Ontology - Cytoscape - BiNGO

Exercise with Gene Ontology - Cytoscape - BiNGO Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray

More information

Learning from Diversity

Learning from Diversity Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

Bayesian Hidden Markov Models for Alcoholism Treatment Tria

Bayesian Hidden Markov Models for Alcoholism Treatment Tria Bayesian Hidden Markov Models for Alcoholism Treatment Trial Data May 12, 2008 Co-Authors Dylan Small, Statistics Department, UPenn Kevin Lynch, Treatment Research Center, Upenn Steve Maisto, Psychology

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

Basic Concepts of DNA, Proteins, Genes and Genomes

Basic Concepts of DNA, Proteins, Genes and Genomes Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate

More information

Interaktionen von RNAs und Proteinen

Interaktionen von RNAs und Proteinen Sonja Prohaska Computational EvoDevo Universitaet Leipzig June 9, 2015 Studying RNA-protein interactions Given: target protein known to bind to RNA problem: find binding partners and binding sites experimental

More information

The Information Bottleneck EM Algorithm

The Information Bottleneck EM Algorithm 200 ELIDAN & FRIEDMAN UAI2003 The Information Bottleneck EM Algorithm Gal Elidan and Nir Friedman School of Computer Science & Engineering, Hebrew University {galel,nir} @cs.huji.ac.il Abstract Learning

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

A Primer of Genome Science THIRD

A Primer of Genome Science THIRD A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

Cell Phone based Activity Detection using Markov Logic Network

Cell Phone based Activity Detection using Markov Logic Network Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel [email protected] 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart

More information

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil. Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

More information

A Statistical Framework for Operational Infrasound Monitoring

A Statistical Framework for Operational Infrasound Monitoring A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Stock Option Pricing Using Bayes Filters

Stock Option Pricing Using Bayes Filters Stock Option Pricing Using Bayes Filters Lin Liao [email protected] Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

Bayesian Active Distance Metric Learning

Bayesian Active Distance Metric Learning 44 YANG ET AL. Bayesian Active Distance Metric Learning Liu Yang and Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 4884 Rahul Sukthankar Robotics Institute

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

More information

Visualization of High Dimensional Scientific Data

Visualization of High Dimensional Scientific Data Visualization of High Dimensional Scientific Data Roberto Tagliaferri and Antonino Staiano Department of Mathematics and Computer Science, University of Salerno, Italy {robtag,astaiano}@unisa.it Copyright

More information

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS

NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)

More information

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004 Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Machine Learning and Statistics: What s the Connection?

Machine Learning and Statistics: What s the Connection? Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Graphical Modeling for Genomic Data

Graphical Modeling for Genomic Data Graphical Modeling for Genomic Data Carel F.W. Peeters [email protected] Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics

More information

Detecting Corporate Fraud: An Application of Machine Learning

Detecting Corporate Fraud: An Application of Machine Learning Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several

More information

Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

More information

Big Data, Machine Learning, Causal Models

Big Data, Machine Learning, Causal Models Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion

More information

Using Graph Theory to Analyze Gene Network Coherence

Using Graph Theory to Analyze Gene Network Coherence Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gómez-Vela [email protected] Norberto Díaz-Díaz [email protected] José A. Lagares José A. Sánchez Jesús S. Aguilar 1 Outlines Introduction Proposed

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Denominazione insegnamento in italiano Denominazione insegnamento in inglese Tipologia dell esame (scritto- scritto/orale orale)

Denominazione insegnamento in italiano Denominazione insegnamento in inglese Tipologia dell esame (scritto- scritto/orale orale) Biochimica Cellulare II Cellular Biochemistry II e COMPREHENSION OF MECHANISMS REGULATING CELL CYCLE AND CELL PROLIFERATION IN EUKARYOTES. EVOLUTION OF MOLECULAR CIRCUITS THAT REGULATE GROWTH AND CELL

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Learning from Data: Naive Bayes

Learning from Data: Naive Bayes Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

More information

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics Ilan Beer Haifa Research Lab Dec 10, 2002 Pep-Miner s Location in the Life Sciences World The post-genome era - the age of proteome

More information

Mixtures of Robust Probabilistic Principal Component Analyzers

Mixtures of Robust Probabilistic Principal Component Analyzers Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Using Bayesian Networks to Analyze Expression Data ABSTRACT

Using Bayesian Networks to Analyze Expression Data ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 7, Numbers 3/4, 2 Mary Ann Liebert, Inc. Pp. 6 62 Using Bayesian Networks to Analyze Expression Data NIR FRIEDMAN, MICHAL LINIAL, 2 IFTACH NACHMAN, 3 and DANA PE

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Inference on Phase-type Models via MCMC

Inference on Phase-type Models via MCMC Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Using MATLAB: Bioinformatics Toolbox for Life Sciences

Using MATLAB: Bioinformatics Toolbox for Life Sciences Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1 Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Biclustering Algorithms for Biological Data Analysis: A Survey

Biclustering Algorithms for Biological Data Analysis: A Survey INESC-ID TEC. REP. 1/2004, JAN 2004 1 Biclustering Algorithms for Biological Data Analysis: A Survey Sara C. Madeira and Arlindo L. Oliveira Abstract A large number of clustering approaches have been proposed

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. [email protected] V1.5 NOV 15

LabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. hi@labgeni.us V1.5 NOV 15 LabGenius The world s most advanced synthetic DNA libraries Technical design notes [email protected] V1.5 NOV 15 Introduction OUR APPROACH LabGenius is a gene synthesis company focussed on the design and manufacture

More information

Decision Support System For A Customer Relationship Management Case Study

Decision Support System For A Customer Relationship Management Case Study 61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Data Visualization with Simultaneous Feature Selection

Data Visualization with Simultaneous Feature Selection 1 Data Visualization with Simultaneous Feature Selection Dharmesh M. Maniyar and Ian T. Nabney Neural Computing Research Group Aston University, Birmingham. B4 7ET, United Kingdom Email: {maniyard,nabneyit}@aston.ac.uk

More information

Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4

Bayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4 Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs

More information

Hidden Markov Models

Hidden Markov Models 8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath [email protected] National Institute of Industrial Engineering (NITIE) Vihar

More information

RNA & Protein Synthesis

RNA & Protein Synthesis RNA & Protein Synthesis Genes send messages to cellular machinery RNA Plays a major role in process Process has three phases (Genetic) Transcription (Genetic) Translation Protein Synthesis RNA Synthesis

More information