Probabilistic methods for post-genomic data integration
|
|
- Kimberly Blair
- 8 years ago
- Views:
Transcription
1 Probabilistic methods for post-genomic data integration Dirk Husmeier Biomathematics & Statistics Scotland (BioSS) JMB, The King s Buildings, Edinburgh EH9 3JZ United Kingdom dirk
2 Integrated analysis of regulatory networks
3 Integrated analysis of regulatory networks Expression data alone are not sufficient ombining multiple sources of information yields complementary constraints
4 ombining promoter sequences and gene expression data
5 ombining promoter sequences and gene expression data onventional approach: Find clusters of co-expressed genes Identify regulatory elements by searching for common over-represented motifs in the promoter regions of these genes
6 Shortcomings of the conventional algorithm
7 Microarray data Model Promoter sequences
8 Microarray data Model Promoter sequences
9 Microarray data Model Promoter sequences
10 Segal s unifying probabilistic model
11 Microarray data Model Promoter sequences
12 Microarray data Model Promoter sequences
13 Microarray data Model Promoter sequences
14 Segal, Yelensky, Koller (2003) Bioinformatics 19
15 Segal, Yelensky, Koller (2003) Bioinformatics 19 Revision: Motif finding
16 Motif: TAT A G T G A A T T T A T A G A
17 Motif: TAT A G T G A A T T T A T A G A
18 Motif: TAT A G T G A A T T T A T A G A
19 Motif: TAT A G T G A A T T T A T A G A
20 Motif: TAT A G T G A A T T T A T A G A
21 Motif: TAT A G T G A A T T T A T A G A
22 Motif: TAT A G T G A A T T T A T A G A
23 Position Specific Scoring Matrix (PSSM) Search for a motif of length W in binding sequences
24 Position Specific Scoring Matrix (PSSM) Search for a motif of length W in binding sequences W 4 matrix ψ k (l): Probability that the nucleotide in the kth position, k [1,, W ], is an l {A,, G, T }
25 Position Specific Scoring Matrix (PSSM) Search for a motif of length W in binding sequences W 4 matrix ψ k (l): Probability that the nucleotide in the kth position, k [1,, W ], is an l {A,, G, T } Background model for non-binding sequences 4-dim vector θ 0 (l): Probability of nucleotide l; this distribution is position-independent
26 Sequence S 1, S 2,, S N
27 Sequence S 1, S 2,, S N Non-binding sequence: R=0 P (S 1, S 2,, S N R = 0) = N θ 0 (S t ) t=1
28 Sequence S 1, S 2,, S N Non-binding sequence: R=0 P (S 1, S 2,, S N R = 0) = N θ 0 (S t ) Binding sequence: R=1, motif starting at position m+1 t=1 k=1 t=1 P (S 1, S 2,, S N R = 1, start = m + 1) m W N = θ 0 (S t ) ψ k (S m+k ) θ 0 (S t ) = N θ 0 (S t ) t=1 W k=1 ψ k (S m+k ) θ 0 (S m+k ) t=m+w +1
29 Binding sequence: R=1, motif starting at position m+1 P (S 1, S 2,, S N R = 1, start = m + 1) = N W θ 0 (S t ) t=1 k=1 ψ k (S m+k ) θ 0 (S m+k )
30 Binding sequence: R=1, motif starting at position m+1 P (S 1, S 2,, S N R = 1, start = m + 1) = N θ 0 (S t ) t=1 W k=1 Binding sequence: R=1, motif starting anywhere P (S 1, S 2,, S N R = 1) = = N W m=0 ψ k (S m+k ) θ 0 (S m+k ) P (start = m + 1)P (S 1, S 2,, S N R = 1, start = m + 1) N 1 θ 0 (S t ) N W + 1 t=1 N W m=0 W k=1 ψ k (S m+k ) θ 0 (S m+k )
31 Binding sequence: R=1, motif starting at position m+1 P (S 1, S 2,, S N R = 1, start = m + 1) = N θ 0 (S t ) t=1 W k=1 Binding sequence: R=1, motif starting anywhere P (S 1, S 2,, S N R = 1) = = N W m=0 ψ k (S m+k ) θ 0 (S m+k ) P (start = m + 1)P (S 1, S 2,, S N R = 1, start = m + 1) N 1 θ 0 (S t ) N W + 1 t=1 N W m=0 W k=1 ψ k (S m+k ) θ 0 (S m+k ) Objective: Prediction of binding activity from sequence: P (R = 1 S 1, S 2,, S N )
32 P (R = 1 S 1, S 2,, S N ) = = = Apply Bayes rule: P (S 1, S 2,, S N R = 1)P (R = 1) P (S 1, S 2,, S N R = 0)P (R = 0) + P (S 1, S 2,, S N R = 1)P (R = 1) ( ) P (R = 0)P (S 1, S 2,, S N R = 0) P (R = 1)P (S 1, S 2,, S N R = 1) ( [ ] N W 1 ) 1 P (R = 1) 1 W ψ k (S m+k ) 1 + P (R = 0) (N W + 1) θ 0 (S m+k ) m=0 k=1
33 P (R = 1 S 1, S 2,, S N ) = = = Apply Bayes rule: P (S 1, S 2,, S N R = 1)P (R = 1) P (S 1, S 2,, S N R = 0)P (R = 0) + P (S 1, S 2,, S N R = 1)P (R = 1) ( ) P (R = 0)P (S 1, S 2,, S N R = 0) P (R = 1)P (S 1, S 2,, S N R = 1) ( [ ] N W 1 ) 1 P (R = 1) 1 W ψ k (S m+k ) 1 + P (R = 0) (N W + 1) θ 0 (S m+k ) m=0 k=1 Define: w k (l) = log ψ k(l) θ 0 (l), w 0 = log P (R=1) P (R=0), logit(z) = 1 1+exp( z)
34 P (R = 1 S 1, S 2,, S N ) ( [ w 0 = logit log N W + 1 N W m=0 ( W )] ) exp w k (S t+k ) k=1 4 W + 1 parameters: w k (l), w 0
35 Motif: TAT A G T G A A T T T A T A G A
36 Motif: TAT A G T G A A T T T A T A G A Score 1
37 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2
38 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t
39 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N
40 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N +
41 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N + Nonlinear transfer function
42 Motif: TAT A G T G A A T T T A T A G A Score 1 Score 2 Score t Score N + Nonlinear transfer function P(R=1 sequence)
43 P (R = 1 S 1, S 2,, S N ) ( [ w 0 = logit log N W + 1 N W m=0 ( W )] ) exp w k (S t+k ) k=1 4 W + 1 parameters: w k (l), w 0
44 Wolfgang Lehrach Biomathematics & Statistics Scotland Ab initio prediction of protein interaction
45 SH3 yeast two-hybrid interaction network Tong et al (2002), Science 295, interactions between 28 SH3 proteins and 143 binding peptides 9 binding partners per SH3 domain on average
46
47 Final Test Set Performance 1 True positive rate (sensitivity) Reiss 062 None 064 Naive 069 Gaussian 071 Laplacian with pruning 073 Laplacian False positive rate (1 specificity)
48 The model of Segal, Yelensky and Koller Bioinformatics 19, 2003
49 P(gR 2 gs) TAT A G gs 1 gs 2 gs N gr 1 gr 2
50 Transcriptional Regulation Basics Evaluation MotifScanne ases onclusions
51 P(gR 2 gs) TAT A G gs 1 gs 2 gs N gr 1 gr 2
52 P(gR 2 gs) TAT A G gs gs2 1 gsn gr 1 gr 2 gm gr 1 gr 2 gm
53 P(gR 2 gs) TA T A G gs1 gs2 gs N gr 1 gr 2 gr 1 gr 2 gm P(gE3 gm) gm 1 2 gm ge ge 1 2 ge 3
54 P(gR 2 gs) TAT A G gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3
55 P (gr i = 1 gs 1, gs 2,, gs N ) ( [ w 0 = logit log N W + 1 N W m=0 ( W )] ) exp w k (gs t+k ) k=1 4 W + 1 parameters per binding motif R i : w i k (l), wi 0
56 gs1 gs2 gs N gr 1 gr 2 gr 1 gr 2 gm gm ge ge 1 2 ge 3
57 Softmax function P (gm = m gr 1 = r 1, gr 2 = r 2,, gr N = r N ) ( L ) exp i=1 u mir i = ( m exp L ) i=1 u mir i Parameter matrix: Number of motifs/regulators number of modules
58 gs1 gs2 gsn gr1 gr2 gm P(gE 3 gm) 0 gm ge 1 ge 2 ge3
59 Independent Gaussian distributions P (ge 1, ge 2,, ge L gm = m) = j P (ge j gm = m) P (ge j gm = m) = N(µ j,m, σ j,m ) For each module m and each condition j: Mean: µ j,m Standard deviation: σ j,m
60 Parameter estimation
61 P(gR 2 gs) TA T A G gs1 gs2 gs N gr 1 gr 2 gr 1 gr 2 gm P(gE3 gm) gm 1 2 gm ge ge 1 2 ge 3
62 P(gR 2 gs) TA T A G gs1 gs2 gsn gr 1 gr 2 gr 1 gr 2 gm P(gE3 gm) gm 1 2 gm ge ge 1 2 ge 3
63 Bayesian approach P(parameters data) = P(parameters, latent variables data)
64 Bayesian approach P(parameters data) = P(parameters, latent variables data) Intractable!
65 Bayesian approach P(parameters data) = P(parameters, latent variables data) Intractable! Gibbs sampling parameters P(parameters latent variables, data) latent variables P(latent variables parameters, data)
66 y P(x,y) x
67 y P(x,y) x
68 y P(x,y) P(x y) x
69 y P(y x) P(x,y) x
70 y P(x,y) x
71 Still too expensive Find one good set of parameters rather than a whole sample from the posterior distribution Hard-assignment EM algorithm Various heuristic simplifications See Bioinformatics 19, 2003 for details
72 gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3
73 E-step gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3
74 M-step gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3
75 gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3
76 gs 1 gs 2 gs N gr1 gr 2 gm ge 1 ge 2 ge 3
77 Segal, Yelensky, Koller (2003) Bioinformatics 19 Saccharomyces cerevisiae
78 From Segal et al, Bioinformatics 2003
79 Experiment microarrays, measuring responses to various stress conditions (Gasch et al 2000) onventional algorithms: 20% of the predicted motifs are known Unified probabilistic model: 45% of the predicted motifs are known
80 Experiment 2 77 microarrays, expression during the cell cycle (Spellman et al 1998) onventional algorithms: 30% of the predicted motifs are known Unified probabilistic model: 56% of the predicted motifs are known
81 From Segal et al, Bioinformatics 2003
82
Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov
Data Integration Lectures 16 & 17 Lectures Outline Goals for Data Integration Homogeneous data integration time series data (Filkov et al. 2002) Heterogeneous data integration microarray + sequence microarray
More informationIntegrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationInferring Probabilistic Models of cis-regulatory Modules. BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.
Inferring Probabilistic Models of cis-regulatory Modules MI/S 776 www.biostat.wisc.edu/bmi776/ Spring 2015 olin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationCurrent Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
More informationGenetomic Promototypes
Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,
More informationTITLE MOTIVATION OBJECTIVES AUDIENCE COURSE INSTRUCTORS. Analysis of regulatory sequences controlling the expression of gene networks
TITLE Analysis of regulatory sequences controlling the expression of gene networks MOTIVATION Functional genomics techniques are defining sets of genes likely to act in concert. From expression profiles,
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationT cell Epitope Prediction
Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011 Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationNetwork Analysis. BCH 5101: Analysis of -Omics Data 1/34
Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search
More informationSystems Biology through Data Analysis and Simulation
Biomolecular Networks Initiative Systems Biology through Data Analysis and Simulation William Cannon Computational Biosciences 5/30/03 Cellular Dynamics Microbial Cell Dynamics Data Mining Nitrate NARX
More informationMIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska
MIC - Detecting Novel Associations in Large Data Sets by Nico Güttler, Andreas Ströhlein and Matt Huska Outline Motivation Method Results Criticism Conclusions Motivation - Goal Determine important undiscovered
More informationComparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
More informationUnraveling protein networks with Power Graph Analysis
Unraveling protein networks with Power Graph Analysis PLoS Computational Biology, 2008 Loic Royer Matthias Reimann Bill Andreopoulos Michael Schroeder Schroeder Group Bioinformatics 1 Complex Networks
More informationExercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
More informationCustomer Data Mining and Visualization by Generative Topographic Mapping Methods
Customer Data Mining and Visualization by Generative Topographic Mapping Methods Jinsan Yang and Byoung-Tak Zhang Artificial Intelligence Lab (SCAI) School of Computer Science and Engineering Seoul National
More informationLearning from Diversity
Learning from Diversity Epitope Prediction with Sequence and Structure Features using an Ensemble of Support Vector Machines Rob Patro and Carl Kingsford Center for Bioinformatics and Computational Biology
More informationProbabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References
More informationInferring the role of transcription factors in regulatory networks
Inferring the role of transcription factors in regulatory networks Philippe Veber 1, Carito Guziolowski 1, Michel Le Borgne 2, Ovidiu Radulescu 1,3 and Anne Siegel 4 1 Centre INRIA Rennes Bretagne Atlantique,
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationBayesian Hidden Markov Models for Alcoholism Treatment Tria
Bayesian Hidden Markov Models for Alcoholism Treatment Trial Data May 12, 2008 Co-Authors Dylan Small, Statistics Department, UPenn Kevin Lynch, Treatment Research Center, Upenn Steve Maisto, Psychology
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationTracking Groups of Pedestrians in Video Sequences
Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal
More informationBasic Concepts of DNA, Proteins, Genes and Genomes
Basic Concepts of DNA, Proteins, Genes and Genomes Kun-Mao Chao 1,2,3 1 Graduate Institute of Biomedical Electronics and Bioinformatics 2 Department of Computer Science and Information Engineering 3 Graduate
More informationInteraktionen von RNAs und Proteinen
Sonja Prohaska Computational EvoDevo Universitaet Leipzig June 9, 2015 Studying RNA-protein interactions Given: target protein known to bind to RNA problem: find binding partners and binding sites experimental
More informationThe Information Bottleneck EM Algorithm
200 ELIDAN & FRIEDMAN UAI2003 The Information Bottleneck EM Algorithm Gal Elidan and Nir Friedman School of Computer Science & Engineering, Hebrew University {galel,nir} @cs.huji.ac.il Abstract Learning
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationInferring the role of transcription factors in regulatory networks
Inferring the role of transcription factors in regulatory networks Philippe Veber, Carito Guziolowski, Michel Le Borgne, Ovidiu Radulescu, Anne Siegel To cite this version: Philippe Veber, Carito Guziolowski,
More informationCell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel sxs104721@utdallas.edu 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
More informationLinear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.
Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:
More informationA Statistical Framework for Operational Infrasound Monitoring
A Statistical Framework for Operational Infrasound Monitoring Stephen J. Arrowsmith Rod W. Whitaker LA-UR 11-03040 The views expressed here do not necessarily reflect the views of the United States Government,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationStock Option Pricing Using Bayes Filters
Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationBayesian Active Distance Metric Learning
44 YANG ET AL. Bayesian Active Distance Metric Learning Liu Yang and Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 4884 Rahul Sukthankar Robotics Institute
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually
More informationVisualization of High Dimensional Scientific Data
Visualization of High Dimensional Scientific Data Roberto Tagliaferri and Antonino Staiano Department of Mathematics and Computer Science, University of Salerno, Italy {robtag,astaiano}@unisa.it Copyright
More informationNOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS
NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)
More informationBayesian Probability Maps For Evaluation Of Cardiac Ultrasound Data
Bayesian Probability Maps For Evaluation Of Cardiac Ultrasound Data Mattias Hansson 1, Sami Brandt 1,2, and Petri Gudmundsson 3 1 Center for Technological Studies, Malmö University, Sweden, mattias.hansson@mah.se.
More informationProtein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004
Protein & DNA Sequence Analysis Bobbie-Jo Webb-Robertson May 3, 2004 Sequence Analysis Anything connected to identifying higher biological meaning out of raw sequence data. 2 Genomic & Proteomic Data Sequence
More informationItem selection by latent class-based methods: an application to nursing homes evaluation
Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationMachine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationGraphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters cf.peeters@vumc.nl Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
More informationA Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences Eric P. Xing, Michael I. Jordan, Richard M. Karp and Stuart Russell Computer Science Division University of California, Berkeley
More informationDetecting Corporate Fraud: An Application of Machine Learning
Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several
More informationBayesian Image Super-Resolution
Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian
More informationTitle: Surveying Genome to Identify Origins of DNA Replication In Silico
Title: Surveying Genome to Identify Origins of DNA Replication In Silico Abstract: DNA replication origins are the bases to realize the process of chromosome replication and analyze the progress of cell
More informationBig Data, Machine Learning, Causal Models
Big Data, Machine Learning, Causal Models Sargur N. Srihari University at Buffalo, The State University of New York USA Int. Conf. on Signal and Image Processing, Bangalore January 2014 1 Plan of Discussion
More informationSupplementary Information
Supplementary Information S1: Degree Distribution of TFs in the E.coli TRN and CRN based on Operons 1000 TRN Number of TFs 100 10 y = 619.55x -1.4163 R 2 = 0.8346 1 1 10 100 1000 Degree of TFs CRN 100
More informationUsing Graph Theory to Analyze Gene Network Coherence
Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gómez-Vela fgomez@upo.es Norberto Díaz-Díaz ndiaz@upo.es José A. Lagares José A. Sánchez Jesús S. Aguilar 1 Outlines Introduction Proposed
More informationFortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid
Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid Institut für Statistik LMU München Sommersemester 2013 Outline
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationBIOINFORMATICS. Inferring Quantitative Models of Regulatory Networks From Expression Data. I. Nachman, A. Regev, N. Friedman 1 INTRODUCTION
BIOINFORMATICS Vol. 00 no. 00 2004 Pages 1 8 Inferring Quantitative Models of Regulatory Networks From Expression Data I. Nachman, A. Regev, N. Friedman School of Computer Science & Engineering, Hebrew
More informationDenominazione insegnamento in italiano Denominazione insegnamento in inglese Tipologia dell esame (scritto- scritto/orale orale)
Biochimica Cellulare II Cellular Biochemistry II e COMPREHENSION OF MECHANISMS REGULATING CELL CYCLE AND CELL PROLIFERATION IN EUKARYOTES. EVOLUTION OF MOLECULAR CIRCUITS THAT REGULATE GROWTH AND CELL
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationRAP: Accurate and fast motif finding based on protein binding microarray data
RAP: Accurate and fast motif finding based on protein binding microarray data Yaron Orenstein 1, Eran Mick 1,2 and Ron Shamir 1 * 1 Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv,
More informationIntroduction to SAGEnhaft
Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene
More informationLearning from Data: Naive Bayes
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.
More informationSystematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
More informationTekniker för storskalig parsning
Tekniker för storskalig parsning Diskriminativa modeller Joakim Nivre Uppsala Universitet Institutionen för lingvistik och filologi joakim.nivre@lingfil.uu.se Tekniker för storskalig parsning 1(19) Generative
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationPep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics
Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics Ilan Beer Haifa Research Lab Dec 10, 2002 Pep-Miner s Location in the Life Sciences World The post-genome era - the age of proteome
More informationMixtures of Robust Probabilistic Principal Component Analyzers
Mixtures of Robust Probabilistic Principal Component Analyzers Cédric Archambeau, Nicolas Delannay 2 and Michel Verleysen 2 - University College London, Dept. of Computer Science Gower Street, London WCE
More information1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
More informationUsing Bayesian Networks to Analyze Expression Data ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 7, Numbers 3/4, 2 Mary Ann Liebert, Inc. Pp. 6 62 Using Bayesian Networks to Analyze Expression Data NIR FRIEDMAN, MICHAL LINIAL, 2 IFTACH NACHMAN, 3 and DANA PE
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationIdentifying Gene Regulatory Networks from Gene Expression Data
27 Identifying Gene Regulatory Networks from Gene Expression Data Vladimir Filkov University of California, Davis 27.1 Introduction... 27-1 27.2 Gene Networks... 27-2 Definition Biological Properties Utility
More informationInference on Phase-type Models via MCMC
Inference on Phase-type Models via MCMC with application to networks of repairable redundant systems Louis JM Aslett and Simon P Wilson Trinity College Dublin 28 th June 202 Toy Example : Redundant Repairable
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationUsing MATLAB: Bioinformatics Toolbox for Life Sciences
Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber 2011 1
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2011 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationBiclustering Algorithms for Biological Data Analysis: A Survey
INESC-ID TEC. REP. 1/2004, JAN 2004 1 Biclustering Algorithms for Biological Data Analysis: A Survey Sara C. Madeira and Arlindo L. Oliveira Abstract A large number of clustering approaches have been proposed
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationLabGenius. Technical design notes. The world s most advanced synthetic DNA libraries. hi@labgeni.us V1.5 NOV 15
LabGenius The world s most advanced synthetic DNA libraries Technical design notes hi@labgeni.us V1.5 NOV 15 Introduction OUR APPROACH LabGenius is a gene synthesis company focussed on the design and manufacture
More informationDecision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationData Visualization with Simultaneous Feature Selection
1 Data Visualization with Simultaneous Feature Selection Dharmesh M. Maniyar and Ian T. Nabney Neural Computing Research Group Aston University, Birmingham. B4 7ET, United Kingdom Email: {maniyard,nabneyit}@aston.ac.uk
More informationBayesian Networks. Read R&N Ch. 14.1-14.2. Next lecture: Read R&N 18.1-18.4
Bayesian Networks Read R&N Ch. 14.1-14.2 Next lecture: Read R&N 18.1-18.4 You will be expected to know Basic concepts and vocabulary of Bayesian networks. Nodes represent random variables. Directed arcs
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationUsing Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationRNA & Protein Synthesis
RNA & Protein Synthesis Genes send messages to cellular machinery RNA Plays a major role in process Process has three phases (Genetic) Transcription (Genetic) Translation Protein Synthesis RNA Synthesis
More informationHierarchical Bayesian Modeling of the HIV Response to Therapy
Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and
More information