Applied Machine Learning - Intro. Annalisa Marsico OWL RNA Bioinformatics, MPI Molgen Berlin Freie Universität Berlin

Similar documents
The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

G E N OM I C S S E RV I C ES

School of Nursing. Presented by Yvette Conley, PhD

Discovery & Modeling of Genomic Regulatory Networks with Big Data

Overview of Next Generation Sequencing platform technologies

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

The world of non-coding RNA. Espen Enerly

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Control of Gene Expression

PHCO 701 Short title Molecular Pharmacology Long title Introduction to Molecular Pharmacology

Current Motif Discovery Tools and their Limitations

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

A Brief Introduction on DNase-Seq Data Aanalysis

RNA Structure and folding

Introduction to bioinformatics

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Core Facility Genomics

Data Mining and Machine Learning in Bioinformatics

Proposal for New Program: BS in Data Science: Computational Analytics

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

Control of Gene Expression

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

Bioinformatics Unit Department of Biological Services. Get to know us

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

TITLE MOTIVATION OBJECTIVES AUDIENCE COURSE INSTRUCTORS. Analysis of regulatory sequences controlling the expression of gene networks

Machine Learning Introduction

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Next Generation Sequencing

Profiling of non-coding RNA classes Gunter Meister

Genetomic Promototypes

Big Data Analytics. Tools and Techniques

Bioinformatics: course introduction

CCTS Informatics Expertise. Auburn University (AU) HudsonAlpha Institute for Biotechnology (HAIB)

Next Generation Sequencing: Technology, Mapping, and Analysis

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

WG5: Informatics. Martin Dugas, Jaakko Hollmen. European Genomics and Epigenomics Study on MDS and AML

Bioinformatics Grid - Enabled Tools For Biologists.

How many of you have checked out the web site on protein-dna interactions?

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

Learning outcomes. Knowledge and understanding. Competence and skills

Comparing Methods for Identifying Transcription Factor Target Genes

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

European Educational Programme in Epidemiology

TRACKS GENETIC EPIDEMIOLOGY

Berlin Brandenburg School for Regenerative Therapies (BSRT)

Biomedical Big Data and Precision Medicine

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

8 th Hungarian Cell Analysis Conference Light in research and diagnostics

Translation Study Guide

Novel Mining of Cancer via Mutation in Tumor Protein P53 using Quick Propagation Network

Faculty of Applied Sciences. Bachelor s degree programme. Nanobiology. Integrating Physics with Biomedicine

BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

A Primer of Genome Science THIRD

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

EXPLORE BIO SIMULATION. COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

MPIMG. Research Report Max Planck Institute for Molecular Genetics, Berlin

Open Access, Open Data, Open Science Martin Vetterli, President of the National Research Council

Russ College of Engineering and Technology. Revised 9/ Undergraduate GPA of 3.0/4.0 or equivalent.

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Genetic programming with regular expressions

Cover Page. The handle holds various files of this Leiden University dissertation.

OpenMedicine Foundation (OMF)

An Introduction to Data Mining

Human Genome Organization: An Update. Genome Organization: An Update

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Course Requirements for the Ph.D., M.S. and Certificate Programs

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Biological Sciences Initiative. Human Genome

Medical Informatics II

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

A role of microrna in the regulation of telomerase? Yuan Ming Yeh, Pei Rong Huang, and Tzu Chien V. Wang

Agreement on. Dual Degree Master Program in Computer Science KAIST. Technische Universität Berlin

Next Generation Sequencing; Technologies, applications and data analysis

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

university of copenhagen Bioinformatics Master Program

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

COMPUTER SCIENCE. Learning Outcomes (Graduate) Graduate Programs in Computer Science. Mission of the Undergraduate Program in Computer Science

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

RNAseq / ChipSeq / Methylseq and personalized genomics

Unraveling protein networks with Power Graph Analysis

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Abdullah Mohammed Abdullah Khamis

MA2823: Foundations of Machine Learning

Next Generation Sequencing; Technologies, applications and data analysis

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Transcription:

Applied Machine Learning - Intro Annalisa Marsico OWL RNA Bioinformatics, MPI Molgen Berlin Freie Universität Berlin 20.04.16

Who am I? Laurea in Physics (5 years) Thesis: From conventional tests to adaptive tests: application of artificial neural networks to ability evaluation Master in Bioinformatics (1.5 years) PPNEMA: database for sorting and classificaiton of rrna genes in nematodes PhD in Bioinformatics (4 years) Prof. Michael Schroeder (TU Dresden) & Prof. Marino Zerial (MPI MGC) clustering algorithm for de novo motif discovery MeMotif: database of sequence / structure motifs pattern classification of unolding pathways from biophysical data Post-doc at the Max Planck for molecular Genetics Berlin (3 years) Prof. Martin Vingron PROmiRNA: semi-supervised learning of mirna promoters linear model for mirna processing from sequence signal RNA-seq and ChIP-seq data for infection processes (SFB TR48)

The RNA Bioinformatics group Since July 2014 Junior Professor for High-Throughput Genomics (FU) & group leader of the RNA Bioinformatics group at MPI Molgen http://www.molgen.mpg.de/2733742/rna-bioinformatics

Additional co-lecturers Stefan Budach, PhD student Bachelor & Master in Bioinformatics, Freie Universität Berlin Bachelor thesis in Martin Vingron s lab on statistical modeling of batch effects in RNA- Seq data Student assistant in my lab for a year (2014) predictive model of mirna-eqtls Master thesis in Knut Reinert s lab on parallelization in SeqAn Since November 2015, IMPRS PhD student in the RNA Bioinformatics Lab Wolfgang Kopp, PhD student Bachelor & Master in Biomedical Engineering, Technical University of Graz (Austria) student assistant for the Computational Intelligence course Since 2012 PhD student of the IMPRS in the lab of Martin Vingron (MPI Molgen) Projects: statistical models for pattern occurrences in DNA sequences, ChIP-seq timeseries analysis

Work in our Lab RNA Binding Proteins (RBPs) prediction RBPs target identification Target specificity (de novo motif discovery)

RBPs prediction TripSVM - A tool for species-specific RBP prediction Annkatrin Bressin, Benedikt Beckmann

RBPs and their targets Capturing target-specific, clustered protein-rna interaction footprints from iclip-seq RNA Sabrina Krakau, Hugues Richard (PMC Paris)

RBPs and their targets A tool for de novo sequence-structure motif discovery PUM2 (translational repressor) model SFRS1 (splicing factor) model David Heller, Ralf Krestel (Uni Potsdam), Uwe Ohler (MDC)

Work in our Lab Regulation and function of lncrnas microrna-eqtl prediction

Predicting X-chromosome silencing kinetics Random Forest prediction Lisa Barros, Edda Schulz

Deep Learning Predicting DNAse I Hypersensitivity Sites from Dna Sequences Using Deep Learning Techniques - Convolutional Boltzmann Machines Roman Schulte-Sasse, Wolfgang Kopp

The Plan Lecture Application Method 8 June Cancer classification from gene expression 15 June Cancer classification from clinical data / DNA methylation 22 June RBP binding site prediction from RNA sequences 29 June Promoter prediction from epigenetic features 06 July Cancer location prediction from micrornas 13 July Applications of DP in Bioinformatics 20 July Discuss applications of DP in Bioinformatics Partial Least Square Regression (PLSR) SVMs String kernel SVMs Semi-supervised learning: cotraining algorithm Neural Networks (multi-class classification) Concepts of deep Learning Discuss applications of DP in Bioinformatics

Deep Learning Papers 1. Deep Architectures for Protein Contact Map Prediction http://www.ncbi.nlm.nih.gov/pubmed/22847931 2. Toxicity Prediction Using Deep Learning http://arxiv.org/pdf/1503.01445.pdf 3. Deep Learning of tissue-regulated splicing code http://www.ncbi.nlm.nih.gov/pubmed/24931975 4. Predicting effects of non-coding variants with deep-learning-based sequence model http://www.ncbi.nlm.nih.gov/pubmed/26301843 5. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning http://www.ncbi.nlm.nih.gov/pubmed/26213851

Link to course material http://www.molgen.mpg.de/3434810/applied-machine-learning Contacts: marsico@molgen.mpg.de budach@molgen.mpg.de kopp@molgen.mpg.de