Core Bioinformatics. Degree Type Year Semester



Similar documents
Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Learning outcomes. Knowledge and understanding. Competence and skills

Statistics Graduate Courses

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Bio-Informatics Lectures. A Short Introduction

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Genome Explorer For Comparative Genome Analysis

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE

Statistics in Applications III. Distribution Theory and Inference

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Pairwise Sequence Alignment

Guide for Bioinformatics Project Module 3

A Primer of Genome Science THIRD

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Bioinformatics Grid - Enabled Tools For Biologists.

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

Module 1. Sequence Formats and Retrieval. Charles Steward

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Bioinformatics Resources at a Glance

BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS

Protein Protein Interaction Networks

Implementation Regulation for the MSc Programme Nanobiology

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Final Project Report

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

Protein Sequence Analysis - Overview -

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Current Motif Discovery Tools and their Limitations

UF EDGE brings the classroom to you with online, worldwide course delivery!

STATISTICS COURSES UNDERGRADUATE CERTIFICATE FACULTY. Explanation of Course Numbers. Bachelor's program. Master's programs.

DnaSP, DNA polymorphism analyses by the coalescent and other methods.

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

Introduction to Bioinformatics AS Laboratory Assignment 6

Network Protocol Analysis using Bioinformatics Algorithms

Data, Measurements, Features

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

Bioinformatics: course introduction

Bayesian Phylogeny and Measures of Branch Support

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

STA 4273H: Statistical Machine Learning

Phylogenetic Trees Made Easy

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Doctor of Philosophy in Computer Science

TOWARD BIG DATA ANALYSIS WORKSHOP

A Tutorial in Genetic Sequence Classification Tools and Techniques

Principles of Data Mining by Hand&Mannila&Smyth

Azure Machine Learning, SQL Data Mining and R

Dr Alexander Henzing

Introduction to bioinformatics

Applied mathematics and mathematical statistics

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

T cell Epitope Prediction

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Biological Databases and Protein Sequence Analysis

Lab 2/Phylogenetics/September 16, PHYLOGENETICS

Linear Sequence Analysis. 3-D Structure Analysis

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Introduction to Bioinformatics 3. DNA editing and contig assembly

Genetomic Promototypes

How To Understand The Science Of Genomics

Machine Learning with MATLAB David Willingham Application Engineer

Contents. Page 1 of 11

PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

How To Understand The Theory Of Probability

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Bachelor Curriculum in cooperation with

Modulhandbuch / Program Catalog. Master s degree Evolution, Ecology and Systematics. (Master of Science, M.Sc.)

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

FBIO - Fundations of Bioinformatics

MSCA Introduction to Statistical Concepts

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

MASTER'S DEGREE PROGRAMME IN BIOINFORMATICS

COURSE DESCRIPTOR Signal Processing II Signalbehandling II 7.5 ECTS credit points (7,5 högskolepoäng)

itesla Project Innovative Tools for Electrical System Security within Large Areas

Predictive Modeling Techniques in Insurance

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

Statistics Graduate Programs

Cloud-Based Big Data Analytics in Bioinformatics

AP Biology Essential Knowledge Student Diagnostic

Alabama Department of Postsecondary Education

6 ELIXIR Domain Specific Services

Bachelor Curriculum in cooperation with

Transcription:

Core Bioinformatics 2015/2016 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat Teachers Use of languages Principal working language: english (eng) Antoni Barbadilla Prados Leonardo Pardo Carrasco Pere Puig Casado Alfredo Ruíz Panadero Miquel Àngel Senar Rosell Jean Didier Pie Marechal Daniel Yero Corona Raquel Egea Sánchez Xavier Daura Ribera External teachers Cedric Notredame Emanuel Raineri Josep Abril Sebastián Ramos Prerequisites Level B2 of English or equivalent is recommended. Objectives and Contextualisation This module focuses on the development of diverse bioinformatic tools and resources commonly used in Omics research. Our intention is that it covers several aspects of bioinformatics in a series of brief topics, in the form of "tastings". Therefore, it is not an accummulative module, but a transversal one, which should provide with a wide range of ideas and approaches that bioinformatics offers, through the hands of experts. The main objective is to provide students with the necessary foundation to apply bioinformatics to different areas of scientific research. Over time, each student will be able to gain all the depth they propose on any of these topics, the one which finally represents their research framework. 1

Skills Analyse and interpret data deriving from omic technology using biocomputing methods. Design and apply scientific methodology in resolving problems. Possess and understand knowledge that provides a basis or opportunity for originality in the development and/or application of ideas, often in a research context. Propose biocomputing solutions for problems deriving from omic research. Propose innovative and creative solutions in the field of study Student should possess the learning skills that enable them to continue studying in a way that is largely student led or independent. Understand the molecular bases and most common standard experimental techniques in omic research (genomics, transcriptomics, proteomics, metabolomics, interactomics, etc.) Use and manage bibliographical information and computer resources in the area of study Use operating systems, programs and tools in common use in biocomputing and be able to manage high performance computing platforms, programming languages and biocomputing analysis. Learning outcomes 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Create and promote algorithms, calculation and statistical techniques and theories to resolve formal and practical problems deriving from the handling and analysis of biological data. Design and apply scientific methodology in resolving problems. Identify and apply algorithms in which the programs are based bioinformatic analysis. Identify and classify the principle types of biomolecular data obtained from omic technology. Possess and understand knowledge that provides a basis or opportunity for originality in the development and/or application of ideas, often in a research context. Propose innovative and creative solutions in the field of study Search for specific bioinformatics tools and bioinformatics resources in the network. Student should possess the learning skills that enable them to continue studying in a way that is largely student led or independent. Synthesise and interpret in a logical and reasoned manner the information from the molecular data bases and analyse it using biocomputing tools. Understand the theoretical, statistical and biological bases, in which the programs are based bioinformatic analysis: sequence alignment, similarity search and multiple alignment, structure prediction, genome annotation, phylogenetic and evolutionary analysis. Use and manage bibliographical information and computer resources in the area of study Use the main molecular databases, the main standard formats of molecular data and integrate data from different data sources Content Statistical Inference Professor Antonio Barbadilla - Statistics: bridge between data and models - Data Types - Population and sample - Experimental design - Data Quality - Exploration of Data - Sample distribution and law of large numbers - Statistical inference - Central Limit Theorem - Point estimation - Estimation of confidence interval - Hypothesis - Elements of a test: H0, H1, statistical test, p value, significance level, type I and II errors, power - Z test, t test, chi-square test, correlation test, regression, analysis of variance 2

- Interpretation of statistical significance - Parametric versus nonparametric tests - Selecting the appropriate statistical test (decision tree) - Multivariate Testing - Resampling Statistics and Stochastic Processes for Sequence Analysis Professor Pere Puig a. Probability basics Sets and events. Properties. Conditional probability. Independence. Alphabet and sequences. Probabilistic models. b. The multinomial model Simulating a multinomial sequence. Estimating probabilities. c. The seqinr package d. Markov chain models Concept and examples. Classification of states. R code. Simulating a Markov chain sequence. Estimating the probabilities of transition. The probability of a sequence. Using Markov chain for discrimination. e. Higher order Markov chain models Concept and examples. Estimating the probabilities of transition. Comparison of higher order Markov chains. f. Hidden Markov chain models Concept and examples. Parameter estimation. Hidden states estimation. g. An introduction to Generalized Linear Models GLM basics. The Logistic model. The Poisson model. Bayesian Inference Professor Emmanuele Raineri 1. Curve fitting. - Estimation of parameters of probability distributions: binomial, poisson and gaussian. - Example: fitting a noisy dataset. - Cross validation, overfitting and regularization. 2. Dimensional reduction. - Principal component analysis, multidimensional scaling. - Example: distinguishing cell types using methylation profiles. 3. Lasso regression. - Variable selection in linear models. - Penalized regression: Lasso and Elastic Net. - Example: lasso regression in R. Bioinformatics Formats and Databases Professor Daniel Yero a. Sequence formats Nomenclature. Text editors. FASTA format and its variants. Raw/Plain format. Genbank sequence format. EMBL sequence format. GCG, NBRF/PIR, MSA, PHYLIP, NEXUS. Format conversion. b. Databases Concept. Boolean searches. Wildcards and regular expressions. Identifiers and accession numbers. Classification. NAR databases compilation. GenBank and other NCBI databases. EMBL. DDBJ. Integrated Meta-Databases. Main nucleotide, protein, structure, taxonomy, etc. databases. 3

Workflows with Galaxy Professor Raquel Egea a. Introduction to workflow managers Concept, origin and design of workflow managers. Workflow patterns. Existing workflow managers. APIs and Web Servers. b. Galaxy: basics, interface and practical uses. Web Development Professor Raquel Egea a. Web development Why developing webs as a bioinformatician? Web developing basics: OS, Web servers, databases, programming languages. Apache. HTML, CSS, JavaScript, PHP. b. Self-learning: bibliography and useful online resources Online bioinformatics communities. Software Engineering Professor Miquel Àngel Senar a. Parallelization strategies and HPC b. Cloud computing with Amazon Web Services c. Version control system with Git and GitHub Sequence Alignment Professor Cedric Notredame a. Evolution and comparison Models Molecular clock. Protein structure and evolution. Substitution Matrices. b. Dynamic Programming based sequence comparisons Needlman and Wunsch algorithm. Smith and Waterman algorithm. Afine gap penalties computation. Linear space computation of pairwise algorithms. c. Blast and Database searches The Blast algorithm. E-values and estimates of statistical significance. Database search strategies. PSI-Blast and other evolutionary approaches. d. Multiple Sequence Alignments: algorithms and strategies Main applications of multiple sequence alignments. Most common algorithms. Multiple sequence alignment strategies. Gene and Control Region Finding Professor Josep Abril a. Gene prediction Annotation: concept, databases, problems. Gene finding: search by signal, search by content, approaches (ab-initio, homology search, comparative genomics, NGS), evaluation of software accuracy. b. Finding regulatory motifs DNA motifs: exact matching, regular expressions, position weight matrices, search trees, profiles, randomized algorithms, logos and pictograms, software for motif finding. Regulatory domains. Histones. CRMs architecture & networking. Regulatory networks. Meta-alignment. Conservation, phylogenetic footprinting and phylogenetic shadowing. NGS. Population Genomics Professor Alfredo Ruiz 4

a. Population genomics under neutrality in a finite population Introduction. Genetic drift. Effective population size. Probability of fixation of neutral mutants. b. Population genomics under selection Natural selection. Probability of fixation of selected mutants. Fitness distribution of new mutants. Rate of evolution. c. Adaptive evolution and population size Phylogeny and Molecular Evolution Professor Sebastián Ramos a. Models of sequence evolution DNA sequence. Jukes and Cantor model. More realistic models. Model selection. b. Phylogeny Concept. Species trees versus gene trees. Tree-reconstruction methods: distance methods, maximum parsimony, maximum likelihood, Bayesian inference. Support. Phylogenomics. Building trees with R. Introduction to Genomes Professor Alfredo Ruiz a. Introduction to genomes Sequenced genomes. Organization and size of eukaryotic genomes. Building a genome: NGS methods for genomics and transcriptomics. b. The human genome: where are we now? Current assembly of the human genome. The ENCODE project: functional elements in the human genome. Repetitive content of the human genome. Structural Bioinformatics Professors Xavier Daura, Leonardo Pardo and Jean-Didier Maréchal a. Structural biology and interactions Biomolecules. Xenobiotics. Proteins, aminoacids and peptide bonds. Four levels of protein structure. Protein folding and stability. DNA structure. Experimental methods for structure determination. b. Structure databases PDB. PyMOL. c. Molecular modeling Models. Potential energy. Quantum and molecular mechanics techniques. Energy minimization. Molecular Dynamics. Homology modeling. d. Docking Concept and applications. Exploring the conformational space. Defining molecular interactions. Scoring functions. Workflows and software. Online resources. Systems Biology Professor TBD a. Classical and Genomic age Systems Biology The systems biology paradigm in light of technological developments over the last 100 years. Data integration bottlenecks. b. Mathematical modeling of molecular circuits. Conceptual models. From conceptual models to mathematical models. Mathematical formalisms. Data driven models. 5

c. Design and organization principles in molecular circuits. Concept of design principle. Mathematically controlled comparisons. Feasibility analysis. Design Spaces. Synthetic Biology. Methodology The methodology will combine master classes, solving practical problems and real cases, working in the computing lab, performing individual and team work, reading articles related to the thematic blocks, and independent self-study. The virtual platform will be used. Activities Title Hours ECTS Learning outcomes Type: Directed Solving problems in class and work in the biocomputing lab 39 1.56 1, 2, 4, 5, 6, 8, 9, 10, 11, 12 Theoretical classes 39 1.56 1, 2, 3, 4, 5, 6, 8, 9, 10, 11 Type: Supervised Performing individual and team works 40 1.6 2, 4, 5, 6, 7, 8, 9, 11 Type: Autonomous Regular study 178 7.12 2, 4, 5, 6, 7, 8, 9, 11 Evaluation Work done and presented by the student (student's portfolio) (60%). Individual theoretical and practical test (40%): A final exam will take place at the end of this module. It will consist of one or two multiple-choice or short questions by each professor teaching in this module. Evaluation activities Title Weighting Hours ECTS Learning outcomes Individual theoretical and practical test 40% 4 0.16 2, 4, 5, 6, 8, 9, 10, 11, 12 Student's portfolio 60% 0 0 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12 Bibliography Updated bibliography will be recommended in each session of this module by the professor, and links will be made available on the Student's Area of the MSc Bioinformatics official website (http://mscbioinformatics.uab.cat/base/base3.asp?sitio=bioinformaticsintranet&anar=module_2 &item=&subitem=). 6