An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle



Similar documents
University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

NORTH PACIFIC RESEARCH BOARD SEMIANNUAL PROGRESS REPORT

A Primer of Genome Science THIRD

Modulhandbuch / Program Catalog. Master s degree Evolution, Ecology and Systematics. (Master of Science, M.Sc.)

G E N OM I C S S E RV I C ES

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

BIO 3352: BIOINFORMATICS II HYBRID COURSE SYLLABUS

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

Scientific diving and documentation techniques, 4 hp

Ettema Lab Information Management System Documentation

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

BSc (Hons) Biology (Minor: Forensic Science or Marine & Coastal Environmental Science)/MSc Biology SC516 (Subject to Approval) SC516

A data management framework for the Fungal Tree of Life

BIOINFORMATICS METHODS AND APPLICATIONS

Molecular and Cell Biology Laboratory (BIOL-UA 223) Instructor: Ignatius Tan Phone: Office: 764 Brown

Fast. Integrated Genome Browser & DAS. Easy. Flexible. Free. bioviz.org/igb

Cloud-Based Big Data Analytics in Bioinformatics

Microbial Oceanomics using High-Throughput DNA Sequencing

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

university of copenhagen Bioinformatics Master Program

BIOLOGICAL SCIENCES REQUIREMENTS [63 75 UNITS]

Intro to Bioinformatics

Guidelines for Establishment of Contract Areas Computer Science Department

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

A Correlation of Miller & Levine Biology 2014

Bioinformatics Grid - Enabled Tools For Biologists.

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

The College of Science Graduate Programs integrate the highest level of scholarship across disciplinary boundaries with significant state-of-the-art

TENTATIVE COURSE SYLLABUS

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Contents. Page 1 of 11

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Overview of Next Generation Sequencing platform technologies

Introductory Biotechnology for High School Teachers - UNC-Wilmington

MARY DOHERTY 1111 Holland Avenue Cambridge, MD Telephone:

Hadoop Hands-On Exercises

Learning outcomes. Knowledge and understanding. Competence and skills

New challenges for research in biodiversity and ecology of micro- and macro-algae

AP Biology Essential Knowledge Student Diagnostic

Summer School: New challenges for research in biodiversity and ecology of micro- and macro-algae. Date: from 27/11 to 06/12/2015,

Bio-Informatics Lectures. A Short Introduction

COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

Biology Department Competitive Admission Requirements

Diablo Valley College Catalog

Genome Explorer For Comparative Genome Analysis

Computer Science. 232 Computer Science. Degrees and Certificates Awarded. A.S. Degree Requirements. Program Student Outcomes. Department Offices

UGENE Quick Start Guide

Factors for success in big data science

LifeScope Genomic Analysis Software 2.5

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

General Syllabus for Third Cycle Studies in Music Education Leading to a Doctor of Philosophy

Translation of Subject Curriculum (Study Plan) for Third-cycle (PhD) Education. Chemistry with specialization in Chemical Physics.

Practical Solutions for Big Data Analytics

Genotyping by sequencing and data analysis. Ross Whetten North Carolina State University

THE DOCTORAL COMPREHENSIVE PORTFOLIO

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Molecular Biology And Biotechnology

Instructor Information. Instructor: Table of Contents. Course Description (Catalog) Table of Contents. Course Scope

A. Master of Science Programme (120 credits) in Global Studies (Masterprogram i globala studier)

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Genetics. Biology Spring 2014

Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics

Biology Department Admission Requirements

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Bioinformatics Resources at a Glance

DNA Barcoding: A New Tool for Identifying Biological Specimens and Managing Species Diversity

TENTATIVE COURSE SYLLABUS

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

2. Prior knowledge and other entry requirements

An Introduction to Genomics and SAS Scientific Discovery Solutions

Linux command line. An introduction to the Linux command line for genomics. Susan Fairley

ITT Advanced Medical Technologies - A Programmer's Overview

All LJMU programmes are delivered and assessed in English

Deliverable First report on sample storage, DNA extraction and sample analysis processes

Dr. Michael D. Meyer, Chair Forbes Hall 1021B (757)

Basic processing of next-generation sequencing (NGS) data

MASTER OF SCIENCE IN BIOLOGY

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

The growing challenges of big data in the agricultural and ecological sciences

Biology AP Edition - Campbell & Reece (8th Edition)

Applied mathematics and mathematical statistics

Hadoop Basics with InfoSphere BigInsights

Student Text and E-Book ISBN:

Transcription:

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle Faculty of Science; Department of Marine Sciences The Swedish Royal Academy of Sciences 1 Confirmation The syllabus was confirmed by the Steering Committee of the Department of Marine Sciences 200X-XX-XX, 200X-XX-XX. Discipline: Natural Science Responsible department: Department of Marine Sciences Main fields of study: Bioinformatics 2. Position in the educational system Elective course; third-cycle education. 3. Entry requirements Admitted to third cycle education. 4. Course content Kursplanemall fastställd av rektor 2009-01-19 Dnr G25 5247/06

This course aims at detailed understanding and hands-on experience of using state of the art bioinformatics pipelines for one s own biological research questions. An important aspect of the course is to show how genomic data can be applied to address and answer research questions in the fields of genetics, ecology, population biology, biodiversity monitoring and conservation. The students will be trained in the latest bioinformatic methods to analyze high throughput sequencing data, which is present in many research projects. The course will cover basic computing tools required to run command line applications, processing high throughput sequencing data of the CO1 gene from environmental samples to reveal biodiversity and analysis of sequencing data from whole genome scans for population genomic studies. The first part of the course introduces general computing tools for beginners such as the UNIX command line environment, bash commands, data formatting using regular expressions and basic scripting in the unix shell with a series of examples and exercises. The course introduces bioinformatics software for analysis of sequence data from metagenetics (The high-throughput sequencing of a molecular marker from an ecosystem or a community of organisms, used for large-scale analyses of biodiversity), through a series of live demonstrations (AmpliconNoise, TaxAssign, QIIME). The course also introduces basic and advanced concepts of population genomics data analysis such as genome/transcriptome assembly, annotation (BLAST), alignment/mapping, differential Gene expression, functional enrichment tests, SNP genotyping, PCA, outlier tests. The course corresponds to 1 week of full time studies and and is composed of lectures, demonstrations and computer labs. 5. Outcomes 1. Knowledge and understanding 1a. Demonstrate advanced knowledge of experimental strategies, applications and tools of DNA barcoding/metabarcoding and population genomics. 1b. Demonstrate advanced knowledge of the potential of genomics approaches to answer ecosystem-wide questions, in particular for biodiversity monitoring. 2. Skills and abilities 2a. Ability to use basic commands in the Unix command line environment (reformatting data with regular expressions, basic scripting, running python scripts from the unix shell) 2b. Ability to use metagenetics software tools to analyse sequence data from environmental samples (data cleaning steps, clustering of reads into operational taxonomic units (OTUs) and taxonomic assignment through hidden markov models and database matching (BLAST, barcode of life database). 2c. Ability to use population genomics software tools to assemble and annotate a genome/transcriptome, and perform gene alignment/mapping, differential gene expression, functional enrichment tests, SNP genotyping, PCA, outlier tests. 3. Judgement and approach 3a. Formulate one's own research questions, identify data and tools needed to answer these questions and critically evaluate and analyse the results. 6. Required reading

Part 1: General computing tools. This will be the main textbook for the introduction to general computing tools: - Haddock and Dunn (2010). Practical computing for Biologists. Sinauer Associates. Part 2: Population genomics - dewit et al. (2012). The simple fool's guide to population genomics via RNA-seq: an introduction to high-throughput sequencing data analysis. Molecular Ecology Resources 12, 1058-1067. Part 3: Metagenetics - Bik, H. M., D. L. Porazinska, et al. (2012). "Sequencing our way towards understanding global eukaryotic biodiversity." TREE 1485. - Leray, M. and Knowlton, N. (2015), 'DNA barcoding and metabarcoding of standardized samples reveal patterns of marine benthic diversity', Proceedings of the National Academy of Sciences of the United States of America, 112 (7), 2076-81. Online course material - The simple fool's guide to population genomics via RNA-Seq: an introduction to highthroughput sequencing data analysis. Details of the pipeline can be found at (http://sfg.stanford.edu) Practical computing for Biologists (http://practicalcomputing.org) 7. Assessment Attendence is mandatory for a pass grade. 8. Grading scale The grading scale comprises Fail (U), and Pass (G). 9. Course evaluation The course evaluation will be carried out through an online questionnaire. Additional information Language of instruction will be English, as international guest lecturers will participate. 11. Preliminary course schedule Course format: 2.5 hp course, fulltime. Lecturers: Sarah Bourlat (SB), Pierre de Wit (PDW), Mats Töpel (MT) and Katja Lehmann (KL) Day 1: Introduction to general computing tools (PDW, MT) Format: 3 hours lecture and demo sessions, 3 hours computer labs, assigned exercises Day 1 of the course will be an introduction to general computing tools, such as the unix command line environment. We will go through bash commands (less, nano, ls, ll, wc,, tail, head, mkdir, cat, grep, for loop), regular expressions, basic scripting, and running python scripts from the unix shell with a series of examples. Exercises and assignments will be based

on Haddock and Dunn Practical Computing for biologists and can be carried out independently. There will also be a presentation of useful bioinformatics software. Part of day 1 will also be concerned with working on a remote server, using the University of Gothenburg s Albiorix cluster as a training tool. Sudents will be provided with guest accounts. This portion of the course is to refresh the students knowledge of the command line environment and the shell, a tool for interacting with the computer through typed instructions at the command line. Exercise sessions will be carried out in pairs to encourage collaborative problem solving. All lectures will be made dynamic through live demonstrations of the command line. Detailed course material including commands and scripts will be available through the course web page. This part of the course corresponds to learning outcome 2a: Ability to use basic commands in the Unix command line environment (reformatting data with regular expressions, basic scripting, running python scripts from the unix shell) Days 2, 3 and 4: Population Genomics pipeline (PDW) Format: Lectures and hands on sessions (computer labs) This part of the course will cover an introductory lecture and practical session run through of the simple fool's guide to population genomics via RNA-Seq: an introduction to highthroughput sequencing data analysis. Details of the pipeline can be found at (http://sfg.stanford.edu). Lecture: Population genomics via RNA-seq: an introduction to high-throughput sequencing data analysis (PDW) Pratical session: A hands on practical session based on the which will cover: Genome/Transcriptome assembly, annotation (BLAST), alignment/mapping, differential Gene expression, functional enrichment tests, SNP genotyping, PCA, outlier tests. The practical session will encourage students to collaborate and work in pairs to enhance communication and understanding. Before the course: Students will need to bring their own computers, with some software preinstalled. Guidelines and download locations for all software used in the course will be available on the course web page. This part of the course corresponds to learning outcome 2c: Ability to use population genomics software tools to assemble and annotate a genome/transcriptome, and perform gene alignment/mapping, differential gene expression, functional enrichment tests, SNP genotyping, PCA, outlier tests. Day 5: Metagenetic data analysis pipelines (SB, KL) Format: Lectures and live demonstrations of software Day 5 will focus on analysis of a high-throughput sequencing dataset from an environmental sample. This approach allows us to gather novel data on species composition of a sample for

biodiversity analyses. We will proceed through data cleaning steps, clustering of reads into operational taxonomic units (OTUs) and taxonomic assignment through database matching using the QIIME pipeline. Lecture 1: The potential of metagenomics approaches to answer ecosystem-wide questions, in particular for biodiversity monitoring (SB). Demo session: Introduction to taxonomic diversity analysis of 16S amplicon data using QIIME and R (KL, SB) This part of the course corresponds to learning outcome 2b: Ability to use metagenetics software tools to analyse sequence data from environmental samples (data cleaning steps, clustering of reads into operational taxonomic units (OTUs) and taxonomic assignment through hidden markov models and database matching (BLAST, barcode of life database).