Bioinformatics Approaches for Analysis of High-throughput Biological Data WORKSHOP. September Istanbul, Turkey

Transcription

1 Bioinformatics Approaches for Analysis of High-throughput Biological Data WORKSHOP September Istanbul, Turkey

2

3 WELCOME TO ISTANBUL! Course Co-Organizers Hasan H. Otu Department of Bioengineering, Istanbul Bilgi University Istanbul, Turkey Luiz F. Zerbini Cancer Genomics, ICGEB Cape Town, South Africa

4 Dear Participants and Faculty, It is my great pleasure to welcome you to the Workshop Bioinformatics Approaches for Analysis of Highthroughput Biological Data in Istanbul, Turkey - September , sponsored by International Centre for Genetic Engineering and Biotechnology (ICGEB), International Union of Biochemistry and Molecular Biology (IUBMB) and Istanbul Bilgi University. My special thanks go to the co-organizer, Dr. Luiz Zerbini and eight speakers without whom this event would not be possible. High Throughput Biological Data (HTBD) production has been increasing at an unprecedented pace with the advancements of microarrays and nextgen sequencing technologies, which requires detailed and comprehensive analysis methods. Bioinformatics has emerged as an interdisciplinary field at the intersection of life sciences, engineering, computational and basic sciences and acts as an information management and analysis system for HTBD. There is, however, increased need for awareness, knowledge, and skills in Bioinformatics, which is one of the main motivations behind this workshop. Bioinformatics mainly deals with four facets of analysis: DNA sequence analysis, Protein structure prediction, Functional Genomics and Proteomics, and Systems Biology. In this workshop, participants will be introduced to current state-of-theart Bioinformatics methods and applications of aforementioned four facets. The workshop is designed to give researchers a thorough basis to understand the new trends in sequence, protein and gene expression analysis. One of the outcomes will be to equip the participants with available databases and analysis tools that address issues faced in their research and propose possible rooms for improvement of discussed methods from an algorithmic point of view. I am very excited in welcoming you for this stimulating event in the historic and beautiful city of Istanbul! Hasan H. Otu, Ph.D. Chair, Department of Bioengineering Istanbul Bilgi University Istanbul, Turkey Letter from the Organizer

5 Faculty Speakers Mehmet Serkan Apaydin, PhD Assistant Professor of Electrical and Electronic Engineering, Istanbul Sehir University, Istanbul Turkey Rita Casadio, PhD Professor of Biophysics, University of Bologna, Group leader of the Bologna Biocomputing Unit, Bologna, Italy Esra Erdem, PhD Assistant Professor of Computer Science and Engineering, Sabanci University, Istanbul Turkey Towia Libermann, PhD Associate Professor of Medicine, Harvard Medical School, Boston, MA, USA Director, BIDMC Genomics and Proteomics Center and DF/HCC Cancer Proteomics Core Div. of Interdisciplinary Medicine and Biotechnology Michael P. Myers, PhD Group Leader, Protein Networks International Centre for Genetic Engineering and Biotechnology (ICGEB) Trieste, Italy Hasan H. Otu, PhD Assistant Professor of Bioengineering, Istanbul Bilgi University, Istanbul Turkey Cenk Sahinalp, PhD Professor of Computing Science, Simon Fraser University, Vancouver, Canada Director, SFU Lab for Computational Biology; Canada Research Chair in Computational Genomics Khalid Sayood, PhD Professor of Electrical Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA Ugur Sezerman, PhD Associate Professor of Biological Sciences and Bioengineering, Sabanci University, Istanbul Turkey Luiz Zerbini, PhD Group Leader, Cancer Genomics International Centre for Genetic Engineering and Biotechnology (ICGEB) Cape Town, South Africa

6 Scientific Program Monday, September 3rd :15-09:00 Arrival and Registration 09:00-09:20 Welcome / Hasan H. Otu 09:20-09:50 About ICGEB / Luiz F. Zerbini 10:00-10:50 Population Sc3ale Detection of Common and Rare Genomic Rearrangements and Transcriptomic Aberrations / Cenk Sahinalp 11:00-11:30 Tea/Coffee Break 1 1: 30-12:20 Efficient Communication and Storage vs. Accurate Variant Calls in Massively Parallel Sequencing: Two Sides of the Same Coin / Cenk Sahinalp 12:30-14:00 Lunch 14:00-14:50 Large scale annotation of proteins with labeling methods (Part I) / Rita Cassadio :50 Large scale annotation of proteins with labeling methods (Part II) / Rita Cassadio 16:00-17:00 Meet the Expert Session* Tuesday, September 4th :00-09:50 Protein Structure Prediction Methods / Ugur Sezerman 10 :00-10:50 SNP Analysis in a pathway related context / Ugur Sezerman 11:00-11:30 Tea/Coffee Break 1 1: 30-12:20 The connectivity map database and its use in cancer research / Luiz F. Zerbini 12:30-14:00 Lunch 14:00-14:50 Genome Rearrangement with AI Planning / Esra Erdem 15:00-15:50 Querying Biomedical Databases and Ontologies in Natural Language Using Automated Reasoners / Esra Erdem 16:00-17:00 Meet the Expert Session* Wednesday, September 5th 2012 Free morning and afternoon 19:00 - Social Dinner** Thursday, September 6th :20-09:50 Computational Genomic Signatures and their Applications (Part I) / Khalid Sayood 10:00-10:50 Computational Genomic Signatures and their Applications (Part II) / Khalid Sayood 11 :00-11:30 Tea/Coffee Break 11: 30-12:20 Pathway Analysis of High Throughput Biological Data within the context of Bayesian Networks / Hasan H. Otu 12:30-14:00 Lunch 14:00-14:50 Functional genomics driving individualized medicine and the challenges ahead / Towia Liberman :50 Proteomic approaches to personalized medicine / Towia Liberman 16:00-17:00 Meet the Expert Session* Friday, September 7th :00-09:50 Proteomics: From proteins to networks (Part I) / Michael Myers 10:00-10:50 Proteomics: From proteins to networks (Part I) / Michael Myers 11 :00-11:30 Tea/Coffee Break 11: 30-12:20 Introduction to Nuclear Magnetic Resonance Spectroscopy / Serkan Apaydin 12:30-14:00 Lunch 14:00-14:50 Algorithms for NMR Structure Based Assignment / Serkan Apaydin :30 Closing Remarks and Evaluation / Hasan H. Otu 15:30-16:30 Meet the Expert Session* 16:00 - Farewell Cocktail** * Meet the expert sessions will provide an opportunity for the participants to interact with experts and discuss topics related to their experiences. Sessions will begin promptly. ** Social Dinner and Farewell Cocktail are free of charge for all participants. RSVP at the registration desk is required for these events.

7 Mehmet Serkan Apaydin, PhD Assistant Professor of Electrical and Electronic Engineering, Istanbul Sehir University MSA is an assistant professor in the College of Engineering and Natural Sciences at Istanbul Sehir University, Turkey. He received his B.Sc. in Electrical Engineering from Bilkent University and his Ph.D. in Electrical Engineering from Stanford University. His research interests are on bioinformatics developing computational tools to study protein structure, motion, and interactions with ligands. Talk Schedule: Friday, September 7 / 11:30-12:20 & 14:00-14:50 Introduction to Nuclear Magnetic Resonance Spectroscopy The 3-D structure of a protein plays a critical role in defining the protein s function. High-throughput protein structure determination methods are very important to obtain structural information quickly and accurately. The two main experimental techniques for structure determination are X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy. Not all proteins can be crystallized and studied by XRC; furthermore NMR allows solving protein structure in solution. In NMR, various experiments are performed on the protein. A general introduction to NMR spectroscopy will be given, with emphasis on the information content of the various NMR experiments, such as HSQC, TOCSY, RDC. Algorithms for NMR Structure Based Assignment NMR Spectroscopy allows determining protein structure in solution. An important problem in protein structure determination using NMR spectroscopy is the mapping of peaks to corresponding nuclei. Structure Based Assignment (SBA) is an approach to solve this problem using a template structure that is homologous to the target. We formulate SBA as a linear assignment problem with additional Nuclear Overhauser Effect (NOE) constraints, which can be solved within Nuclear Vector Replacement s (NVR) framework. This approach (NVR-BIP) uses NVR s scoring function and data types. Our results are comparable to NVR s assignment accuracy on NVR s test set, but higher on four additional small proteins. We prove that this problem is NP-hard and propose a tabu search algorithm (NVR-TS) equipped with a dynamic tabu list structure and guided perturbation mechanism to efficiently solve it. NVR-TS uses a quadratic penalty relaxation of NVR-BIP where the violations in the NOE constraints are penalized in the objective function. We also implement a memory structure that reports k-best solutions. Experimental results indicate that our algorithm finds the optimal solution on NVR-BIP s data set (7 proteins with 24 templates - 31 to 126 residues). Furthermore, it achieves high assignment accuracies on two additional large proteins, MBP and EIN (348 and 243 residues, respectively), which NVR-BIP failed to solve. We then propose an ant colony optimization based approach to this problem. Our method finds optimal solutions for small proteins and achieves higher accuracies on larger proteins compared to NVR-TS. Joint work with Jeyhun Aslanov, Bülent Çatay, Gizem Çavuşlar, Bruce Donald and Nick Patrick.

8 Rita Casadio, PhD Professor of Biophysics, University of Bologna Group leader of the Bologna Biocomputing Unit Rita Cassadio, after her degree in Physics at the University of Bologna, Italy, attended several courses both in Italy and abroad and acquired experience and theoretical background in different fields, such as Computer Science, Membrane and Protein Biophysics, Bioenergetics and Irreversible Thermodynamics. After working in Laboratories of Biophysics both in the United States and in Germany, in 1987 RC became Assistant Professor of Biophysics at the University of Bologna Italy. Since 1/10/2003 she is full professor of Biochemistry/Bioinformatics/Biophysics at UNIBO. RC worked in membrane and protein Biophysics (particularly with bacteriorhodopsin from Halobacterium Halobium and F1F0 ATPases from mesophilic organisms), both experimentally and theoretically. Presently she is interested in computer modelling of relevant biological processes, such as protein folding and modelling, protein-protein interaction, genome annotation, protein interaction networks, and SNPs search and annotation and their effect on protein stability. RC is giving courses to undergraduate and graduate students in Physics, Biology and Biotechnology on Membrane and Molecular Biophysics, Computational Biology and Bioinformatics. Presently she is the president of the Bologna International Master in Bioinformatics (Laurea Magistrale). RC is member of the American Biophysical Society, the Protein Society, ISCB, the Italian Societies of Biochemistry, Biophysics and Bioinformatics. She is also a member of the Accademia delle Scienze dell Istituto di Bologna. She is member of the board of directors of I.N.B.B, an Italian Interuniversity Consortium for Researches in Biostructures and Biosystems, acting as a representative of the Italian Minister of MIUR; she has been a member of the board of directors of ISCB, the International Society of Computational Biology ( ). Presently she is a member of the Editorial Board of BMC Bioinformatics, Advances in Bioinformatics, Bio Data Mining and BMC Research Notes. Talk Schedule: Monday, September 3 / 14:00 14:50 & 15:00 15:50 Large scale annotation of proteins with labeling methods As a result of large sequencing projects, data banks of protein sequences and structures are growing rapidly. The number of sequences is however orders of magnitude larger than the number of structures known at atomic level and this is so in spite of the efforts in accelerating processes aiming at the resolution of protein structure. Tools have been developed in order to bridge the gap between sequence and protein 3D structure, based on the notion that information is to be retrieved from the data bases and that knowledgebased methods can help in approaching a solution of the protein folding problem. By this several futures can be predicted starting from a protein sequence such as structural and functional motifs and domains, including the topological organisation of a protein inside the membrane phase, and the formation of disulfide bonds in a folded protein structure. Our group has been contributing to the field with different computational methods, mainly based on machine learning (neural networks (NNs), hidden markov models (HMMs), support vector machines (SVMs), hidden neural networks (HNNs) and extreme learning machines (ELMs)) and capable of computing the likelihood of a given feature starting from the protein sequence ( Our methods can add to the process of large scale proteome annotation (endowing sequences with functional and structural features).

9 Recently Conditional Random Fields (CRFs) have been introduced as a new promising framework to solve sequence labelling problems in that they offer several advantages over Hidden Markov Models (HMMs), including the ability of relaxing strong independence assumptions made in HMMs. However, several problems of sequence analysis can be successfully addressed only by designing a grammar in order to provide meaningful results. We therefore introduced Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We tested our GRHCRF on two typical biosequence labelling problem: the prediction of the topology of Prokaryotic outer-membrane proteins and the prediction of bonding states of cysteine residues in proteins, proving that the separation of state names and labels allows to model a huge number of concurring paths compatible with the grammar and with the experimental labels without increasing the time and space computational complexity.

10 Esra Erdem, PhD Assistant Professor of Computer Science and Engineering, Sabanci University Esra Erdem is a faculty member at Sabanci University. She received her Ph.D. in computer sciences at the University of Texas at Austin (2002), and visited University of Toronto and Vienna University of Technology for postdoctoral research ( ). Her research is in the area of knowledge representation and reasoning. Talk Schedule: Tuesday, September 4 / 14:00 14:50 & 15:00 15:50 Genome Rearrangement with AI Planning The genome rearrangement problem is to find the most economical explanation for observed differences between the gene orders of two genomes. Such an explanation is provided in terms of events (such as inversions, transpositions) that change the order of genes in a genome. A similar problem studied in AI is the planning problem, where the goal is to plan the actions of a robotic agent to achieve the given goals from a given initial state. In the first part of the talk, we will study these two problems, emphasizing their similarities and the methods for solving them. In the second part of the talk, we will explain how the genome rearrangement problem can be modeled as an AI planning problem and solved using a general-purpose AI planner. Querying Biomedical Databases and Ontologies in Natural Language Using Automated Reasoners Storing biomedical data in various structured forms, like biomedical databases and ontologies, and at different locations have brought about many challenges for answering queries about the knowledge represented in these ontologies, like representation of queries, extraction of relevant knowledge from the biomedical resources, integrating them, efficiently answering queries, and generating further related explanations taking into account the provenance information. In the first part of the talk, we will go over these challenges, and present a related knowledge representation and reasoning paradigm from AI, called Answer Set Programming (ASP), that provides a high-level expressive formalism to represent knowledge and efficient solvers to answer queries about this knowledge. In the second part of the talk, we will discuss how ASP in connection with Semantic Web technologies can be used to handle the challenges of biomedical query answering.

11 Towia Libermann, PhD Associate Professor of Medicine, Harvard Medical School Director, BIDMC Genomics and Proteomics Center and DF/HCC Cancer Proteomics Core Div. of Interdisciplinary Medicine and Biotechnology Towia A. Libermann, Ph.D. is an Associate Professor of Medicine in the Department of Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School. Dr. Libermann is also the Director of the Beth Israel Deaconess Medical Center Genomics, Proteomics and Bioinformatics Center and Director of the Dana Farber/Harvard Cancer Center Cancer Proteomics Core. His laboratory is located in the Division of Interdisciplinary Medicine and Biotechnology at Beth Israel Deaconess Medical Center. Dr. Libermann is an experienced molecular biologist with a strong track record in oncology, immunological and inflammatory diseases, signal transduction, molecular biology, gene regulation, bioinformatics, proteomics and genomics. He is applying systematic and comprehensive functional genomics and proteomics strategies for transcriptional profiling, high throughput genotyping, proteomics, and drug screening to define disease mechanisms at a molecular level and to identify novel prognostic and predictive biomarkers as well as new drug targets in human disease such as cancer and diabetes. Dr. Libermann is an expert in translational research and personalized medicine, having worked on a variety of approaches to identify proteins that may be exploited as biomarkers and/ or targets for therapeutic invention, starting with his seminal discovery of EGF receptor gene amplifications in glioblastomas while he was a graduate student. Dr. Libermann is routinely participating as a reviewer for various NIH study sections, NCI Cancer Centers, Arthritis Foundation, AAAS, AIBS, Genome Canada, Science Foundation of Ireland, Israel Science foundation, Ontario Research Fund, Ontario Institute for Cancer Research and Korea Science and Engineering Foundation. Dr. Libermann is a Founder of Karyon Therapeutics, BananaLogix, and Tolerance Pharmaceuticals which merged into Cardion. Prior to joining Beth Israel Deaconess Medical Center and Harvard Medical School in 1990, Dr. Libermann did his post-doctoral training at the Whitehead Institute for Biomedical Research with Dr. David Baltimore, after receiving his Ph. D. degree under the supervision of Dr. Joseph Schlessinger in Immunology from the Weizmann Institute of Science and Technology, Rehovot, Israel in Dr. Libermann is Editor-in-Chief of the Open Proteomics Journal. He has published more than 170 scientific papers and has 4 issued and 5 pending patents. Talk Schedule: Thursday, September 6 / 14:00 14:50 & 15:00 15:50 Functional genomics driving individualized medicine and the challenges ahead Human beings are 99.99% the same. However, subtle genetic differences between individuals combined with environmental effects result in the predisposition or the development of diseases as well as divergent responses to therapies. Patients with the same disease respond differently to drugs due to individual differences in the particular disease causing mechanisms and due to individual variations in drug metabolism. The aim of personalized medicine is to optimally tailor therapy to each patient based on advanced molecular characterization of the individual s disease process and drug response. Functional genomics is promising to provide solutions to personalized medicine and will enhance our overall knowledge and treatment of various diseases.

12 Functional genomics approaches have rapidly evolved over the last years and have provided the basis for groundbreaking discoveries in basic and clinical research. As the technologies such as next generation sequencing become more mature and validated, bench-to-bedside clinical applications rapidly emerge. Genomics strategies in conjunction with bioinformatics and systems biology approaches are rapidly changing the landscape of medicine and patient management enabling stratification of patients based on their individual disease mechanism, development of targeted and more specific therapies and identification of novel diagnostic and prognostic biomarkers. This course will introduce some of the genomic technologies, applications to personalized medicine and challenges in applying genomic discoveries to patient management. Proteomic approaches to personalized medicine As a result of the human genome sequencing effort, delineation of the proteome has become the new frontier in basic, translational and clinical research. In-depth understanding of the proteome promises to solve many biological and clinical questions and is considered an enabling and critical approach in research and clinical investigation, expanding knowledge of etiology, development and progression of diseases. It is becoming more apparent that only if we understand the proteome within cells and in the extracellular compartments, will we be able to model the complex biological pathways in human disease. With the aid of knowledge-based analytical fields such as bioinformatics, proteomic technologies (e.g., mass spectrometry, protein fractionation) provide the means to compare profiles in normal and pathological tissues and correlate them with biological function, identify temporal patterns of protein expression and post-translational modifications and define the function and interactions of proteins. When considered along with clinical data, the proteomic profile of bodily fluids or tumor tissue can guide the rational and personalized management of patients with regard to therapeutic modalities and with the ultimate hope of discovering new molecular targets for drug development. Mass spectrometry has the unique ability to identify in parallel protein modifications and mutations while at the same time providing quantitative measurements. While mass spectrometry until recently had major limitations with regard to quantitation, sensitivity and throughput when applied to larger sets of clinical samples, recent technological developments have resulted in significant enhancements that make clinical proteomics feasible. Technological advances in mass spectrometry are rapidly pushing the limits of sensitivity and resolution. Indeed some targeted quantitative proteomic approaches combining antibody immunoprecipitation with mass spectrometry using Multiple Reaction Monitoring (MRM) are pushing the limit of protein detection to the level of ELISAs. As a result of this proteomic revolution, we are now able to identify and validate new disease biomarkers and potential novel molecular targets for drug development as well as to dissect aberrant biological pathways in diseases. This course will introduce fundamental language and concepts including basic concepts of mass spectrometry, sample preparation, quantitative proteomics, protein-protein interaction networks, posttranslational modifications, proteomics data analysis, proteomic biomarker discovery and validation, translational introduction of diagnostic and prognostic biomarkers into the clinic, proteomics approaches to understand basic disease mechanisms and to identify potential novel drug targets.

13 Michael P. Myers, PhD Group Leader, Protein Networks ICGEB Trieste Dr. Myers received his BA degree in Departments of Biology and Physics, DePauw University, Greencastle USA (1990) and his PhD degree in Department of Neuroscience, Case Western Reserve University, Cleveland USA (1996). Since 2007, Dr. Myers has been a group leader at the ICGEB Trieste Component. Prior to that, he was a principal investigator and the director of proteomics at the Cold Spring Harbor Laboratory. His laboratory is interested in understanding how protein complexes regulate cellular behavior. Inside the cell, the majority of proteins can be found in highly interactive networks. The architecture of these networks and how they change in response to the cellular environment is critical for the normal physiological functioning of the cell. In fact, these networks are responsible for the robustness and adaptability of living cells and perturbations to these networks result in pathological conditions such as cancer and neurological disorders. The laboratory uses high throughput mass spectrometry to gain insights into the protein interaction networks from a variety of normal and pathological conditions. Talk Schedule: Friday, September 7 / 09:00 09:50 & 10:00 10:50 Proteomics: From proteins to networks (in two parts) Proteomics attempts to identify or characterize the protein component from a biologically interesting source. The source can be extremely complex, such as the whole cell lysate from a tumor, or extremely simple, such as a single purified protein. Clearly the goals of these approaches, as well as the underlying workflows, are extremely different and these will be highlighted in the first presentation. The focus of the second presentation will be on how these approaches can be used to build networks of proteins and how these networks can be exploited to gain new insights. The goal of my laboratory is to understand how protein complexes regulate cellular behavior. My laboratory implements new methods for the high throughput analysis of protein interactions (networks) using mass spectrometry. In particular, we focus on those networks that are (or likely to be) perturbed in a variety of pathological conditions, including tumorigenesis and viral infection.

14 Hasan H. Otu, PhD Assistant Professor of Bioengineering, Istanbul Bilgi University Hasan H. Otu obtained his BS degree in 1996 and MS degree in 1997, both from Bogazici University, Department of Electrical and Electronics Engineering. In 2002, he graduated from the University of Nebraska-Lincoln with a PhD in Electrical Engineering. He is a faculty member at Harvard Medical School ( ) where he was a research fellow between Dr. Otu is the founding director of Bioinformatics Core at Beth Israel Deaconess Medical Center, Harvard Medical School and Associate Director of Proteomics Core at Dana Farber Harvard Cancer Center. Since 2010, Dr. Otu has been acting as the founding chair of Department of Bioengineering at Istanbul Bilgi University. Dr. Otu s research interests are in the area of Bioinformatics focusing on macromolecular sequence analysis, microarrays, biomarker discovery and systems biology, analyzing high throughput biological data within the context of networks. Talk Schedule: Monday, September 3 / 09:00 09:20 Thursday, September 6 / 11:30 12:20 Pathway Analysis of High Throughput Biological Data within the context of Bayesian Networks High Throughput Biological Data (HTBD) production has been increasing at an unprecedented pace with the advancements of microarrays and next-gen sequencing technologies. From a life science perspective HTBD data analysis results make most sense when interpreted within the context of biological networks and pathways. Bayesian Networks (BN) represent dependency structure for a set of random variables using directed acyclic graphs and have been used with increasing popularity in mathematics and computational sciences over the past 20 years. BNs model both linear and non-linear interactions, handle stochastic events in a probabilistic framework accounting for noise, and emphasize only strong relations in noisy data. These properties make BNs excellent candidates for HTBD analysis. In applications of BNs to HTBD analysis, generally, nodes represent genes and edges represent interaction relations. In this talk I will describe a method we have devised, Bayesian Pathway Analysis (BPA), with applications to synthetic and real data. In the BPA approach, known biological pathways are modeled as BNs and pathways that best explain given HTBD are found. Gene Set Enrichment (GSE) or Gen Ontology (GO) based approaches that analyze microarray data within the context of pathways or functional groups consider the genes in a pathway or group as a list, calculate some sort of a score for each list representing the pathway s or group s significance without involving in their model the topology via which genes in a given pathway or group interact with each other. Proposed method, for the first time, integrates pathway topology (graph representing gene interactions) when analyzing HTBD within the context of pathways. BPA tests fitness of the HTBD to the pathways (which are modeled as BNs) through the Bayesian Dirichlet Equivalent (BDe) scoring scheme. Significance of the scores are assessed using randomization via bootstrapping and False Discovery Rate (FDR) corrected p-values are calculated for each pathway accounting for multiple hypothesis testing.

15 Cenk Sahinalp, PhD Professor of Computing Science, Simon Fraser University Director, SFU Lab for Computational Biology; Canada Research Chair in Computational Genomics S. Cenk Sahinalp is a Professor of Computing Science at Simon Fraser University, Canada. He received his B.Sc. in Electrical Engineering from Bilkent University and his Ph.D. in Computer Science from the University of Maryland at College Park. Sahinalp is a University of Maryland Distinguished CS Alumni, an NSF Career Awardee, a Canada Research Chair, a Michael Smith Foundation for Health Research Scholar and an NSERC Discovery Accelerator Awardee. His papers on genomics and bioinformatics have been highlighted by scientific journals and magazines (e.g. Genome Research, Nature Biotech, Genome Technology) and won several awards (e.g. ISMB 12 Best Student Paper, ISMB 11 HitSeq Best Paper). He was/is the conference general chair of RECOMB 11, PC chair of APBC 13, and has served as the sequence analysis area chair for ISMB and CSHL Genome Informatics Conferences. He has also co-founded the RECOMB-Seq conference series on Massively Parallel Sequencing. He is an area co-editor for BMC Bioinformatics and is on the editorial board of Bioinformatics and several other journals. He co-directs the SFU undergraduate program in Bioinformatics and the SFU Bioinformatics for Combating Infectious Diseases Research Program. His research interests include computational genomics, in particular algorithms for high throughput sequence data, network biology, RNA structure and interaction prediction and chemoinformatics algorithms. Talk Schedule: Monday, September 3 / 10:00-10:50 & 11:30-12:20 Population Scale Detection of Common and Rare Genomic Rearrangements and Transcriptomic Aberrations Massively parallel (MP) sequencing technologies are on their way to reduce the cost of whole shotgun sequencing of an individual donor genome to USD Coupled with algorithms to accurately detect structural (in particular expressed) differences among many individual genomes, MP sequencing technologies are soon to change the way diseases of genomic origin are diagnosed and treated. In this talk we will briefly go through some of the algorithm development efforts at the Lab for Computational Biology in SFU for simultaneously analyzing large collections of MP sequenced genomes and transcriptomes, and in particular for identifying and differentiating common and rare, expressed and unexpressed large scale variants with high accuracy. Our algorithms, which we collectively call CommonLAW (Common Loci structural Alteration detection Widgets) move away from the current model of detecting genomic variants in single MP sequenced donors independently, and checking whether two or more donor genomes indeed agree or disagree on the variations. Instead, we propose a new model in which structural variants are detected among multiple genomes and transcriptomes simultaneously. One of our methods, Comrad, for example, enables integrated analysis of transcriptome (i.e. RNA) and genome (i.e. DNA) sequence data for discovering expressed rearrangements in multiple, possibly related, individuals.

16 Efficient Communication and Storage vs. Accurate Variant Calls in Massively Parallel Sequencing: Two Sides of the Same Coin Given two strings A and B from the DNA alphabet, the Levenshtein edit distance between A and B, LED(A,B), is defined to be the minimum number of single character insertions, deletions and replacements to transform A to B (equivalently B to A). If in addition to the single character edits, one is permitted to perform segmental (block) edits in the form of (i) moving a block from any location to another, (ii) copying a block to any location, and (iii) uncopying (i.e. deleting one of the two occurrences of) a block, the resulting block edit distance, BED(A,B), captures much of our current understanding of the relation between individual genome sequences. If among two communicating parties, Alice (holding genome sequence A) and Bob (holding genome sequence B), Alice wants to compute B, then, theoretically, the total number of bits Bob needs to send to Alice is O(BED(A,B) polylog BED(A,B)) [Cormode et al., SODA 2000]. Considering that between a typical donor genome B and a reference genome A, the number of single character differences are in the order of a few million and the number of structural (i.e. blockwise) differences are in the order of tens of thousands, it should be possible to communicate genomes by exchanging only a few million bytes! Yet, today, the most effective way of communicating genome sequence data involves physically exchanging hard disks. In this talk we will try to explain the wide gap between theoretical expectations and the current reality in genome communication, as well as storage, and pose some theoretical and practical problems on the way to the Google-ization of genome search and analysis. We will also try to explore the extent our theoretical predictions for genome sequences hold for the RNA-Seq data. Finally we will briefly go through some of the recent developments in transcriptome sequence analysis, especially in the context of disease studies.

17 Khalid Sayood, PhD Professor of Electrical Engineering, University of Nebraska-Lincoln Khalid Sayood received his undergraduate education at the Middle East Technical University, Ankara, Turkey, and the University of Rochester, Rochester, NY. He received the B.S. and M.S. degrees from the University of Rochester, and the Ph.D. degree from Texas A&M University, College Station, TX, in 1977, 1979, and 1982 respectively, all in electrical engineering. He joined the Department of Electrical Engineering at the University of Nebraska-Lincoln in 1982, where he is currently serving as Henson Professor of Engineering. He spent the academic years at the TUBITAK Marmara Research Center and Bogazici University in Turkey. He is the author of Introductionto Data Compression, now in its second edition, and the editor of The Lossless Compression Handbook. Khalid Sayood s principal interest is in the search of patterns in data. He indulges this interest by looking at problems in data compression, joint source-channel coding, and various aspects of bioinformatics. Talk Schedule: Thursday, September 6 / 09:00-09:50 & 10:00-10:50 Computational Genomic Signatures and their Applications Recent advances in development of sequencing technology have resulted in a deluge of genomic data. In order to make sense of this data there is an urgent need for algorithms for data processing and quantitative reasoning. An emerging in silico approach, called computational genomic signatures, addresses this need by representing global speciesspecific features of genomes using simple mathematical models. The first talk introduces the general concept of computational genomic signatures, and reviews some of the DNA sequence models which can be used as computational genomic signatures. We begin with well known composition models such as GC content and dinucleotide odds ratio. We continue with signatures based on correlation statistics. These include autoregressive models as well as information theoretic models such as the average mutual information profile. We conclude with signatures based on composition vectors. Practical computational genomic signatures consist of both a model and a measure for computing the distance or similarity between models. Therefore, a discussion of sequence similarity/distance measurement in the context of computational genomic signatures is presented. The second talk deals with various applications of computational genomic signatures. In particular we will examine the areas of phylogeny construction and metagenomics with a brief excursion into the problem of detecting horizontal gene transfer.

18 Ugur Sezerman, PhD Associate Professor of Biological Sciences and Bioengineering, Sabanci University O. Uğur Sezerman graduated from Bogazici University, Istanbul, (B. Sc. Elect. Eng. 1985, M.Sc. Biomedical Eng. 1987) and received a Ph. D. in Biomedical Engineering (1993) from Boston University, MA, USA. Previously he worked at Boston University and Bogazici University as a researcher and an instructor. He has been at Sabanci University Biological Sciences and Bioengineering Program since He has established the Computational Biology Laboratory at Sabanci University. His current research interests are molecular modeling, synthetic vaccine and drug design, protein engineering, DNA chips, and developing algorithms for applications in functional genomics, systems biology and bioinformatics. Talk Schedule: Tuesday, September 4 / 09:00 09:50 & 10:00 10:50 Protein Structure Prediction Methods 3D structure information on proteins is instrumental in understanding the mechanism of their function. Even though there are several experimental methods for structure determination they are usually labor, time and cost expensive. In this talk I will go over computational methods developed for protein structure prediction. The talk will specially focus on ab initio, threading and homology modeling. SNP Analysis in a pathway related context For complex diseases there are no strong associations between SNPs and disease etiology. Analysis of SNPs in a pathway related context reveals pathways that are affected by these SNPs and show higher conservation than SNPs across populations. In this talk I will summarize the method we developed to find disease related pathways. The method involves SNP targeted gene identification, functional impact scoring, mapping to interaction networks, identification of targeted connected sub networks and finally identification of KEGG pathways found in these sub-networks.

19 Luiz Zerbini, PhD Group Leader, Cancer Genomics ICGEB Cape Town Dr. Luiz Zerbini received a Ph. D. from University of Sao Paulo, Brazil in His Ph. D. work focused on developing new gene therapy strategies in order to generate adenoviral vectors that could be specifically targeted to certain cell types. He joined Dr. Libermann s laboratory as post-doctoral scientist at Beth Israel Deaconess Medical Centre (BIDMC) and Harvard Medical School in September 1999 where he worked on defining the mechanisms involved in the deregulated function of malignant cells. He was promoted to Instructor Faculty position at Harvard Medical School in At the same year, Dr. Zerbini became the Associate Director of Research Proteomics, Dana Farber Harvard Proteomics Core. Furthermore, he was awarded as an independent investigator with two long-term Department of Defense (DoD), Cancer Program projects and co-investigator in a NIH R01 Research grant. He was also award two research grants from the Special Program for Research Excellence (SPORE) through the Dana Farber Harvard Cancer Center, funded by the National Cancer Institute (NCI). Dr. Zerbini joined the International Centre for Genetic Engineering and Biotechnology (ICGEB), located in Cape Town, South Africa in March ICGEB provides a scientific and educational environment of the highest standard and conducts innovative research in life sciences for the benefit of developing countries. ICGEB is part of the United Nations System. Dr. Zerbini currently serves as the head of the Cancer Genomics Group. The overall goals of the Group are to utilize genomics and proteomics tools and signal transduction resources to accelerate comparative analysis of aberrant gene expression in carcinogenesis and to study alterations in signal transduction pathways during development of cancers. Talk Schedule: Monday, September 3 / 09:20 09:50 Tuesday, September 4 / 11:30 12:20 The Connectivity map database and its use in cancer research Global molecular profiling has shown broad utility in delineating pathways and processes underlying disease, in predicting prognosis and response to therapy. The connectivity map (cmap) database was established in 2008 at The Broad Institute of MIT and Harvard in Cambridge, Massachusetts. It consists of a collection of genome-wide transcriptional expression data from cultured human cells treated with bioactive small molecules and simple pattern-matching algorithms that together enable the discovery of functional connections between drugs, genes and diseases through the transitory feature of common gene-expression changes. This approach can be used to identify gene signatures in patients that may predict the response to a particular drug. The cmap analysis involves the ranking of drugs based on the highest inverse similarity with the disease-specific gene signatures, providing a score for each drug. The drug gene signatures that are opposite to disease-specific gene signatures are drugs that may potentially reverse the disease phenotype towards the healthy state. This talk will introduce the rationale behind cmap and it will describe some applications that can be used in cancer research

20 Shuttle Bus Schedule from TAKSİM: Taksim - Dolapdere - Santral 08:00-08:30-09:00-09:30-10:00-10:30-11:00-11:30-12:00-12:30-13:00-13:30-14:00-14:30-15:00-15:30-16:00-16:30-17:00-17:30-18:00-18:30-19:00 Taksim - Dolapdere 08:15-09:15-10:15-11:15-12:15-13:15-14:15-15:15-16:15 There are hourly shuttle buses between on every Saturdays. from SANTRAL CAMPUS: Santral - Dolapdere - Taksim 08:00-08:30-09:00-09:30-10:00-10:30-11:00-11:30-12:00-12:30-13:00-13:30-14:00-14:30-15:00-15:30-16:00-16:30-17:00-17:30-18:00-18:30-19:00-19:30-20:00 There are hourly shuttle buses between on every Saturdays. Shuttle services are not available on Sundays. Shuttle services are not available on official holidays. You should check to see Shuttle Bus Schedule on our website.