BIOTECHNOLOGY Vol.XI Bioinformatics on Post genomic Era: From Genomes to Systems Biology - Vihinen, Mauno

Similar documents
Human Genome and Human Genome Project. Louxin Zhang

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

FACULTY OF MEDICAL SCIENCE

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

Human Genome Organization: An Update. Genome Organization: An Update

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

A Primer of Genome Science THIRD

Bachelor of Science in Applied Bioengineering

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Biochemistry Major Talk Welcome!!!!!!!!!!!!!!

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE Q5B

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Biochemistry

CCR Biology - Chapter 9 Practice Test - Summer 2012

Elective Options for MS in Clinical and Translational Sciences Program

SACKLER SCHOOL OF GRADUATE BIOMEDICAL SCIENCES CATALOG PROGRAMS OF STUDY, COURSES AND REQUIREMENTS FOR ALL GRADUATE PROGRAMS

How Can Institutions Foster OMICS Research While Protecting Patients?

Genetic diagnostics the gateway to personalized medicine

BIOSCIENCES COURSE TITLE AWARD

Study Program Handbook Biochemistry and Cell Biology

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Department of Food and Nutrition

GenBank, Entrez, & FASTA

A leader in the development and application of information technology to prevent and treat disease.

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Microbiology, Immunology and Serology

Degree Level Expectations, Learning Outcomes, Indicators of Achievement and the Program Requirements that Support the Learning Outcomes

Resumen Curricular de los Profesores. Jesse Boehm

FACULTY OF MEDICAL SCIENCE

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

Integrating Bioinformatics, Medical Sciences and Drug Discovery

European Medicines Agency

How To Understand The Science Of Genomics

Intro to Bioinformatics

An Interdepartmental Ph.D. Program in Computational Biology and Bioinformatics:

The National Institute of Genomic Medicine (INMEGEN) was

SIPBS Portfolio Entry

BIOLOGICAL SCIENCES REQUIREMENTS [63 75 UNITS]

MUTATION, DNA REPAIR AND CANCER

Protein Synthesis How Genes Become Constituent Molecules

Contents. Page 1 of 21

Genome and DNA Sequence Databases. BME 110/BIOL 181 CompBio Tools Todd Lowe March 31, 2009

2019 Healthcare That Works for All

Introduction to bioinformatics

UCF, College of Medicine BS Biotechnology MS Biotechnology/MBA Professional Science Masters Program in Biotechnology/MBA

Cardiff University MSc in Bioinformatics - MSc in Genetic Epidemiology & Bioinformatics

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Biological Sequence Data Formats

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

Leukemia Drug Pathway Analyzer

White Paper. Yeast Systems Biology - Concepts

GRADUATE CATALOG LISTING

MASTER OF SCIENCE IN BIOLOGY

Dr Alexander Henzing

Biotechnical Engineering (BE) Course Description

MOLECULAR PHARMACOLOGY AND EXPERIMENTAL THERAPEUTICS

Analysis of Illumina Gene Expression Microarray Data

Bioinformatics Resources at a Glance

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

BSc (Hons) Biology (Minor: Forensic Science or Marine & Coastal Environmental Science)/MSc Biology SC516 (Subject to Approval) SC516

Appendix 2 Molecular Biology Core Curriculum. Websites and Other Resources

Curriculum Policy of the Graduate School of Agricultural Science, Graduate Program

Program Approval Form

Kazan (Volga region) Federal University, Kazan, Russia Institute of Fundamental Medicine and Biology. Master s program.

Master of Science in Biomedical Sciences

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

The EcoCyc Curation Process

Masters Learning mode (Форма обучения)

Bachelor of Biomedical Science

Teaching Bioinformatics to Undergraduates

Heriot Watt Course Catalogue

Course Curriculum for Master Degree in Medical Laboratory Sciences/Hematology and Blood Banking

Department of Biochemistry & Molecular Biology

INDUSTRY OVERVIEW. Our business segments. (ii) Global drug development service market Preclinical drug development services

Human Health Sciences

Genetic testing. The difference diagnostics can make. The British In Vitro Diagnostics Association

Cancer Genomics: What Does It Mean for You?

Biology Institute: 7 PhD programs Expertise in all areas of biological sciences

Biology Major and Minor (from the College Catalog)

Curricula review for Bioengineering and Medical Informatics degree programmes in the West Balkans

Course Descriptions. I. Professional Courses: MSEG 7216: Introduction to Infectious Diseases (Medical Students)

CALIFORNIA STATE UNIVERSITY CHANNEL ISLANDS

BERKELEY CITY COLLEGE COLLEGE OF ALAMEDA LANEY COLLEGE MERRITT COLLEGE

Genetics Module B, Anchor 3

Genetics Lecture Notes Lectures 1 2

14.3 Studying the Human Genome

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

MASTER S DEGREE PROGRAMME IN BIOINFORMATICS

For additional information on the program, see the current university catalog.

International CEMarin Omics Workshop: Omics Techniques for the Study of Marine Organisms and Ecosystems

How To Change Medicine

Biological Sciences B.S. Degree Program Requirements (Effective Fall, 2015)

AP Biology Syllabus

Master of Philosophy (MPhil) and Doctor of Philosophy (PhD) Programs in Life Science

Bioinformatics Grid - Enabled Tools For Biologists.

Biotechnology Programs. Biotechnology Associate in Science Degree

Course Requirements for the Ph.D., M.S. and Certificate Programs

A Degree in Science is:

Transcription:

BIOINFORMATICS ON POST GENOMIC ERA: FROM GENOMES TO SYSTEMS BIOLOGY Vihinen, Mauno Institute of Medical Technology, University of Tampere, Finland and Research Unit, Tampere University Hospital, Tampere, Finland Keywords: Genome, transcriptome, proteome, data mining, sequence analysis, bioinformatics, functional genomics, systems biology. Contents 1. Introduction 1.1. Implications of Genomes 2. Bioinformatics and the Internet 3. Genomics 4. Databases 4.1. Annotation 4.2. Gene Ontologies 4.3. Data Mining 5. Sequence Alignment 6. Common Sequence Analysis Methods 6.1. Gene Finding 6.2. Mapping 6.3. DNA/RNA Secondary Structure 6.4. Primer Selection 6.5. Translation 6.6. Protein Analysis 6.7. Pattern Searches 7. Functional Genomics 7.1. Gene and Protein Expression 8. Protein Structure Prediction 9. Systems Biology 10. Trends and Prospects Acknowledgements Glossary Bibliography Biographical Sketch Summary Genome projects are changing the emphasis of the life sciences. A major part of the human genome containing about 3 billion base pairs in 24 chromosomes was recently published. Genome studies and other laboratory methods are producing mountains of data. Bioinformatics is a relatively new multidisciplinary field that combines methods from biology and biomedicine with computer science, statistics and mathematics. Bioinformatics is discipline to organize, analyze, store, retrieve, share and distribute

biological, genomic, proteomic, clinical and biomedical information. In bioinformatics, scientists aim at revealing the mechanisms behind biological phenomena. This review provides an introduction to the methods and problems tackled by bioinformatics. In addition to identifying all the genes in genomes it is crucial to store and distribute the information in databases. Annotation and identification of genes from genomes are crucial for generating useful genome databases. Ethical, legal, and social implications of genome and gene data are briefly discussed. Currently, data mining is one of the most common bioinformatics tasks. Nucleotide and amino acid sequences contain all the information for genes to function and for proteins to fold and deliver their functions. Sequence analysis methods can derive plenty of information when combined with powerful database search methods. 1. Introduction The publication of numerous genome sequences, including that for human, has propelled the biosciences into the post genomic era. The human genome contains about 3 billion base pairs organized into 24 chromosomes. The actual number of genes will still remain open for a long time, but current estimates are around 25 000. In addition to identifying all the genes in genomes, it is crucial to store and distribute the information in databases. Genome projects also aim at identifying the genes and proteins critical for life functions, regulation and networks. To make sense of the mountains of biological data, often generated with highthroughput method of omics approaches, advanced computer based methods are required. Bioinformatics is the field that handles the data. This review provides an introduction to the methods and problems that can be solved with current stateoftheart bioinformatics, in order to unravel biological information, especially in relation to genome data. 1.1. Implications of Genomes Genomic data, for the first time, allows systematic analysis and modification of several cellular processes. The availability of microbial genomes facilitates e.g., the identification of new drug targets, the search for new energy sources, monitoring to detect pollutants in environment, and possibly efficient toxic waste cleanup. Medicine will benefit in many ways. Improved and early diagnosis, as well as detection of genetic predispositions to diseases is already available for many disease conditions. Rational drug design based on target information has revolutionized the pharmaceutical industry. In the future gene therapy will be a real therapeutic option for certain disorders. Bioarchaeology and evolutionary studies of human origin and migration will reveal the history of different population groups. DNAbased forensics will be extensively utilized. Insect, drought and disease resistant crops can be developed, as well as farm animals that are healthier and more productive. Genome studies also have broad ethical, legal, and social implications (ELSI). These issues are of concern in policy making. Important issues include e.g., (fetal) genetic testing, the effects of genetic counseling and medical practices, the use of genetic information and privacy implications, as well as commercialization and intellectual

property protection, including patenting. Safety and environmental issues are of concern for genetically modified foods, feed and microbes. 2. Bioinformatics and Internet Bioinformatics (see also Bioinformatics) is a relatively new multidisciplinary field that combines methods from biology and biomedicine with computer science, statistics and mathematics. Bioinformatics stores, analyses, processes and manipulates diverse information relevant to biological systems. With bioinformatics, scientists aim at revealing the mechanisms lying behind biological phenomena by modeling the dependencies and faint inclinations in massive gene and protein information. These studies generally involve such large data sets that only computerbased methods can be used. Bioinformatics is closely connected to experimental research, usually being an integral part of it. Many bioinformatics services, programs and databases are available in the Internet. The majority of basic bioinformatics analyses can generally be performed over the Internet, as fast as or even faster than in local computers. For the distribution of the biological information, the Internet is the primary means. All the most important registries are freely available. There are already hundreds of biological and biomedical databases available on the Web. Internet databases provide up to date information, providing a great many benefits over many locally maintained services. Recently commercial databases and services with restricted access and/or high price have also emerged. In addition to the most important databases, the majority of the most frequently used software tools can either be run on the Internet or downloaded for free. 3. Genomes Genomes contain all the information necessary for an organism to live. The draft sequence is largely useless without further bioinformatic analyses. Functional genomics aims at the identification of the locations of genes and their functions. Some 326 complete genomes have been fully sequenced, and approximately 1000 prokaryotic and about 610 eukaryotic genome projects are ongoing as of July 2006. A list of the central genome databases is given in Table 1. Genome sequences facilitate for the first time the gaining of a complete picture of the processes occurring within an organism. In genome research, computers are needed at all stages. Genomes are sequenced by using the socalled shotgun sequencing approach, which generates a large number of randomly selected sequence stretches, which are computationally combined to contigs, and eventually to complete genomes. In fact, the computations last much longer than the actual data collection in the big sequencing centers. By analyzing the genomes, it is possible to address several questions. Instead of comparing organisms based on single sequences, it can now be made by comparing complete reaction pathways and metabolic networks. GOLD Genomes OnLine Database KEGG http://www.genomesonline.org/ http://www.genome.ad.jp/kegg/

Sanger Center MIPS GOBASE Ensembl Entrez Gene http://www.sanger.ac.uk/ http://mips.gsf.de/ http://www.bch.umontreal.ca/gobase/ http://www.ensembl.org/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene Table 1. Gene and Genome Information in the Internet. The genome sequences have to be further analyzed, in the first place to identify the locations of genes. For example, human genes comprise less than 5 percent of the genome. Further, eukaryotic genes are socalled mosaic genes composed of exons and intervening introns. Some genes can be fragmented to hundreds of exons and introns, which makes the identification of coding genes far from trivial, especially because the raw sequencing data can contain errors. In the analysis and annotation of the genomes, functional reconstruction is used for the conceptual assembly of metabolic pathways, transport units and signal transduction pathways. Special tools are available for the characterization of gene structures and functions. These methods rely on extensive database analysis and comparison to other genes and genomes. Naturally, the functions have to be verified by experimentalists. However, functional annotation of individual genes is still largely incomplete. Bibliography TO ACCESS ALL THE 13 PAGES OF THIS CHAPTER, Visit: http://www.eolss.net/eolsssampleallchapter.aspx Blundell, T. L., Sibanda, B. L., Montalvao, R. W., Brewerton, S., Chelliah, V., Worth, C. L., Harmer N. J., Davies, O., Burke, D. (2006) Philos Trans R Soc Lond B Biol Sci 29, 413:423. Fox J. A.,. Butland, S. L., McMillan, S., Campbell, G, Ouellette, B. F. F.(2005) The Bioinformatics Links Directory: a Compilation of Molecular Biology Web Servers. Nucl. Acids Res. 33: W3W24; doi:10.1093/nar/gki594 Galperin M. Y. (2006). The Molecular Biology Database Collection: 2006 update. Nucleic Acids Research 34: D3D5; doi:10.1093/nar/gkj162 Biographical Sketch Mauno Vihinen received his PhD in Biochemistry from the University of Turku, Finland (1990), followed by appointments at Turku Centre for Biotechnology (19901993), and Karolinska Institute, Stockholm (19931995). From 19951997, Vihinen was at University of Helsinki, Finland (Associate Professor). Since 1998, he has been Professor of Bioinformatics at Institute of Medical Technology,

University of Tampere, Finland. The scientific interests of his research group are studying protein structurefunction relationships, especially in relation to human disease, and the analysis and development of software tools for both gene and protein expression studies, microarrays and proteomics, respectively. The group has applied bioinformatics methods, especially in the research of primary immunodeficiencies and certain cancers. In addition to applied bioinformatics, he has also developed programs on diverse areas of bioinformatics, and has contributed to the development and maintenance of numerous databases (mutation databases and knowledge bases for diseases). Gene and expression data analysis and method development for the data analysis in these fields are among major interests of his research group as well as systems biology, especially in analysis of immunological processes and immunodeficiencies.