BIOINFORMATICS: A NEW FIELD IN ENGINEERING EDUCATION



Similar documents
Russ College of Engineering and Technology. Revised 9/ Undergraduate GPA of 3.0/4.0 or equivalent.

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

A Primer of Genome Science THIRD

Biochemistry. Entrance Requirements. Requirements for Honours Programs. 148 Bishop s University 2015/2016

Accelerated Undergraduate/Graduate (BS/MS) Dual Degree Program in Computer Science

Electrical and Computer Engineering Undergraduate Advising Manual

Department of Mathematical Sciences

Syllabus of B.Sc. (Bioinformatics) Subject- Bioinformatics (as one subject) B.Sc. I Year Semester I Paper I: Basic of Bioinformatics 85 marks

For additional information on the program, see the current university catalog.

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

CALIFORNIA STATE UNIVERSITY CHANNEL ISLANDS

The College of Science Graduate Programs integrate the highest level of scholarship across disciplinary boundaries with significant state-of-the-art

Proposal for New Program: Minor in Data Science: Computational Analytics

AC : INTEGRATION OF BIOINFORMATICS IN SCIENCE CURRICULUM AT FORT VALLEY STATE UNIVERSITY

COMPUTER SCIENCE AND ENGINEERING

BERKELEY CITY COLLEGE COLLEGE OF ALAMEDA LANEY COLLEGE MERRITT COLLEGE

Reasons for need for Computer Engineering program From Computer Engineering Program proposal

The University of Connecticut. School of Engineering COMPUTER SCIENCE GUIDE TO COURSE SELECTION AY Revised July 27, 2015.

DEPARTMENT OF PHYSICAL SCIENCES

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Study Program Handbook Biochemistry and Cell Biology

CHEMISTRY. Faculty. Programs Offered. Bachelor of Science in Chemistry (certified by the American Chemical Society) Careers in Chemistry

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

An Interdepartmental Ph.D. Program in Computational Biology and Bioinformatics:

BIOLOGICAL SCIENCES REQUIREMENTS [63 75 UNITS]

Teaching Bioinformatics to Undergraduates

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Biochemistry

BIOMEDICAL ENGINEERING UNDERGRADUATE CURRICULUM

Genome Science Education for Engineering Majors

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

UPBM CURRICULAR BROCHURE

CALIFORNIA STATE UNIVERSITY, EAST BAY. Committee on Academic Planning & Review (CAPR)

1. Rationale. The Society for Industrial and Applied Mathematics (SIAM) has just published its. A. Background

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

M.S. DEGREE REQUIREMENTS

UF EDGE brings the classroom to you with online, worldwide course delivery!

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Contents. Page 1 of 21

Electrical and Computer Engineering

Chemistry

Biochemistry Major Talk Welcome!!!!!!!!!!!!!!

Applied Mathematics and Mathematical Modeling

Department of Mathematics

October 2, 2003 Don Genson, director Sloan Master s Programs, Eberly College of Science, Penn State dwg9@psu.edu

The Johns Hopkins University. Department of Biomedical Engineering. Overview of Undergraduate Curriculum

Bio-Informatics Lectures. A Short Introduction

COURSE TITLE COURSE DESCRIPTION

Program Approval Form

Course Curriculum for Master Degree in Medical Laboratory Sciences/Hematology and Blood Banking

Molecular Biology And Biotechnology

Biology meets Engineering

Jack Baskin School of Engineering The University of California, Santa Cruz. Steve Kang, Dean and Prof. of Electrical Engineering October 15, 2003

CHEMISTRY. Real. Amazing. Program Goals and Learning Outcomes. Preparation for Graduate School. Requirements for the Chemistry Major (71-72 credits)

Sample schedule for ChE/BME Dual Majors in the BMTE Track

Faculty of Applied Sciences. Bachelor s degree programme. Nanobiology. Integrating Physics with Biomedicine

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Accelerated Bachelor of Science/Master of Science in Computer Science. Dual Degree Program

University of Wisconsin-Madison Department of Chemical and Biological Engineering. Curriculum Guide for Chemical Engineering Undergraduates

DEPARTMENT of CHEMISTRY AND BIOCHEMISTRY

DEPARTMENT OF PHYSICS KISSENA BOULEVARD FLUSHING, NEW YORK TEL.: (718) /FAX: (718)

Department of Chemistry and Biochemistry

A Multiple DNA Sequence Translation Tool Incorporating Web Robot and Intelligent Recommendation Techniques

The University of Connecticut. School of Engineering COMPUTER SCIENCE GUIDE TO COURSE SELECTION AY Revised May 23, 2013.

Learning outcomes. Knowledge and understanding. Competence and skills

Biochemistry, B.S. Bachelor of Science. Summary of Requirements for the Degree. Biochemistry, B.S. 1

Introduction to bioinformatics

Contents. Page 1 of 11

UC SANTA CRUZ FALL 2016 TRANSFER ADMISSION GUARANTEE (TAG) PROGRAM ELIGIBILITY REVIEW FORM

Biology Majors Information Session. Biology Advising Center NHB 2.606

Hidden Markov models in biological sequence analysis

Department of Computer Science and Engineering

university of copenhagen Bioinformatics Master Program

GRADUATE CATALOG LISTING

The College of Liberal Arts and Sciences

Chemistry and Biochemistry

CHEMISTRY, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN BIOCHEMISTRY

COMPUTER & INFORMATION TECHNOLOGY DEPARTMENT

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Microbiology, Immunology and Serology

The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

CHEMICAL SCIENCES REQUIREMENTS [61-71 UNITS]

Department of Electrical and Computer Engineering

Department of Integrative Physiology Bachelor of Arts Mission of the Department. Prospective Majors:

Proposal for New Program: BS in Data Science: Computational Analytics

Implementation Regulation for the MSc Programme Nanobiology

Bioinformatics Grid - Enabled Tools For Biologists.

Chemistry/Biochemistry

Physics. William D. Thacker, Ph.D., Chair

Bachelor of Science Advanced Research (Honours)

BS in Pharmaceutical Science (BSPS) degree program University of Rhode Island

CHEMISTRY, BACHELOR OF SCIENCE (B.S.) WITH A CONCENTRATION IN CHEMICAL SCIENCE

Student Handbook. Masters of Science degree Program. Biochemistry & Biotechnology. Last updated Spring 2013

Please consult the Department of Engineering about the Computer Engineering Emphasis.

DEPARTMENT OF MATHEMATICS & COMPUTING

Mellon College of Science First Year Spring Scheduling Booklet

Health Informatics Student Handbook

Department of Biomedical Engineering Curriculum Summary for MS Degree

Transcription:

BIOINFORMATICS: A NEW FIELD IN ENGINEERING EDUCATION Richard Hughey and Kevin Karplus Abstract Bioinformatics is a new engineering field poorly served by traditional engineering curricula. Bioinformatics can be defined in several ways, but the emphasis is always on the use of computer and statistical methods to understand biological data, such as the voluminous data produced by high-throughput biological experimentation including gene sequencing and gene chips. As the demand has outpaced the supply of bioinformaticians, the UCSC School of Engineering is establishing undergraduate and graduate degrees in bioinformatics. Although many schools have or are proposing graduate programs in bioinformatics, few are creating undergraduate programs. In this paper, we explore the blend of mathematics, engineering, science, and bioinformatics topics and courses needed for an undergraduate degree in this new field. Index Terms Bioinformatics, computational biology, undergraduate degree program Introduction Bioinformatics is a new engineering field poorly served by traditional engineering curricula. Bioinformatics can be defined in several ways, but the emphasis is always on the use of computer and statistical methods to understand biological data, such as the voluminous data produced by high-throughput biological experimentation including gene sequencing and gene chips. The field s rapid growth is spurred by the vast potential for new understanding that can lead to new treatments, new drugs, new crops, and the general expansion of knowledge. The public and private efforts to determine and annotate the 3.1 billion nucleotides of the human genome is only the most visible creation of computerized data. Bioinformatics is broadly defined as the application of computer techniques to biological data. This field has arisen in parallel with the development of automated highthroughput methods of biological and biochemical discovery that yield a variety of forms of experimental data, such as DNA sequences, gene expression patterns, and chemical structures. Bioinformatics encompasses everything from data storage and retrieval to the identification and presentation of features within data, such as finding genes within DNA sequence, finding similarities between sequences, structural predictions and correlation between sequence variation and Department of Computer Engineering, Jack Baskin School of Engineering, University of California, Santa Cruz, CA 95064, rph,karplus @soe.ucsc.edu clinical data. As the demand has outpaced the supply of bioinformaticians, the UCSC School of Engineering is establishing undergraduate and graduate degrees in bioinformatics. Although many schools have or are proposing graduate programs in bioinformatics, few are creating undergraduate programs. In forming our graduate program, we realized a fundamental problem: where would we find qualified graduate students? This quickly lead to the development of an undergraduate curriculum that we believe will serve students heading both to graduate school and to industry. We are, of course, joining a wave of activity in the development of academic programs in bioinformatics [1], [2]. Although most programs are at the graduate level, there are a few undergraduate programs as well. More details can be found at the web site of the International Society for Computational Biology (ISCB), the area s 4-year-old professional society (http://www.iscb.org) [3]. In the following sections, we discuss the history of bioinformatics at UCSC and the new undergraduate bioinformatics curriculum. Included are both background courses and bioinformatics course, including descriptions of two new courses, Introduction to Bioinformatics and Bioethics. Bioinformatics at UCSC UC Santa Cruz is well known in the field of bioinformatics for pioneering work on applications of hidden Markov models (HMMs) and stochastic context-free grammars to biological sequence data and for the development of software tools for sequence analysis [4], [5], [6], [7]. Our efforts include participation in the international Human Genome Project [8], development of the gene-finding software chosen to annotate the human and Drosophila (fly) genomes [9], success in international protein structure prediction contests [10], [11], and creation of a SIMD programmable accelerator for sequenceanalysis [12]. Since the introduction of our first bioinformatics course in 1996, we have trained many students with majors in Computer Science, Computer Engineering, Mathematics, Molecular, Cell and Developmental (MCD) Biology, Chemistry, and Biochemistry. These students are in very high demand in industry and academia. Based on our experience, we are now attempting to create distinctive BS, MS, and PhD programs of study in bioinformatics. These curricula emphasize the foundations of basic molecular biology and biochemistry coupled with substantial training in mathematics and computer engineering and science. F2B-15

Foundation Material Bioinformatics, as with the classical engineering disciplines, is the application of mathematics (e.g., probability and statistics), science (e.g., biochemistry), and a core set of problem solving methods (e.g., computer programming and database design) to find solutions to critical problems. Bioinformatics systems include multi-layered software, hardware, and experimental solutions that bring together a variety of tools and methods to analyze immense quantities of data. In describing the curriculum, it is important to know that UCSC is on a quarter system with 5-credit courses (3.5 hours of lecture per week for 10 weeks). A total of 180 credits, including major and general education requirements, are needed to graduate. The B.S. in Bioinformatics major requirements include 129 credits in 23 courses, slightly less than our computer engineering curriculum. Core Math The mathematics requirement is similar to that of most majors within the UCSC Baskin School of Engineering (presently Computer Engineering, Computer Science, Electrical Engineering, and Information Systems Management). Students complete 4 quarters of calculus, including a 2-quarter multivariable sequence, a combined linear algebra and differential equations course, discrete mathematics, and statistics. Students receive additional statistics training in the bioinformatics courses and, possibly, in electives. Core Science The core science requirement is expected to be the most difficult hurdle in the degree program for computer-oriented students. The tools that the bioinformatician use are primarily drawn from mathematics and engineering. However, a thorough understanding of the underlying biochemistry is critical to employing the tools effectively and interacting with laboratory scientists. Therefore, the science requirement includes two quarters of college chemistry, one year (or an accelerated two quarters) of organic chemistry, one quarter of cell biology, and, most important, the first of the 3-course sequence in biochemistry. This first biochemistry course studies the basics of DNA, RNA, and proteins. Unlike the chemistry courses leading up to it, there is no associated laboratory. Students may complete additional courses in the biochemistry sequence (discussing enzymes, regulation, metabolism, biochemistry laboratory, and other topics) as electives in the bachelor s degree program. Core Engineering The engineering requirements focus on developing programming and algorithm skills. They include two quarters of introductory programming, a programming and proof course in abstract data types, database systems, technical writing, and an advanced programming elective. Because of the broad Session F2B scope and extensive requirements of this major, we did not include the computer organization course required of other engineering majors. The database systems class is presently offered by the computer science department for computer science and information systems management majors. It is likely that soon after the Bioinformatics program has started, we will construct a bioinformatics-oriented database class. This potential new class will focus on the structure, contents, and use of biological sequence databases. The technical writing course, long a staple in our computer engineering major, serves the dual purpose of formal discipline-specific writing instruction and fulfillment of the campus writing requirements. The goal of the advanced programming elective is to provide additional programming experience. Any programmingintensive course will be accepted; the recommended electives include object-oriented programming, compilers, and applied graph theory. The compiler class may be the most appropriate because of its dual emphasis on programming and formal languages, a core area in bioinformatics. Bioinformatics Study The program includes three bioinformatics courses Introduction to Bioinformatics, Bioethics, and the graduate Bioinformatics I course and two advanced electives. The exceptionally strong math, science, and engineering training of bioinformatics majors leaves little doubt that they will be able to excel in the graduate bioinformatics course. Initially, in fact, we expect the undergraduates to frequently outperform the graduate students, who may not have as broad understanding of the three foundation areas. Introduction to Bioinformatics The Introduction to Bioinformatics course is the only course being introduced specifically for this major; the other two bioinformatics courses were developed for our graduate program. Introduction to Bioinformatics (Table I) has prerequisites of probability and statistics, the second quarter of introductory programming, and the first quarter of organic chemistry. While one could make a strong case for requiring the abstract data types and the biochemistry courses, including them in the prerequisite chain would prevent most undergraduates from taking this course until their senior year (or later). Instead, we are hoping that many students will take this class in their junior year, making them ideal candidates for summer research assistantships. We are also including a laboratory section with this course. One of the fundamental tasks of the bioinformatician is connecting of the different tools into a data analysis pipeline. Thus, the scripting language Perl is perhaps the most used programming language in bioinformatics. Approximately half of F2B-16

TABLE I CORE AND OPTIONAL MATERIAL FOR INTRODUCTION TO BIOINFORMATICS, IN 70-MINUTE LECTURES Core topics 1 The fundamental dogma of biology (DNA RNA protein), DNA bases, RNA bases, Amino acids. 2 DNA sequence databases on the web: genomic, unfinished genomic, EST, clustered EST. Specific databases include Genbank, NCBI s NR, dbest, Unigene, STACK, EGAD, and others. Error rates in different DNA databases due to sequencing error. 2 Protein sequence and structure databases on the web: Swiss-Prot, TrEMBL, NCBI s NR, PDB, FSSP, SCOP, CATH. Visualization tools: RASMOL or CHIME. 3 Pairwise alignment: substitution matrices and dynamic programming, Needleman-Wunsch and Smith-Waterman algorithms. 3 Probabilistic models: Bayesian statistics, definition of E-value, null models. 2 Fast search: BLAST algorithm and interpreting BLAST results. BLAST options. Visualization tool: NCBI s pictorial BLAST results. 4 Multiple sequence alignment: clustal, HMMs, and regularizers. Information and relative entropy. Visualization tools: belvu (or equivalent) and sequence logos. Core topics with potential for expansion 1 Protein structure prediction: HMMs, threading, secondary structure prediction, ab initio prediction. 1 DNA chips: how they work, design of oligos, extracting data from images. 1 mrna expression analysis: clustering of profiles. 1 Genefinding: HMM structure, signals and content sensors, using EST and homology data, using cross-genome similarity. Example optional topics 4 RNA structure prediction and alignment: energy minimization (Zucker), correlated mutation, stochastic context-free grammars. 6 Phylogeny: parsimony, max likelihood, evolutionary models. 2 DNA assembly. 2 X-ray crystallography how models are created. 3 NMR spectroscopy how models are created. 2 2D gel proteomics. 2 Mass spec proteomics. 3 Finding binding sites in DNA regulatory regions. References for these topics may be found on the web or in bioinformatics texts the laboratory time will be spent using Perl, and the other half using WWW and command-line tools. There are presently no books that match our course design. Many appropriate sections are in Durbin, Eddy, Krogh, and Mitchison s Biological Sequence Analysis [13], used in the graduate Bioinformatics I class. This book is quite advanced mathematically and includes too much detail for an advanced undergraduate class. We expect to develop most of the course material ourselves, possibly leading to a new undergraduate text on bioinformatics. Interested readers can find information on many bioinformatics texts at the ISCB web site mentioned above. Bioethics This is a joint undergraduate and graduate course, with additional work at the graduate level. The course has weekly topics with visiting distinguished lecturers from our campus, other UC campuses, other universities, national labs, and local (Santa Cruz and Santa Clara county) industry. Topics include general scientific ethics, biological research ethics, the ethical implications of the Human Genome Project, and governmental and research policy issues. Course work is primarily reading and writing. F2B-17

week 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 TABLE II WEEKLY TOPICS IN THE GRADUATE COURSE BIOINFORMATICS I Introduction to molecular biology and biosequence databases. Probability/statistical significance, Bayesian statistical methods. Pairwise alignment, dynamic programming, multiple alignment. Pairwise alignment, dynamic programming, multiple alignment. Profiles, hidden Markov models. Forward-backward algorithm and expectation maximization Dirichlet mixtures and sequence weighting Neural nets, nearest-neighbors, secondary-structure prediction. Profile-based protein structure prediction and protein threading Fragment packing Bioinformatics I Undergraduate Bioinformatics majors have the choice of completing our graduate Bioinformatics I course or a senior thesis. Bioinformatics I has a focus on core statistical methods, including the mathematics behind hidden Markov models and Dirichlet Mixture distributions, that form the foundation of the UCSC approach to bioinformatics. Fewer topics are covered in far more detail than Introduction to Bioinformatics (Table II). The primary evaluation criteria for Bioinformatics I is a quarter-long project in bioinformatics. Several undergraduates have already completed this course (without the benefit of Introduction to Bioinformatics), many also taking part in Research Experiences for Undergraduates, funded by the National Science Foundation. Additional information on this course can be found at links from the UCSC Computational Biology web page (http://www.soe.ucsc.edu/research/compbio/) Science, Math, and Engineering Electives The degree program includes two relatively free electives. These may be filled with any pair of courses that make sense, such as additional biochemistry courses, graduate bioinformatics courses, probability and statistics courses, biology and chemistry courses, and other electives in computer engineering and science. Graduate Bioinformatics Study With the assistance of the Alfred P. Sloan Foundation, we have also created a curriculum for the professional masters degree in bioinformatics (http://www.sciencemasters.org) [14]. There are six required graduate courses: Advanced Molecular Biology, Protein Structure, Stochastic Models, Bioinformatics I, Bioinformatics II, and Bioethics. Students must also take three graduate electives and three 2-unit seminar courses. The seminars typically alternate between invited guest speakers and student research presentations. We expect the graduate program to be approved for the 2002 academic year, one year after the undergraduate program, because of the more extensive review process. Once both programs are approved, we shall propose a combined BS/MS option in bioinformatics that will allow undergraduate bioinformatics students to apply Bioinformatics I and Bioethics toward both the graduate and undergraduate degrees. Conclusions The undergraduate bioinformatics major is a rigorous major combining courses from many departments (Table III; the second and third quarters of biochemistry are electives). We expect the program to be far more effective than double majors or major and minor combinations at integrating the fundamental knowledge and techniques required by the bioinformatician. As we gain experience with the program, we expect to refine several of the requirements. As mentioned above, a bioinformatics-centered database course is a particularly attractive possibility. We may also replace the fourth quarter of calculus with a course in Bayesian statistical methods. One of our colleagues in Molecular, Cellular, and Developmental Biology is developing a course in biotechnology for the bioinformatician: a course that would expose students to the full chain of laboratory work required to, for example, sequence DNA. We will continue working with our colleagues in all associated departments to introduce and revise courses for the discipline. The development of multi-disciplinary majors can be a difficult and time-consuming task. Coordination and discussion is required between many departments and faculty committees. However, as bioinformatics and other multi-disciplinary programs are developed, this path will become easier here and elsewhere. We hope that others will consider introducing bioinformatics degrees at all levels to better train our scientists for the new biology. Acknowledgments The authors thank David Haussler, Director of the UCSC Center for Biomolecular Science and Engineering, for starting bioinformatics at UCSC, Ann Pace, Assistant Direc- F2B-18

TABLE III PLAN OF STUDY WITH COURSES IN APPLIED MATH AND STATISTICS (AMS), BIOCHEMISTRY (BCH), BIOLOGY (BIO), BIOMOLECULAR ENGINEERING (BME), CHEMISTRY (CH), COMPUTER ENGINEERING (CE), COMPUTER SCIENCE (CS), AND GENERAL EDUCATION. Year Fall Winter Spring 1 General education General education General education CS Programming 1 CS Programming 2 CE Discrete Math M Calculus 1 M Calculus 2 M Calculus 3 2 CH Chemistry 1/L CH Chemistry 2/L BIO Cell Biology General education General education CS Abstract Data Types M Calculus 4 AMS ODE and Linear Algebra General education 3 CH Organic Chem 1/L CH Organic Chem 2/L General education CE Technical writing General education CE Stochastic Methods CS Databases CS Compilers General education 4 BCH Biochemistry 1 BCH Biochemistry 2 BCH Biochemistry 3 BME Intro to Bioinformatics BME Bioinformatics I General education General education CH Bioethics tor of the Center for contributing some of the introductory text, David Deamer, Professor of Chemistry, for developing the bioethics course, and Eric Rice for valuable comments. Our work in bioinformatics is supported in part by NSF grants EIA9905322 and DBI9808007, DOE grant DE-FG03-99ER62849, and a grant from the Alfred P. Sloan Foundation. References [1] Russ B. Altman, A curriculum for bioinformatics: The time is ripe, Bioinformatics, vol. 14, no. 7, pp. 549 550, 1998. [2] Andy Brass, Bioinformatics education a UK perspective, Bioinformatics, vol. 16, no. 2, pp. 77 78, 2000. [3] Chris Rawlings and Larry Hunter, Creating a professional society for bioinformatics the International Society for Computational Biology (ISCB), Bioinformatics, vol. 14, no. 6, pp. 471, 1998. [4] A. Krogh, M. Brown, I. S. Mian, K. Sjölander, and D. Haussler, Hidden Markov models in computational biology: Applications to protein modeling, JMB, vol. 235, pp. 1501 1531, Feb. 1994. [5] Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian, Kimmen Sjölander, Rebecca C. Underwood, and David Haussler, Stochastic context-free grammars for trna modeling, NAR, vol. 22, pp. 5112 5120, 1994. [6] Richard Hughey and Anders Krogh, Hidden Markov models for sequence analysis: Extension and analysis of the basic method, CABIOS, vol. 12, no. 2, pp. 95 107, 1996, Information on obtaining SAM is available at http://www.cse.ucsc.edu/research/ compbio/sam.html. [7] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, and C. Chothia, Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods, JMB, vol. 284, no. 4, pp. 1201 1210, 1998, Paper available at http://www.mrc-lmb.cam.ac.uk/genomes/jong/ assess paper/assess papernov.html. [8] The Human Genome, Nature, 15 Feb. 2001. [9] D. Kulp, D. Haussler, M.G. Reese, and F. Eeckman, A generalized hidden Markov model for the recognition of human genes in DNA, in ISMB-96, St. Louis, June 1996, pp. 134 142, AAAI Press, http://www.cse.ucsc.edu/ dkulp/cgi-bin/genie. [10] Kevin Karplus, Kimmen Sjölander, Christian Barrett, Melissa Cline, David Haussler, Richard Hughey, Liisa Holm, and Chris Sander, Predicting protein structure using hidden Markov models, Proteins: Structure, Function, and Genetics, vol. Suppl. 1, pp. 134 139, 1997. [11] Kevin Karplus, Christian Barrett, Melissa Cline, Mark Diekhans, Leslie Grate, and Richard Hughey, Predicting protein structure using only sequence information, Proteins: Structure, Function, and Genetics, vol. Supplement 3, no. 1, pp. 121 125, 1999. [12] J. D. Hirschberg, D. Dahle, K. Karplus, D. Speck, and R. Hughey, Kestrel: A programmable array for sequence analysis, Journal of VLSI Signal Processing, vol. 19, pp. 115 126, 1998. [13] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998. [14] A. J. S. Rayl, Designer degrees or academic alchemy?, The Scientist, vol. 14, no. 7, pp. 41, 3 Apr. 2000. F2B-19