Lecture 1 Introduction to Bioinformatics. Burr Settles IBS Summer Research Program 2008

Similar documents
Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure enzymes control cell chemistry ( metabolism )

Structure and Function of DNA

PRACTICE TEST QUESTIONS

Basic Concepts of DNA, Proteins, Genes and Genomes

RNA and Protein Synthesis

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

13.2 Ribosomes & Protein Synthesis

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Concluding lesson. Student manual. What kind of protein are you? (Basic)

Shu-Ping Lin, Ph.D.

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Molecular Genetics. RNA, Transcription, & Protein Synthesis

Molecular Facts and Figures

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Academic Nucleic Acids and Protein Synthesis Test

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Protein Synthesis How Genes Become Constituent Molecules

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

Transcription and Translation of DNA

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, b.patel@griffith.edu.

Page 1. Name:

Multiple Choice Write the letter that best answers the question or completes the statement on the line provided.

Translation Study Guide

1 Mutation and Genetic Change

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

Genetics Module B, Anchor 3

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

How To Understand How Gene Expression Is Regulated

CCR Biology - Chapter 8 Practice Test - Summer 2012

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

Genomes and SNPs in Malaria and Sickle Cell Anemia

DNA, RNA, Protein synthesis, and Mutations. Chapters

To be able to describe polypeptide synthesis including transcription and splicing

The sequence of bases on the mrna is a code that determines the sequence of amino acids in the polypeptide being synthesized:

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

Amino Acids, Peptides, Proteins

Control of Gene Expression

Lecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides

13.4 Gene Regulation and Expression

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

BOC334 (Proteomics) Practical 1. Calculating the charge of proteins

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Guidelines for Writing a Scientific Paper

GENE REGULATION. Teacher Packet

RNA & Protein Synthesis

PRESTWICK ACADEMY NATIONAL 5 BIOLOGY CELL BIOLOGY SUMMARY

A disaccharide is formed when a dehydration reaction joins two monosaccharides. This covalent bond is called a glycosidic linkage.

Protein Synthesis. Page 41 Page 44 Page 47 Page 42 Page 45 Page 48 Page 43 Page 46 Page 49. Page 41. DNA RNA Protein. Vocabulary

Ms. Campbell Protein Synthesis Practice Questions Regents L.E.

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Cellular Respiration Worksheet What are the 3 phases of the cellular respiration process? Glycolysis, Krebs Cycle, Electron Transport Chain.

Genetics Lecture Notes Lectures 1 2

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

Answer: 2. Uracil. Answer: 2. hydrogen bonds. Adenine, Cytosine and Guanine are found in both RNA and DNA.

Complex multicellular organisms are produced by cells that switch genes on and off during development.

Chapter 18 Regulation of Gene Expression

Sample Questions for Exam 3

Replication Study Guide

Given these characteristics of life, which of the following objects is considered a living organism? W. X. Y. Z.

Gene Switches Teacher Information

GenBank, Entrez, & FASTA

The Puzzle of Life A Lesson Plan for Life S cien ce Teach ers From: The G reat Lakes S cien ce C ent er, C lev elan d, OH

The Molecules of Cells

Chapter 11: Molecular Structure of DNA and RNA

AP BIOLOGY 2009 SCORING GUIDELINES

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

Nucleotides and Nucleic Acids

2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Control of Gene Expression

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet

Lab # 12: DNA and RNA

DNA is found in all organisms from the smallest bacteria to humans. DNA has the same composition and structure in all organisms!

Gene Models & Bed format: What they represent.

Problem Set 3 KEY

H H N - C - C 2 R. Three possible forms (not counting R group) depending on ph

Chapter 3: Biological Molecules. 1. Carbohydrates 2. Lipids 3. Proteins 4. Nucleic Acids

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Lecture Overview. Hydrogen Bonds. Special Properties of Water Molecules. Universal Solvent. ph Scale Illustrated. special properties of water

Central Dogma. Lecture 10. Discussing DNA replication. DNA Replication. DNA mutation and repair. Transcription

How To Understand The Chemistry Of Organic Molecules

Biochemistry of Cells

AP Biology TEST #5 - Chapters 11-14, 16 - REVIEW SHEET

DNA Bracelets

Genetics Test Biology I

A. A peptide with 12 amino acids has the following amino acid composition: 2 Met, 1 Tyr, 1 Trp, 2 Glu, 1 Lys, 1 Arg, 1 Thr, 1 Asn, 1 Ile, 1 Cys

Modeling DNA Replication and Protein Synthesis

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

Problem Set 1 KEY

Biological molecules:

Gene Regulation -- The Lac Operon

From DNA to Protein

Elements in Biological Molecules

Organelle Speed Dating Game Instructions and answers for teachers

Chapter 17: From Gene to Protein

Transcription:

Lecture 1 Introduction to Bioinformatics Burr Settles IBS Summer Research Program 2008 bsettles@cs.wisc.edu www.cs.wisc.edu/~bsettles/ibs08/

About Me instructor: Burr Settles 7 th year graduate student in Computer Sciences thesis topic: Active Learning advisor: Dr. Mark Craven office: 6775 Medical Sciences Center (upstairs) email: bsettles@cs.wisc.edu course webpage: http://www.cs.wisc.edu/~bsettles/ibs08/

What About You? school, major, year? what is your background in biology and/or computer science & statistics? what attracted you to the ISB program and the CBB track in particular? what do you think bioinformatics is?

Bioinformatics general definition: computational techniques for solving biological problems data problems: representation (graphics), storage and retrieval (databases), analysis (statistics, artificial intelligence, optimization, etc.) biology problems: sequence analysis, structure or function prediction, data mining, etc. also called computational biology

Course Overview basic molecular biology sequence alignment probabilistic sequence models gene expression analysis protein structure prediction by Ameet Soni

The Central Dogma

DNA can be thought of as the recipe for an organism composed of small molecules called nucleotides four different nucleotides distinguished by the four bases: adenine (A), cytosine (C), guanine (G) and thymine (T) is a polymer: large molecule consisting of similar units (nucleotides in this case) a single strand of DNA can be thought of as a string composed of the four letters: A, C, G, T ctgctggaccgggtgctaggaccctgactgcccggg gccgggggtgcggggcccgctgag

The Double Helix DNA molecules usually consist of two strands arranged in the famous double helix

Watson-Crick Base Pairs in double-stranded DNA A always bonds to T C always bonds to G

The Double Helix each strand of DNA has a direction at one end, the terminal carbon atom in the backbone is the 5 carbon atom of the terminal sugar at the other end, the terminal carbon atom is the 3 carbon atom of the terminal sugar therefore we can talk about the 5 and the 3 ends of a DNA strand in a double helix, the strands are antiparallel (arrows drawn from the 5 end to the 3 end go in opposite directions)

image from the DOE Human Genome Program http://www.ornl.gov/hgmis

Chromosomes DNA is packaged into individual chromosomes prokaryotes (single-celled organisms lacking nuclei) typically have a single circular chromosome examples: bacteria, archea eukaryotes (organisms with nuclei) have a species-specific number of linear chromosomes examples: animals, plants, fungi

Human Chromosomes

Genomes the term genome refers to the complete complement of DNA for a given species the human genome consists of 23 pairs of chromosomes mosquitos have 3 pairs camels have 35 pairs! every cell (except sex cells and mature red blood cells) contains the complete genome of an organism

Genes genes are the basic units of heredity a gene is a sequence of bases that carries the information required for constructing a particular protein (more accurately, polypeptide) such a gene is said to encode a protein the human genome comprises ~ 25,000 protein-coding genes

Gene Density not all of the DNA in a genome encodes protein: bacteria human ~90% coding gene/kb ~1.5% coding gene/35kb

The Central Dogma

RNA RNA is like DNA except: backbone is a little different often single stranded the base uracil (U) is used in place of thymine (T) a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U

Transcription

Transcription RNA polymerase is the enzyme that builds an RNA strand from a gene RNA that is transcribed from a gene is called messenger RNA (mrna)

Transcription Movie

The Central Dogma

Proteins proteins are molecules composed of one or more polypeptides a polypeptide is a polymer composed of amino acids cells build their proteins from 20 different amino acids a polypeptide can be thought of as a string composed from a 20-character alphabet

Protein Functions structural support storage of amino acids transport of other substances coordination of an organism s activities response of cell to chemical stimuli movement protection against disease selective acceleration of chemical reactions

Amino Acids Alanine Ala A Arginine Arg R Aspartic Acid Asp D Asparagine Asn N Cysteine Cys C Glutamic Acid Glu E Glutamine Gln Q Glycine Gly G Histidine His H Isoleucine Ile I Leucine Leu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro P Serine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine Val V

Amino Acid Sequence: Hexokinase 5 10 15 20 25 30 1 A A S X D X S L V E V H X X V F I V P P X I L Q A V V S I A 31 T T R X D D X D S A A A S I P M V P G W V L K Q V X G S Q A 61 G S F L A I V M G G G D L E V I L I X L A G Y Q E S S I X A 91 S R S L A A S M X T T A I P S D L W G N X A X S N A A F S S 121 X E F S S X A G S V P L G F T F X E A G A K E X V I K G Q I 151 T X Q A X A F S L A X L X K L I S A M X N A X F P A G D X X 181 X X V A D I X D S H G I L X X V N Y T D A X I K M G I I F G 211 S G V N A A Y W C D S T X I A D A A D A G X X G G A G X M X 241 V C C X Q D S F R K A F P S L P Q I X Y X X T L N X X S P X 271 A X K T F E K N S X A K N X G Q S L R D V L M X Y K X X G Q 301 X H X X X A X D F X A A N V E N S S Y P A K I Q K L P H F D 331 L R X X X D L F X G D Q G I A X K T X M K X V V R R X L F L 361 I A A Y A F R L V V C X I X A I C Q K K G Y S S G H I A A X 391 G S X R D Y S G F S X N S A T X N X N I Y G W P Q S A X X S 421 K P I X I T P A I D G E G A A X X V I X S I A S S Q X X X A 451 X X S A X X A enzyme involved in glycolysis in every organism known from bacteria to humans

Space-Filling Model of Hexokinase

Hemoglobin protein built from 4 polypeptides responsible for carrying oxygen in red blood cells

Translation ribosomes are the machines that synthesize proteins from mrna the grouping of codons is called the reading frame translation begins with the start codon translation ends with the stop codon

Codons and Reading Frames

The Genetic Code

Translation Movie

RNA Processing in Eukaryotes eukaryotes (animals, plants, fungi, etc.) are organisms that have enclosed nuclei in their cells in many eukaryotes, genes/mrnas consist of alternating exon/intron segments exons are the coding parts introns are spliced out before translation

RNA Splicing

Protein Synthesis in Eukaryotes vs. Prokaryotes

image from the DOE Human Genome Program http://www.ornl.gov/hgmis

RNA Genes not all genes encode proteins for some genes the end product is RNA ribosomal RNA (rrna), which includes major constituents of ribosomes transfer RNAs (trnas), which carry amino acids to ribosomes micro RNAs (mirnas), which play an important regulatory role in various plants and animals etc.

The Dynamics of Cells all cells in an organism have the same genomic data, but the genes expressed in each vary according to cell type, time, and environmental factors there are networks of interactions among various biochemical entities in a cell (DNA, RNA, protein, small molecules) that carry out processes such as metabolism intra-cellular and inter-cellular signaling regulation of gene expression

Overview of the E. coli Metabolic Pathway Map image from the KEGG database

The Metabolic Pathway for Synthesizing the Amino Acid Alanine reactions metabolites enzymes (proteins that catalyze reactions) genes encoding the enzymes image from the Ecocyc database www.biocyc.org

Gene Regulation Example: the lac Operon this protein regulates the transcription of LacZ, LacY, LacA these proteins metabolize lactose

Gene Regulation Example: the lac Operon lactose is absent the protein encoded by laci represses transcription of the lac operon

Gene Regulation Example: the lac Operon lactose is present it binds to the protein encoded by laci changing its shape; in this state, the protein doesn t bind upstream from the lac operon; therefore the lac operon can be transcribed

Gene Regulation Example: the lac Operon this example provides a simple illustration of how a cell can regulate (turn on/off) certain genes in response to the state of its environment an operon is a sequence of genes transcribed as a unit the lac operon is involved in metabolizing lactose it is turned on when lactose is present in the cell the lac operon is regulated at the transcription level the depiction here is incomplete; for example, the level of glucose (not just lactose) in the cell influences transcription of the lac operon as well

Completed Genomes Type Approx # Completed Archaea 46 Bacteria 524 Eukaryota 65 metagenomes 108 Organelles, Phages, Plasmids, Viroids, Viruses too many to keep track * Genomes OnLine Database (9/07)

Completed Base Pairs

Some Greatest Hits Genome Where Year H. Influenza TIGR 1995 E. Coli K -12 Wisconsin 1997 S. cerevisiae (yeast) internat. collab. 1997 C. elegans (worm) Washington U./Sanger 1998 Drosophila M. (fruit fly) multiple groups 2000 E. Coli 0157:H7 (pathogen) Wisconsin 2000 H. Sapiens (that s us) internat. collab./celera 2001 Mus musculus (mouse) internat. collaboration 2002 Rattus norvegicus (rat) internat. Collaboration 2004

Some Genome Sizes HIV: 9.8k E. coli: 4.6m S. cerevisiae: 12m D. melanogaster: 137m H. sapiens, R. norvegicus: 3.1b

More Than Just Genomes > 300 other publicly available databases pertaining to molecular biology (see pointer to Nucleic Acids Research directory on course home page) GenBank > 61 million sequence entries > 65 billion bases UnitProtKB / Swis-Prot > 277 thousand protein sequence entries > 100 million amino acids Protein Data Bank 45,632 protein (and related) structures * all numbers current about 9/07

Even More Data: High-Throughput Experiments this figure depicts one yeast gene -expression data set each row represents a gene each column represents a measurement of gene expression (mrna levels) at some time point red indicates that a gene is being expressed more than usual; green means less Figure from Spellman et al., Molecular Biology of the Cell, 9:3273-3297, 1998

Significance of the Genomics Revolution data driven biology functional genomics comparative genomics systems biology molecular medicine identification of genetic components of various maladies diagnosis/prognosis from sequence/expression gene therapy pharmacogenomics developing highly targeted drugs toxicogenomics elucidating which genes are affected by various chemicals

Bioinformatics Revisited representation/storage/retrieval/analysis of biological data concerning: sequences (DNA, protein, RNA) structures (protein, RNA) functions (protein, sequence signals) activity levels (mrna, protein, metabolites) networks of interactions (metabolic pathways, regulatory pathways, signaling pathways) of/among biomolecules even textual information from biomedical literature!

Next Time basic molecular biology sequence alignment probabilistic sequence models gene expression analysis protein structure prediction by Ameet Soni