Figure 1: Genome sizes of different organisms.



Similar documents
An Overview of Cells and Cell Research

Copyright by Mark Brandt, Ph.D.

Human Genome and Human Genome Project. Louxin Zhang

The world of non-coding RNA. Espen Enerly

Human Genome Organization: An Update. Genome Organization: An Update

Introduction to Genome Annotation

Basic Concepts of DNA, Proteins, Genes and Genomes

Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome

CCR Biology - Chapter 9 Practice Test - Summer 2012

1 Introduction. 2 Chromosomes Topology & Counts. 3 Genome size. 4 Replichores and gene orientation. 5 Chirochores. 6 G+C content.

restriction enzymes 350 Home R. Ward: Spring 2001

1.1 Introduction. 1.2 Cells CHAPTER Prokaryotic Cells Eukaryotic Cells

The Puzzle of Life A Lesson Plan for Life S cien ce Teach ers From: The G reat Lakes S cien ce C ent er, C lev elan d, OH

Structure and Function of DNA

Statistical modeling of non-coding DNA

BME Engineering Molecular Cell Biology. Lecture 02: Structural and Functional Organization of

Genetics Lecture Notes Lectures 1 2

Plant and Animal Cells

Transfection-Transfer of non-viral genetic material into eukaryotic cells. Infection/ Transduction- Transfer of viral genetic material into cells.

Genomes and SNPs in Malaria and Sickle Cell Anemia

1 Mutation and Genetic Change

1865 Discovery: Heredity Transmitted in Units

Viruses. Viral components: Capsid. Chapter 10: Viruses. Viral components: Nucleic Acid. Viral components: Envelope

Activity 4 Probability, Genetics, and Inheritance

Gymnázium, Brno, Slovanské nám. 7, WORKBOOK - Biology WORKBOOK.

Milestones of bacterial genetic research:

Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+

Amazing DNA facts. Hands-on DNA: A Question of Taste Amazing facts and quiz questions

Supplementary Figure Legends

The Cell Teaching Notes and Answer Keys

Molecular Facts and Figures

Next Generation Sequencing

CHAPTER 6: RECOMBINANT DNA TECHNOLOGY YEAR III PHARM.D DR. V. CHITRA

D.U.C. Assist. Lec. Faculty of Dentistry Medical Biology Ihsan Dhari. Kingdoms of life

1. When you come to a station, attempt to answer each question for that station.

Gene Models & Bed format: What they represent.

Introduction to the Cell: Plant and Animal Cells

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

Microscopes and the Metric System

Organization and Structure of Cells

Given these characteristics of life, which of the following objects is considered a living organism? W. X. Y. Z.

4. Why are common names not good to use when classifying organisms? Give an example.

The Ins and Outs of DNA Transfer in Thermus thermophilus

Answer: 2. Uracil. Answer: 2. hydrogen bonds. Adenine, Cytosine and Guanine are found in both RNA and DNA.

Protein Synthesis How Genes Become Constituent Molecules

Biotechnology: DNA Technology & Genomics

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Biology Chapter 7 Practice Test

Restriction Endonucleases

Protein Expression. A Practical Approach J. HIGGIN S

Lecture Series 7. From DNA to Protein. Genotype to Phenotype. Reading Assignments. A. Genes and the Synthesis of Polypeptides

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.

Localised Sex, Contingency and Mutator Genes. Bacterial Genetics as a Metaphor for Computing Systems

Cell Structure and Function. Eukaryotic Cell: Neuron

Bioinformatics I, WS 09-10, D. Huson, January 27,

investigation 3 Comparing DNA Sequences to

Control of Gene Expression

Protein Protein Interactions (PPI) APID (Agile Protein Interaction DataAnalyzer)

How To Understand How Gene Expression Is Regulated

A Genomic Timeline Tim Shank 2003

Trasposable elements: P elements

Compiled and/or written by Amy B. Vento and David R. Gillum

HCS Exercise 1 Dr. Jones Spring Recombinant DNA (Molecular Cloning) exercise:

The Steps. 1. Transcription. 2. Transferal. 3. Translation

Mitochondrial DNA Analysis

Biological Sciences Initiative. Human Genome

B2 1 Cells, Tissues and Organs

PRESTWICK ACADEMY NATIONAL 5 BIOLOGY CELL BIOLOGY SUMMARY

KEY CONCEPT Organisms can be classified based on physical similarities. binomial nomenclature

FINDING RELATION BETWEEN AGING AND

All About Cells Literacy Foundations Science: Biology

Video Links: Differences Between Plant and Animal Cells

Cells in Biology. Lesson 1.

Respiration occurs in the mitochondria in cells.

Chapter 13: Meiosis and Sexual Life Cycles

Biotechnology and Recombinant DNA (Chapter 9) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College

Genetics 301 Sample Final Examination Spring 2003

Chapter 11: Molecular Structure of DNA and RNA

The general structure of bacteria

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

RNA: Transcription and Processing

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

The microscope is an important tool.

G4120: Introduction to Computational Biology

13.4 Gene Regulation and Expression

Recombinant DNA and Biotechnology

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

Transcription:

How big are genomes? Genomes are now being sequenced at such a rapid rate that it is fair to say that it is becoming routine. As a result, there is a growing interest in trying to understand the meaning of the information that is encoded and stored in these genomes and to understand their differences and what they say about the evolution of life on Earth. Further, it is now even becoming possible to compare the genomes between different individuals of the same species which serves as a starting point for understanding the genetic contributions to variability. Naively, the first question one might ask in trying to take stock of the information content of genomes would be how large are they? Early thinking held that the genome size should be directly related to the meaningful information content in the genome. This was strikingly refuted by the similarity in the number of protein coding genes in genomes of very different sizes, one of the unexpected results of the accumulated weight of sequencing a number of different genomes from organisms far and wide. For example, Arabidopsis thaliana (a mustard plant), Caenorhabditis elegans (a nematode) and Drosophila melanogaster (a fruit fly) all have a very similar number of protein coding genes to that of human or mouse ( 20,000) even though their genomes vary in size by over 20 fold. As shown in Figure 1, the range of genome sizes runs from roughly the 0.5 Mbp of Mycoplasma genitalium to 670 Gbp for the enormous genome of Polychaos dubium (formerly Amoeba dubia), revealing more than a million-fold difference in genome size even for different microscopic organisms. Of course, viral genomes are another matter and their sizes are considerably smaller with many of the most feared RNA viruses having genomes that are less than 10 kb in length. Converting the length in base pairs to physical length if the DNA was linearly stretched can be done by noting that the distance between bases along the DNA strand is 0.3 nm (BNID 100667). For the human genome totaling 3Gbp, this results in about one meter. So in each of our body s cells one has to compress this one meter into a few micrometers, the size of the nucleus. Achieving this requires packing proteins such as histones and much dexterity in reading the stored information during transcription.

Figure 1 gives some examples of different genome sizes with the ambition of illustrating some of the useful and well known model organisms, some of the key outliers characterized by genomes that are either extraordinarily small or large and examples which are particularly exotic. One often finds contrasting values for genome sizes even for many of the model organisms that are already sequenced. The human genome can be quoted as consisting of 2.9 Gbp or 3.2 Gbp depending on the resource that is consulted. The reason for this uncertainty can be related to the methods of measurement, sequencing usually captures only the euchromatic regions whereas the repetitive regions consisting of the heterochromatin are still often not resolved when using sequencing methods. Older methods of measuring DNA in bulk refer to the genome size through the C-value, representing the amount of DNA and thus genome length without regard to its specific sequence. This difference in what is being measured leads to contrasting values even for the most highly studied genomes. Given the large range of genome sizes revealed in Figure 1, the next vignette now takes up the question of how many genes are present in these various genomes and whether there are any useful rules of thumb for predicting the gene number on the basis of genome size. Figure 1: Genome sizes of different organisms.

Table 1 Genomic census for a variety of selected organisms. The table features the genome size, current best estimate for number of protein coding genes and number of chromosomes. Notice how the number of genes in bacteria and archea is similar to the size of the genome in kbp reflecting the goodness of a simple estimate that each gene is coded by roughly 1000 bp and that the majority of the genome in these cases is devoted to protein coding. Genomes often also include extra-chromosomal elements such as plasmids that might not be indicated in the genome size and number of chromosomes. The number of genes is constantly under revision. The numbers given here reflect the number of protein coding genes. trna and non coding RNAs, many of them still to be discovered, are not accounted for. Bacterial strains often show significant variations in genome size and number of genes among strains. Organism Genome size (bp) Number of genes - Protein coding (total) Number of chromosomes Model Organisms Escherichia coli 4.6 Mbp 100269 4,225 105443 Budding yeast Saccharomyces cerevisiae Fission yeast Schizosaccharomyces pombe Amoeba Dictyostelium discoideum Diatom Thalassiosira pseudonana Bread mold Neurospora crassa Nematode Caenorhabditis elegans Fruit fly Drosophila melanogaster Thale cress Arabidopsis thaliana (4,606 100272 ) 12.1 Mbp 100459 5,616 105444 (6,606 100237 ) 1 100269 16 100459 ~13 Mbp 105369 4,824 105369 3 105369 ~34 Mbp 105513 ~12,500 105514 6 105513 34.5 Mbp 105369 11,242+144 chloroplast +40 mitochondrial genes 103246 39 Mbp 103246 10,082+498 RNA genes 103246 100 Mbp 101363 19,735 101364 120 Mbp 100199 Euchromatic Original estimate ~125 Mbp 105472 Measured by flow cytometry ~157 Mbp 104000 (22,901 100313 ) 24 105369 6 (n) 100294 autosomal 5 (n) 101369 13,601 100200 8 (2n) 100201 20,568 105446 (27,547 100473 ) 10 (2n) 100474 Moss Physcomitrella 511 Mbp 104729 27 105322 patens Zebrafish Danio rerio 1.2 Gbp 103246 15,761 103246 48 100597 Mouse Mus musculus Euchromatic ~2.5 20,210 100310 40 (2n) 100335

Human Homo sapiens Gbp 100305 Total 2.64 Gbp 100308 Euchromatic 2.88 Gbp 100396 Overall 3.08 Gbp 101484 21,701 100399 ; 19,042 105447 (Mapped genes 22,585 101640 ) 46 (2n) 100426 Viruses Viroids (nonencapsulated 0.25-0.4 Kb 105571 plant RNA parasites) Hepatitis d virus 1.7 Kb 105570 (smallest known RNA virus) HIV-1 9.7 Kbp 105769 9 105769 2 ssrna (2n) 105769 Influenza A 13.5 Kbp 105768 10-11 105767 8 ssrna 105767 Bacteriophage λ 48.5 Kbp 105770 66 ORFs 105770 1 dsdna 105770 Epstein-Barr virus ~172 Kbp 103246 80 103246 Acanthamoeba polyphaga Mimivirus (Largest known viral 1.18 Mbp 105142 Organelles Mitochondria - human 16.7 Kbp 105470 13 105470 (37) 105470 1 105470 Mitochondria - yeast 85.8 Kbp 105471 8 105471 1 105471 Chloroplast 154.5 Kbp 105918 104 105918 1 105918 Arabidopsis Bacteria Carsonella ruddii (Smallest genome of an endosymbiont bacteria) Mycoplasma genitalium (smallest genome of a free living 159.7 Kbp 100622 ~580 Kbp 105492 470 105493 1 105492

bacteria) Buchnera sp. 641 105757 610 105757 Heliobacter pylori 1.67 Mbp 105494 1,590 105494 1 105494 Haemophilus influenza (first free-living 1.83 Mbp 105491 organism sequenced) Cyanobacteria Synechococcus elongatus Methicillin resistant Staphylococcus aureus (MRSA) 2.8 Mbp 100527 2,991 100527 (3,041) 100527 1 and 2 plasmids 100527 2.87 Mbp 105499 2,699+12 on plasmids 105500 1 and 3 plasmids 105499 Deinococcus radiodurans ~3.3 Mbp 103246 3,187 103246 2 chromosomes and 2 plasmids 103246 Caulobacter crescentus 4.02 Mbp 105497 3,767 105498 1 105497 Bacillus subtilis ~4.2 Mbp 103246 4,779 103246 Sorangium cellulosum (Largest known bacterial Archaea Nanoarchaeum equitans (smallest parasitic archaeal Thermoplasma acidophilum (flourish in ph<1) Methanocaldococcus (Methanococcus) jannaschii (from ocean bottom hydrothermal vents; pressure >200 atm) 13 Mbp 104469 1 104469 ~490 Kbp 105503 552 105502 1 105503 1.56 Mbp 105915 1,478 105915 (1,478 105915 ) 1.66 Mbp 105501 1,682 (+ 56 on plasmids 105501 ) Pyrococcus horikoshii ~1.7 Mbp 103246 1,994 103246 Pyrococcus furiosus (optimum temp 100⁰C) 1 105915 1.91 Mbp 105916 2,065 105916 1 105916 1 and 2 plasmids 105501

Eukaryotes - unicellular Microsporidian Encephalitozoon cuniculi (smallest eukaryotic nuclear Ostreococcus tauri (smallest free living eukaryote) Plasmodium falciparum (chief cause of Malaria parasite) Polychaos dubium (Largest known genome size) 2.9 Mbp 105426 1,997 103246 11 105426 12.56 Mbp 101523 8,166 105490 20 105489 23 Mbp 103246 5,268 103246 14 102196 670 Gbp 104470 1 (with many copies) 104470 Eukaryotes - multicellular Placozoan Trichoplax adhaerens Pufferfish Fugu rubripes (Smallest known vertebrate Populus trichocarpa (first tree to have its genome sequenced) Sea urchin Strongylocentrotus purpuratus 98 Mbp 105516 11,514 105515 12 (2n) 105516 400 Mbp 100278 44 100278 485 Mbp 105322 19 105322 814 Mbp 105517 ~23,300 105518 Corn Zea mays 2.4 Gbp 105520 Estimated number of genes 42,000-56,000 105521 20 (2n) 105522 Dog Canis familiaris 2.4 Gbp 103246 19,300 103246 78 (2n) 100597 Chimpanzee Pan 3.7 Gbp 100597 48 (2n) 100597 troglodytes Wheat Triticum 16.8 Gbp 102713 estimations range 42 (2n=6x) 105917

aestivum 107,000-334,000 105448 Fritillaria assyrica 124.6 Gbp (largest known plant [calculated from C value] 102726 Marbled lungfish Protopterus aethiopicus (largest known animal 130 Gbp [calculated from C value] 100597