Codon usage bias is correlated with gene expression levels in the fission yeast Schizosaccharomyces pombe



Similar documents
( TUTORIAL. (July 2006)

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet

Hands on Simulation of Mutation

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.

The p53 MUTATION HANDBOOK

10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C)

Mutations and Genetic Variability. 1. What is occurring in the diagram below?

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?

Mutation. Mutation provides raw material to evolution. Different kinds of mutations have different effects

Chapter 9. Applications of probability. 9.1 The genetic code

DNA Sample preparation and Submission Guidelines

Molecular Facts and Figures

Table S1. Related to Figure 4

Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1

Supplementary Online Material for Morris et al. sirna-induced transcriptional gene

Gene Finding CMSC 423

DNA Bracelets

Next Generation Sequencing

Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation

Gene Synthesis 191. Mutagenesis 194. Gene Cloning 196. AccuGeneBlock Service 198. Gene Synthesis FAQs 201. User Protocol 204

pcas-guide System Validation in Genome Editing

Supplementary Information. Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human

Title : Parallel DNA Synthesis : Two PCR product from one DNA template

Problem Set 3 KEY

Drosophila NK-homeobox genes

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

Molecular analyses of EGFR: mutation and amplification detection

SERVICES CATALOGUE WITH SUBMISSION GUIDELINES

Introduction to Genome Annotation

Inverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project

ANALYSIS OF A CIRCULAR CODE MODEL

Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.)

Module 6: Digital DNA

Cloning, sequencing, and expression of H.a. YNRI and H.a. YNII, encoding nitrate and nitrite reductases in the yeast Hansenula anomala

pcmv6-neo Vector Application Guide Contents

Introduction to Bioinformatics (Master ChemoInformatique)

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Marine Biology DEC 2004; 146(1) : Copyright 2004 Springer

and revertant strains. The present paper demonstrates that the yeast gene for subunit II can also be translated to yield a polypeptide

How To Clone Into Pcdna 3.1/V5-His

Supplemental Data. Short Article. PPARγ Activation Primes Human Monocytes. into Alternative M2 Macrophages. with Anti-inflammatory Properties

Gene and Chromosome Mutation Worksheet (reference pgs in Modern Biology textbook)

BD BaculoGold Baculovirus Expression System Innovative Solutions for Proteomics

The making of The Genoma Music

Y-chromosome haplotype distribution in Han Chinese populations and modern human origin in East Asians

All commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI_ NotI

Mutation of the SPSl-encoded protein kinase of Saccharomyces cerevisiae leads to defects in transcription and morphology during spore formation

A Web Based Software for Synonymous Codon Usage Indices

Hiding Data in DNA. 1 Introduction

The DNA-"Wave Biocomputer"

An Introduction to Bioinformatics Algorithms Gene Prediction


Insulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus

DNA Sequencing of the eta Gene Coding for Staphylococcal Exfoliative Toxin Serotype A

Molecular chaperones involved in preprotein. targeting to plant organelles

The nucleotide sequence of the gene for human protein C

Chlamydomonas adapted Green Fluorescent Protein (CrGFP)

ANALYSIS OF GROWTH HORMONE IN TENCH (TINCA TINCA) ANALÝZA RŮSTOVÉHO HORMONU LÍNA OBECNÉHO (TINCA TINCA)

Biological One-way Functions

Archimer

Protein Synthesis Simulation

Protein Synthesis. Page 41 Page 44 Page 47 Page 42 Page 45 Page 48 Page 43 Page 46 Page 49. Page 41. DNA RNA Protein. Vocabulary

TITRATION OF raav (VG) USING QUANTITATIVE REAL TIME PCR

cdna sequence and expression pattern of the putative

were demonstrated to be, respectively, the catalytic and regulatory subunits of protein phosphatase 2A (PP2A) (29).

expressed histone genes have intervening sequences and encode polyadenylylated mrnas

Transmembrane Signaling in Chimeras of the E. coli Chemotaxis Receptors and Bacterial Class III Adenylyl Cyclases

Biopython Tutorial and Cookbook

inhibition of mitosis

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

An Overview of Cells and Cell Research

PRACTICE TEST QUESTIONS

Five-minute cloning of Taq polymerase-amplified PCR products

Event-specific Method for the Quantification of Maize MIR162 Using Real-time PCR. Protocol

Heraeus Sepatech, Kendro Laboratory Products GmbH, Berlin. Becton Dickinson,Heidelberg. Biozym, Hessisch Oldendorf. Eppendorf, Hamburg

Basic Concepts of DNA, Proteins, Genes and Genomes

Distribution of the DNA transposon family, Pokey in the Daphnia pulex species complex

NimbleGen SeqCap EZ Library SR User s Guide Version 3.0

pentr Directional TOPO Cloning Kits

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

C YTOPLASMIC organelles proliferate by the growth and

Name: Date: Problem How do amino acid sequences provide evidence for evolution? Procedure Part A: Comparing Amino Acid Sequences

Nucleotide sequence and the encoded amino acids of human serum

Production and Characterization of a Murine/Human Chimeric Anti-Idiotype Antibody That Mimics Ganglioside1

Name Class Date. Figure Which nucleotide in Figure 13 1 indicates the nucleic acid above is RNA? a. uracil c. cytosine b. guanine d.

DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108

The Puzzle of Life A Lesson Plan for Life S cien ce Teach ers From: The G reat Lakes S cien ce C ent er, C lev elan d, OH

Association of IGF1 and IGFBP3 polymorphisms with colorectal polyps and colorectal cancer risk

From DNA to Protein

Bio 102 Practice Problems Recombinant DNA and Biotechnology

N-terminal Regulatory Domains of Phosphodiesterases 1, 4, 5 and 10 examined with an Adenylyl Cyclase as a Reporter

pbad/his A, B, and C pbad/myc-his A, B, and C

Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype

Metabolic Engineering of Escherichia coli for Enhanced Production of Succinic Acid, Based on Genome Comparison and In Silico Gene Knockout Simulation

Academic Nucleic Acids and Protein Synthesis Test

Impaired insulin and insulin-like growth factor expression and signaling mechanisms in Alzheimer s disease is this type 3 diabetes?

Transcription:

Codon usage bias is correlated with gene expression levels Blackwell Y Hiraoka usage Publishing et al. bias in fission Inc yeast in the fission yeast Schizosaccharomyces pombe Yasushi Hiraoka 1,2,3, *, Kenichi Kawamata 1, Tokuko Haraguchi 1,2 and Yuji Chikashige 1,2 1 Department of Biology, Graduate School of Science, Osaka University, 1-1 Machikaneyama, Toyonaka, 560-0043, Japan 2 Kobe Advanced ICT Research Center, National Institute of Information and Communications Technology, 588-2 Iwaoka, Iwaoka-cho, Nishi-ku, Kobe 651-2492, Japan 3 Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamadaoka, Suita, 565-0871, Japan Usage of synonymous codons represents a characteristic pattern of preference in each organism. It has been inferred that such bias of codon usage has evolved as a result of adaptation for efficient synthesis of. Here we examined synonymous codon usage in genes of the fission yeast Schizosaccharomyces pombe, and compared codon usage bias with expression levels of the gene. In this organism, synonymous codon usage bias was correlated with expression levels of the gene; the bias was most obvious in two-codon amino acids. A similar pattern of the codon usage bias was also observed in Saccharomyces cerevisiae, Arabidopsis thaliana and Caenorhabditis elegans, but was not obvious in Oryza sativa, Drosophila melanogaster, Takifugu rubripes and Homo sapiens. As codons of the highly expressed genes have greater influence on translational efficiency than codons of genes expressed at lower levels, it is likely that codon usage in the S. pombe genome has been optimized by translational selection through evolution. Introduction In protein synthesis, a triplet codon of mrna is translated to a respective amino acid. In 18 out of 20 amino acids (exceptions are Met and Trp), multiple codons are assigned for one amino acid in their translation. Such synonymous codons are not used equally, but are instead biased. One of the models explaining codon usage bias relies on selection for translational efficiency. It has been shown in some species that codon usage preference is correlated with the abundance of the respective trna (Ikemura 1985; Andersson & Kurland 1990; Sharp et al. 1993; Akashi 2001; Kanaya et al. 2001) which is expected to affect translational efficiency of the gene product (Akashi 2003). Although the question of codon usage bias has been extensively studied in many organisms, including humans as well as human infectious viruses (e.g. Plotkin & Dushoff 2003; Meintjes & Rodrigo 2005; Ren et al. 2007; van Hemert et al. 2007), unicellular organisms can provide a simpler model system for examining whether codon usage bias is related to cellular metabolism. In Escherichia coli, clear evidence has been obtained for correlation Communicated by: Masayuki Yamamoto (The University of Tokyo) *Correspondence: hiraoka@fbs.osaka-u.ac.jp between codon usage preference and trna abundance (Ikemura 1981a,b). Also, in the budding yeast Saccharomyces cerevisiae, it has been shown that codon usage bias is correlated with mrna copy numbers of the gene (Sharp & Cowe 1991; Akashi 2003). Translational efficiency can be directly related to growth rates, and thus it is expected that codon usage has been optimized for growth through evolution in such situations. Here we report codon usage in another unicellular eukaryote, the fission yeast Schizosaccharomyces pombe, in which the entire genome sequence has been determined (Wood et al. 2002). The usage of synonymous codons has been reported previously for three highly expressed genes and another 153 genes that are expressed at a lower level (Forsburg 1994). We examined codon usage for genes in the S. pombe genome, and found significant difference in codon usage bias with respect to expression levels of the gene. Thus, we propose that codon usage of the gene can predict its expression levels in S. pombe. Results Codon usage profiles in Schizosaccharomyces pombe Expression levels of 4932 open reading frames (ORFs) in the S. pombe genome were determined based on DNA DOI: 10.1111/j.1365-2443.2009.01284.x 499

Y Hiraoka et al. Figure 1 Histogram of mrna copy numbers. The majority of genes showed low levels of expression (A) whereas ribosomal were highly expressed (B). Thus, average codon usage is representative of with low expression levels. microarray analysis (Fig. 1A). Among these ORFs, genes for ribosomal showed high levels of expression (Fig. 1B). We first examined the usage of synonymous codons in the ribosomal protein genes, and found a significant difference compared with the genome average (Table 1). The difference can be seen clearly in twocodon amino acids (Lys, Asn, Phe, His, Glu, Cys, Tyr, Asp and Gln) where the usage bias of two codons for ribosomal was opposite to that of total (Fig. 2). In contrast, codon usage in meiotic was similar to that in total (Table 1; Fig. 2). The use of codons with G or C at the third position approximately corresponds to the GC content in the S. pombe genome (38%) in total and in meiotic, but significantly higher in ribosomal, with an exception of Gln (Fig. 2). Figure 2 Codon usage in two-codon amino acids. Occurrence of G or C nucleotides at the third position of the codon was approximately equal to the average GC content in the S. pombe genome (38%; indicated by the broken line). This was also the case in meiotic. In contrast, the G/C frequency was significantly higher (except for Gln) in ribosomal. Codon usage and gene expression in Schizosaccharomyces pombe To examine whether the codon usage bias is related to levels of gene expression or not, we compared codon 500 Genes to Cells (2009) 14, 499 509 2009 The Authors

Codon usage bias in fission yeast Table 1 Codon usage Codon Total Meiotic Ribosomal Codon Total Meiotic Ribosomal GGG (Gly) 0.09 0.15 0 ACT (Thr) 0.42 0.43 0.54 GGA (Gly) 0.32 0.37 0.16 ACC (Thr) 0.2 0.15 0.39 GGT (Gly) 0.43 0.32 0.66 TGG (Trp) 1 1 1 GGC (Gly) 0.17 0.17 0.18 TGA (End) 0.2 0.22 0.01 GAG (Glu) 0.32 0.28 0.57 TAG (End) 0.21 0.22 0.22 GAA (Glu) 0.68 0.72 0.43 TAA (End) 0.59 0.56 0.77 GAT (Asp) 0.71 0.71 0.56 TGT (Cys) 0.62 0.57 0.32 GAC (Asp) 0.29 0.29 0.44 TGC (Cys) 0.38 0.43 0.68 GTG (Val) 0.14 0.15 0.04 TAT (Tyr) 0.65 0.67 0.42 GTA (Val) 0.2 0.21 0.04 TAC (Tyr) 0.35 0.33 0.58 GTT (Val) 0.48 0.47 0.57 TTG (Leu) 0.25 0.22 0.37 GTC (Val) 0.18 0.17 0.35 TTA (Leu) 0.27 0.3 0.07 GCG (Ala) 0.09 0.11 0.01 CTG (Leu) 0.07 0.09 0.02 GCA (Ala) 0.25 0.32 0.07 CTA (Leu) 0.09 0.1 0.01 GCT (Ala) 0.48 0.4 0.63 CTT (Leu) 0.26 0.23 0.37 GCC (Ala) 0.18 0.17 0.3 CTC (Leu) 0.07 0.07 0.17 AGG (Arg) 0.12 0.17 0.01 TTT (Phe) 0.71 0.72 0.35 AGA (Arg) 0.27 0.27 0.09 TTC (Phe) 0.29 0.28 0.65 CGG (Arg) 0.07 0.07 0 TCG (Ser) 0.09 0.08 0.03 CGA (Arg) 0.02 0.18 0.01 TCA (Ser) 0.2 0.24 0.06 CGT (Arg) 0.37 0.18 0.69 TCT (Ser) 0.33 0.3 0.51 CGC (Arg) 0.14 0.13 0.19 TCC (Ser) 0.13 0.11 0.23 AAG (Lys) 0.38 0.34 0.82 AGT (Ser) 0.16 0.16 0.07 AAA (Lys) 0.62 0.66 0.18 AGC (Ser) 0.1 0.1 0.1 AAT (Asn) 0.66 0.68 0.28 CAG (Gln) 0.28 0.27 0.16 AAC (Asn) 0.34 0.32 0.72 CAA (Gln) 0.72 0.73 0.84 ATG (Met) 1 1 1 CAT (His) 0.72 0.75 0.44 ATA (Ile) 0.22 0.27 0.01 CAC (His) 0.28 0.25 0.56 ATT (Ile) 0.57 0.53 0.55 CCG (Pro) 0.1 0.14 0.01 ATC (Ile) 0.21 0.2 0.43 CCA (Pro) 0.27 0.26 0.05 ACG (Thr) 0.12 0.12 0.02 CCT (Pro) 0.46 0.41 0.6 ACA (Thr) 0.26 0.3 0.04 CCC (Pro) 0.17 0.18 0.34 usage in selected genes at seven levels of expression. We found that the codon usage varied according to level of gene expression (Table 2, Figs 3 5), and that codon usage in ribosomal was characteristic of that in highly-expressed genes. Correlation of codon usage with gene expression was particularly high in the two-codon amino acids (Fig. 3). Similar tendencies for correlation were also observed in a three-codon amino acid (Ile) and four-codon amino acids (Gly, Val, Ala, Thr and Pro) (Fig. 4). In Ile, the ATC codon showed codon usage increasing as a function of expression level whereas the ATT codon was used most frequently at all levels of gene expression. In Val, Ala, Thr and Pro, one of the four codons was most frequently used at all levels of gene expression although there was a tendency for a slight increase as a function of expression. One of the other three codons showed a significant increase in codon usage as a function of the level of gene expression. In contrast, in Gly, the most frequently used codon, GGT, showed a significant increase as a function of the expression. In six-codon amino acids, increasing usage as a function of gene expression was seen in codons CGT (Arg), TCG and TCC (Ser), and TTG, CTT and CTC (Leu) (Table 2). In these cases, six codons can be divided into two-codon and four-codon components depending on the first nucleotide of the codon. The usage of two-codon components showed a pattern similar to that in the two-codon amino acids (Fig. 5). Among four-codon components (Fig. 5), Ser and Leu showed a pattern similar to that in Val, Ala, Thr and Pro, in which one of the four codons was most frequently used at all levels of 501

Y Hiraoka et al. Table 2 Codon usage and gene expression levels mrna copy number per cell mrna copy number per cell Codon 4 8 16 32 64 128 256 Codon 4 8 16 32 64 128 256 GGG (Gly) 0.09 0.12 0.06 0.05 0.03 0.00 0.00 ACT (Thr) 0.47 0.43 0.45 0.45 0.51 0.55 0.51 GGA (Gly) 0.35 0.33 0.26 0.29 0.20 0.13 0.12 ACC (Thr) 0.17 0.19 0.19 0.23 0.30 0.38 0.45 GGT (Gly) 0.34 0.36 0.48 0.48 0.60 0.73 0.73 TGG (Trp) 1.00 1.00 1.00 1.00 1.00 1.00 1.00 GGC (Gly) 0.22 0.18 0.20 0.18 0.17 0.13 0.15 TGA (End) 0.25 0.20 0.20 0.15 0.00 0.00 0.03 GAG (Glu) 0.28 0.30 0.34 0.37 0.47 0.51 0.65 TAG (End) 0.10 0.05 0.20 0.30 0.20 0.31 0.13 GAA (Glu) 0.72 0.70 0.66 0.63 0.53 0.49 0.35 TAA (End) 0.65 0.75 0.60 0.55 0.80 0.69 0.83 GAT (Asp) 0.70 0.77 0.68 0.73 0.65 0.58 0.48 TGT (Cys) 0.61 0.68 0.61 0.68 0.51 0.38 0.39 GAC (Asp) 0.30 0.23 0.32 0.27 0.35 0.42 0.52 TGC (Cys) 0.39 0.32 0.39 0.32 0.49 0.62 0.61 GTG (Val) 0.16 0.12 0.15 0.14 0.07 0.04 0.02 TAT (Tyr) 0.64 0.68 0.67 0.64 0.52 0.48 0.30 GTA (Val) 0.18 0.21 0.21 0.20 0.10 0.04 0.02 TAC (Tyr) 0.36 0.32 0.33 0.36 0.48 0.52 0.70 GTT (Val) 0.48 0.53 0.45 0.46 0.57 0.60 0.53 TTG (Leu) 0.23 0.27 0.24 0.26 0.31 0.37 0.38 GTC (Val) 0.18 0.14 0.19 0.20 0.26 0.32 0.42 TTA (Leu) 0.29 0.27 0.26 0.27 0.17 0.08 0.06 GCG (Ala) 0.10 0.10 0.08 0.08 0.04 0.01 0.00 CTG (Leu) 0.07 0.06 0.06 0.04 0.04 0.02 0.00 GCA (Ala) 0.30 0.25 0.23 0.23 0.13 0.06 0.05 CTA (Leu) 0.11 0.09 0.09 0.06 0.04 0.01 0.01 GCT (Ala) 0.43 0.48 0.50 0.50 0.58 0.63 0.61 CTT (Leu) 0.24 0.25 0.28 0.29 0.36 0.36 0.36 GCC (Ala) 0.17 0.17 0.20 0.19 0.25 0.30 0.34 CTC (Leu) 0.07 0.06 0.07 0.08 0.08 0.16 0.19 AGG (Arg) 0.13 0.14 0.08 0.07 0.04 0.01 0.00 TTT (Phe) 0.72 0.76 0.69 0.66 0.58 0.42 0.26 AGA (Arg) 0.26 0.22 0.22 0.19 0.10 0.08 0.08 TTC (Phe) 0.28 0.24 0.31 0.34 0.42 0.58 0.74 CGG (Arg) 0.05 0.09 0.06 0.05 0.01 0.01 0.00 TCG (Ser) 0.10 0.09 0.10 0.07 0.04 0.02 0.01 CGA (Arg) 0.16 0.16 0.13 0.11 0.07 0.02 0.00 TCA (Ser) 0.23 0.18 0.18 0.17 0.09 0.04 0.02 CGT (Arg) 0.27 0.27 0.36 0.42 0.58 0.72 0.75 TCT (Ser) 0.32 0.32 0.32 0.36 0.39 0.52 0.49 CGC (Arg) 0.12 0.12 0.14 0.15 0.21 0.17 0.16 TCC (Ser) 0.14 0.12 0.15 0.14 0.20 0.24 0.31 AAG (Lys) 0.35 0.35 0.39 0.42 0.58 0.78 0.90 AGT (Ser) 0.14 0.18 0.16 0.16 0.14 0.07 0.04 AAA (Lys) 0.65 0.65 0.61 0.58 0.42 0.22 0.10 AGC (Ser) 0.08 0.10 0.09 0.10 0.14 0.10 0.13 AAT (Asn) 0.69 0.68 0.68 0.63 0.49 0.31 0.22 CAG (Gln) 0.30 0.30 0.30 0.33 0.22 0.14 0.10 AAC (Asn) 0.31 0.32 0.32 0.37 0.51 0.69 0.78 CAA (Gln) 0.70 0.70 0.70 0.67 0.78 0.86 0.90 ATG (Met) 1.00 1.00 1.00 1.00 1.00 1.00 1.00 CAT (His) 0.69 0.75 0.69 0.70 0.69 0.47 0.29 ATA (Ile) 0.24 0.24 0.18 0.14 0.09 0.01 0.00 CAC (His) 0.31 0.25 0.31 0.30 0.31 0.53 0.71 ATT (Ile) 0.56 0.57 0.60 0.62 0.64 0.59 0.59 CCG (Pro) 0.11 0.14 0.10 0.07 0.03 0.02 0.00 ATC (Ile) 0.20 0.19 0.23 0.24 0.27 0.39 0.40 CCA (Pro) 0.28 0.29 0.26 0.21 0.12 0.07 0.02 ACG (Thr) 0.13 0.13 0.11 0.12 0.05 0.02 0.01 CCT (Pro) 0.48 0.40 0.48 0.48 0.54 0.59 0.56 ACA (Thr) 0.23 0.25 0.25 0.20 0.14 0.04 0.03 CCC (Pro) 0.13 0.16 0.16 0.24 0.31 0.33 0.42 gene expression, and another codon showed an increase as a function of expression. In contrast, four-codon components in Arg showed a codon usage pattern similar to that in Gly, in which only one of the four codons increased as a function of expression. Taken together, these results showed that the usage of synonymous codons is correlated with expression levels of the gene in all amino acids in S. pombe. Codon usage profiles in other organisms The codon usage of ribosomal protein genes was compared to the genomic average in Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, Oryza sativa, Drosophila melanogaster, Takifugu rubripes and Homo sapiens (Supporting Table S1). Codon usage of the ribosomal in Saccharomyces cerevisiae, A. thaliana and C. elegans was significantly different to the genomic average, which is especially clear in two-codon amino acids. In contrast, no significant difference in codon usage was observed between the ribosomal protein genes and the genomic average in O. sativa, D. melanogaster, T. rubripes and H. sapiens (Fig. 6). Codon usage of two-codon amino acids in ribosomal was plotted against that in total. Plots were lined along the orthogonal line for O. sativa, D. melanogaster, T. rubripes and H. sapiens (Fig. 6A), indicating that these two groups of have a similar usage of codons. In contrast, in Saccharomyces cerevisiae, A. thaliana and C. elegans, plots showed a scattered distribution from the orthogonal line (Fig. 6B), 502 Genes to Cells (2009) 14, 499 509 2009 The Authors

Codon usage bias in fission yeast Figure 3 Codon usage as a function of expression levels in two-codon amino acids. indicating that codon usage of ribosomal differs from that of total as in S. pombe. In contrast, a recent report shows that synonymous codon usage bias is only moderately correlated with expression levels of the gene in A. thaliana (dos Reis & Wernisch 2009). This apparent discrepancy may be because our results are based on only ribosomal protein genes. We examined ribosomal protein genes as representatives of highly-expressed genes to avoid possible complexity associated with tissuespecific expression levels in multi-cellular organisms. Discussion We demonstrated that codon usage is correlated with expression levels of the genes in S. pombe. In S. pombe, the majority of genes express less than 32 copies of mrna; only a small number of the genes (12%) express more than 32 copies of mrna, but the total number of mrna copies expressed from these genes occupies 50% of mrna in the cell (Fig. 1A). Thus, the S. pombe genes can be divided into two groups that produce one half of 503

Y Hiraoka et al. Figure 4 Codon usage as a function of expression levels in three- and four-codon amino acids. Figure 5 Codon usage as a function of expression levels in six-codon amino acids. Usage patterns of two-codon and four-codon components are shown separately in the upper and lower panels. 504 Genes to Cells (2009) 14, 499 509 2009 The Authors

Codon usage bias in fission yeast Figure 6 Codon usage profiles in ribosomal vs. total. Codon usage of two-codon amino acids (vertical axis) is plotted against that of total (horizontal axis). If their codon usage biases were close to each other, the plots were lined up along the orthogonal line. 505

Y Hiraoka et al. mrna copies in the cell: 12% of genes with high levels of expression (above 32 copies of mrna) and the rest of genes with low levels of expression (below 32 copies of mrna). Interestingly, codon usage preference switches at an expression level above and below 32 copies of mrna. An implication of this result is that major codons are allocated for the group of genes with high expression levels to allow efficient translation of the mrna, and that genes with low expression levels use minor codons to avoid competition with highly expressed genes. This usage allocation of codons may optimize overall translational efficiency to provide the best growth rate in S. pombe, and thus such codon usage bias may have been selected through evolution. From these results, we formulated a set of coefficients for each codon to determine a type of codon usage of a gene of interest (Table 3). This formula calculates codon usage score which provides a criterion for judgment of two types of codon usage: high or low expression type (above or below 32 copies of mrna, respectively) (see Experimental procedures). It should be mentioned that the formula does not predict expression levels of the gene, but instead predicts the types of codon usage. Examples for two S. pombe genes, cdc2 and act1, are shown in Table 3. The positive and negative values represent high- and low-expression types of codon usage, respectively. This formula may also be used to judge a type of codon usage of a foreign gene introduced to S. pombe cells. There are caveats to the analysis of data from host cells that have been transfected with a foreign gene. Examples for two versions of GFP are shown in Table 3. Aequorea victoria GFP has codon usage similar to that in low expression genes in S. pombe whereas humanized EGFP has codon usage of the high-expression type. Introduction of an extremely biased codon usage may affect growth rates through translational efficiency. High translational efficiency of foreign genes may not always be appreciated in the host cell as it may compete with synthesis of ribosomal. In choosing versions of GFP, translational efficiency might be optimized by matching the codon usage type with the target gene of fusion. Mutations causing amino acid substitution can directly affect functions of the protein, and thus have been effectively selected through evolution. In contrast, mutations only changing codon usage may be less effectively selected through evolution. However, in unicellular organisms such as yeasts, translational efficiency can be directly related to growth rates, and thus optimization of codon usage is expected to occur as a consequence of evolutional pressures. Experimental procedures Databases for codon usage and ribosomal Codon usages in the genome of the species examined were obtained from the web site Codon Usage Database, http:// Table 3 Prediction of the types of codon usage Codon Expression coefficient (Ec) Cdc2 Act1 GFP-S65T EGFP Usage Σ Ec Usage Σ Ec Usage Σ Ec Usage Σ Ec GGG (Gly) 1.758 4 7.032 0 0.000 3 5.274 3 5.274 GGA (Gly) 6.227 6 37.362 5 31.135 12 74.724 0 0.000 GGT (Gly) 9.158 5 45.790 24 219.792 9 82.422 0 0.000 GGC (Gly) 1.099 2 2.198 1 1.099 4 4.396 19 20.881 GAG (Glu) 10.256 7 71.792 18 184.608 3 30.768 15 153.840 GAA (Glu) 10.256 16 164.096 11 112.816 13 133.328 1 10.256 GAT (Asp) 9.158 11 100.738 15 137.37 12 109.896 2 18.316 GAC (Asp) 9.158 7 64.106 5 45.790 7 64.106 16 146.528 GTG (Val) 4.396 3 13.188 0 0.000 0 0.000 13 57.148 GTA (Val) 6.593 3 19.779 2 13.186 3 19.779 1 6.593 GTT (Val) 4.029 12 48.348 14 56.406 8 32.232 0 0.000 GTC (Val) 5.495 4 21.980 6 32.970 7 38.465 4 21.980 GCG (Ala) 2.857 1 2.857 1 2.857 1 2.857 0 0.000 GCA (Ala) 6.593 3 19.779 2 13.186 4 26.372 0 0.000 GCT (Ala) 4.029 7 28.203 18 72.522 5 20.145 0 0.000 GCC (Ala) 5.495 3 16.485 10 54.950 2 10.990 8 43.960 Continued overleaf 506 Genes to Cells (2009) 14, 499 509 2009 The Authors

Codon usage bias in fission yeast Table 3 Continued Codon Expression coefficient (Ec) Cdc2 Act1 GFP-S65T EGFP Usage Σ Ec Usage Σ Ec Usage Σ Ec Usage Σ Ec AGG (Arg) 2.381 1 2.381 0 0.000 0 0.000 0 0.000 AGA (Arg) 4.029 5 20.145 2 8.058 7 28.203 0 0.000 AGT (Ser) 4.396 3 13.188 0 0.000 2 8.792 0 0.000 AGC (Ser) 1.099 2 2.198 3 3.297 1 1.099 7 7.693 AAG (Lys) 17.582 3 52.746 12 210.984 6 105.492 19 334.058 AAA (Lys) 17.582 16 281.312 7 123.074 14 246.148 1 17.582 AAT (Asn) 15.018 10 150.180 2 30.036 6 90.108 0 0.000 AAC (Asn) 15.018 3 45.054 9 135.162 8 120.144 13 195.234 ATG (Met) 0.000 7 0.000 15 0.000 6 0.000 6 0.000 ATA (Ile) 5.092 1 5.092 0 0.000 1 5.092 0 0.000 ATT (Ile) 1.099 16 17.584 20 21.980 8 8.792 0 0.000 ATC (Ile) 5.861 2 11.722 9 52.749 3 17.583 12 70.332 ACG (Thr) 4.029 2 8.058 0 0.000 1 4.029 0 0.000 ACA (Thr) 6.227 2 12.454 0 0.000 6 37.362 0 0.000 ACT (Thr) 2.198 4 8.792 13 28.574 7 15.386 1 2.198 ACC (Thr) 8.059 3 24.177 6 48.354 4 32.236 15 120.885 TGG (Trp) 0.000 4 0.000 4 0.000 1 0.000 1 0.000 TGA (End) 0.000 0.000 0.000 0.000 0.000 TGT (Cys) 10.623 2 21.246 1 10.623 0 0.000 0 0.000 TGC (Cys) 10.623 1 10.623 4 42.492 2 21.246 2 21.246 TAG (End) 0.000 0.000 0.000 0.000 0.000 TAA (End) 0.000 0.000 0.000 0.000 0.000 TAT (Tyr) 12.454 9 112.086 8 99.632 4 49.816 1 12.454 TAC (Tyr) 12.454 3 37.362 9 112.086 7 87.178 10 124.540 TTG (Leu) 4.396 13 57.148 13 57.148 1 4.396 0 0.000 TTA (Leu) 7.692 8 61.536 0 0.000 3 23.076 0 0.000 TTT (Phe) 14.652 12 175.824 3 43.956 9 131.868 0 0.000 TTC (Phe) 14.652 0 0.000 9 131.868 4 58.608 12 175.824 TCG (Ser) 2.198 1 2.198 1 2.198 2 4.396 0 0.000 TCA (Ser) 5.495 6 32.97 1 5.495 2 10.99 0 0.000 TCT (Ser) 4.762 4 19.048 12 57.144 5 23.810 0 0.000 TCC (Ser) 6.227 2 12.454 7 43.589 3 18.681 3 18.681 CGG (Arg) 1.648 1 1.648 0 0.000 1 1.648 0 0.000 CGA (Arg) 3.846 6 23.076 0 0.000 1 3.846 0 0.000 CGT (Arg) 12.088 4 48.352 16 193.408 1 12.088 0 0.000 CGC (Arg) 0.366 2 0.732 0 0.000 0 0.000 6 2.196 CAG (Gln) 8.425 1 8.425 1 8.425 1 8.425 8 67.400 CAA (Gln) 8.425 7 58.975 13 109.525 7 58.975 0 0.000 CAT (His) 15.018 9 135.162 4 60.072 5 75.090 0 0.000 CAC (His) 15.018 0 0.000 5 75.090 5 75.090 9 135.162 CTG (Leu) 1.282 0 0.000 0 0.000 1 1.282 18 23.076 CTA (Leu) 1.832 2 3.664 0 0.000 3 5.496 0 0.000 CTT (Leu) 2.564 10 25.640 13 33.332 11 28.204 0 0.000 CTC (Leu) 4.029 1 4.029 2 8.058 1 4.029 3 12.087 CCG (Pro) 2.527 1 2.527 0 0.000 1 2.527 0 0.000 CCA (Pro) 6.960 3 20.88 0 0.000 6 41.76 0 0.000 CCT (Pro) 2.930 6 17.580 12 35.160 2 5.860 0 0.000 CCC (Pro) 6.593 2 13.186 7 46.151 2 13.186 10 65.930 Codon usage score 6.479 10.044 0.693 5.914 Codon usage type prediction Low (100%) High (100%) Low (75%) High (100%) 507

Y Hiraoka et al. www.kazusa.or.jp/codon/ (Nakamura et al. 2000). Genes for ribosomal were obtained from the web site Ribosomal Protein Gene Database, http://ribosome.med.miyazaki-u.ac.jp/ (Nakao et al. 2004). Analysis of codon usage and gene expression in Schizosaccharomyces pombe Levels of gene expression in S. pombe were determined using a DNA microarray containing 4932 ORFs (Chikashige et al. 2007). Relative amounts of mrna for each ORF were measured by DNA microarray using genomic DNA as a reference, and the copy number of mrna was calculated using an estimate of the total mrna number in the cell as 100 000 copies. The entire DNA microarray data were deposited in gene expression omnibus (GEO, <http://www.ncbi.nlm.nih.gov/geo/index.cgi>; accession number GSE13554). Expression levels of the ORFs are available through the web site: <http://www2.nict.go.jp/w/w103/w131103/ CellMagic/index.html>. Thirty ORFs with the highest expression levels express an average of 256 copies of mrna, and contain a total of 7326 codons. Other groups of ORFs that express an average of 128, 64, 32, 16, 8 and 4 copies of mrna were selected such that each group contained at least 7000 codons or 20 ORFs (7263 codons of 34 ORFs, 7291 codons of 20 ORFs, 7234 codons of 20 ORFs, 12 182 codons of 20 ORFs, 10 964 codons of 20 ORFs, and 10 075 codons of 20 ORFs, respectively). For S. pombe ribosomal, 141 ORFs containing 233 320 codons were selected. For S. pombe meiotic, 18 ORFs containing 9039 codons were selected. These genes selected are listed in Supporting Table S2. Analysis of codon usage in other organisms Genes for ribosomal were obtained in Ribosomal Protein Gene Database, <http://ribosome.med.miyazakiu.ac.jp/>, and codon usage of these genes was examined by the Countcodon program in Codon Usage Database, <http:// www.kazusa.or.jp/codon/>. Formulation for codon usage score Codon usage score (S) was defined as an average of expression coefficients over all codons contained in the gene of interest: N Ec() r r S = = 1 N N is the number of codons in the gene, Ec is an expression coefficient calculated for each triplet codon as listed in Table 3, and the suffix r represents the array number of triplet codons from 1 to N. The expression coefficients were obtained by subtracting the codon usage of 32 copies of mrna from that of 256 copies of mrna for each triplet codon, and were normalized to make the SD 32 value equal to unity where the SD 32 is a standard deviation of the S-value for the group of genes expressing 32 copies of mrna. Positive and negative signs of codon usage score indicate the high-expression type (above 32 copies) and low-expression type (below 32 copies) of codon usage, respectively; its absolute value indicates a fold multiple of SD 32, which provides a measure for the likelihood of the prediction. Acknowledgements We thank Chihiro Tsutsumi for her technical support in DNA microarray analysis. This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) to YH, TH and CY. References Akashi, H. (2001) Gene expression and molecular evolution. Curr. Opin. Genet. Dev. 11, 660 666. Akashi, H. (2003) Translational selection and yeast proteome evolution. Genetics 164, 1291 1303. Andersson, S.G. & Kurland, C.G. (1990) Codon preferences in free-living microorganisms. Microbiol. Rev. 54, 198 210. Chikashige, Y., Tsutsumi, C., Okamasa, K., Yamane, M., Nakayama, J., Niwa, O., Haraguchi, T. & Hiraoka, Y. (2007) Gene expression and distribution of Swi6 in partial aneuploids of the fission yeast Schizosaccharomyces pombe. Cell Struct. Funct. 32, 149 161. Forsburg, S.L. (1994) Codon usage table for Schizosaccharomyces pombe. Yeast 10, 1045 1047. van Hemert, F.J., Berkhout, B. & Lukashov, V.V. (2007) Host-related nucleotide composition and codon usage as driving forces in the recent evolution of the Astroviridae. Virology 361, 447 454. Ikemura, T. (1981a) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 146, 1 21. Ikemura, T. (1981b) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J. Mol. Biol. 151, 389 409. Ikemura, T. (1985) Codon usage and trna content in unicellular and multicellular organisms. Mol. Biol. Evol. 2, 13 34. Kanaya, S., Yamada, Y., Kinouchi, M., Kudo, Y. & Ikemura, T. (2001) Codon usage and trna genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J. Mol. Evol. 53, 290 298. Meintjes, P.L. & Rodrigo, A.G. (2005) Evolution of relative synonymous codon usage in Human Immunodeficiency Virus type-1. J. Bioinform. Comput. Biol. 3, 157 168. Nakamura, Y., Gojobori, T. & Ikemura, T. (2000) Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 28, 292. Nakao, A., Yoshihama, M. & Kenmochi, N. (2004) RPG: the Ribosomal Protein Gene database. Nucleic Acids Res. 32, D168 D170. 508 Genes to Cells (2009) 14, 499 509 2009 The Authors

Codon usage bias in fission yeast Plotkin, J.B. & Dushoff, J. (2003) Codon bias and frequencydependent selection on the hemagglutinin epitopes of influenza A virus. Proc. Natl. Acad. Sci. USA 100, 7152 7157. dos Reis, M. & Wernisch, L. (2009) Estimating translational selection in Eukaryotic genomes. Mol. Biol. Evol. 26, 451 461. Ren, L., Gao, G., Zhao, D., Ding, M., Luo, J. & Deng, H. (2007) Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation. Genome. Biol. 8, R35. Sharp, P.M. & Cowe, E. (1991) Synonymous codon usage in Saccharomyces cerevisiae. Yeast 7, 657 658. Sharp, P.M., Stenico, M., Peden, J.F. & Lloyd, A.T. (1993) Codon usage: mutational bias, translational selection, or both? Biochem. Soc. Trans. 21, 835 841. Wood, V., Gwilliam, R., Rajandream, M.A. et al. (2002) The genome sequence of Schizosaccharomyces pombe. Nature 415, 871 880. Supporting Information/Supplementary materials The following Supporting Information can be found in the online version of the article: Table S1 Codon usage in ribosomal protein genes Table S2 List of genes selected Additional Supporting Information may be found in the online version of the article. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article. Received: 1 November 2008 Accepted: 15 January 2009 509