Similar documents
Molecular Facts and Figures

( TUTORIAL. (July 2006)

Hands on Simulation of Mutation

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet

Mutation. Mutation provides raw material to evolution. Different kinds of mutations have different effects

Gene Finding CMSC 423

Mutations and Genetic Variability. 1. What is occurring in the diagram below?

DNA Bracelets

Chapter 9. Applications of probability. 9.1 The genetic code

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

BOC334 (Proteomics) Practical 1. Calculating the charge of proteins

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

The p53 MUTATION HANDBOOK

UNIT (12) MOLECULES OF LIFE: NUCLEIC ACIDS

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.

Pipe Cleaner Proteins. Essential question: How does the structure of proteins relate to their function in the cell?

Hiding Data in DNA. 1 Introduction

Module 6: Digital DNA

PRACTICE TEST QUESTIONS

10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C)

Shu-Ping Lin, Ph.D.

Concluding lesson. Student manual. What kind of protein are you? (Basic)

Insulin mrna to Protein Kit

Amino Acids, Peptides, Proteins

Protein Synthesis Simulation

IV. -Amino Acids: carboxyl and amino groups bonded to -Carbon. V. Polypeptides and Proteins

Biological One-way Functions

SEAC 2012 Medical Director Potpourri BANNER. WILLIAM PENN. YOUR COMPANY FOR LIFE

Table S1. Related to Figure 4

Guidelines for Writing a Scientific Paper

DNA Sample preparation and Submission Guidelines

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?

Advanced Medicinal & Pharmaceutical Chemistry CHEM 5412 Dept. of Chemistry, TAMUK

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1

Supplementary Online Material for Morris et al. sirna-induced transcriptional gene

Drosophila NK-homeobox genes

Ribosomal Protein Synthesis

Problem Set 3 KEY

pcas-guide System Validation in Genome Editing

Introduction to Bioinformatics (Master ChemoInformatique)

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

Mutation, Repair, and Recombination

CHALLENGES IN THE HUMAN GENOME PROJECT

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

13.2 Ribosomes & Protein Synthesis

2. The number of different kinds of nucleotides present in any DNA molecule is A) four B) six C) two D) three

Gene Synthesis 191. Mutagenesis 194. Gene Cloning 196. AccuGeneBlock Service 198. Gene Synthesis FAQs 201. User Protocol 204

Part A: Amino Acids and Peptides (Is the peptide IAG the same as the peptide GAI?)

Application Note. Determination of 17 AQC derivatized Amino acids in baby food samples. Summary. Introduction. Category Bio science, food Matrix

Next Generation Sequencing

Amino Acids. Amino acids are the building blocks of proteins. All AA s have the same basic structure: Side Chain. Alpha Carbon. Carboxyl. Group.

Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation

GENETIC CODING. A mathematician considers the problem of how genetic information is encoded for transmission from parent to offspring,.

AMINO ACIDS & PEPTIDE BONDS STRUCTURE, CLASSIFICATION & METABOLISM

Academic Nucleic Acids and Protein Synthesis Test

Molecular Genetics. RNA, Transcription, & Protein Synthesis

The Puzzle of Life A Lesson Plan for Life S cien ce Teach ers From: The G reat Lakes S cien ce C ent er, C lev elan d, OH

An Introduction to Bioinformatics Algorithms Gene Prediction

Supplementary Information. Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human

Application Note. Determination of Amino acids by UHPLC with automated OPA- Derivatization by the Autosampler. Summary. Fig. 1.

Multiple Choice Write the letter that best answers the question or completes the statement on the line provided.

All commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI_ NotI


Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.)

H H N - C - C 2 R. Three possible forms (not counting R group) depending on ph

SERVICES CATALOGUE WITH SUBMISSION GUIDELINES

Journal of Chemical and Pharmaceutical Research

Molecular analyses of EGFR: mutation and amplification detection

The Organic Chemistry of Amino Acids, Peptides, and Proteins

Protein Synthesis. Page 41 Page 44 Page 47 Page 42 Page 45 Page 48 Page 43 Page 46 Page 49. Page 41. DNA RNA Protein. Vocabulary

THE CHEMICAL SYNTHESIS OF PEPTIDES

Amino Acids and Proteins

Peptide bonds: resonance structure. Properties of proteins: Peptide bonds and side chains. Dihedral angles. Peptide bond. Protein physics, Lecture 5

LESSON 4. Using Bioinformatics to Analyze Protein Sequences. Introduction. Learning Objectives. Key Concepts

Genomes and SNPs in Malaria and Sickle Cell Anemia

Transcription and Translation These terms describe the two steps used to transform the information carried in genes into useful products.

Announcements. Chapter 15. Proteins: Function. Proteins: Function. Proteins: Structure. Peptide Bonds. Lab Next Week. Help Session: Monday 6pm LSS 277

Non-standard amino acids. mgr Adrian Jasiński Theoretical Molecular Biophysics/Bioinformatics Group

A SPECULATION ON THE ORIGIN OF PROTEIN SYNTHESIS*

Paper: 6 Chemistry University I Chemistry: Models Page: 2 of Which of the following weak acids would make the best buffer at ph = 5.0?

The making of The Genoma Music

Inverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project

Introduction to Bioinformatics 2. DNA Sequence Retrieval and comparison

ANALYSIS OF A CIRCULAR CODE MODEL

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

BD BaculoGold Baculovirus Expression System Innovative Solutions for Proteomics

Thymine = orange Adenine = dark green Guanine = purple Cytosine = yellow Uracil = brown

From DNA to Protein

DNA Replication & Protein Synthesis. This isn t a baaaaaaaddd chapter!!!

Chapter 26 Biomolecules: Amino Acids, Peptides, and Proteins

Title : Parallel DNA Synthesis : Two PCR product from one DNA template

Marine Biology DEC 2004; 146(1) : Copyright 2004 Springer

Supplemental Data. Short Article. PPARγ Activation Primes Human Monocytes. into Alternative M2 Macrophages. with Anti-inflammatory Properties

Biopython Tutorial and Cookbook

Gene and Chromosome Mutation Worksheet (reference pgs in Modern Biology textbook)

BIOLÓGIA ANGOL NYELVEN

and revertant strains. The present paper demonstrates that the yeast gene for subunit II can also be translated to yield a polypeptide

Transcription:

DNA pol RNA pol ARS trna Ribosome DNA mrna Protein Transcription Translation Replication

A B Acceptor stem D-loop T C loop Anticodon loop Variable loop

Relative trna gene copy number 0.0 0.2 0.4 0.6 0.8 2 box codons 3 box codons 4 box codons 6 box codons A A A A A V V A AG AG A AGG A V A L AG AGA L 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Relative codon frequency

i j c a o ac c a c c =[o 1,...,o 64 ] o o C a C a k a L A A 1 ˆr C A c a C a k a o ac L F a f ac r ac a a c a a c a c a

g c = o c o c c C g o ac g ac = o ac a A c C a f ac = o ac c C a o ac f r ac = 1 k a o ac c C a o ac = o ac o a k a r

w ac o ac w ac = c C a o ac

= o o o = c C o c C C o = o e o e o o e e e = a A n a o a k a

o a a o a a k a f f = a A F a ( a, a ) F a a a a a ( ) a ( a, a )= ac c C a f ac,f B(z z all ) = ( ) ( )

= { c C } = (, ) E c = f c e c f c e c = b 1 b 2 b 3 c

L L L L w ac w ac = f ac e ac w f ac e ac = b 1 b 2 b 3 b i i L =( wc(i)) 1 1 L = ( L i=1 L wc(i)) i=1

w ac = ( L w c (i) ) 1 L = ( 1 L w c (i) ) L i=1 i=1 = ( 1 L L i=1 o ac(i)) ( 1 L L i=1 o a, (i)) = ( 1 o c w c ) o c C w c = o c E[o c ] E[o c ] o c c E[o c ] (b 1 b 2 b 3 )

= ( 1 o c C w c ) 1 wc w c = w 0 c w +1 c w +2 c

c w c = f c fc fc fc c = 1 wc o c C w W c c t c W c = t (1 s ct )T ct s ct T ct t c W c W ac wac = W ac c C a W c w n = ( 1 ) o c wc o c C

s ct = U U σ U U U = M( ) M( ) = o! (o c!) c C c C f oc c

o c f c o c o = a A B a o a o o a a o B a B a = (o c e c )2 e c c C a χ 2 o c e c c χ 2 χ 2

χ 2 = 1 o a A o ac k 1 a k c C a 1 a o ac c a k a a χ 2 Z a f ac Z a = o a c C a fac 2 1 o a 1 N a = Z 1 a N a k a N a Z a k K = k K n k N a=k N a=k = 1 N a n k a K k

> Z Z k=3 = 1 ( 2 ( 1) 1 2 +( 1 2 3 Z k=2 3Z k=4 3 ) 1 +( 3 5Z k=6 5 ) 1) = a A N a χ 2

X x 1,x 2,...x k X = I(o c ) c C a o c c C a a I( ) 1 o c 1 0 o =[5, 4, 0, 1] x a p x 2 k 1 [p 1,p 2,p 1 +p 2,p 3,...,p 1 +p 2 +...+p k ] 1 T i+1 = T i n 1 n n n = T n 1 1 T = QΛQ T Λ n = QΛ (n 1) Q T 1

p 1 p 2 p 1 p 1 D 0 p 3 0 0 0 T Compute new state vector s p 2 p 1 +p 2 p T 1 +p 2 c 1 p 3 p 1 +p 3 p 2 +p 3 p 1 +p 2 +p 3 Repeat until end of sequence p 2 p 3 p 1 +p 3 p 2 +p 3 p 1 +p 2 +p 3 Sum the state vector c 2 c 3 s 1 s i+1 s n D 2 k 1 k = D n p z (z) =p x (x) p y (y) = i {k y,...,k} p x (i) p y (k i + 1) k p z p x + p y 1 k 1 x + y 1 x y

n = i o a > 0 i=1 x = a A x a x n nk =1 P (X x),

= a A F a E a F a E a H a (H a )= 2 k a E a = H a (H a ) = H a 2 k a H a a H a = f ac 2 f ac c C a a n H a = p a (c)p a (c c ) ka p a (c c ), i=2 p a (c) c p a (c c ) c c a

(H a )= o 1 k a. E a = (H a) H a (H a ) = 2 k a H a 2 k a = a A F a E a

p c c p c f c = c C f c p c = o + o o + o

=1 2p

= o o o = 1 M a K L a A L M a K M a M a =2 o ac o ac e ac c C a e ac K K = 1 (k a 1) 1 L 2 a A 1 /2 =

S k k S a = 1 k a (k a 1) c C a (r ac 1) 2 r ac k a a = a A F a S a F a 1/18

v(c) c 9 v(c) = (A(c i ),A(c)) i=1 A(c) d

β i i(g) g i E i (c) i c β 1 β 3

w ac w ac = o ac o ac o ac o ac

= 2 G(G 1) G i,j {1 ( (i), (j) )}

(x x)/s x s x / x Normalized mean 1.0 0.5 0.0 0.5 1.0 CAI Fop CBI Nc Coefficient of variation 0.0 0.2 0.4 0.6 0.8 1.0 CAI Fop CBI Nc 0 20 40 60 80 100 GC content 0 20 40 60 80 100 GC content (x x)/s x s x / x

Normalized mean 1.0 0.5 0.0 0.5 1.0 CAI Fop CBI Nc Coefficient of variation 0.0 0.2 0.4 0.6 0.8 1.0 CAI Fop CBI Nc 0 100 200 300 400 500 Length 0 100 200 300 400 500 Length

Normalized mean 1.0 0.5 0.0 0.5 1.0 CAI Fop CBI Nc log CV 4 2 0 2 4 CAI Fop CBI Nc 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of 4 & 6 degenerate codons 0.0 0.2 0.4 0.6 0.8 1.0 Fraction of 4 & 6 degenerate codons d i d i 1 d = 1 2 {( 1 2 )0, ( 1 2 )1, ( 1 2 )2, ( 1 2 )3 } { 8 15, 4 15, 2 15, 1 15 }

Normalized mean 2 1 0 1 2 CAI Fop CBI Nc log CV 4 2 0 2 4 CAI Fop CBI Nc 0.0 0.2 0.4 0.6 0.8 1.0 Degree of codon discrepancy 0.0 0.2 0.4 0.6 0.8 1.0 Degree of codon discrepancy Y = a A F a Y a F a Y a

Normalized mean 2 1 0 1 2 CAI Fop CBI Nc log CV 4 3 2 1 0 CAI Fop CBI Nc 0.0 0.2 0.4 0.6 0.8 1.0 Degree of amino acid discrepancy 0.0 0.2 0.4 0.6 0.8 1.0 Degree of amino acid discrepancy = a A φ F a A φ F a

k s

k s k s d = k s [ ] k d [ ]. dt k s = k d [ ] [ ] k d = 2/t 1 2 k s k d

Transcription [mrna] Translation [Protein] Protein turnover mrna decay k s k d k s k d

A Bias towards reuse (standard deviations) Distance between codons (number of intervening amino acids) B C D

16 A B C 10 20 Frequency 12 8 4 8 6 4 2 16 12 8 4-25 -20-15 -10-5 0 5 10 15 20 25 Standard Deviations -30-20 -10 0 10 20 30 Standard Deviations -30-20 -10 0 10 20 30 Standard Deviations <

<

Slow translation (GFP1) Rapid translation (GFP2) Rapid translation (GFP2) Slow translation (GFP1)

20 15 10 5 0 Alanine normal autocorrelation shuffled within gene shuffled within genome 20 15 10 5 0 Arginine 20 15 10 5 0 Glycine 20 15 10 5 0 Isoleucine 20 Leucine 20 Proline 15 15 10 10 5 5 0 0 20 15 10 5 0 Serine 20 15 10 5 0 Threonine percent deviation from expected 20 15 10 5 0 Valine 0 10 20 30 40 50 distance between codons (number of intervening amino acids) 20 15 10 5 0 All 0 10 20 30 40 50 >

S. cerevisiae 15 C. glabrata 10 D. melanogaster 5 10 5 5 0 10 20 30 0 10 20 30 0 10 20 30 20 A. gossypii 10 A. thaliana 15 H. sapiens 15 10 5 0 10 20 30 5 0 10 20 30 10 5 0 10 20 30 percent deviation from expected 5 0 S. pombe 10 20 30 distance between codons (number of intervening amino acids) 15 10 5 0 C. elegans 10 20 30

61 Anticodon-codon mapping 23-45 20 AA-tRNA charging mrna trnas Amino acids Genetic code

B Anticodon A Anticodon Codon Codon A(I) U A U G C G C U A U A C G C G

π t e E j e ij =1 E T j t ij =1 T α β

π = [π 1,π 2 ] λ = {E,T,π} O P (O λ) P = P (O i λ), i P (O λ) P (λ O) P (M) P (O)

1 t 2 12 t trna AGC trna UGC 11 t 22 t e 21 14 e 21 e 22 e 11 e 12 e 13 e 23 e 24 GCU GCC GCA GCG t π e x i i x i = c i n c c,

n c i n c = r c t r, t r c i c n c =( 1 /4)/( 1 /4 + 1 /3) = 3 /7 x i i i γx 2 i + ɛ i γ ɛ i Z =(X E[X])/σ X

AGC UGC trna 11 5 GCU 58952 1 0 GCC Codon GCA GCG 35580 47988 18336 1 0 0 0 1 1 5 trna 11 R 2 = 0.9995 p = 0.0102 Ala UGC Ala AGC Reading 47988 + 18336 58952 + 35580 = 66324 = 94532 γx 2 + e s X s = i C ij X ij, i j, C ij +1 1 j

Consecutive codon GCU GCC GCA GCG GCU 11.0 1.3-8.7-6.9 Leading codon GCC GCA GCG 0.8-8.2-7.1 6.8-6.2-0.8-6.4 11.5 4.8-1.4 5.4 5.7 s n =(s s)/( s s)

ˆp = r +1 n +1, n r

1.0 Normalized Score 0.5 0.0 CC REG HMM

<

Number of predictions Diffr. to random +/- HMM 428-115 REG 419 +132 205 +26 412 +125 119-168 CC

a b c d e f a a a a a a a a a a a e e e d c b f

ψ ψ

2nd 1st T C A G Val Met Ile Leu Leu Phe T C A G GmAA ncm 5 UmAA m 5 CAA GAG UAG IAU A CAU init CAU IAC ncm 5 UAC CAC Ala Thr Pro Ser IGA ncm 5 UGA CGA AGG ncm 5 UGG IGU ncm 5 UGU CAU IGC ncm 5 UGC His Stop Tyr Glu Asp Lys Asn Gln G A GUG mcm 5 s 2 UUG CUG GUU mcm 5 s 2 UUU CUU GUC mcm 5 s 2 UUC CUC Gly Arg Ser Arg Trp Stop Cys GCA CmCA ICG CCG GCU mcm 5 UCU CCU GCC mcm 5 UCC CCC 2nd 3rd T C A G T C A G T C A G T C A G

Anticodon A G U C Pyr Pur Ile 4-box Gly ψ ψ

All Pairs All x All Comparison Candidate Pairs Formation of Stable Pairs Stable Pairs Verification of Stable Pairs Verified Pairs Clustering of Orthologs Group Pairs Broken Pairs Orthologous Groups >

l ( a 1, a 2 ) >l ( s 1, s 2 ) a 1 a 2 s 1 s 2 l d d + d

Triangle test [%] 100 99.90 99.80 Domain test [%] 100 95 90 Number of orthologous relations Fraction of genes with same number of domains Fraction of genes that pass triangle test 0.5 0.6 0.7 0.8 0.9 1 Length Tolerance 0.3 0.2 0.1 Orthologous relations [10 6 ] l

< l< l

No Tolerance Tolerance Score BBH RBH Distance RSD SP i, i j, j d j d >k σ 2 (d j d ) d i d >k σ 2 (d i d ) d k σ 2 (d j d )=σ 2 (d j )+σ 2 (d ) (d j,d ) k k

A x y 1? y 2 B C z x y 1 y 2 z x y 1 y 2 D z x d > 0 y 1 y 2 d

d d = d + d + d + d d d > 0 k k 1 2 d d d d

90 Fraction of SP passing test [%] 89 88 l = 0.70 l = 0.65 l = 0.60 l = 0.55 1.4 1.6 1.8 2.0 2.2 SP tolerance 2.4 l = 0.50 A B C x 1 y 2 dx 1 z 2 dy 2 z 1 dx 1 z 1 d y2 z 2 x 1 y 2 x 1 x 2 z 1 z 2 y 1 y 2 z 1 z 2 z 1 z 2

k

Fraction of VP passing test [%] 97.2 97.0 96.8 96.6 96.4 96.2 96.0 95.8 l = 0.61, k SP = 1.81 l = 0.72, k SP = 1.67 l = 0.58, k SP = 1.96 0.5 1 1.5 2 2.5 VP tolerance

A 800 w 1 300 B x 1 900 700 400 200 y 2 500 1000 w 1 x 1 z 1 z 2 y 2 z 1 z 2

( n 2)

= Paralogs = Orthologs AP CP SP VP GP BP = SP minus VP Relative Amount [%] 50 40 30 20 10 CP SP VP GP Type of Connection

i,j j i

Number of members 10 5 10 4 10 3 10 2 10 2 Class All Bacteria Firmicutes Eukaryota Archaea Vertebrates Mammalia Group Size Genomes Orthologs Ave. groupsize 550 444 116 72 51 32 25 302596 145255 28109 157302 15622 80123 58982 5.52 7.20 7.67 4.11 4.32 5.46 5.75 Full

Codon 1st position T C A G Codon 2nd position T C A G Phe Leu Leu Ile Met + Init Val Ser Pro Thr Ala Tyr Stop His Gln Asn Lys Asp Glu Cys Stop Trp Arg Ser Arg Gly = 6 box = 4 box = 3 box = 2 box = 1 box T C A G T C A G T C A G T C A G Codon 3rd position

Cysteine Stop Tryptophan Threonine Tyrosine Stop Isoleucine Leucine A UC G U C A G G C U C A G C U A G Methionine Phenylalanine Asparagine A U C G C A U U A Lysine Serine Arginine A U C G G A U C G Glutamine A U C G C G U A A UC U G U C A C U C G G A A G U C A G A A U U G C C G A U G C U A G C U A G C Serine Arginine Alanine Valine Histidine Leucine Aspartic acid Proline Glycine Glutamic acid

1st Position T C A G Genetic code 1 2 3 5 12 13 21 23 1 22 1 6 9 14 15 16 21 22 1 2 3 4 5 9 10 13 14 21 F S Y C T F S Y C C L $ S $ $ Q Y $ W W W W W C W W W A L S $ Q Q L L W G L T P H R T L T P H R C L T P Q R A L T S P Q R G I T N S T I T N S C I M M M M M T K N N N R $ S S G S S A M T K R $ S S G S S G V A D G T V A D G C V A E G A V A E G G T C A G 2nd Position 2nd Position

N N N H N NH 2 NH N N H N O NH 2 N N H NH 2 O NH N H O O S Strong M amino K Keto W Weak W Weak M amino K Keto S Strong Y pyrimidine C Cytosine T Thymine Y pyrimidine R purine A Adenine G Guanine R purine H not-g V not-t D not-c B not-a N any

Electricaly charged side chains Positive Negative Arginine Histidine Lysine Aspartic acid Glutamic acid Polar uncharged side chains Special cases Serine Threonine Asparagine Glutamine Cysteine Selenocysteine Glycine Proline Pyrrolysine Hydrophobic side chains Alanine Valine Leucine Isoleucine Methionine Phenylalanine Tyrosine Tryptophan

ψ

TPI = L - R (L + R = 1) Probability L R Changes 4 Valine A A A R M R R A V C V V C V A R 4 Arginine 5 Alanine Count the number of changes Calculate the distribution of changes

A B C GFP1 GFP1GFP2 GFP2GFP2 GFP2GFP1 TPI construct 2GFP GFP Intensity (arbitrary units) 200 150 100 50 GFP2GFP1 GFP2GFP2 Position on gel GFP 100 200 300 2GFP Velocity ratio correlated vs. anti-correlated 1.5 1.0 0.5 GFP1 GFP1 GFP1 All GFP2 GFP2 GFP2 TPI construct 1 1 2 2

Amino acid sequence MGCANLVSRLENNSRLLNRDLIAVTIGAIVYKDPHAGALRS... Subsequence of consecutive synonymous codons GCA GCA GCT GCG GCC... Observable output sequence 1, 1, 4, 3, 2,... 1 Count matrix of consecutive codon 1 1 1

Alanine Arginine Glutamine trna gene copy number 4 5 6 7 8 9 10 11 GCA R squared = 0.9993 p val= 0.0118 GCT trna gene copy number 0 2 4 6 8 10 R squared = 0.9303 p val= 0.0052 CGT CGG AGG AGA trna gene copy number 0 2 4 6 8 CAG R squared = 0.9755 p val= 0.0706 CAA 65000 75000 85000 95000 Codon frequency Glutamic acid 0 20000 40000 60000 Codon frequency Glycine 40000 60000 80000 Codon frequency Isoleucine trna gene copy number 2 4 6 8 10 12 14 GAG R squared = 0.9962 p val= 0.0277 GAA trna gene copy number 5 10 15 R squared = 0.9238 p val= 0.0257 GGA GGG GGC trna gene copy number 2 4 6 8 10 12 ATA R squared = 1 p val= 0.0016 ATT 60000 100000 140000 Codon frequency Leucine 0e+00 4e+04 8e+04 Codon frequency Lysine 60000 100000 140000 Codon frequency Proline trna gene copy number 0 2 4 6 8 10 R squared = 0.9179 p val= 0.0066 CTA CTC TTG TTA trna gene copy number 6 8 10 12 14 R squared = 0.7774 AAG p val= 0.2165 AAA trna gene copy number 2 4 6 8 10 CCT R squared = 0.9997 p val= 0.0073 CCA 55000 65000 75000 Codon frequency Serine 90000 110000 130000 Codon frequency Threonine 40000 60000 80000 Codon frequency Valine trna gene copy number 0 2 4 6 8 10 R squared = 0.9581 p val= 0.0024 TCA AGC TCG TCT trna gene copy number 0 2 4 6 8 10 R squared = 0.9963 p val= 0.0013 ACA ACG ACT trna gene copy number 2 4 6 8 10 12 14 R squared = 0.9978 p val= 7e 04 GTG GTA GTT 20000 60000 100000 Codon frequency 2e+04 6e+04 1e+05 Codon frequency 4e+04 6e+04 8e+04 1e+05 Codon frequency

GCT GCC GCA GCG Alanine Arginine GCT GCC GCA GCG CGT CGC CGA CGG AGA AGG 11 0.8-8.2-7 1.3 6.8-6.4-1.4-8.7-6.3 11.6 5.4-6.9-0.9 4.8 5.8 CGT CGC CGA CGG AGA AGG 13.4 2.5-2.5-3.7-1.7-6.8 3.4 8.5 2.1 9-8.4-0.1-3.3 4.6 9.3 7.5-7.1 2-0.7 5.3 4.9 8.7-7.1 1.5-3.5-8.8-7 -8 12-0.9-5.5 1.4 3.6 1.8-3 5.1 GGT GGC GGA GGG Glycine GGT GGC GGA GGG 26.8-11.8-17.6-9.2-11.1 7.8 5.2 3.9-17.2 6.2 14 5.2-10.5 3.8 7.3 5.3 CCT CCC CCA CCG Proline CCT CCC CCA CCG 3.7 0.2-2.8-1 1.3 6.7-7 2.8-4.5-7.1 11-3.9 0.5 4.7-6.4 5.3 CTT CTC CTA CTG TTA TTG Leucine Serine CTT CTC CTA CTG TTA TTG TCT TCC TCA TCG AGT AGC 7.9 4.6-1.6-4.1 0.6-4.5 TCT 12.6 6-0.5-2.8-9.7-11 4.3 10.6-1.4 4.4-5.1-4.6 TCC 5.4 7.3-4 -1.7-3.7-5.2-0.2-0.3 4 0.8 0.9-4 TCA -4.3-3.5 9.4 2.3-0.3-4 1 3.1 1.8 9.7-6 -3.6 TCG -4.8 0.4 1.2 6.9-2.9 2.3-0.3-3.1 1.9-3.5 7.4-4.8 AGT -7.1-5.8-1.8-2.4 10.5 9.7-7.7-6.9-4.3-2.5-2.1 15.5 AGC -6.3-6.3-6.4-0.5 9.7 14.5 ACT ACC ACA ACG Threonine Valine ACT ACC ACA ACG GTT GTC GTA GTG 6.1 4.9-6.7-5.6 GTT 9 2.1-6.5-7.5 3.6 7.1-6.5-4.6 GTC 2 7.5-6.8-3.2-5.5-8 8.6 5.6 GTA -6.5-6.6 10.9 4.1-5.8-4.4 5.6 6.1 GTG -7.3-3.6 4.3 9.2

Alanine Arginine Glutamine 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 HMM REG CC HMM REG CC HMM REG CC Glutamic acid Glycine Isoleucine 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 HMM REG CC HMM REG CC HMM REG CC Leucine Lysine Proline 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 HMM REG CC HMM REG CC HMM REG CC Serine Threonine Valine 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 HMM REG CC HMM REG CC HMM REG CC

ψ ψ ψ