Sequence comparison, Part I: Substitution and Scores
|
|
- Mervyn Hensley
- 7 years ago
- Views:
Transcription
1 Sequence comparison, Part I: Substitution and Scores David H. Ardell Docent of Bioinformatics
2 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
3 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
4 HOMOLOGY: common descent (Darwin, 1859) Original definition: "the same organ in different animals under every variety of form and function." (Owen, 1843). Richard Owen ( ) But: homology need not imply similarity of form nor function because of divergence. Similarity need not imply homology because of convergence.
5 Most Recent Common Ancestor Most Recent Common Ancestor Divergence Convergence
6 Most Recent Common Ancestor Earlier Common Ancestor Most Recent Common Ancestor Divergence Convergence
7 Morphology vs. Sequences GCCACTTT CGCGATCA GAAACGTT CGTGATCG GGCAGTTT CGCGATTT
8 Morphology DNA Sequences GCCACTTT CGCGATCA GGCAGATT CAGGATTT GGCAGATT CAGGATTT Convergence More Common Convergence Very Rare!!
9 Why sequence convergence is rare: Many genotypes code for the same phenotype Development GCCACTTT CGCGATCA Evolution Convergent Phenotype Development GAAACGTT CGTGATCG Divergent Genotype GGCAGATT CAGGATTT
10 The enormity of sequence space: DNA (a = 4) L = 1 A G T C N = L a = 4 1 = 4 K = NL(a 1) = 12
11 The enormity of sequence space: DNA (a = 4) L = 1 L = 2 T A G C A T A A A G A C GA GT GG GC TA TT TG TC C A C T C G C C N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 4 2 = 16
12 The enormity of sequence space: DNA (a = 4) L = 1 L = 2 T A G C A T A A A G A C GA GT GG GC TA TT TG TC C A C T C G C C N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 4 2 = 16 K = NL(a 1) = 96
13 The enormity of sequence space: DNA (a = 4) L = 3 AAA ATA AGA GAA GTA ACA GGA GCA AAG ATG AGGGAG GTG ACG GGG GCG TAA TGA CAA CGA TAG TGG CAG CGG TTA TCACTA CCA TTG TCGCTG CCG AAT AGT GAT GGT AAC AGC GAC GGC ATT GTT ACT GCT ATC GTC ACC GCC TAT TGT CAT CGT TAC TGC CAC CGC TTT TCT CTT CCT TTC TCC CTC CCC N = L a = 4 3 = 64 K = NL(a 1) = 576
14 The enormity of sequence space DNA (a = 4), L = 300: N = L a = x K = NL(a 1) 3.74 x
15 The enormity of sequence space DNA (a = 4), L = 300: N = L a = x K = NL(a 1) 3.74 x The probability of two independent randomly evolving sequences converging over any but very small lengths is infinitesimally small.
16 Similarity implies homology DNA (a = 4), L = 300: N = L a = x K = NL(a 1) 3.74 x The probability of two independent randomly evolving sequences converging over any but very small lengths is infinitesimally small. Sequences more similar than expected from random are therefore inferred to have evolved from a common ancestor.
17 Similarity implies homology for sequences Similar morphologies need not imply homology because of convergence. Similar sequences do imply homology because convergence is improbable. GCCACGTTCGCGATCG GGCAGTCTCGCGATTT
18 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
19 Homologous DNA sequences GCCACGTTCGCGATCG GGCAGTCTCGCGATTT
20 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA Significantly similar sequences (such as from a BLAST search) are inferred to have come from a common ancestor GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT GGCAGTTTCGCGATTT Homologous sequences
21 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA T 0 All the differences we see between homologs must have evolved since their diverged GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT GGCAGTTTCGCGATTT T now Homologous sequences
22 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA T 0 GCCACTTTCGCGATCG GCCACTTTCGCGATCA T 1 GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences
23 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA T 0 GCCACTTTCGCGATCG GCCACTTTCGCGATCG GCCACTTTCGCGATTA T 1 T 2 GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences
24 Homologous DNA sequences Ancestral sequence GCCAGTTTCGCGATCT T 0 GCCAGTTTCGCGATCG T 1 GCCAGTTTCGCGATTA T 2 GCCAGGTTCGTGATCG T 3 GCCACGTTCGCGATCG GCCAGTCTCGCGATTA GGCAGTCTCGCGATTT T 4 T 5 T 6 GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT GGCAGTCTCGCGATTT T now Homologous sequences
25 Homologous DNA sequences Ancestral sequence GCCAGTTTCGCGATCT T 0 GCCAGTTTCGCGATCG T 1 GCCAGTTTCGCGATTA T 2 GCCAGGTTCGTGATCG T 3 GCCACGTTCGCGATCG GCCAGTCTCGCGATTA GGCAGTCTCGCGATTT T 4 T 5 T 6 GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTCTCGCGATTT T now Homologous bases at a site
26 Rate of Evolution: changes per time (or per generation) per sequence and per site. Ancestral sequence GCCACTTTCGCGATCA T 0 time t GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTTTCGCGATTT T now 6 differences per 16 sites per 2 sequences = (6 / 16) / 2 = 18.75% per time t
27 Why divide by two? to estimate how one sequence changes over time Ancestral sequence GCCAGTTTCGCGATCT T 0 GCCAGTTTCGCGATTA time t GCCAGTCTCGCGATTA GGCAGTCTCGCGATTT GGCAGTCTCGCGATTT T now 3 differences per 16 sites = (3 / 16) = 18.75% per time t
28 We usually don't know ancestral sequences. So we compare sequences to infer evolutionary changes? T 0 time t GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTTTCGCGATTT T now 6 differences per 16 sites per 2 sequences = (6 / 16) / 2 = 18.75% per time t
29 We usually don't know how much time has passed. So we calculate Evolutionary distance as rate X time.? T 0 time? GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTTTCGCGATTT T now 6 differences per 16 sites per 2 sequences = (6 / 16) / 2 = 18.75% divergence
30 There may thus exist a Molecular Evolutionary Clock Zuckerkandl & Pauling (1965) % amino acid differences Divergence between α and β or γ Divergence between β, and γ Approx. duplication dates (mya) from vertebrate fossil records
31 Different protein clocks tick at different rates:
32 Different protein clocks tick at different rates
33 A given large divergence can be attained from a fast rate and short time or a slow rate and a long time
34 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
35 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1
36 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2
37 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2
38 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2: 2 mutations
39 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2: 2 mutations t = 3
40 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2: 2 mutations t = 3 t = 4: 1 substitution
41 Sequence differences between species are often assumed to be substitutions (fixed differences). Ancestor Species 1 Species 2
42 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
43 % identity (100 - %differences) underestimates evolutionary divergence! % amino acid differences Approx. duplication dates (mya) from vertebrate fossil records
44 Why Percent Identity (%ID) underestimates evolution The more sequences evolve, the more changes we miss. ANCESTOR
45 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site
46 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences
47 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences Back changes can undo earlier changes
48 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences Back changes can undo earlier changes 4 changes, 1 difference
49 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences Back changes can undo earlier changes 4 changes, 1 difference Parallel changes hide evolution 6 changes, 1 difference
50 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
51 The Poisson Correction Imagine substitutions raining down on sequences:
52 The Poisson Correction Imagine substitutions raining down on sequences:
53 The Poisson Correction Imagine substitutions raining down on sequences:
54 The Poisson Correction Imagine substitutions raining down on sequences:
55 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n).
56 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time.
57 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time. 3. Each site has probability λ/n of mutating at distance λ, where it is assumed that N is large. The average fraction of sites not mutated (p/n) is then: (1 - λ/n) N e λ (for large N).
58 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time. 3. Each site has probability λ/n of mutating at distance λ, where it is assumed that N is large. The average fraction of sites not mutated (p/n) is then: (1 - λ/n) N e λ (for large N). 4. Therefore, if we see p out of N sites not mutated and assume no back or parallel substitutions, we can estimate λ = ln (p/n).
59 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time. 3. Each site has probability λ/n of mutating at distance λ, where it is assumed that N is large. The average fraction of sites not mutated (p/n) is then: (1 - λ/n) N e λ (for large N). 4. Therefore, if we see p out of N sites not mutated and assume no back or parallel substitutions, we can estimate λ = ln (p/n). 5. Ex: %ID of 38% implies λ = -ln(0.38) 1. About as many substitutions have occurred as the length of the sequence.
60 Poisson-Corrected Evolutionary Distance vs. %ID Substitutions per site 38%ID = %ID = 0.5 %ID
61 The effect of alphabet size DNA (a = 4) Protein (a = 20) A G T C L = 1 N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 20 1 = 20 K = NL(a 1) = 380 At a given position, randomly evolving proteins are less likely than DNA to mutate back ( revert ) to an earlier state.
62 When should you use the Poisson Correction? DNA (a = 4) Protein (a = 20) A G T C L = 1 N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 20 1 = 20 K = NL(a 1) = 380 The Poisson correction assumes no back or parallel substitutions so it is most appropriate for proteins at short evolutionary distances.
63 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
64 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( )
65 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Basic idea: 1. Collect a big dataset of alignments of closely related proteins. Margaret Dayhoff ( )
66 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of alignments of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset.
67 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset. 3. Calculate the transition probabilities for any amino acid to substitute to another amino acid after 1% sequence divergence.
68 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset. 3. Calculate from this the transition probabilities for any amino acid to substitute into any other amino acid after 1% sequence divergence. 4. This defines the PAM1 substitution matrix ( Point Accepted Mutation, where accepted implies by natural selection ).
69 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset. 3. Calculate from this the transition probabilities for any amino acid to substitute into any other amino acid after 1% sequence divergence. 4. This defines the PAM1 matrix ( Point Accepted Mutation, where accepted implies by natural selection ). 5. Assume that the transition probabilities after N% sequence divergence are given by the N-th power of the PAM1 matrix. Ex: PAM250 = (PAM1) 250
70 Example: part of the PAM15 matrix of Jones, Taylor and Thornton (1998) A R N D C Q E G H I L K... A R N D C Q E G H I J K
71 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites in the sequence.
72 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. The probability of A becoming B at 2% divergence is PAM2(B A) = Σ x PAM1(B x) * PAM1(x A) A. B
73 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. PAM2 = PAM1*PAM1 = (PAM1) 2 PAM3 = PAM2*PAM1 = (PAM1) 3 PAMn = PAMn-1*PAM1 = (PAM1) n
74 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. 3. Sufficient Sample Size: Sequence composition is the same as in the alignments used to make the matrix.
75 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. 3. Sufficient Sample Size: Sequence composition is the same as in the alignments used to make the matrix. 4. Stationarity: The probabilities of substitutions do not change with time.
76 Q: What does PAM % change to a protein mean?
77 Q: What does PAM % change to a protein mean? A: a little less than 82% divergence, i.e. just over 18% ID
78 Part of the PAM250 matrix of Jones, Taylor and Thornton (1998) A R N D C Q E G H I L K... A R N D C Q E G H I J K
79 Part of the PAM1000 matrix of Jones, Taylor and Thornton (1998) A R N D C Q E G H I L K... A R N D C Q E G H I J K PAM matrix transition probabilities converge to the composition of the database used to make them
80 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
81 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles
82 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11: 2 * [p( )+p( )+p( )+p( )]
83 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles: 2 * [p( )+p( )+p( )+p( )] p( )+p( )+p( )+p( )+p( )+p( )
84 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles: 2 * [p( )+p( )+p( )+p( )] p( )+p( )+p( )+p( )+p( )+p( ) = 8 / 36 = 4 : 3 odds 6 / 36
85 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles: 2 * [p( )+p( )+p( )+p( )] p( )+p( )+p( )+p( )+p( )+p( ) = 8 / 36 = 4 : 3 odds 6 / 36 Example 2: Odds of rolling doubles versus a poker flush : p( )+p( )+p( )+p( )+p( )+p( ) p(5 )+p(5 )+p(5 )+p(5 ) = 6 / : 1 4 * (13/52 * 12/51 * 11/50 * 10/49 * 9/48) odds
86 Odds versus Likelihood Ratios Odds can be made of any probabilities, even over different event spaces: p( )+p( )+p( )+p( )+p( )+p( ) 3030 : 1 odds p(5 )+p(5 )+p(5 )+p(5 )
87 Odds versus Likelihood Ratios: Odds can be made of any probabilities, even over different event spaces: p( )+p( )+p( )+p( )+p( )+p( ) 3030 : 1 odds p(5 )+p(5 )+p(5 )+p(5 ) Likelihood ratios must be made over the same events. Example: The likelihood ratio of the word HELLO in a random sequence of letters with English frequencies, versus uniform freqs.: p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.)
88 Likelihood Ratios Model 1: English Probabilities of letters in English E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.)
89 Likelihood Ratios compare the likelihoods of the same event in two different models Model 1: English Probabilities of letters in English E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) Uniform probabilities ( = 1/26) E T I O A N S H R L D U C Y G W M B F P V K X Q J Z Model 2: Uniform
90 Likelihood Ratios compare the likelihoods of the same event in two different models E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.)
91 Likelihood Ratios compare the likelihoods of the same event in two different models E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = 12.8
92 Likelihood Ratios compare the likelihoods of the same event in two different models E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = 12.8 HELLO is about 13 times more likely in a sequence with English letter frequencies than random
93 Independence of elementary events makes calculating compound event likelihoods easy p( HELLO Eng.) p( HELLO Unif.) = p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = x 10-6 * * * * = x 10-8 =
94 Independence of elementary events makes calculating compound event likelihoods easy p( HELLO Eng.) p( HELLO Unif.) p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = x 10-6 * * * * = x * * * * = =
95 Log-Likelihood Ratios let you add instead of multiply (avoiding overflow, etc.) p( HELLO Eng.) p( HELLO Unif.) = x 10-6 * * * * = x * * * * log2(1.4) + log2(3.2) + 2 * log2(1.2) + log2(2.0) log2(12.8)
96 Log-Likelihood Ratios of symbols are called LOD Scores ( LOD stands for Log-Odds ) p( HELLO Eng.) p( HELLO Unif.) = x 10-6 * * * * = x * * * * log2(1.4) + log2(3.2) + 2 * log2(1.2) + log2(2.0) log2(12.8) S( H ) + S( E ) + 2 * S( L ) + S( O ) log2(12.8)
97 Scores Likelihoods !;!<!= > = A positive score means the symbol is more likely in model 1; a negative score means it is more likely in model 2 E T I O A N S H R L D U C Y G W M B F P V K X Q J Z! " # $ % & ' ( ) * +, -. / :
98 Scores Likelihoods !;!<!= > = We use log2 for scores, so +1: an event is twice as likely in model 1 than model 2 1: an event is half as likely in model 1 than model 2 E T I O A N S H R L D U C Y G W M B F P V K X Q J Z O is twice as likely in English than random M is half as likely in English than random! " # $ % & ' ( ) * +, -. / :
99 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
100 To score pairwise alignments, elementary events are pairs of amino acids or nucleotides in a column S( ) = S( ) +S( ) +S( ) +S( ) +S( )
101 To score pairwise alignments, elementary events are pairs of amino acids or nucleotides in a column S( ) = S( ) +S( ) +S( ) +S( ) +S( )
102 S( ) = log2 p( p( evolution) chance)
103 S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( )
104 Model 1: Evolution From Substitution Matrices S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( )
105 Model 1: Evolution From Substitution Matrices S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( ) Model 2: Chance Two picks from a Random Urn with Database composition
106 Model 1: Evolution From Substitution Matrices S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( ) Model 2: Chance Two picks from a Random Urn with Database composition Probability of pairs when sliding unrelated sequences past each other
107 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
108 Unlike Subst. Matrices, Score Matrices are symmetric Model 1: Evolution S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( ) Model 2: Chance = log2 p( ) p( ) p( )
109 BLOSSUM 62 Score Matrix
110 BLOSSUM 62 Score Matrix Bedell et al Figure 4-3. Amino acid chemical relationships Isoleucine Leucine Phenylalanine
111 PAM vs BLOSSUM PAM Starts from alignments of closely related proteins Builds trees to avoid overcounting related sequences inferred ancestral states are used to estimate transition probabilities Transition probabilities at larger evolutionary distances are extrapolated from those at short distances Larger PAMs model bigger distances (Ex: PAM250 > PAM 100) BLOSSUM Starts from alignments of both closely and distantly related proteins Clusters sequences by single-linkage to avoid overcounting. all pairs in a clustered alignment are used to calculate pair probabilities Transition probabilities at different evolutionary distances are estimated empirically from clusters made at different minimal percent identities Larger BLOSSUMs model shorter distances (from higher %ID clusters) (Ex: BLOSSUM62 > BLOSSUM80)
112 Other Amino Acid Substitution/Score Matrices Some matrices are updates of the original Dayhoff method with more data or some technical refinements Ex: JTT, Jones, Taylor, Thornton Gonnet, Benner and Cohen Some matrices are for specialized kinds or parts of proteins. Ex: JTT transmembrane protein matrix Goldstein secondary structure matrices Some matrices have different assumptions Ex: BLOSSUM does not assume Markov property. Matrices are computed independently from alignments at different % IDs. BLOSSUM matrices are labeled by expected %ID, so BLOSSUM30 > BLOSSUM62, whereas PAM100 < PAM250!!!
113 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices
114 Matrix models of DNA evolution A G C T A G C T A * α α α G α α α C α α * α T α α α * The Jukes-Cantor model
115 Matrix models of DNA evolution Pools A G C T The Jukes-Cantor model
116 Matrix models of DNA evolution A C G T Flows out A G C T A * α α α G α α α C α α * α The Jukes-Cantor model T α α α *
117 Matrix models of DNA evolution A C G T Flows in A G C T A * α α α G α α α C α α * α The Jukes-Cantor model T α α α *
118 Matrix models of DNA evolution A C G T Because of symmetry, sequences evolve to the uniform base composition (25%A, 25%G, 25%C, 25%T). The Jukes-Cantor model
119 Matrix models of DNA evolution A G C T The Kimura model A G C T A * β α α G β α α C α α * β T α α β *
120 Matrix models of DNA evolution A G C T The Kimura model
121 Matrix models of DNA evolution A G C T The Kimura model
122 Matrix models of DNA evolution A C G T The Kimura model A G C T A * β α α G β α α C α α * β T α α β *
(http://genomes.urv.es/caical) TUTORIAL. (July 2006)
(http://genomes.urv.es/caical) TUTORIAL (July 2006) CAIcal manual 2 Table of contents Introduction... 3 Required inputs... 5 SECTION A Calculation of parameters... 8 SECTION B CAI calculation for FASTA
More informationGENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core
DNA Sequencing Services Pre-Mixed o Provide template and primer, mixed into the same tube* Pre-Defined o Provide template and primer in separate tubes* Custom o Full-service for samples with unknown concentration
More informationMutations and Genetic Variability. 1. What is occurring in the diagram below?
Mutations and Genetic Variability 1. What is occurring in the diagram below? A. Sister chromatids are separating. B. Alleles are independently assorting. C. Genes are replicating. D. Segments of DNA are
More information(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.
Legends of supplemental figures and tables Figure 1: Overview of study design and results. (A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors. After raw data gene expression
More informationHands on Simulation of Mutation
Hands on Simulation of Mutation Charlotte K. Omoto P.O. Box 644236 Washington State University Pullman, WA 99164-4236 omoto@wsu.edu ABSTRACT This exercise is a hands-on simulation of mutations and their
More informationUNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet
1 UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Exam in: MBV4010 Arbeidsmetoder i molekylærbiologi og biokjemi I MBV4010 Methods in molecular biology and biochemistry I Day of exam:.
More informationSequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need
More informationDNA Sample preparation and Submission Guidelines
DNA Sample preparation and Submission Guidelines Requirements: Please submit samples in 1.5ml microcentrifuge tubes. Fill all the required information in the Eurofins DNA sequencing order form and send
More information10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C)
TECHNICAL DATA SHEET BIOLUMINESCENCE RESONANCE ENERGY TRANSFER RENILLA LUCIFERASE FUSION PROTEIN EXPRESSION VECTOR Product: prluc-c Vectors Catalog number: Description: Amount: The prluc-c vectors contain
More informationTable S1. Related to Figure 4
Table S1. Related to Figure 4 Final Diagnosis Age PMD Control Control 61 15 Control 67 6 Control 68 10 Control 49 15 AR-PD PD 62 15 PD 65 4 PD 52 18 PD 68 10 AR-PD cingulate cortex used for immunoblot
More informationIntroduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1
Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation Beginning Perl, Chap 4 6 Example 1 #!/usr/bin/perl -w use strict; # version 1: my @nt = ('A', 'C', 'G', 'T'); for
More informationThe p53 MUTATION HANDBOOK
The p MUTATION HANDBOOK Version 1. /7 Thierry Soussi Christophe Béroud, Dalil Hamroun Jean Michel Rubio Nevado http://p/free.fr The p Mutation HandBook By T Soussi, J.M. Rubio-Nevado, D. Hamroun and C.
More informationInverse PCR & Cycle Sequencing of P Element Insertions for STS Generation
BDGP Resources Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation For recovery of sequences flanking PZ, PlacW and PEP elements E. Jay Rehm Berkeley Drosophila Genome Project I.
More informationSupplementary Online Material for Morris et al. sirna-induced transcriptional gene
Supplementary Online Material for Morris et al. sirna-induced transcriptional gene silencing in human cells. Materials and Methods Lentiviral vector and sirnas. FIV vector pve-gfpwp was prepared as described
More informationGene Synthesis 191. Mutagenesis 194. Gene Cloning 196. AccuGeneBlock Service 198. Gene Synthesis FAQs 201. User Protocol 204
Gene Synthesis 191 Mutagenesis 194 Gene Cloning 196 AccuGeneBlock Service 198 Gene Synthesis FAQs 201 User Protocol 204 Gene Synthesis Overview Gene synthesis is the most cost-effective way to enhance
More informationPart ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?
Networked Systems, COMPGZ01, 2012 Answer TWO questions from Part ONE on the answer booklet containing lined writing paper, and answer ALL questions in Part TWO on the multiple-choice question answer sheet.
More informationSERVICES CATALOGUE WITH SUBMISSION GUIDELINES
SERVICES CATALOGUE WITH SUBMISSION GUIDELINES 3921 Montgomery Road Cincinnati, Ohio 45212 513-841-2428 www.agctsequencing.com CONTENTS Welcome Dye Terminator Sequencing DNA Sequencing Services - Full Service
More informationGene Finding CMSC 423
Gene Finding CMSC 423 Finding Signals in DNA We just have a long string of A, C, G, Ts. How can we find the signals encoded in it? Suppose you encountered a language you didn t know. How would you decipher
More informationSupplementary Information. Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human
Supplementary Information Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human vascular endothelial growth factor 165 revealed by biosensor based assays Yoichi Takakusagi
More informationpcas-guide System Validation in Genome Editing
pcas-guide System Validation in Genome Editing Tagging HSP60 with HA tag genome editing The latest tool in genome editing CRISPR/Cas9 allows for specific genome disruption and replacement in a flexible
More informationNext Generation Sequencing
Next Generation Sequencing 38. Informationsgespräch der Blutspendezentralefür Wien, Niederösterreich und Burgenland Österreichisches Rotes Kreuz 22. November 2014, Parkhotel Schönbrunn Die Zukunft hat
More informationMolecular analyses of EGFR: mutation and amplification detection
Molecular analyses of EGFR: mutation and amplification detection Petra Nederlof, Moleculaire Pathologie NKI Amsterdam Henrique Ruijter, Ivon Tielen, Lucie Boerrigter, Aafke Ariaens Outline presentation
More informationTitle : Parallel DNA Synthesis : Two PCR product from one DNA template
Title : Parallel DNA Synthesis : Two PCR product from one DNA template Bhardwaj Vikash 1 and Sharma Kulbhushan 2 1 Email: vikashbhardwaj@ gmail.com 1 Current address: Government College Sector 14 Gurgaon,
More informationCoding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein
Assignment 3 Michele Owens Vocabulary Gene: A sequence of DNA that instructs a cell to produce a particular protein Promoter a control sequence near the start of a gene Coding sequence the sequence of
More informationModule 6: Digital DNA
Module 6: Digital DNA Representation and processing of digital information in the form of DNA is essential to life in all organisms, no matter how large or tiny. Computing tools and computational thinking
More informationChapter 9. Applications of probability. 9.1 The genetic code
Chapter 9 Applications of probability In this chapter we use the tools of elementary probability to investigate problems of several kinds. First, we study the language of life by focusing on the universal
More informationAmino Acids and Their Properties
Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that
More informationInverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project
Inverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project Protocol for recovery of sequences flanking insertions in the Drosophila Gene Disruption
More informationANALYSIS OF A CIRCULAR CODE MODEL
ANALYSIS OF A CIRCULAR CODE MODEL Jérôme Lacan and Chrstan J. Mchel * Laboratore d Informatque de Franche-Comté UNIVERSITE DE FRANCHE-COMTE IUT de Belfort-Montbélard 4 Place Tharradn - BP 747 5 Montbélard
More informationMORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.
MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using
More informationhttp://www.life.umd.edu/grad/mlfsc/ DNA Bracelets
http://www.life.umd.edu/grad/mlfsc/ DNA Bracelets by Louise Brown Jasko John Anthony Campbell Jack Dennis Cassidy Michael Nickelsburg Stephen Prentis Rohm Objectives: 1) Using plastic beads, construct
More informationSupplemental Data. Short Article. PPARγ Activation Primes Human Monocytes. into Alternative M2 Macrophages. with Anti-inflammatory Properties
Cell Metabolism, Volume 6 Supplemental Data Short Article PPARγ Activation Primes Human Monocytes into Alternative M2 Macrophages with Anti-inflammatory Properties M. Amine Bouhlel, Bruno Derudas, Elena
More informationY-chromosome haplotype distribution in Han Chinese populations and modern human origin in East Asians
Vol. 44 No. 3 SCIENCE IN CHINA (Series C) June 2001 Y-chromosome haplotype distribution in Han Chinese populations and modern human origin in East Asians KE Yuehai ( `º) 1, SU Bing (3 Á) 1 3, XIAO Junhua
More informationProvincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.
Provincial Exam Questions Unit: Cell Biology: Protein Synthesis (B7 & B8) 2010 Jan 3. Describe the process of translation. (4 marks) 2009 Sample 8. What is the role of ribosomes in protein synthesis? A.
More informationMutation. Mutation provides raw material to evolution. Different kinds of mutations have different effects
Mutation Mutation provides raw material to evolution Different kinds of mutations have different effects Mutational Processes Point mutation single nucleotide changes coding changes (missense mutations)
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationpcmv6-neo Vector Application Guide Contents
pcmv6-neo Vector Application Guide Contents Package Contents and Storage Conditions... 2 Product Description... 2 Introduction... 2 Production and Quality Assurance... 2 Methods... 3 Other required reagents...
More informationMarine Biology DEC 2004; 146(1) : 53-64 http://dx.doi.org/10.1007/s00227-004-1423-6 Copyright 2004 Springer
Marine Biology DEC 2004; 146(1) : 53-64 http://dx.doi.org/10.1007/s00227-004-1423-6 Copyright 2004 Springer Archimer http://www.ifremer.fr/docelec/ Archive Institutionnelle de l Ifremer The original publication
More informationProtein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein
More informationBio-Informatics Lectures. A Short Introduction
Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively
More informationThe making of The Genoma Music
242 Summary Key words Resumen Palabras clave The making of The Genoma Music Aurora Sánchez Sousa 1, Fernando Baquero 1 and Cesar Nombela 2 1 Department of Microbiology, Ramón y Cajal Hospital, and 2 Department
More informationISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes
ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes Page 1 of 22 Introduction Indiana students enrolled in Biology I participated in the ISTEP+: Biology I Graduation Examination
More informationCloning, sequencing, and expression of H.a. YNRI and H.a. YNII, encoding nitrate and nitrite reductases in the yeast Hansenula anomala
Cloning, sequencing, and expression of H.a. YNRI and H.a. YNII, encoding nitrate and nitrite reductases in the yeast Hansenula anomala -'Pablo García-Lugo 1t, Celedonio González l, Germán Perdomo l, Nélida
More informationOn Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques
On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques MAGDY SAEB 1, EMAN EL-ABD 2, MOHAMED E. EL-ZANATY 1 1. School of Engineering, Computer Department,
More informationGene and Chromosome Mutation Worksheet (reference pgs. 239-240 in Modern Biology textbook)
Name Date Per Look at the diagrams, then answer the questions. Gene Mutations affect a single gene by changing its base sequence, resulting in an incorrect, or nonfunctional, protein being made. (a) A
More informationIntroduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov
More informationANALYSIS OF GROWTH HORMONE IN TENCH (TINCA TINCA) ANALÝZA RŮSTOVÉHO HORMONU LÍNA OBECNÉHO (TINCA TINCA)
ANALYSIS OF GROWTH HORMONE IN TENCH (TINCA TINCA) ANALÝZA RŮSTOVÉHO HORMONU LÍNA OBECNÉHO (TINCA TINCA) Zrůstová J., Bílek K., Baránek V., Knoll A. Ústav morfologie, fyziologie a genetiky zvířat, Agronomická
More informationTITRATION OF raav (VG) USING QUANTITATIVE REAL TIME PCR
Page 1 of 5 Materials DNase digestion buffer [13 mm Tris-Cl, ph7,5 / 5 mm MgCl2 / 0,12 mm CaCl2] RSS plasmid ptr-uf11 SV40pA Forward primer (10µM) AGC AAT AGC ATC ACA AAT TTC ACA A SV40pA Reverse Primer
More informationMolecular chaperones involved in preprotein. targeting to plant organelles
Molecular chaperones involved in preprotein targeting to plant organelles Dissertation der Fakultät für Biologie der Ludwig-Maximilians-Universität München vorgelegt von Christine Fellerer München 29.
More informationThe DNA-"Wave Biocomputer"
The DNA-"Wave Biocomputer" Peter P. Gariaev (Pjotr Garjajev)*, Boris I. Birshtein*, Alexander M. Iarochenko*, Peter J. Marcer**, George G. Tertishny*, Katherine A. Leonova*, Uwe Kaempf ***. * Institute
More informationHeraeus Sepatech, Kendro Laboratory Products GmbH, Berlin. Becton Dickinson,Heidelberg. Biozym, Hessisch Oldendorf. Eppendorf, Hamburg
13 4. MATERIALS 4.1 Laboratory apparatus Biofuge A Centrifuge 5804R FACScan Gel electrophoresis chamber GPR Centrifuge Heraeus CO-AUTO-ZERO Light Cycler Microscope Motopipet Neubauer Cell Chamber PCR cycler
More informationDrosophila NK-homeobox genes
Proc. Natl. Acad. Sci. USA Vol. 86, pp. 7716-7720, October 1989 Biochemistry Drosophila NK-homeobox genes (NK-1, NK-2,, and DNA clones/chromosome locations of genes) YONGSOK KIM AND MARSHALL NIRENBERG
More informationBiopython Tutorial and Cookbook
Biopython Tutorial and Cookbook Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck, Michiel de Hoon, Peter Cock Last Update September 2008 Contents 1 Introduction 5 1.1 What is Biopython?.........................................
More informationNimbleGen SeqCap EZ Library SR User s Guide Version 3.0
NimbleGen SeqCap EZ Library SR User s Guide Version 3.0 For life science research only. Not for use in diagnostic procedures. Copyright 2011 Roche NimbleGen, Inc. All Rights Reserved. Editions Version
More informationTransmembrane Signaling in Chimeras of the E. coli Chemotaxis Receptors and Bacterial Class III Adenylyl Cyclases
Transmembrane Signaling in Chimeras of the E. coli Chemotaxis Receptors and Bacterial Class III Adenylyl Cyclases Dissertation der Mathematisch-Naturwissenschaftlichen Fakultät der Eberhard Karls Universität
More informationRapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST
Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some
More informationDISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108
DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108 DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108 THE INTERLEUKIN-10 FAMILY CYTOKINES GENE POLYMORPHISMS IN PLAQUE PSORIASIS KÜLLI KINGO TARTU
More informationCharacterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.)
Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.) J. Paz-Ares, F. Ponz, P. Rodríguez-Palenzuela, A. Lázaro, C. Hernández-Lucas,
More information2006 7.012 Problem Set 3 KEY
2006 7.012 Problem Set 3 KEY Due before 5 PM on FRIDAY, October 13, 2006. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Which reaction is catalyzed by each
More informationIntroduction to Bioinformatics (Master ChemoInformatique)
Introduction to Bioinformatics (Master ChemoInformatique) Roland Stote Institut de Génétique et de Biologie Moléculaire et Cellulaire Biocomputing Group 03.90.244.730 rstote@igbmc.fr Biological Function
More informationAssociation of IGF1 and IGFBP3 polymorphisms with colorectal polyps and colorectal cancer risk
DOI 10.1007/s10552-009-9438-4 ORIGINAL PAPER Association of IGF1 and IGFBP3 polymorphisms with colorectal polyps and colorectal cancer risk Elisabeth Feik Æ Andreas Baierl Æ Barbara Hieger Æ Gerhard Führlinger
More informationEvent-specific Method for the Quantification of Maize MIR162 Using Real-time PCR. Protocol
Event-specific Method for the Quantification of Maize MIR162 Using Real-time PCR Protocol 31 January 2011 Joint Research Centre Institute for Health and Consumer Protection Molecular Biology and Genomics
More informationArchimer http://archimer.ifremer.fr
Please note that this is an author-produced PDF of an article accepted for publication following peer review. The definitive publisher-authenticated version is available on the publisher Web site Fish
More informationwere demonstrated to be, respectively, the catalytic and regulatory subunits of protein phosphatase 2A (PP2A) (29).
JOURNAL OF VIROLOGY, Feb. 1992, p. 886-893 0022-538X/92/020886-08$02.00/0 Copyright C) 1992, American Society for Microbiology Vol. 66, No. 2 The Third Subunit of Protein Phosphatase 2A (PP2A), a 55- Kilodalton
More informationInsulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus
Iranian Biomedical Journal 13 (3): 161-168 (July 2009) Insulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus Bahram Kazemi 1*, Negar Seyed 1, Elham Moslemi 2, Mojgan Bandehpour
More informationhttp://hdl.handle.net/10197/2727
Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Performance of DNA data embedding algorithms
More informationMolecular Facts and Figures
Nucleic Acids Molecular Facts and Figures DNA/RNA bases: DNA and RNA are composed of four bases each. In DNA the four are Adenine (A), Thymidine (T), Cytosine (C), and Guanine (G). In RNA the four are
More informationCore Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1
Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat
More informationN-terminal Regulatory Domains of Phosphodiesterases 1, 4, 5 and 10 examined with an Adenylyl Cyclase as a Reporter
N-terminal Regulatory Domains of Phosphodiesterases 1, 4, 5 and 10 examined with an Adenylyl Cyclase as a Reporter Dissertation der Mathematisch-Naturwissenschaftlichen Fakultät der Eberhard Karls Universität
More informationFive-minute cloning of Taq polymerase-amplified PCR products
TOPO TA Cloning Version R 8 April 2004 25-0184 TOPO TA Cloning Five-minute cloning of Taq polymerase-amplified PCR products Catalog nos. K4500-01, K4500-40, K4510-20, K4520-01, K4520-40, K4550-01, K4550-40,
More informationAll commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI_ NotI
2. Primer Design 2.1 Multiple Cloning Sites All commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI NotI XXX XXX GGA TCC CCG AAT
More informationSix Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype
Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype Iori Sakakibara 1,2,3, Marc Santolini 4, Arnaud Ferry 2,5, Vincent Hakim 4, Pascal Maire 1,2,3 * 1 INSERM U1016,
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationInterleukin-4 Receptor Signal Transduction: Involvement of P62
Interleukin-4 Receptor Signal Transduction: Involvement of P62 Den Naturwissenschaftlichen Fakultäten der Friedrich Alexander Universität Erlangen Nürnberg zur Erlangung des Doktorgrades vorgelegt von
More informationTHREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
More informationHidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006
Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm
More informationPhylogenetic Trees Made Easy
Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts
More informationPrinciples of Evolution - Origin of Species
Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X
More informationIntroduction to Genome Annotation
Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT
More informationThe Arabinosyltransferase EmbC Is Inhibited by Ethambutol in Mycobacterium tuberculosis
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, Oct. 2009, p. 4138 4146 Vol. 53, No. 10 0066-4804/09/$08.00 0 doi:10.1128/aac.00162-09 Copyright 2009, American Society for Microbiology. All Rights Reserved. The
More informationDNA Insertions and Deletions in the Human Genome. Philipp W. Messer
DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.
More informationDNA Sequencing of the eta Gene Coding for Staphylococcal Exfoliative Toxin Serotype A
Journal of General Microbiology (1988), 134, 71 1-71 7. Printed in Great Britain 71 1 DNA Sequencing of the eta Gene Coding for Staphylococcal Exfoliative Toxin Serotype A By SUSUMU SAKURA, HTOSH SUZUK
More informationChlamydomonas adapted Green Fluorescent Protein (CrGFP)
Chlamydomonas adapted Green Fluorescent Protein (CrGFP) Plasmid pfcrgfp for fusion proteins Sequence of the CrGFP In the sequence below, all amino acids which have been altered from the wildtype GFP from
More informationProtein Synthesis Simulation
Protein Synthesis Simulation Name(s) Date Period Benchmark: SC.912.L.16.5 as AA: Explain the basic processes of transcription and translation, and how they result in the expression of genes. (Assessed
More informationA Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML
9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze
More informationMolecular detection of Babesia rossi and Hepatozoon sp. in African wild dogs (Lycaon pictus) in South Africa
Available online at www.sciencedirect.com Veterinary Parasitology 157 (2008) 123 127 Short communication Molecular detection of Babesia rossi and Hepatozoon sp. in African wild dogs (Lycaon pictus) in
More informationIntroduction to Phylogenetic Analysis
Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.
More informationMetabolic Engineering of Escherichia coli for Enhanced Production of Succinic Acid, Based on Genome Comparison and In Silico Gene Knockout Simulation
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Dec. 2005, p. 7880 7887 Vol. 71, No. 12 0099-2240/05/$08.00 0 doi:10.1128/aem.71.12.7880 7887.2005 Copyright 2005, American Society for Microbiology. All Rights
More informationIntro to Map/Reduce a.k.a. Hadoop
Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by
More informationBioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationinhibition of mitosis
The EMBO Journal vol.13 no.2 pp.425-434, 1994 cdt 1 is an essential target of the Cdc 1 O/Sct 1 transcription factor: requirement for DNA replication and inhibition of mitosis Johannes F.X.Hofmann and
More informationa. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled
Biology 101 Chapter 14 Name: Fill-in-the-Blanks Which base follows the next in a strand of DNA is referred to. as the base (1) Sequence. The region of DNA that calls for the assembly of specific amino
More informationImpaired insulin and insulin-like growth factor expression and signaling mechanisms in Alzheimer s disease is this type 3 diabetes?
Journal of Alzheimer s Disease 7 (2005) 63 80 63 IOS Press Impaired insulin and insulin-like growth factor expression and signaling mechanisms in Alzheimer s disease is this type 3 diabetes? Eric Steen,
More informationIII III 0 IIOI DID IIO 1101 010 II0 1101 I IIII
(19) United States III III 0 IIOI DID IIO 1101 010 II0 1101 I IIII US 20020090376A1 III 1010 II 0I II (12) Patent Application Publication (lo) Pub. No.: US 2002/0090376 Al KANIGA et at. (43) Pub. Date:
More informationSimilarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003
Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:
More informationEvolution (18%) 11 Items Sample Test Prep Questions
Evolution (18%) 11 Items Sample Test Prep Questions Grade 7 (Evolution) 3.a Students know both genetic variation and environmental factors are causes of evolution and diversity of organisms. (pg. 109 Science
More informationChapter 5. Stripping Bacillus: ComK auto-stimulation is responsible for the bistable response in competence development
Stripping Bacillus: ComK auto-stimulation is responsible for the bistable response in competence development This chapter has been adapted from: W.K. Smits*, C.C. Eschevins*, K.A. Susanna, S. Bron, O.P.
More informationModule 10: Bioinformatics
Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior
More information