Sequence comparison, Part I: Substitution and Scores

Size: px
Start display at page:

Download "Sequence comparison, Part I: Substitution and Scores"

Transcription

1 Sequence comparison, Part I: Substitution and Scores David H. Ardell Docent of Bioinformatics

2 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

3 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

4 HOMOLOGY: common descent (Darwin, 1859) Original definition: "the same organ in different animals under every variety of form and function." (Owen, 1843). Richard Owen ( ) But: homology need not imply similarity of form nor function because of divergence. Similarity need not imply homology because of convergence.

5 Most Recent Common Ancestor Most Recent Common Ancestor Divergence Convergence

6 Most Recent Common Ancestor Earlier Common Ancestor Most Recent Common Ancestor Divergence Convergence

7 Morphology vs. Sequences GCCACTTT CGCGATCA GAAACGTT CGTGATCG GGCAGTTT CGCGATTT

8 Morphology DNA Sequences GCCACTTT CGCGATCA GGCAGATT CAGGATTT GGCAGATT CAGGATTT Convergence More Common Convergence Very Rare!!

9 Why sequence convergence is rare: Many genotypes code for the same phenotype Development GCCACTTT CGCGATCA Evolution Convergent Phenotype Development GAAACGTT CGTGATCG Divergent Genotype GGCAGATT CAGGATTT

10 The enormity of sequence space: DNA (a = 4) L = 1 A G T C N = L a = 4 1 = 4 K = NL(a 1) = 12

11 The enormity of sequence space: DNA (a = 4) L = 1 L = 2 T A G C A T A A A G A C GA GT GG GC TA TT TG TC C A C T C G C C N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 4 2 = 16

12 The enormity of sequence space: DNA (a = 4) L = 1 L = 2 T A G C A T A A A G A C GA GT GG GC TA TT TG TC C A C T C G C C N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 4 2 = 16 K = NL(a 1) = 96

13 The enormity of sequence space: DNA (a = 4) L = 3 AAA ATA AGA GAA GTA ACA GGA GCA AAG ATG AGGGAG GTG ACG GGG GCG TAA TGA CAA CGA TAG TGG CAG CGG TTA TCACTA CCA TTG TCGCTG CCG AAT AGT GAT GGT AAC AGC GAC GGC ATT GTT ACT GCT ATC GTC ACC GCC TAT TGT CAT CGT TAC TGC CAC CGC TTT TCT CTT CCT TTC TCC CTC CCC N = L a = 4 3 = 64 K = NL(a 1) = 576

14 The enormity of sequence space DNA (a = 4), L = 300: N = L a = x K = NL(a 1) 3.74 x

15 The enormity of sequence space DNA (a = 4), L = 300: N = L a = x K = NL(a 1) 3.74 x The probability of two independent randomly evolving sequences converging over any but very small lengths is infinitesimally small.

16 Similarity implies homology DNA (a = 4), L = 300: N = L a = x K = NL(a 1) 3.74 x The probability of two independent randomly evolving sequences converging over any but very small lengths is infinitesimally small. Sequences more similar than expected from random are therefore inferred to have evolved from a common ancestor.

17 Similarity implies homology for sequences Similar morphologies need not imply homology because of convergence. Similar sequences do imply homology because convergence is improbable. GCCACGTTCGCGATCG GGCAGTCTCGCGATTT

18 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

19 Homologous DNA sequences GCCACGTTCGCGATCG GGCAGTCTCGCGATTT

20 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA Significantly similar sequences (such as from a BLAST search) are inferred to have come from a common ancestor GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT GGCAGTTTCGCGATTT Homologous sequences

21 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA T 0 All the differences we see between homologs must have evolved since their diverged GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT GGCAGTTTCGCGATTT T now Homologous sequences

22 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA T 0 GCCACTTTCGCGATCG GCCACTTTCGCGATCA T 1 GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences

23 Homologous DNA sequences Ancestral sequence GCCACTTTCGCGATCA T 0 GCCACTTTCGCGATCG GCCACTTTCGCGATCG GCCACTTTCGCGATTA T 1 T 2 GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences

24 Homologous DNA sequences Ancestral sequence GCCAGTTTCGCGATCT T 0 GCCAGTTTCGCGATCG T 1 GCCAGTTTCGCGATTA T 2 GCCAGGTTCGTGATCG T 3 GCCACGTTCGCGATCG GCCAGTCTCGCGATTA GGCAGTCTCGCGATTT T 4 T 5 T 6 GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT GGCAGTCTCGCGATTT T now Homologous sequences

25 Homologous DNA sequences Ancestral sequence GCCAGTTTCGCGATCT T 0 GCCAGTTTCGCGATCG T 1 GCCAGTTTCGCGATTA T 2 GCCAGGTTCGTGATCG T 3 GCCACGTTCGCGATCG GCCAGTCTCGCGATTA GGCAGTCTCGCGATTT T 4 T 5 T 6 GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTCTCGCGATTT T now Homologous bases at a site

26 Rate of Evolution: changes per time (or per generation) per sequence and per site. Ancestral sequence GCCACTTTCGCGATCA T 0 time t GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTTTCGCGATTT T now 6 differences per 16 sites per 2 sequences = (6 / 16) / 2 = 18.75% per time t

27 Why divide by two? to estimate how one sequence changes over time Ancestral sequence GCCAGTTTCGCGATCT T 0 GCCAGTTTCGCGATTA time t GCCAGTCTCGCGATTA GGCAGTCTCGCGATTT GGCAGTCTCGCGATTT T now 3 differences per 16 sites = (3 / 16) = 18.75% per time t

28 We usually don't know ancestral sequences. So we compare sequences to infer evolutionary changes? T 0 time t GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTTTCGCGATTT T now 6 differences per 16 sites per 2 sequences = (6 / 16) / 2 = 18.75% per time t

29 We usually don't know how much time has passed. So we calculate Evolutionary distance as rate X time.? T 0 time? GCCACGTTCGCGATCG GCCACGTTCGCGATCG GGCAGTCTCGCGATTT Homologous sequences GGCAGTTTCGCGATTT T now 6 differences per 16 sites per 2 sequences = (6 / 16) / 2 = 18.75% divergence

30 There may thus exist a Molecular Evolutionary Clock Zuckerkandl & Pauling (1965) % amino acid differences Divergence between α and β or γ Divergence between β, and γ Approx. duplication dates (mya) from vertebrate fossil records

31 Different protein clocks tick at different rates:

32 Different protein clocks tick at different rates

33 A given large divergence can be attained from a fast rate and short time or a slow rate and a long time

34 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

35 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1

36 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2

37 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2

38 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2: 2 mutations

39 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2: 2 mutations t = 3

40 Q: What is a substitution? A: A substitution is the fixation of a mutation in a population. It has been accepted by natural selection. Population of 5 individuals at generation t = 1 t = 2: 2 mutations t = 3 t = 4: 1 substitution

41 Sequence differences between species are often assumed to be substitutions (fixed differences). Ancestor Species 1 Species 2

42 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

43 % identity (100 - %differences) underestimates evolutionary divergence! % amino acid differences Approx. duplication dates (mya) from vertebrate fossil records

44 Why Percent Identity (%ID) underestimates evolution The more sequences evolve, the more changes we miss. ANCESTOR

45 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site

46 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences

47 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences Back changes can undo earlier changes

48 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences Back changes can undo earlier changes 4 changes, 1 difference

49 Why Percent Identity (%ID) underestimates divergence The more sequences evolve, the more changes we miss. ANCESTOR Multiple changes can hit the same site 3 changes, 2 differences Back changes can undo earlier changes 4 changes, 1 difference Parallel changes hide evolution 6 changes, 1 difference

50 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

51 The Poisson Correction Imagine substitutions raining down on sequences:

52 The Poisson Correction Imagine substitutions raining down on sequences:

53 The Poisson Correction Imagine substitutions raining down on sequences:

54 The Poisson Correction Imagine substitutions raining down on sequences:

55 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n).

56 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time.

57 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time. 3. Each site has probability λ/n of mutating at distance λ, where it is assumed that N is large. The average fraction of sites not mutated (p/n) is then: (1 - λ/n) N e λ (for large N).

58 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time. 3. Each site has probability λ/n of mutating at distance λ, where it is assumed that N is large. The average fraction of sites not mutated (p/n) is then: (1 - λ/n) N e λ (for large N). 4. Therefore, if we see p out of N sites not mutated and assume no back or parallel substitutions, we can estimate λ = ln (p/n).

59 The Poisson Correction Imagine substitutions raining down on sequences: 1. Want to estimate avg. evolutionary distance λ (number of substitutions per site) from %ID = 100 x (p/n). 2. Assume substitutions occur independently by site and time. 3. Each site has probability λ/n of mutating at distance λ, where it is assumed that N is large. The average fraction of sites not mutated (p/n) is then: (1 - λ/n) N e λ (for large N). 4. Therefore, if we see p out of N sites not mutated and assume no back or parallel substitutions, we can estimate λ = ln (p/n). 5. Ex: %ID of 38% implies λ = -ln(0.38) 1. About as many substitutions have occurred as the length of the sequence.

60 Poisson-Corrected Evolutionary Distance vs. %ID Substitutions per site 38%ID = %ID = 0.5 %ID

61 The effect of alphabet size DNA (a = 4) Protein (a = 20) A G T C L = 1 N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 20 1 = 20 K = NL(a 1) = 380 At a given position, randomly evolving proteins are less likely than DNA to mutate back ( revert ) to an earlier state.

62 When should you use the Poisson Correction? DNA (a = 4) Protein (a = 20) A G T C L = 1 N = L a = 4 1 = 4 K = NL(a 1) = 12 N = L a = 20 1 = 20 K = NL(a 1) = 380 The Poisson correction assumes no back or parallel substitutions so it is most appropriate for proteins at short evolutionary distances.

63 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

64 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( )

65 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Basic idea: 1. Collect a big dataset of alignments of closely related proteins. Margaret Dayhoff ( )

66 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of alignments of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset.

67 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset. 3. Calculate the transition probabilities for any amino acid to substitute to another amino acid after 1% sequence divergence.

68 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset. 3. Calculate from this the transition probabilities for any amino acid to substitute into any other amino acid after 1% sequence divergence. 4. This defines the PAM1 substitution matrix ( Point Accepted Mutation, where accepted implies by natural selection ).

69 Improving the Poisson correction: PAM Amino Acid Substitution Matrices Margaret Dayhoff ( ) Basic idea: 1. Collect a big dataset of closely related proteins. 2. Count amino acid changes and the total composition of amino acids in the dataset. 3. Calculate from this the transition probabilities for any amino acid to substitute into any other amino acid after 1% sequence divergence. 4. This defines the PAM1 matrix ( Point Accepted Mutation, where accepted implies by natural selection ). 5. Assume that the transition probabilities after N% sequence divergence are given by the N-th power of the PAM1 matrix. Ex: PAM250 = (PAM1) 250

70 Example: part of the PAM15 matrix of Jones, Taylor and Thornton (1998) A R N D C Q E G H I L K... A R N D C Q E G H I J K

71 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites in the sequence.

72 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. The probability of A becoming B at 2% divergence is PAM2(B A) = Σ x PAM1(B x) * PAM1(x A) A. B

73 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. PAM2 = PAM1*PAM1 = (PAM1) 2 PAM3 = PAM2*PAM1 = (PAM1) 3 PAMn = PAMn-1*PAM1 = (PAM1) n

74 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. 3. Sufficient Sample Size: Sequence composition is the same as in the alignments used to make the matrix.

75 Assumptions of PAM Substitution Matrices 1. Site Independence: Probability of substitution at a site is independent of amino acids in all other sites. 2. Markov Property: Probability of substitution at a site depends only on the site s present state, not on its history. 3. Sufficient Sample Size: Sequence composition is the same as in the alignments used to make the matrix. 4. Stationarity: The probabilities of substitutions do not change with time.

76 Q: What does PAM % change to a protein mean?

77 Q: What does PAM % change to a protein mean? A: a little less than 82% divergence, i.e. just over 18% ID

78 Part of the PAM250 matrix of Jones, Taylor and Thornton (1998) A R N D C Q E G H I L K... A R N D C Q E G H I J K

79 Part of the PAM1000 matrix of Jones, Taylor and Thornton (1998) A R N D C Q E G H I L K... A R N D C Q E G H I J K PAM matrix transition probabilities converge to the composition of the database used to make them

80 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

81 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles

82 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11: 2 * [p( )+p( )+p( )+p( )]

83 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles: 2 * [p( )+p( )+p( )+p( )] p( )+p( )+p( )+p( )+p( )+p( )

84 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles: 2 * [p( )+p( )+p( )+p( )] p( )+p( )+p( )+p( )+p( )+p( ) = 8 / 36 = 4 : 3 odds 6 / 36

85 Odds are ratios of probabilities Example 1: Odds of rolling 7 or 11 versus rolling doubles: 2 * [p( )+p( )+p( )+p( )] p( )+p( )+p( )+p( )+p( )+p( ) = 8 / 36 = 4 : 3 odds 6 / 36 Example 2: Odds of rolling doubles versus a poker flush : p( )+p( )+p( )+p( )+p( )+p( ) p(5 )+p(5 )+p(5 )+p(5 ) = 6 / : 1 4 * (13/52 * 12/51 * 11/50 * 10/49 * 9/48) odds

86 Odds versus Likelihood Ratios Odds can be made of any probabilities, even over different event spaces: p( )+p( )+p( )+p( )+p( )+p( ) 3030 : 1 odds p(5 )+p(5 )+p(5 )+p(5 )

87 Odds versus Likelihood Ratios: Odds can be made of any probabilities, even over different event spaces: p( )+p( )+p( )+p( )+p( )+p( ) 3030 : 1 odds p(5 )+p(5 )+p(5 )+p(5 ) Likelihood ratios must be made over the same events. Example: The likelihood ratio of the word HELLO in a random sequence of letters with English frequencies, versus uniform freqs.: p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.)

88 Likelihood Ratios Model 1: English Probabilities of letters in English E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.)

89 Likelihood Ratios compare the likelihoods of the same event in two different models Model 1: English Probabilities of letters in English E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) Uniform probabilities ( = 1/26) E T I O A N S H R L D U C Y G W M B F P V K X Q J Z Model 2: Uniform

90 Likelihood Ratios compare the likelihoods of the same event in two different models E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.)

91 Likelihood Ratios compare the likelihoods of the same event in two different models E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = 12.8

92 Likelihood Ratios compare the likelihoods of the same event in two different models E T I O A N S H R L D U C Y G W M B F P V K X Q J Z p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = 12.8 HELLO is about 13 times more likely in a sequence with English letter frequencies than random

93 Independence of elementary events makes calculating compound event likelihoods easy p( HELLO Eng.) p( HELLO Unif.) = p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = x 10-6 * * * * = x 10-8 =

94 Independence of elementary events makes calculating compound event likelihoods easy p( HELLO Eng.) p( HELLO Unif.) p( H Eng.) * p( E Eng.) * p( L Eng.) * p( L Eng.) * p( O Eng.) p( H Unif.) * p( E Unif.) * p( L Unif.) * p( L Unif.) * p( O Unif.) = * * * * x * * * * x 10-8 = x 10-6 * * * * = x * * * * = =

95 Log-Likelihood Ratios let you add instead of multiply (avoiding overflow, etc.) p( HELLO Eng.) p( HELLO Unif.) = x 10-6 * * * * = x * * * * log2(1.4) + log2(3.2) + 2 * log2(1.2) + log2(2.0) log2(12.8)

96 Log-Likelihood Ratios of symbols are called LOD Scores ( LOD stands for Log-Odds ) p( HELLO Eng.) p( HELLO Unif.) = x 10-6 * * * * = x * * * * log2(1.4) + log2(3.2) + 2 * log2(1.2) + log2(2.0) log2(12.8) S( H ) + S( E ) + 2 * S( L ) + S( O ) log2(12.8)

97 Scores Likelihoods !;!<!= > = A positive score means the symbol is more likely in model 1; a negative score means it is more likely in model 2 E T I O A N S H R L D U C Y G W M B F P V K X Q J Z! " # $ % & ' ( ) * +, -. / :

98 Scores Likelihoods !;!<!= > = We use log2 for scores, so +1: an event is twice as likely in model 1 than model 2 1: an event is half as likely in model 1 than model 2 E T I O A N S H R L D U C Y G W M B F P V K X Q J Z O is twice as likely in English than random M is half as likely in English than random! " # $ % & ' ( ) * +, -. / :

99 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

100 To score pairwise alignments, elementary events are pairs of amino acids or nucleotides in a column S( ) = S( ) +S( ) +S( ) +S( ) +S( )

101 To score pairwise alignments, elementary events are pairs of amino acids or nucleotides in a column S( ) = S( ) +S( ) +S( ) +S( ) +S( )

102 S( ) = log2 p( p( evolution) chance)

103 S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( )

104 Model 1: Evolution From Substitution Matrices S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( )

105 Model 1: Evolution From Substitution Matrices S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( ) Model 2: Chance Two picks from a Random Urn with Database composition

106 Model 1: Evolution From Substitution Matrices S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( ) Model 2: Chance Two picks from a Random Urn with Database composition Probability of pairs when sliding unrelated sequences past each other

107 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

108 Unlike Subst. Matrices, Score Matrices are symmetric Model 1: Evolution S( ) = log2 p( p( evolution) chance) = log2 p( ) p( ) p( ) Model 2: Chance = log2 p( ) p( ) p( )

109 BLOSSUM 62 Score Matrix

110 BLOSSUM 62 Score Matrix Bedell et al Figure 4-3. Amino acid chemical relationships Isoleucine Leucine Phenylalanine

111 PAM vs BLOSSUM PAM Starts from alignments of closely related proteins Builds trees to avoid overcounting related sequences inferred ancestral states are used to estimate transition probabilities Transition probabilities at larger evolutionary distances are extrapolated from those at short distances Larger PAMs model bigger distances (Ex: PAM250 > PAM 100) BLOSSUM Starts from alignments of both closely and distantly related proteins Clusters sequences by single-linkage to avoid overcounting. all pairs in a clustered alignment are used to calculate pair probabilities Transition probabilities at different evolutionary distances are estimated empirically from clusters made at different minimal percent identities Larger BLOSSUMs model shorter distances (from higher %ID clusters) (Ex: BLOSSUM62 > BLOSSUM80)

112 Other Amino Acid Substitution/Score Matrices Some matrices are updates of the original Dayhoff method with more data or some technical refinements Ex: JTT, Jones, Taylor, Thornton Gonnet, Benner and Cohen Some matrices are for specialized kinds or parts of proteins. Ex: JTT transmembrane protein matrix Goldstein secondary structure matrices Some matrices have different assumptions Ex: BLOSSUM does not assume Markov property. Matrices are computed independently from alignments at different % IDs. BLOSSUM matrices are labeled by expected %ID, so BLOSSUM30 > BLOSSUM62, whereas PAM100 < PAM250!!!

113 Outline of the lecture Convergence and Divergence Similarity and Homology Percent Difference as Evolutionary Distance Mutations and Substitutions Hidden change in sequences Poisson Correction Substitution Matrices Odds, Likelihood Ratios, Log-Likelihoods, Scores Sequence similarity scores Score Matrices: PAM and BLOSSUM DNA Matrices

114 Matrix models of DNA evolution A G C T A G C T A * α α α G α α α C α α * α T α α α * The Jukes-Cantor model

115 Matrix models of DNA evolution Pools A G C T The Jukes-Cantor model

116 Matrix models of DNA evolution A C G T Flows out A G C T A * α α α G α α α C α α * α The Jukes-Cantor model T α α α *

117 Matrix models of DNA evolution A C G T Flows in A G C T A * α α α G α α α C α α * α The Jukes-Cantor model T α α α *

118 Matrix models of DNA evolution A C G T Because of symmetry, sequences evolve to the uniform base composition (25%A, 25%G, 25%C, 25%T). The Jukes-Cantor model

119 Matrix models of DNA evolution A G C T The Kimura model A G C T A * β α α G β α α C α α * β T α α β *

120 Matrix models of DNA evolution A G C T The Kimura model

121 Matrix models of DNA evolution A G C T The Kimura model

122 Matrix models of DNA evolution A C G T The Kimura model A G C T A * β α α G β α α C α α * β T α α β *

(http://genomes.urv.es/caical) TUTORIAL. (July 2006)

(http://genomes.urv.es/caical) TUTORIAL. (July 2006) (http://genomes.urv.es/caical) TUTORIAL (July 2006) CAIcal manual 2 Table of contents Introduction... 3 Required inputs... 5 SECTION A Calculation of parameters... 8 SECTION B CAI calculation for FASTA

More information

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core

GENEWIZ, Inc. DNA Sequencing Service Details for USC Norris Comprehensive Cancer Center DNA Core DNA Sequencing Services Pre-Mixed o Provide template and primer, mixed into the same tube* Pre-Defined o Provide template and primer in separate tubes* Custom o Full-service for samples with unknown concentration

More information

Mutations and Genetic Variability. 1. What is occurring in the diagram below?

Mutations and Genetic Variability. 1. What is occurring in the diagram below? Mutations and Genetic Variability 1. What is occurring in the diagram below? A. Sister chromatids are separating. B. Alleles are independently assorting. C. Genes are replicating. D. Segments of DNA are

More information

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors. Legends of supplemental figures and tables Figure 1: Overview of study design and results. (A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors. After raw data gene expression

More information

Hands on Simulation of Mutation

Hands on Simulation of Mutation Hands on Simulation of Mutation Charlotte K. Omoto P.O. Box 644236 Washington State University Pullman, WA 99164-4236 omoto@wsu.edu ABSTRACT This exercise is a hands-on simulation of mutations and their

More information

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet

UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet 1 UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Exam in: MBV4010 Arbeidsmetoder i molekylærbiologi og biokjemi I MBV4010 Methods in molecular biology and biochemistry I Day of exam:.

More information

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 15: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 15: lecture 5 Substitution matrices Multiple sequence alignment A teacher's dilemma To understand... Multiple sequence alignment Substitution matrices Phylogenetic trees You first need

More information

DNA Sample preparation and Submission Guidelines

DNA Sample preparation and Submission Guidelines DNA Sample preparation and Submission Guidelines Requirements: Please submit samples in 1.5ml microcentrifuge tubes. Fill all the required information in the Eurofins DNA sequencing order form and send

More information

10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C)

10 µg lyophilized plasmid DNA (store lyophilized plasmid at 20 C) TECHNICAL DATA SHEET BIOLUMINESCENCE RESONANCE ENERGY TRANSFER RENILLA LUCIFERASE FUSION PROTEIN EXPRESSION VECTOR Product: prluc-c Vectors Catalog number: Description: Amount: The prluc-c vectors contain

More information

Table S1. Related to Figure 4

Table S1. Related to Figure 4 Table S1. Related to Figure 4 Final Diagnosis Age PMD Control Control 61 15 Control 67 6 Control 68 10 Control 49 15 AR-PD PD 62 15 PD 65 4 PD 52 18 PD 68 10 AR-PD cingulate cortex used for immunoblot

More information

Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1

Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation. Beginning Perl, Chap 4 6. Example 1 Introduction to Perl Programming Input/Output, Regular Expressions, String Manipulation Beginning Perl, Chap 4 6 Example 1 #!/usr/bin/perl -w use strict; # version 1: my @nt = ('A', 'C', 'G', 'T'); for

More information

The p53 MUTATION HANDBOOK

The p53 MUTATION HANDBOOK The p MUTATION HANDBOOK Version 1. /7 Thierry Soussi Christophe Béroud, Dalil Hamroun Jean Michel Rubio Nevado http://p/free.fr The p Mutation HandBook By T Soussi, J.M. Rubio-Nevado, D. Hamroun and C.

More information

Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation

Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation BDGP Resources Inverse PCR & Cycle Sequencing of P Element Insertions for STS Generation For recovery of sequences flanking PZ, PlacW and PEP elements E. Jay Rehm Berkeley Drosophila Genome Project I.

More information

Supplementary Online Material for Morris et al. sirna-induced transcriptional gene

Supplementary Online Material for Morris et al. sirna-induced transcriptional gene Supplementary Online Material for Morris et al. sirna-induced transcriptional gene silencing in human cells. Materials and Methods Lentiviral vector and sirnas. FIV vector pve-gfpwp was prepared as described

More information

Gene Synthesis 191. Mutagenesis 194. Gene Cloning 196. AccuGeneBlock Service 198. Gene Synthesis FAQs 201. User Protocol 204

Gene Synthesis 191. Mutagenesis 194. Gene Cloning 196. AccuGeneBlock Service 198. Gene Synthesis FAQs 201. User Protocol 204 Gene Synthesis 191 Mutagenesis 194 Gene Cloning 196 AccuGeneBlock Service 198 Gene Synthesis FAQs 201 User Protocol 204 Gene Synthesis Overview Gene synthesis is the most cost-effective way to enhance

More information

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain?

Part ONE. a. Assuming each of the four bases occurs with equal probability, how many bits of information does a nucleotide contain? Networked Systems, COMPGZ01, 2012 Answer TWO questions from Part ONE on the answer booklet containing lined writing paper, and answer ALL questions in Part TWO on the multiple-choice question answer sheet.

More information

SERVICES CATALOGUE WITH SUBMISSION GUIDELINES

SERVICES CATALOGUE WITH SUBMISSION GUIDELINES SERVICES CATALOGUE WITH SUBMISSION GUIDELINES 3921 Montgomery Road Cincinnati, Ohio 45212 513-841-2428 www.agctsequencing.com CONTENTS Welcome Dye Terminator Sequencing DNA Sequencing Services - Full Service

More information

Gene Finding CMSC 423

Gene Finding CMSC 423 Gene Finding CMSC 423 Finding Signals in DNA We just have a long string of A, C, G, Ts. How can we find the signals encoded in it? Suppose you encountered a language you didn t know. How would you decipher

More information

Supplementary Information. Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human

Supplementary Information. Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human Supplementary Information Binding region and interaction properties of sulfoquinovosylacylglycerol (SQAG) with human vascular endothelial growth factor 165 revealed by biosensor based assays Yoichi Takakusagi

More information

pcas-guide System Validation in Genome Editing

pcas-guide System Validation in Genome Editing pcas-guide System Validation in Genome Editing Tagging HSP60 with HA tag genome editing The latest tool in genome editing CRISPR/Cas9 allows for specific genome disruption and replacement in a flexible

More information

Next Generation Sequencing

Next Generation Sequencing Next Generation Sequencing 38. Informationsgespräch der Blutspendezentralefür Wien, Niederösterreich und Burgenland Österreichisches Rotes Kreuz 22. November 2014, Parkhotel Schönbrunn Die Zukunft hat

More information

Molecular analyses of EGFR: mutation and amplification detection

Molecular analyses of EGFR: mutation and amplification detection Molecular analyses of EGFR: mutation and amplification detection Petra Nederlof, Moleculaire Pathologie NKI Amsterdam Henrique Ruijter, Ivon Tielen, Lucie Boerrigter, Aafke Ariaens Outline presentation

More information

Title : Parallel DNA Synthesis : Two PCR product from one DNA template

Title : Parallel DNA Synthesis : Two PCR product from one DNA template Title : Parallel DNA Synthesis : Two PCR product from one DNA template Bhardwaj Vikash 1 and Sharma Kulbhushan 2 1 Email: vikashbhardwaj@ gmail.com 1 Current address: Government College Sector 14 Gurgaon,

More information

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein

Coding sequence the sequence of nucleotide bases on the DNA that are transcribed into RNA which are in turn translated into protein Assignment 3 Michele Owens Vocabulary Gene: A sequence of DNA that instructs a cell to produce a particular protein Promoter a control sequence near the start of a gene Coding sequence the sequence of

More information

Module 6: Digital DNA

Module 6: Digital DNA Module 6: Digital DNA Representation and processing of digital information in the form of DNA is essential to life in all organisms, no matter how large or tiny. Computing tools and computational thinking

More information

Chapter 9. Applications of probability. 9.1 The genetic code

Chapter 9. Applications of probability. 9.1 The genetic code Chapter 9 Applications of probability In this chapter we use the tools of elementary probability to investigate problems of several kinds. First, we study the language of life by focusing on the universal

More information

Amino Acids and Their Properties

Amino Acids and Their Properties Amino Acids and Their Properties Recap: ss-rrna and mutations Ribosomal RNA (rrna) evolves very slowly Much slower than proteins ss-rrna is typically used So by aligning ss-rrna of one organism with that

More information

Inverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project

Inverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project Inverse PCR and Sequencing of P-element, piggybac and Minos Insertion Sites in the Drosophila Gene Disruption Project Protocol for recovery of sequences flanking insertions in the Drosophila Gene Disruption

More information

ANALYSIS OF A CIRCULAR CODE MODEL

ANALYSIS OF A CIRCULAR CODE MODEL ANALYSIS OF A CIRCULAR CODE MODEL Jérôme Lacan and Chrstan J. Mchel * Laboratore d Informatque de Franche-Comté UNIVERSITE DE FRANCHE-COMTE IUT de Belfort-Montbélard 4 Place Tharradn - BP 747 5 Montbélard

More information

MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

MORPHEUS. http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. MORPHEUS http://biodev.cea.fr/morpheus/ Prediction of Transcription Factors Binding Sites based on Position Weight Matrix. Reference: MORPHEUS, a Webtool for Transcripton Factor Binding Analysis Using

More information

http://www.life.umd.edu/grad/mlfsc/ DNA Bracelets

http://www.life.umd.edu/grad/mlfsc/ DNA Bracelets http://www.life.umd.edu/grad/mlfsc/ DNA Bracelets by Louise Brown Jasko John Anthony Campbell Jack Dennis Cassidy Michael Nickelsburg Stephen Prentis Rohm Objectives: 1) Using plastic beads, construct

More information

Supplemental Data. Short Article. PPARγ Activation Primes Human Monocytes. into Alternative M2 Macrophages. with Anti-inflammatory Properties

Supplemental Data. Short Article. PPARγ Activation Primes Human Monocytes. into Alternative M2 Macrophages. with Anti-inflammatory Properties Cell Metabolism, Volume 6 Supplemental Data Short Article PPARγ Activation Primes Human Monocytes into Alternative M2 Macrophages with Anti-inflammatory Properties M. Amine Bouhlel, Bruno Derudas, Elena

More information

Y-chromosome haplotype distribution in Han Chinese populations and modern human origin in East Asians

Y-chromosome haplotype distribution in Han Chinese populations and modern human origin in East Asians Vol. 44 No. 3 SCIENCE IN CHINA (Series C) June 2001 Y-chromosome haplotype distribution in Han Chinese populations and modern human origin in East Asians KE Yuehai ( `º) 1, SU Bing (3 Á) 1 3, XIAO Junhua

More information

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme.

Provincial Exam Questions. 9. Give one role of each of the following nucleic acids in the production of an enzyme. Provincial Exam Questions Unit: Cell Biology: Protein Synthesis (B7 & B8) 2010 Jan 3. Describe the process of translation. (4 marks) 2009 Sample 8. What is the role of ribosomes in protein synthesis? A.

More information

Mutation. Mutation provides raw material to evolution. Different kinds of mutations have different effects

Mutation. Mutation provides raw material to evolution. Different kinds of mutations have different effects Mutation Mutation provides raw material to evolution Different kinds of mutations have different effects Mutational Processes Point mutation single nucleotide changes coding changes (missense mutations)

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What

More information

pcmv6-neo Vector Application Guide Contents

pcmv6-neo Vector Application Guide Contents pcmv6-neo Vector Application Guide Contents Package Contents and Storage Conditions... 2 Product Description... 2 Introduction... 2 Production and Quality Assurance... 2 Methods... 3 Other required reagents...

More information

Marine Biology DEC 2004; 146(1) : 53-64 http://dx.doi.org/10.1007/s00227-004-1423-6 Copyright 2004 Springer

Marine Biology DEC 2004; 146(1) : 53-64 http://dx.doi.org/10.1007/s00227-004-1423-6 Copyright 2004 Springer Marine Biology DEC 2004; 146(1) : 53-64 http://dx.doi.org/10.1007/s00227-004-1423-6 Copyright 2004 Springer Archimer http://www.ifremer.fr/docelec/ Archive Institutionnelle de l Ifremer The original publication

More information

Protein Sequence Analysis - Overview -

Protein Sequence Analysis - Overview - Protein Sequence Analysis - Overview - UDEL Workshop Raja Mazumder Research Associate Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Topics Why do protein

More information

Bio-Informatics Lectures. A Short Introduction

Bio-Informatics Lectures. A Short Introduction Bio-Informatics Lectures A Short Introduction The History of Bioinformatics Sanger Sequencing PCR in presence of fluorescent, chain-terminating dideoxynucleotides Massively Parallel Sequencing Massively

More information

The making of The Genoma Music

The making of The Genoma Music 242 Summary Key words Resumen Palabras clave The making of The Genoma Music Aurora Sánchez Sousa 1, Fernando Baquero 1 and Cesar Nombela 2 1 Department of Microbiology, Ramón y Cajal Hospital, and 2 Department

More information

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes

ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes ISTEP+: Biology I End-of-Course Assessment Released Items and Scoring Notes Page 1 of 22 Introduction Indiana students enrolled in Biology I participated in the ISTEP+: Biology I Graduation Examination

More information

Cloning, sequencing, and expression of H.a. YNRI and H.a. YNII, encoding nitrate and nitrite reductases in the yeast Hansenula anomala

Cloning, sequencing, and expression of H.a. YNRI and H.a. YNII, encoding nitrate and nitrite reductases in the yeast Hansenula anomala Cloning, sequencing, and expression of H.a. YNRI and H.a. YNII, encoding nitrate and nitrite reductases in the yeast Hansenula anomala -'Pablo García-Lugo 1t, Celedonio González l, Germán Perdomo l, Nélida

More information

On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques

On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques On Covert Data Communication Channels Employing DNA Recombinant and Mutagenesis-based Steganographic Techniques MAGDY SAEB 1, EMAN EL-ABD 2, MOHAMED E. EL-ZANATY 1 1. School of Engineering, Computer Department,

More information

Gene and Chromosome Mutation Worksheet (reference pgs. 239-240 in Modern Biology textbook)

Gene and Chromosome Mutation Worksheet (reference pgs. 239-240 in Modern Biology textbook) Name Date Per Look at the diagrams, then answer the questions. Gene Mutations affect a single gene by changing its base sequence, resulting in an incorrect, or nonfunctional, protein being made. (a) A

More information

Introduction to Bioinformatics 3. DNA editing and contig assembly

Introduction to Bioinformatics 3. DNA editing and contig assembly Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 matthewb@ba.ars.usda.gov

More information

ANALYSIS OF GROWTH HORMONE IN TENCH (TINCA TINCA) ANALÝZA RŮSTOVÉHO HORMONU LÍNA OBECNÉHO (TINCA TINCA)

ANALYSIS OF GROWTH HORMONE IN TENCH (TINCA TINCA) ANALÝZA RŮSTOVÉHO HORMONU LÍNA OBECNÉHO (TINCA TINCA) ANALYSIS OF GROWTH HORMONE IN TENCH (TINCA TINCA) ANALÝZA RŮSTOVÉHO HORMONU LÍNA OBECNÉHO (TINCA TINCA) Zrůstová J., Bílek K., Baránek V., Knoll A. Ústav morfologie, fyziologie a genetiky zvířat, Agronomická

More information

TITRATION OF raav (VG) USING QUANTITATIVE REAL TIME PCR

TITRATION OF raav (VG) USING QUANTITATIVE REAL TIME PCR Page 1 of 5 Materials DNase digestion buffer [13 mm Tris-Cl, ph7,5 / 5 mm MgCl2 / 0,12 mm CaCl2] RSS plasmid ptr-uf11 SV40pA Forward primer (10µM) AGC AAT AGC ATC ACA AAT TTC ACA A SV40pA Reverse Primer

More information

Molecular chaperones involved in preprotein. targeting to plant organelles

Molecular chaperones involved in preprotein. targeting to plant organelles Molecular chaperones involved in preprotein targeting to plant organelles Dissertation der Fakultät für Biologie der Ludwig-Maximilians-Universität München vorgelegt von Christine Fellerer München 29.

More information

The DNA-"Wave Biocomputer"

The DNA-Wave Biocomputer The DNA-"Wave Biocomputer" Peter P. Gariaev (Pjotr Garjajev)*, Boris I. Birshtein*, Alexander M. Iarochenko*, Peter J. Marcer**, George G. Tertishny*, Katherine A. Leonova*, Uwe Kaempf ***. * Institute

More information

Heraeus Sepatech, Kendro Laboratory Products GmbH, Berlin. Becton Dickinson,Heidelberg. Biozym, Hessisch Oldendorf. Eppendorf, Hamburg

Heraeus Sepatech, Kendro Laboratory Products GmbH, Berlin. Becton Dickinson,Heidelberg. Biozym, Hessisch Oldendorf. Eppendorf, Hamburg 13 4. MATERIALS 4.1 Laboratory apparatus Biofuge A Centrifuge 5804R FACScan Gel electrophoresis chamber GPR Centrifuge Heraeus CO-AUTO-ZERO Light Cycler Microscope Motopipet Neubauer Cell Chamber PCR cycler

More information

Drosophila NK-homeobox genes

Drosophila NK-homeobox genes Proc. Natl. Acad. Sci. USA Vol. 86, pp. 7716-7720, October 1989 Biochemistry Drosophila NK-homeobox genes (NK-1, NK-2,, and DNA clones/chromosome locations of genes) YONGSOK KIM AND MARSHALL NIRENBERG

More information

Biopython Tutorial and Cookbook

Biopython Tutorial and Cookbook Biopython Tutorial and Cookbook Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck, Michiel de Hoon, Peter Cock Last Update September 2008 Contents 1 Introduction 5 1.1 What is Biopython?.........................................

More information

NimbleGen SeqCap EZ Library SR User s Guide Version 3.0

NimbleGen SeqCap EZ Library SR User s Guide Version 3.0 NimbleGen SeqCap EZ Library SR User s Guide Version 3.0 For life science research only. Not for use in diagnostic procedures. Copyright 2011 Roche NimbleGen, Inc. All Rights Reserved. Editions Version

More information

Transmembrane Signaling in Chimeras of the E. coli Chemotaxis Receptors and Bacterial Class III Adenylyl Cyclases

Transmembrane Signaling in Chimeras of the E. coli Chemotaxis Receptors and Bacterial Class III Adenylyl Cyclases Transmembrane Signaling in Chimeras of the E. coli Chemotaxis Receptors and Bacterial Class III Adenylyl Cyclases Dissertation der Mathematisch-Naturwissenschaftlichen Fakultät der Eberhard Karls Universität

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108

DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108 DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108 DISSERTATIONES MEDICINAE UNIVERSITATIS TARTUENSIS 108 THE INTERLEUKIN-10 FAMILY CYTOKINES GENE POLYMORPHISMS IN PLAQUE PSORIASIS KÜLLI KINGO TARTU

More information

Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.)

Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.) Characterization of cdna clones of the family of trypsin/a-amylase inhibitors (CM-proteins) in barley {Hordeum vulgare L.) J. Paz-Ares, F. Ponz, P. Rodríguez-Palenzuela, A. Lázaro, C. Hernández-Lucas,

More information

2006 7.012 Problem Set 3 KEY

2006 7.012 Problem Set 3 KEY 2006 7.012 Problem Set 3 KEY Due before 5 PM on FRIDAY, October 13, 2006. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Which reaction is catalyzed by each

More information

Introduction to Bioinformatics (Master ChemoInformatique)

Introduction to Bioinformatics (Master ChemoInformatique) Introduction to Bioinformatics (Master ChemoInformatique) Roland Stote Institut de Génétique et de Biologie Moléculaire et Cellulaire Biocomputing Group 03.90.244.730 rstote@igbmc.fr Biological Function

More information

Association of IGF1 and IGFBP3 polymorphisms with colorectal polyps and colorectal cancer risk

Association of IGF1 and IGFBP3 polymorphisms with colorectal polyps and colorectal cancer risk DOI 10.1007/s10552-009-9438-4 ORIGINAL PAPER Association of IGF1 and IGFBP3 polymorphisms with colorectal polyps and colorectal cancer risk Elisabeth Feik Æ Andreas Baierl Æ Barbara Hieger Æ Gerhard Führlinger

More information

Event-specific Method for the Quantification of Maize MIR162 Using Real-time PCR. Protocol

Event-specific Method for the Quantification of Maize MIR162 Using Real-time PCR. Protocol Event-specific Method for the Quantification of Maize MIR162 Using Real-time PCR Protocol 31 January 2011 Joint Research Centre Institute for Health and Consumer Protection Molecular Biology and Genomics

More information

Archimer http://archimer.ifremer.fr

Archimer http://archimer.ifremer.fr Please note that this is an author-produced PDF of an article accepted for publication following peer review. The definitive publisher-authenticated version is available on the publisher Web site Fish

More information

were demonstrated to be, respectively, the catalytic and regulatory subunits of protein phosphatase 2A (PP2A) (29).

were demonstrated to be, respectively, the catalytic and regulatory subunits of protein phosphatase 2A (PP2A) (29). JOURNAL OF VIROLOGY, Feb. 1992, p. 886-893 0022-538X/92/020886-08$02.00/0 Copyright C) 1992, American Society for Microbiology Vol. 66, No. 2 The Third Subunit of Protein Phosphatase 2A (PP2A), a 55- Kilodalton

More information

Insulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus

Insulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus Iranian Biomedical Journal 13 (3): 161-168 (July 2009) Insulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus Bahram Kazemi 1*, Negar Seyed 1, Elham Moslemi 2, Mojgan Bandehpour

More information

http://hdl.handle.net/10197/2727

http://hdl.handle.net/10197/2727 Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Performance of DNA data embedding algorithms

More information

Molecular Facts and Figures

Molecular Facts and Figures Nucleic Acids Molecular Facts and Figures DNA/RNA bases: DNA and RNA are composed of four bases each. In DNA the four are Adenine (A), Thymidine (T), Cytosine (C), and Guanine (G). In RNA the four are

More information

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1

Core Bioinformatics. Degree Type Year Semester. 4313473 Bioinformàtica/Bioinformatics OB 0 1 Core Bioinformatics 2014/2015 Code: 42397 ECTS Credits: 12 Degree Type Year Semester 4313473 Bioinformàtica/Bioinformatics OB 0 1 Contact Name: Sònia Casillas Viladerrams Email: Sonia.Casillas@uab.cat

More information

N-terminal Regulatory Domains of Phosphodiesterases 1, 4, 5 and 10 examined with an Adenylyl Cyclase as a Reporter

N-terminal Regulatory Domains of Phosphodiesterases 1, 4, 5 and 10 examined with an Adenylyl Cyclase as a Reporter N-terminal Regulatory Domains of Phosphodiesterases 1, 4, 5 and 10 examined with an Adenylyl Cyclase as a Reporter Dissertation der Mathematisch-Naturwissenschaftlichen Fakultät der Eberhard Karls Universität

More information

Five-minute cloning of Taq polymerase-amplified PCR products

Five-minute cloning of Taq polymerase-amplified PCR products TOPO TA Cloning Version R 8 April 2004 25-0184 TOPO TA Cloning Five-minute cloning of Taq polymerase-amplified PCR products Catalog nos. K4500-01, K4500-40, K4510-20, K4520-01, K4520-40, K4550-01, K4550-40,

More information

All commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI_ NotI

All commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI_ NotI 2. Primer Design 2.1 Multiple Cloning Sites All commonly-used expression vectors used in the Jia Lab contain the following multiple cloning site: BamHI EcoRI SmaI SalI XhoI NotI XXX XXX GGA TCC CCG AAT

More information

Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype

Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype Iori Sakakibara 1,2,3, Marc Santolini 4, Arnaud Ferry 2,5, Vincent Hakim 4, Pascal Maire 1,2,3 * 1 INSERM U1016,

More information

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the

More information

Interleukin-4 Receptor Signal Transduction: Involvement of P62

Interleukin-4 Receptor Signal Transduction: Involvement of P62 Interleukin-4 Receptor Signal Transduction: Involvement of P62 Den Naturwissenschaftlichen Fakultäten der Friedrich Alexander Universität Erlangen Nürnberg zur Erlangung des Doktorgrades vorgelegt von

More information

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering

More information

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006 Hidden Markov Models in Bioinformatics By Máthé Zoltán Kőrösi Zoltán 2006 Outline Markov Chain HMM (Hidden Markov Model) Hidden Markov Models in Bioinformatics Gene Finding Gene Finding Model Viterbi algorithm

More information

Phylogenetic Trees Made Easy

Phylogenetic Trees Made Easy Phylogenetic Trees Made Easy A How-To Manual Fourth Edition Barry G. Hall University of Rochester, Emeritus and Bellingham Research Institute Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

More information

Principles of Evolution - Origin of Species

Principles of Evolution - Origin of Species Theories of Organic Evolution X Multiple Centers of Creation (de Buffon) developed the concept of "centers of creation throughout the world organisms had arisen, which other species had evolved from X

More information

Introduction to Genome Annotation

Introduction to Genome Annotation Introduction to Genome Annotation AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

More information

The Arabinosyltransferase EmbC Is Inhibited by Ethambutol in Mycobacterium tuberculosis

The Arabinosyltransferase EmbC Is Inhibited by Ethambutol in Mycobacterium tuberculosis ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, Oct. 2009, p. 4138 4146 Vol. 53, No. 10 0066-4804/09/$08.00 0 doi:10.1128/aac.00162-09 Copyright 2009, American Society for Microbiology. All Rights Reserved. The

More information

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer

DNA Insertions and Deletions in the Human Genome. Philipp W. Messer DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.

More information

DNA Sequencing of the eta Gene Coding for Staphylococcal Exfoliative Toxin Serotype A

DNA Sequencing of the eta Gene Coding for Staphylococcal Exfoliative Toxin Serotype A Journal of General Microbiology (1988), 134, 71 1-71 7. Printed in Great Britain 71 1 DNA Sequencing of the eta Gene Coding for Staphylococcal Exfoliative Toxin Serotype A By SUSUMU SAKURA, HTOSH SUZUK

More information

Chlamydomonas adapted Green Fluorescent Protein (CrGFP)

Chlamydomonas adapted Green Fluorescent Protein (CrGFP) Chlamydomonas adapted Green Fluorescent Protein (CrGFP) Plasmid pfcrgfp for fusion proteins Sequence of the CrGFP In the sequence below, all amino acids which have been altered from the wildtype GFP from

More information

Protein Synthesis Simulation

Protein Synthesis Simulation Protein Synthesis Simulation Name(s) Date Period Benchmark: SC.912.L.16.5 as AA: Explain the basic processes of transcription and translation, and how they result in the expression of genes. (Assessed

More information

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML

A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML 9 June 2011 A Step-by-Step Tutorial: Divergence Time Estimation with Approximate Likelihood Calculation Using MCMCTREE in PAML by Jun Inoue, Mario dos Reis, and Ziheng Yang In this tutorial we will analyze

More information

Molecular detection of Babesia rossi and Hepatozoon sp. in African wild dogs (Lycaon pictus) in South Africa

Molecular detection of Babesia rossi and Hepatozoon sp. in African wild dogs (Lycaon pictus) in South Africa Available online at www.sciencedirect.com Veterinary Parasitology 157 (2008) 123 127 Short communication Molecular detection of Babesia rossi and Hepatozoon sp. in African wild dogs (Lycaon pictus) in

More information

Introduction to Phylogenetic Analysis

Introduction to Phylogenetic Analysis Subjects of this lecture Introduction to Phylogenetic nalysis Irit Orr 1 Introducing some of the terminology of phylogenetics. 2 Introducing some of the most commonly used methods for phylogenetic analysis.

More information

Metabolic Engineering of Escherichia coli for Enhanced Production of Succinic Acid, Based on Genome Comparison and In Silico Gene Knockout Simulation

Metabolic Engineering of Escherichia coli for Enhanced Production of Succinic Acid, Based on Genome Comparison and In Silico Gene Knockout Simulation APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Dec. 2005, p. 7880 7887 Vol. 71, No. 12 0099-2240/05/$08.00 0 doi:10.1128/aem.71.12.7880 7887.2005 Copyright 2005, American Society for Microbiology. All Rights

More information

Intro to Map/Reduce a.k.a. Hadoop

Intro to Map/Reduce a.k.a. Hadoop Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by

More information

Bioinformatics Grid - Enabled Tools For Biologists.

Bioinformatics Grid - Enabled Tools For Biologists. Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis

More information

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org

PROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,

More information

inhibition of mitosis

inhibition of mitosis The EMBO Journal vol.13 no.2 pp.425-434, 1994 cdt 1 is an essential target of the Cdc 1 O/Sct 1 transcription factor: requirement for DNA replication and inhibition of mitosis Johannes F.X.Hofmann and

More information

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled

a. Ribosomal RNA rrna a type ofrna that combines with proteins to form Ribosomes on which polypeptide chains of proteins are assembled Biology 101 Chapter 14 Name: Fill-in-the-Blanks Which base follows the next in a strand of DNA is referred to. as the base (1) Sequence. The region of DNA that calls for the assembly of specific amino

More information

Impaired insulin and insulin-like growth factor expression and signaling mechanisms in Alzheimer s disease is this type 3 diabetes?

Impaired insulin and insulin-like growth factor expression and signaling mechanisms in Alzheimer s disease is this type 3 diabetes? Journal of Alzheimer s Disease 7 (2005) 63 80 63 IOS Press Impaired insulin and insulin-like growth factor expression and signaling mechanisms in Alzheimer s disease is this type 3 diabetes? Eric Steen,

More information

III III 0 IIOI DID IIO 1101 010 II0 1101 I IIII

III III 0 IIOI DID IIO 1101 010 II0 1101 I IIII (19) United States III III 0 IIOI DID IIO 1101 010 II0 1101 I IIII US 20020090376A1 III 1010 II 0I II (12) Patent Application Publication (lo) Pub. No.: US 2002/0090376 Al KANIGA et at. (43) Pub. Date:

More information

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Similarity Searches on Sequence Databases: BLAST, FASTA Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003 Outline Importance of Similarity Heuristic Sequence Alignment:

More information

Evolution (18%) 11 Items Sample Test Prep Questions

Evolution (18%) 11 Items Sample Test Prep Questions Evolution (18%) 11 Items Sample Test Prep Questions Grade 7 (Evolution) 3.a Students know both genetic variation and environmental factors are causes of evolution and diversity of organisms. (pg. 109 Science

More information

Chapter 5. Stripping Bacillus: ComK auto-stimulation is responsible for the bistable response in competence development

Chapter 5. Stripping Bacillus: ComK auto-stimulation is responsible for the bistable response in competence development Stripping Bacillus: ComK auto-stimulation is responsible for the bistable response in competence development This chapter has been adapted from: W.K. Smits*, C.C. Eschevins*, K.A. Susanna, S. Bron, O.P.

More information

Module 10: Bioinformatics

Module 10: Bioinformatics Module 10: Bioinformatics 1.) Goal: To understand the general approaches for basic in silico (computer) analysis of DNA- and protein sequences. We are going to discuss sequence formatting required prior

More information