Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011
Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments / Consensus Sequences Conservatrix Screening for putative T cell epitopes (EpiMatrix) MHC Class I and II Searching for homology Human genome / Other pathogens (GenBank) Vaccine Design Concepts Immunogenic Consensus String of beads
MHC/Ligand Interactions Binding is mediated by the interaction of the candidate peptides side chains with specific regions in the floor of the MHC binding groove.
MHC/Ligand Interactions The binding groove of both Class I and Class II can be divided into 9 such regions In the case of Class I MHC, the binding groove is closed-ended and can accommodate peptides between 8 and about 11 amino acids in length, although 9-mer and 10-mer peptides are preferred. In the case of Class II MHC the binding groove is open ended. Class II MHC can accommodate longer peptides, typically 12-20 amino acids in length.
How does in silico mapping work? Th cell epitopes are linear and restricted by MHC (HLA). Mature APC MHC Peptide Epitope Because the pockets of the HLA are well known, interactions with peptides can be modeled. The EpiMatrix algorithm scores all the 9 mers in a given sequence for binding affinity across a range of common HLA and reports both detailed and aggregated results.
Constructing the matrix Early researchers eluted, sequenced, and aligned peptides bound to MHC They (Falk et al.) discovered that certain amino acids appeared in certain positions more often than others. This information was used to develop rudimentary binding motifs Falk, K., Rötzschke, O., Stevanovic, S., Jung, G. and Rammensee, H. G. Allele specific motifs revealed by sequencing of self peptides eluted from MHC molecules. Nature 1991, 351,290 296.
Other methods As additional training data became available, many statistically based techniques where used to model the data: Frequency Analysis Hidden Markov Models (HMM) Support Vector Machines (SVM) Stabilization Matrix Methods (SMM) Random Forests Naive Bayesian Analysis
How EpiMatrix Scores Amino Acid graphical representation of A*0201 motif (based on list of actual peptides from Chicz) -3.00-2.75-2.50-2.25-2.00-1.75-1.50-1.25-1.00-0.75-0.50-0.25 0.00 Graphical Representation of A*0201 Coefficient matrix A L L A C D E F G H I K L M N N P Q R S S T T V V W Y Y Y 1 2 3 4 5 6 7 8 9 10 Position S+L+Y+N+V+A+T+Y+L = indication of binding likelihood 1.75-2.00 1.50-1.75 1.25-1.50 1.00-1.25 0.75-1.00 0.50-0.75 0.25-0.50 0.00-0.25-0.25-0.00-0.50--0.25-0.75--0.50-1.00--0.75-1.25--1.00-1.50--1.25-1.75--1.50-2.00--1.75-2.25--2.00-2.50--2.25-2.75--2.50-3.00--2.75
Comparing scores on a standard scale Predictive matrices are benchmarked against a randomly generated standard curve Random peptides 0 1.64 True epitopes DRB1*0101 DRB1*0102 DRB1*0301 DRB1*0401 DRB1*0402 DRB1*0404 DRB1*0405 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1104 DRB1*1301 DRB1*1302 DRB1*1501 DRB1*1502 DRB5*0101 RANDOM TOTAL
HLA Coverage EpiMatrix tests binding to the most common supertype HLA molecules Our results represent >90% of the human populations worldwide No individual haplotype testing necessary Southwood et al., 1998
item individualized T cell epitope measure A method for predicting immunogenicity of protein vaccines and biologic therapeutics based on an individual s HLA type item = Sum of the Significant Scores the Expected score for a protein of that length Number of frames = 13 Expected frequency of the hits = 0.05 Expected value for a hit = 2.06 Cohen et al., 2008
EpiMatrix: Class I Analysis
EpiMatrix: Class I Standard Analysis
EpiMatrix Results Current File: SAMPLE Current Sequence: SAMPLE_01 Top 10% of Z Scores Top 5% of Z Scores Top 1% of Z Scores All Z Scores In Top 5% are Considered "Hits" Matrix: KB_A0101_09 KB_A0201_09 KB_A0301_09 KB_A2402_09 KB_B0702_09 KB_B4403_09 Hit Average AA Sequence AA Start GRAVY Z Score Z Score Z Score Z Score Z Score Z Score Count Z Score MGARASVLT 1 0.79 0.34 0.24 0.48 1.15 0.39 0.14 0 0.38 GARASVLTG 2 0.53 0.85 0.36 0.17 1.28 0.16 0.32 0 0.31 ARASVLTGS 3 0.49 0.01 0.19 0.52 0.91 0.3 0.05 0 0.31 RASVLTGSK 4 0.14 0.75 0.24 2.22 0.73 0.54 0.49 1 0.34 ASVLTGSKL 5 0.78 0.42 0.29 0.33 0.93 1.26 1.15 0 0.73 SVLTGSKLD 6 0.19 0.01 0.5 0.73 0.64 0.6 0.51 0 0.09 VLTGSKLDA 7 0.48 0.03 1.3 0.28 0.59 0.22 0.16 0 0.11 LTGSKLDAW 8 0.09 0.68 0.09 0.18 0.04 0.5 0.44 0 0.22 TGSKLDAWE 9 0.9 0.2 0.84 1.15 0.18 1.05 1.49 0 0.82 GSKLDAWEQ 10 1.21 0.34 1.13 0.51 1.49 1.55 0.9 0 0.87 SKLDAWEQI 11 0.67 1.24 0.73 1.07 0.94 0.68 0.57 0 0.1 KLDAWEQIR 12 1.08 0.33 0.97 1.57 0.91 1.52 0.92 0 0.08 LDAWEQIRL 13 0.22 0.01 0.91 1.04 0.39 1.28 0.1 0 0.24 DAWEQIRLK 14 1.08 0.45 0.45 1.06 0.76 1.07 0.21 0 0.16 AWEQIRLKP 15 0.87 0 1.56 1.08 0.3 0.46 0.08 0 0.48 WEQIRLKPG 16 1.11 2.36 0.86 1.43 0.72 0.5 2.66 1 0.54 EQIRLKPGC 17 0.73 1.32 0.54 0.77 0.91 2.12 0.06 0 0.93 QIRLKPGCK 18 0.78 0.04 0.33 2.47 0.95 0.29 1.72 1 0.13 IRLKPGCKK 19 0.82 0.16 0.83 1.72 0.5 0.47 1.07 1 0.22 RLKPGCKKK 20 1.76 0.15 0.47 2.76 0.22 0.13 0.77 1 0.38 LKPGCKKKY 21 1.4 2.06 1.37 0.36 0.15 0.31 0.01 1 0.1 KPGCKKKYR 22 2.32 1.1 1.24 1.04 1.41 1.34 0.24 0 0.27 PGCKKKYRL 23 1.47 1.13 0.42 1.14 0.61 0.02 0.13 0 0.37 GCKKKYRLK 24 1.72 0.42 1.63 1.28 1.17 1.4 1.17 0 0.75................................. PFASLKSLF 488 0.88 0.16 1.64 0.31 2.02 0.34 0.59 1 0.19 FASLKSLFG 489 1.01 0.88 0.52 0.49 0.79 0.09 0.22 0 0.13 ASLKSLFGT 490 0.62 0.39 0.76 0.59 0.31 0.67 0.81 0 0.26 SLKSLFGTD 491 0.03 0.79 0.85 0.83 0.22 0.24 0.64 0 0.04 LKSLFGTDQ 492 0.27 0.22 0.79 0.9 0.63 0.88 1.44 0 0.74 Maximum Sum of Significant Z Hit Count 3.67 3.1 2.94 2.84 4.11 3.31 2 25.98 49.57 50.35 25.54 55.06 34.08 11 23 22 11 33 25 125 Total Assesments: 2952 Total Significant Z: 240.58 Expected Z: 326.38 Deviation: 75.8 Deviation per 1000: 14.36
EpiMatrix: Class I Extended Analysis
EpiMatrix: Class II Standard Analysis
EpiMatrix: Class II Standard Analysis
EpiMatrix: Class II Standard Analysis
ClassII EpiMatrix Report File: FLU-HA - Sequence: PROVIDENCE-2012 September 29, 2010 (Epx Ver. 1.2) Click to Print Click to Download Frame Frame DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 AA Sequence Start Stop Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Hits 1 QKLPGNRNS 9 0.58 0.25 1.06-0.2 0.23 0.95 0.45 0.31 0 2 KLPGNRNST 10 0.41 0.48-0.19 0.57 0.34 0.8-0.75 0.84 0 3 LPGNRNSTA 11 0.94 1.03 1.26 0.5 0.49 0.96 0.92 0.84 0 4 PGNRNSTAT 12 0.35-1.66-0.43 0.23-0.21-1.07-0.57 0.18 0 5 GNRNSTATL 13 1.1 0.74 1.16 0.6 0.62 0.95 0.95 0.2 0 6 NRNSTATLC 14 1.14 0.29 0.51 1.04 0.07-0.39-0.05 0.58 0........................ 307 RYVKQNTLK 315-0.34 0.32-0.1-0.51 0.58-0.57 0.77-0.23 0 308 YVKQNTLKL 316 3.06 2.28 3.18 2.81 2.43 2.81 3.11 2.55 8 309 VKQNTLKLA 317 0.97 1.51 0.95 1.06 1.62 2.01 1.7 1.41 2 310 KQNTLKLAT 318 0.49-0.1 0.22 0.54 1 0.89 0.86 1.34 0 311 QNTLKLATG 319 0.15-0.22 0.23-1.29 1.19 1.26 0.23 0.01 0 312 NTLKLATGM 320 0.24 0.63-0.41 0.33-0.02-0.77 1.07-0.44 0 313 TLKLATGMR 321 0.9 0.78 0.81 0.55 1.24 0.7-0.09 0.46 0 314 LKLATGMRN 322 1.93 1.17 1.92 1.23 1.86 1.4 0.2 2.35 4 315 KLATGMRNV 323-0.23-1.07-0.73 0.29-0.96-0.6-0.69-0.55 0 316 LATGMRNVP 324-0.57 0.68-0.6-0.24-0.07 0.37-0.55 0.01 0 317 ATGMRNVPK 325 1.18 0.31 1.32 0.09 0.6 0.74 0.15 0.6 0 318 TGMRNVPKK 326-0.24-1.37-0.28-0.98 0.52 0.23-0.18-1 0 319 GMRNVPKKQ 327 0.86 0.13 0.98 1.06 1.04 1.26 0.34 0.04 0 320 MRNVPKKQT 328 0.81 0.36-0.03 0.85 1.13 1.58 0.77 1.41 0 321 RNVPKKQTR 329-1.22 1.25-1.36-1.75 0.16-0.74 0.7 0.34 0 Summarized Results DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 Total Maximum Single Z score 3.06 2.69 3.18 3.02 2.77 3.29 3.11 2.55 -- Sum of Significant Z scores 65.04 58.45 52.95 64.72 51.65 81.61 62.54 72.96 509.92 Count of Significant Z Scores 30 29 25 30 25 38 31 36 244 Total Assessments Performed: 2568 Deviation from Expectation: 224.26 Deviation per 1000 AA: 89.24 Adjusted for Regulatory Epitopes Deviation from Expectation: 224.26 Deviation per 1000 AA: 89.24
EpiMatrix: Class II Standard Analysis
PROVIDENCE 2012 (89.24)
EpiMatrix: Class II Extended Analysis
EpiMatrix: Class II Standard Analysis
ClustiMer identifies T cell Epitopes Epitopes Tend to Cluster T cell epitopes cluster within protein sequences One or more dominant T cell epitope clusters can enable significant immune responses to even otherwise low scoring proteins DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 T cell epitope clusters make excellent vaccine candidates: Compact Easy to deliver as peptides Highly reactive in vivo
Finding Class II Clusters
Finding Class II Clusters
Interactive Class II Cluster Report
Class II Cluster Detail Report EpiMatrix Cluster Detail Report File: FLU-HA Sequence: PROVIDENCE-2012 Cluster: 76 September 29, 2010 (Epx Ver. 1.2) Click to Print Click to Download Back to Cluster Summary Frame Frame Hydrophobicity Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 AA Sequence Start Stop Hits 76 CRSFQNKKW 84-0.37 1.49 0 77 RSFQNKKWR 85-0.54 1.79 1 78 SFQNKKWRL 86-0.34 1.4 0 79 FQNKKWRLF 87-1.2 1.48 2.39 1.89 1.84 1.44 3 80 QNKKWRLFV 88-1.04 0 81 NKKWRLFVK 89-1.09 0 82 KKWRLFVKR 90-1.2 0 83 KWRLFVKRS 91-0.86 1.39 1.39 1.44 0 84 WRLFVKRSK 92-0.86 1.41 1.57 2.77 3.29 1.42 1.44 2 85 RLFVKRSKA 93-0.56 1.69 1.4 1.77 1.52 2 86 LFVKRSKAY 94-0.2 1.28 1.92 1 87 FVKRSKAYS 95-0.71 1.46 1.45 2.73 2.72 2.7 1.4 3 88 VKRSKAYSN 96-1.41 2.17 1.8 2.02 2.31 1.86 2.02 1.3 2.3 7 89 KRSKAYSNC 97-0.34 0 90 RSKAYSNCY 98-0.28 1.84 1.31 1 91 SKAYSNCYP 99-0.21 0 Summarized Results DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 Total Maximum Single Z score 2.17 1.84 2.02 2.39 2.77 3.29 2.7 2.3 -- Sum of Significant Z scores 2.17 5.34 2.02 4.7 9.25 9.88 8.18 2.3 43.84 Count of Significant Z Scores 1 3 1 2 4 4 4 1 20 Total Assessments Performed: 128 Hydrophobicity:-1.11 EpiMatrix Score: 30.66 EpiMatrix Score (w/o flanks): 31.98
Interactive Class II Cluster Report
Class II Cluster Scale
Interactive Class II Cluster Report
Class II Logo Report