T cell Epitope Prediction



Similar documents
Detection of T-cell T and their application to vaccine design

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands

Current Motif Discovery Tools and their Limitations

Hapten - a small molecule that is antigenic but not (by itself) immunogenic.

Learning from Diversity

Modelling and analysis of T-cell epitope screening data.

GenBank, Entrez, & FASTA

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Bioinformatics Resources at a Glance

High Resolution Epitope Mapping of Human Autoimmune Sera against Antigens CENPA and KDM6B. PEPperPRINT GmbH Heidelberg, 06/2014

Antibody responses to linear and conformational epitopes

Interaktionen von Nukleinsäuren und Proteinen

Analysis of the adaptive immune response to West Nile virus

MORPHEUS. Prediction of Transcription Factors Binding Sites based on Position Weight Matrix.

PEPVAC: A web server for multi-epitope vaccine development based on the prediction of supertypic

Pep-Miner: A Novel Technology for Mass Spectrometry-Based Proteomics

Vaxign Reverse Vaccinology Software Demo Introduction Zhuoshuang Allen Xiang, Yongqun Oliver He

RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Guide for Bioinformatics Project Module 3

CSC 2427: Algorithms for Molecular Biology Spring Lecture 16 March 10

Bio-Informatics Lectures. A Short Introduction

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Identification of CD4+ T cell epitopes specific for the breast cancer associated antigen NY-BR-1

specific B cells Humoral immunity lymphocytes antibodies B cells bone marrow Cell-mediated immunity: T cells antibodies proteins

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

Bioinformatics Grid - Enabled Tools For Biologists.

International Journal of Integrative Biology A journal for biology beyond borders ISSN

Biological Sequence Data Formats

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Hidden Markov Models in Bioinformatics. By Máthé Zoltán Kőrösi Zoltán 2006

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

3.2 Roulette and Markov Chains

Data, Measurements, Features

Learning outcomes. Knowledge and understanding. Competence and skills

MASCOT Search Results Interpretation

Statistics Graduate Courses

How To Understand And Solve A Linear Programming Problem

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Chapter 3. Protein Structure and Function

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Autoimmunity and immunemediated. FOCiS. Lecture outline

AP BIOLOGY 2008 SCORING GUIDELINES

Guidance for Industry

B Cells and Antibodies

UGENE Quick Start Guide

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Disulfide Bonds at the Hair Salon

Lecture 11 Enzymes: Kinetics

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

New HLA class I epitopes defined by murine monoclonal antibodies

THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS

PperCHIP. High-Content Peptide Microarrays. Epitope Mapping & Serum Profiling Services

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Pairwise Sequence Alignment

Myoglobin and Hemoglobin

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

Antibody Structure, and the Generation of B-cell Diversity CHAPTER 4 04/05/15. Different Immunoglobulins

Recognition of T cell epitopes (Abbas Chapter 6)

Lecture 19: Proteins, Primary Struture

Making the switch to a safer CAR-T cell therapy

Sequence Formats and Sequence Database Searches. Gloria Rendon SC11 Education June, 2011

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

CHAPTER 8 IMMUNOLOGICAL IMPLICATIONS OF PEPTIDE CARBOHYDRATE MIMICRY

Helices From Readily in Biological Structures

Interaktionen von RNAs und Proteinen

Introduction to Genome Annotation

BLAST. Anders Gorm Pedersen & Rasmus Wernersson

SUPPLEMENTARY METHODS

Error Tolerant Searching of Uninterpreted MS/MS Data

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Protein & DNA Sequence Analysis. Bobbie-Jo Webb-Robertson May 3, 2004

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Searching Nucleotide Databases

Using MATLAB: Bioinformatics Toolbox for Life Sciences

RNA Structure and folding

Classification of Bad Accounts in Credit Card Industry

Built from 20 kinds of amino acids

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

LESSON 3: ANTIBODIES/BCR/B-CELL RESPONSES

Computational Systems Biology. Lecture 2: Enzymes

Dr. Rita P.-Y. Chen Institute of Biological Chemistry Academia Sinica

Custom Antibodies & Recombinant Proteins

Similarity Searches on Sequence Databases: BLAST, FASTA. Lorenza Bordoli Swiss Institute of Bioinformatics EMBnet Course, Basel, October 2003

3 months 1.5 months 1.5 months. 1 month

BioMmune Technologies Inc. Corporate Presentation 2015

Transcription:

Institute for Immunology and Informatics T cell Epitope Prediction EpiMatrix Eric Gustafson January 6, 2011

Overview Gathering raw data Popular sources Data Management Conservation Analysis Multiple Alignments / Consensus Sequences Conservatrix Screening for putative T cell epitopes (EpiMatrix) MHC Class I and II Searching for homology Human genome / Other pathogens (GenBank) Vaccine Design Concepts Immunogenic Consensus String of beads

MHC/Ligand Interactions Binding is mediated by the interaction of the candidate peptides side chains with specific regions in the floor of the MHC binding groove.

MHC/Ligand Interactions The binding groove of both Class I and Class II can be divided into 9 such regions In the case of Class I MHC, the binding groove is closed-ended and can accommodate peptides between 8 and about 11 amino acids in length, although 9-mer and 10-mer peptides are preferred. In the case of Class II MHC the binding groove is open ended. Class II MHC can accommodate longer peptides, typically 12-20 amino acids in length.

How does in silico mapping work? Th cell epitopes are linear and restricted by MHC (HLA). Mature APC MHC Peptide Epitope Because the pockets of the HLA are well known, interactions with peptides can be modeled. The EpiMatrix algorithm scores all the 9 mers in a given sequence for binding affinity across a range of common HLA and reports both detailed and aggregated results.

Constructing the matrix Early researchers eluted, sequenced, and aligned peptides bound to MHC They (Falk et al.) discovered that certain amino acids appeared in certain positions more often than others. This information was used to develop rudimentary binding motifs Falk, K., Rötzschke, O., Stevanovic, S., Jung, G. and Rammensee, H. G. Allele specific motifs revealed by sequencing of self peptides eluted from MHC molecules. Nature 1991, 351,290 296.

Other methods As additional training data became available, many statistically based techniques where used to model the data: Frequency Analysis Hidden Markov Models (HMM) Support Vector Machines (SVM) Stabilization Matrix Methods (SMM) Random Forests Naive Bayesian Analysis

How EpiMatrix Scores Amino Acid graphical representation of A*0201 motif (based on list of actual peptides from Chicz) -3.00-2.75-2.50-2.25-2.00-1.75-1.50-1.25-1.00-0.75-0.50-0.25 0.00 Graphical Representation of A*0201 Coefficient matrix A L L A C D E F G H I K L M N N P Q R S S T T V V W Y Y Y 1 2 3 4 5 6 7 8 9 10 Position S+L+Y+N+V+A+T+Y+L = indication of binding likelihood 1.75-2.00 1.50-1.75 1.25-1.50 1.00-1.25 0.75-1.00 0.50-0.75 0.25-0.50 0.00-0.25-0.25-0.00-0.50--0.25-0.75--0.50-1.00--0.75-1.25--1.00-1.50--1.25-1.75--1.50-2.00--1.75-2.25--2.00-2.50--2.25-2.75--2.50-3.00--2.75

Comparing scores on a standard scale Predictive matrices are benchmarked against a randomly generated standard curve Random peptides 0 1.64 True epitopes DRB1*0101 DRB1*0102 DRB1*0301 DRB1*0401 DRB1*0402 DRB1*0404 DRB1*0405 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1104 DRB1*1301 DRB1*1302 DRB1*1501 DRB1*1502 DRB5*0101 RANDOM TOTAL

HLA Coverage EpiMatrix tests binding to the most common supertype HLA molecules Our results represent >90% of the human populations worldwide No individual haplotype testing necessary Southwood et al., 1998

item individualized T cell epitope measure A method for predicting immunogenicity of protein vaccines and biologic therapeutics based on an individual s HLA type item = Sum of the Significant Scores the Expected score for a protein of that length Number of frames = 13 Expected frequency of the hits = 0.05 Expected value for a hit = 2.06 Cohen et al., 2008

EpiMatrix: Class I Analysis

EpiMatrix: Class I Standard Analysis

EpiMatrix Results Current File: SAMPLE Current Sequence: SAMPLE_01 Top 10% of Z Scores Top 5% of Z Scores Top 1% of Z Scores All Z Scores In Top 5% are Considered "Hits" Matrix: KB_A0101_09 KB_A0201_09 KB_A0301_09 KB_A2402_09 KB_B0702_09 KB_B4403_09 Hit Average AA Sequence AA Start GRAVY Z Score Z Score Z Score Z Score Z Score Z Score Count Z Score MGARASVLT 1 0.79 0.34 0.24 0.48 1.15 0.39 0.14 0 0.38 GARASVLTG 2 0.53 0.85 0.36 0.17 1.28 0.16 0.32 0 0.31 ARASVLTGS 3 0.49 0.01 0.19 0.52 0.91 0.3 0.05 0 0.31 RASVLTGSK 4 0.14 0.75 0.24 2.22 0.73 0.54 0.49 1 0.34 ASVLTGSKL 5 0.78 0.42 0.29 0.33 0.93 1.26 1.15 0 0.73 SVLTGSKLD 6 0.19 0.01 0.5 0.73 0.64 0.6 0.51 0 0.09 VLTGSKLDA 7 0.48 0.03 1.3 0.28 0.59 0.22 0.16 0 0.11 LTGSKLDAW 8 0.09 0.68 0.09 0.18 0.04 0.5 0.44 0 0.22 TGSKLDAWE 9 0.9 0.2 0.84 1.15 0.18 1.05 1.49 0 0.82 GSKLDAWEQ 10 1.21 0.34 1.13 0.51 1.49 1.55 0.9 0 0.87 SKLDAWEQI 11 0.67 1.24 0.73 1.07 0.94 0.68 0.57 0 0.1 KLDAWEQIR 12 1.08 0.33 0.97 1.57 0.91 1.52 0.92 0 0.08 LDAWEQIRL 13 0.22 0.01 0.91 1.04 0.39 1.28 0.1 0 0.24 DAWEQIRLK 14 1.08 0.45 0.45 1.06 0.76 1.07 0.21 0 0.16 AWEQIRLKP 15 0.87 0 1.56 1.08 0.3 0.46 0.08 0 0.48 WEQIRLKPG 16 1.11 2.36 0.86 1.43 0.72 0.5 2.66 1 0.54 EQIRLKPGC 17 0.73 1.32 0.54 0.77 0.91 2.12 0.06 0 0.93 QIRLKPGCK 18 0.78 0.04 0.33 2.47 0.95 0.29 1.72 1 0.13 IRLKPGCKK 19 0.82 0.16 0.83 1.72 0.5 0.47 1.07 1 0.22 RLKPGCKKK 20 1.76 0.15 0.47 2.76 0.22 0.13 0.77 1 0.38 LKPGCKKKY 21 1.4 2.06 1.37 0.36 0.15 0.31 0.01 1 0.1 KPGCKKKYR 22 2.32 1.1 1.24 1.04 1.41 1.34 0.24 0 0.27 PGCKKKYRL 23 1.47 1.13 0.42 1.14 0.61 0.02 0.13 0 0.37 GCKKKYRLK 24 1.72 0.42 1.63 1.28 1.17 1.4 1.17 0 0.75................................. PFASLKSLF 488 0.88 0.16 1.64 0.31 2.02 0.34 0.59 1 0.19 FASLKSLFG 489 1.01 0.88 0.52 0.49 0.79 0.09 0.22 0 0.13 ASLKSLFGT 490 0.62 0.39 0.76 0.59 0.31 0.67 0.81 0 0.26 SLKSLFGTD 491 0.03 0.79 0.85 0.83 0.22 0.24 0.64 0 0.04 LKSLFGTDQ 492 0.27 0.22 0.79 0.9 0.63 0.88 1.44 0 0.74 Maximum Sum of Significant Z Hit Count 3.67 3.1 2.94 2.84 4.11 3.31 2 25.98 49.57 50.35 25.54 55.06 34.08 11 23 22 11 33 25 125 Total Assesments: 2952 Total Significant Z: 240.58 Expected Z: 326.38 Deviation: 75.8 Deviation per 1000: 14.36

EpiMatrix: Class I Extended Analysis

EpiMatrix: Class II Standard Analysis

EpiMatrix: Class II Standard Analysis

EpiMatrix: Class II Standard Analysis

ClassII EpiMatrix Report File: FLU-HA - Sequence: PROVIDENCE-2012 September 29, 2010 (Epx Ver. 1.2) Click to Print Click to Download Frame Frame DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 AA Sequence Start Stop Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Hits 1 QKLPGNRNS 9 0.58 0.25 1.06-0.2 0.23 0.95 0.45 0.31 0 2 KLPGNRNST 10 0.41 0.48-0.19 0.57 0.34 0.8-0.75 0.84 0 3 LPGNRNSTA 11 0.94 1.03 1.26 0.5 0.49 0.96 0.92 0.84 0 4 PGNRNSTAT 12 0.35-1.66-0.43 0.23-0.21-1.07-0.57 0.18 0 5 GNRNSTATL 13 1.1 0.74 1.16 0.6 0.62 0.95 0.95 0.2 0 6 NRNSTATLC 14 1.14 0.29 0.51 1.04 0.07-0.39-0.05 0.58 0........................ 307 RYVKQNTLK 315-0.34 0.32-0.1-0.51 0.58-0.57 0.77-0.23 0 308 YVKQNTLKL 316 3.06 2.28 3.18 2.81 2.43 2.81 3.11 2.55 8 309 VKQNTLKLA 317 0.97 1.51 0.95 1.06 1.62 2.01 1.7 1.41 2 310 KQNTLKLAT 318 0.49-0.1 0.22 0.54 1 0.89 0.86 1.34 0 311 QNTLKLATG 319 0.15-0.22 0.23-1.29 1.19 1.26 0.23 0.01 0 312 NTLKLATGM 320 0.24 0.63-0.41 0.33-0.02-0.77 1.07-0.44 0 313 TLKLATGMR 321 0.9 0.78 0.81 0.55 1.24 0.7-0.09 0.46 0 314 LKLATGMRN 322 1.93 1.17 1.92 1.23 1.86 1.4 0.2 2.35 4 315 KLATGMRNV 323-0.23-1.07-0.73 0.29-0.96-0.6-0.69-0.55 0 316 LATGMRNVP 324-0.57 0.68-0.6-0.24-0.07 0.37-0.55 0.01 0 317 ATGMRNVPK 325 1.18 0.31 1.32 0.09 0.6 0.74 0.15 0.6 0 318 TGMRNVPKK 326-0.24-1.37-0.28-0.98 0.52 0.23-0.18-1 0 319 GMRNVPKKQ 327 0.86 0.13 0.98 1.06 1.04 1.26 0.34 0.04 0 320 MRNVPKKQT 328 0.81 0.36-0.03 0.85 1.13 1.58 0.77 1.41 0 321 RNVPKKQTR 329-1.22 1.25-1.36-1.75 0.16-0.74 0.7 0.34 0 Summarized Results DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 Total Maximum Single Z score 3.06 2.69 3.18 3.02 2.77 3.29 3.11 2.55 -- Sum of Significant Z scores 65.04 58.45 52.95 64.72 51.65 81.61 62.54 72.96 509.92 Count of Significant Z Scores 30 29 25 30 25 38 31 36 244 Total Assessments Performed: 2568 Deviation from Expectation: 224.26 Deviation per 1000 AA: 89.24 Adjusted for Regulatory Epitopes Deviation from Expectation: 224.26 Deviation per 1000 AA: 89.24

EpiMatrix: Class II Standard Analysis

PROVIDENCE 2012 (89.24)

EpiMatrix: Class II Extended Analysis

EpiMatrix: Class II Standard Analysis

ClustiMer identifies T cell Epitopes Epitopes Tend to Cluster T cell epitopes cluster within protein sequences One or more dominant T cell epitope clusters can enable significant immune responses to even otherwise low scoring proteins DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 T cell epitope clusters make excellent vaccine candidates: Compact Easy to deliver as peptides Highly reactive in vivo

Finding Class II Clusters

Finding Class II Clusters

Interactive Class II Cluster Report

Class II Cluster Detail Report EpiMatrix Cluster Detail Report File: FLU-HA Sequence: PROVIDENCE-2012 Cluster: 76 September 29, 2010 (Epx Ver. 1.2) Click to Print Click to Download Back to Cluster Summary Frame Frame Hydrophobicity Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score Z-Score DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 AA Sequence Start Stop Hits 76 CRSFQNKKW 84-0.37 1.49 0 77 RSFQNKKWR 85-0.54 1.79 1 78 SFQNKKWRL 86-0.34 1.4 0 79 FQNKKWRLF 87-1.2 1.48 2.39 1.89 1.84 1.44 3 80 QNKKWRLFV 88-1.04 0 81 NKKWRLFVK 89-1.09 0 82 KKWRLFVKR 90-1.2 0 83 KWRLFVKRS 91-0.86 1.39 1.39 1.44 0 84 WRLFVKRSK 92-0.86 1.41 1.57 2.77 3.29 1.42 1.44 2 85 RLFVKRSKA 93-0.56 1.69 1.4 1.77 1.52 2 86 LFVKRSKAY 94-0.2 1.28 1.92 1 87 FVKRSKAYS 95-0.71 1.46 1.45 2.73 2.72 2.7 1.4 3 88 VKRSKAYSN 96-1.41 2.17 1.8 2.02 2.31 1.86 2.02 1.3 2.3 7 89 KRSKAYSNC 97-0.34 0 90 RSKAYSNCY 98-0.28 1.84 1.31 1 91 SKAYSNCYP 99-0.21 0 Summarized Results DRB1*0101 DRB1*0301 DRB1*0401 DRB1*0701 DRB1*0801 DRB1*1101 DRB1*1301 DRB1*1501 Total Maximum Single Z score 2.17 1.84 2.02 2.39 2.77 3.29 2.7 2.3 -- Sum of Significant Z scores 2.17 5.34 2.02 4.7 9.25 9.88 8.18 2.3 43.84 Count of Significant Z Scores 1 3 1 2 4 4 4 1 20 Total Assessments Performed: 128 Hydrophobicity:-1.11 EpiMatrix Score: 30.66 EpiMatrix Score (w/o flanks): 31.98

Interactive Class II Cluster Report

Class II Cluster Scale

Interactive Class II Cluster Report

Class II Logo Report