MLECNIK Bernhard & BINDEA Gabriela Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools INSERM U872, Jérôme Galon Team15: Integrative Cancer Immunology Cordeliers Research Center, Paris, France
Introduction Colorectal cancer, TNM staging I / II No - therapy Tumor(T) Lymph node(n) Metastasis(M) Surgery Patient Classification Stage III / IV 0% Relapse 100% Relapse Kaplan Mayer Survival Curves for the postoperative follow up of ~400 Patients Chemo - therapy
Immune infiltration predicts patient outcome in colorectal cancer Mechanisms? Galon et al. Science 2006
Immune infiltration predicts patient outcome in colorectal cancer High immune infiltration CD3 Tumor Low immune infiltration Galon et al. Science 2006 Mlecnik et al. JCO 2011
Objective - What are the molecular mechanisms leading to high intra-tumoral memory T cell densities and good prognosis -> Applying biomolecular network enrichment using in silico prediction and bioinformatic tools to generate hypotheses
Overview 1. Integrative analysis of the colorectal tumor microenvironment 2. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks
Large amount of data Affymetrix LDA 105 pts 125 pts TCR Repertoire FACS 20 pts 30 pts TMA acgh MSI 216 pts 280 pts SNP Integrative analysis at gene & protein level 648 pts http://www.ici.upmc.fr 50 pts Mlecnik et al. BMC Genomics. 2010
TME analysis modules, R based - KM curves - ROC curves - Parametric, Non-parametric tests - correlation Analysis of pathways: PathwayExplorer From genes to networks of functions: ClueGO http://www.ici.upmc.fr Mlecnik et al. Nucleic Acids Res 2005 Bindea, Mlecnik et al. Bioinformatics 2009 Mlecnik et al. BMC Genomics. 2010
Search tool for predicting functional interactions (STRING) Conserved Genomic Neighborhood (630 fully sequenced organisms) Gene Fusion Co-occurrence Co-expression Analysis Experiments Functional Genomic Databases predicted genes Textmining (16 million PubMed references) -> Weighted Score for links between known and predicted genes -> in silico prediction of functional interactions between genes Jensen et al. Nucleic Acids Res. 2009
Biomolecular Network Enrichment qpcr for 48 selected immune and tumor genes analysed in 104 colorectal cancer (CRC) patients 12 log-rank significant genes for disease free survival (DFS) at median cut-off for high vs. low gene expression, p<0.05 In silico enrichment using the 12 significant genes (STRING) Top predicted genes Network of experimental + enriched genes (Cytoscape, GOlorize)
In silico network In silico network, 48 initial genes, shown with top enriched genes Edge thickness: r < 0.65 0.65 r < 0.8 r 0.8 Innate and adaptive immune genes Edge Colors: Pro-inflamatory Pearson correlation Combined correlation STRING and Pearson Cytotoxic immune response Th1 T cell activation Negative reg of immune response Angiogensis Immuno suppressive Th2 Metastasis spreading Nodes colored by GO categories STRING combined score Node size: log-rank P-values Tumor invasion P-value 0.05 0.01< P-value 0.05 P-value 0.01 STRING predicted top 40 Mlecnik et al. Gastroenterology. 2010
Hypothesis generation based on ClueGO analysis List1 List2 48 genes qpcr 65 in silico predicted genes 14 functional annotations specific for List2 GO 991 Terms 1110 Terms Chemotaxis ClueGO 9 14 Cell adhesion ClueGO: Bindea, Mlecnik et al. Bioinformatics 2009
Analysis of immune cell densities within the tumor CK8 (blue) / CD8 (brown) 300 200 100 CX3CL1 Cells/mm2 CX3CL1 - Lo CXCL9 - Lo CXCL10 - Lo CD8-CT Cells/mm2 Tissue Microarrays Cells/mm2 Hi Lo 300 200 P<0.05 100 0 CXCL9 CX3CL1 - Hi CXCL9 - Hi CXCL10 - Hi 0 P=0.05 Hi Lo 300 200 100 CXCL10 0 P<0.05 Hi Lo
Analysis of immune cell densities within the tumor Flow cytometry CD3+CD8+ % total cells 8 16 6 12 4 8 2 0 P=0.07 Hi 8 16 6 12 4 8 2 4 0 P<0.05 Hi Lo Hi Lo P<0.05 Hi Lo 16 6 12 4 8 2 CXCL10 0 P=0.06 8 % total cells CXCL9 4 0 Lo % total cells CX3CL1 CD3+TCRαβ+ 0 P<0.05 Hi Lo 4 0 P<0.05 Hi Lo
Tissue micro array (TMA) analysis in the center (CT) and in the invasive margin (IM) of the tumor P values Th1, Activated T Cytotoxic T cells, NK cells Memory T cells Macrophages Mlecnik et al. Gastroenterology. 2010 Bindea, Mlecnik et al. Curr Opin Immunol. 2010
Analysis of immune cell densities within the tumor Highly polyclonal repertoire (V-D-J) All Vβ are present Vβ CDR3 length Overall Survival (%) Most CDR3 lengths are represented TCR Vβ2L03 100 Hi 80 60 40 20 Lo Specific T cells are associated with High/Low 0 0 20 40 60 80 100 expression of CX3CL1 and with the overall Months survival of the patients Mlecnik et al. Gastroenterology. 2010 Bindea, Mlecnik et al. Curr Opin Immunol. 2010
T cell receptor β gene (TCRB) rearrangement. Hodges E et al. J Clin Pathol 2003;56:1-11 2003 by BMJ Publishing Group Ltd and Association of Clinical Pathologists
Conclusions - Biomolecular network creation using in silico prediction reveals potential genes associated to colorectal cancer outcome - The predicted chemokines and adhesion molecules are correlated with the in situ immune cell density - Specific chemokines may attract distinct populations of immune cells, including subpopulations of T cells with a particular repertoire (TCR receptors) - These specific T cells are associated with patient survival
ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks
The challenge: biological interpretation of large gene lists Biological role Gene Ontology (GO) KEGG BioCarta PathwayExplorer Cytoscape GO: controlled vocabulary (terms) that can be used to describe any organism. It has hierarchical structure (parentchild relations between the terms) KEGG, BioCarta, PathwayExplorer: pathway analysis Cytoscape: visualization of biological networks
The challenge: biological interpretation of large gene lists Gene Ontology: T cell receptor related terms (#22) GOID GOTerm # of Associated Genes Type of Ontology
The challenge: biological interpretation of large gene lists BioCarta: T cell receptor signaling pathway
The challenge: biological interpretation of large gene lists KEGG: T cell receptor signaling pathway
The challenge: biological interpretation of large gene lists PathwayExplorer: gene expression mapped to pathways Mlecnik et al. Nucleic Acids Res 2005
Different data sources and tools for biological interpretation of large lists of genes List of genes Experiment KEGG BIOCARTA PathwayExplorer Pathways Cytoscape Gene network GO database Terms Difficulties - Different annotation sources - Redundant information - Many functional annotations for each gene - Identification of related genes
Example 238 genes mapped in GO => 1504 terms expansion of knowledge... instead of going to the essence
Objectives To develop ClueGO, a tool for extracting the global, non-redundant functional information for lists of genes To provide intuitive, comprehensive representation of the results To integrate ClueGO with existing tools for analysis of high-throughput experiments
Integration of different data sources and tools for biological interpretation of large lists of genes with ClueGO List of genes Experiment KEGG BIOCARTA Pathways Cytoscape Gene network ClueGO Term network Charts Statistics GO database Terms
ClueGO Java programming language System requirements Windows, Linux, Unix or MacOS 1024MB RAM + Java 1.5+ Cytoscape 2.6.+ Up to date 14 organisms available
ClueGO, a Cytoscape plugin ClueGO Network ClueGO Results ClueGO ControlPanel
ClueGO, a Cytoscape plugin Chart (Terms)
ClueGO, a Cytoscape plugin OverviewChart (Groups)
ClueGO, a Cytoscape plugin ClueGO settings summary Advanced settings
Redundancy reduction 142 significantly differentially expressed (CRC/CN, wilcoxon, p<0.05) LDA3 genes 238 nonsignif LDA3 genes GO mapping ClueGO 142 238 LDA3 significant genes LDA3 nonsignificant genes 1504 GO Terms 1074 GO Terms 76 11 11
Next challenge: comparison of patterns based on functional analysis List A List B Common GO terms GO terms specific List A GO terms specific List B creation of functional networks
Functional analysis for two lists of genes Groups network
Functional analysis for two lists of genes Genes distribution network %GenesList A 1:1 GenesListA-B %GenesListB
ClueGO fast extraction of biological information from a large data volume 1. Reduces the redundancy (fusion of similar terms) 2. Visualize the terms in a functionally grouped network 3. Compare the ontology of two lists of genes 4. Automatically calculates statistics (FisherExactTest) 5. Result Folder (network, charts,.cys file, matrix) 6. Provides predefined settings 7. User friendly update and organism extension
Open access: http://www.ici.upmc.fr/cluego/ Bindea, Mlecnik et al. Bioinformatics 2009
Summary - ClueGO strongly improves biological interpretation of large lists of genes by analyzing interrelations of corresponding GO/pathway terms and functional groups in biological networks - It can be integrated with existing tools used for the analysis of high-throughput data
Acknowledgments INSERM U872, Integrative Cancer Immunology Team Jérôme Galon Marie Tosolini Amos Kirilovsky Franck Pagès Melanie Gillard Tessa Fredriksen Stephanie Mauger Cordeliers Research Center Wolf-Herman Fridman
MLECNIK Bernhard & BINDEA Gabriela Analysis of the tumor microenvironment using integrative networks and bioinformatic tools INSERM U872, Jerome Galon Team15: Integrative Cancer Immunology Cordeliers Research Center, Paris, France