Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

Similar documents
Analysis of Illumina Gene Expression Microarray Data

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Tutorial for proteome data analysis using the Perseus software platform

The Advantages and Disadvantages of Using Gene Ontology

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

B Cell Generation, Activation & Differentiation. B cell maturation

Hormones & Chemical Signaling

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Hierarchical Clustering Analysis

TEMA 10. REACCIONES INMUNITARIAS MEDIADAS POR CÉLULAS.

Analysis of the colorectal tumor microenvironment using integrative bioinformatic tools

ProteinQuest user guide

Molecule Shapes. 1

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

specific B cells Humoral immunity lymphocytes antibodies B cells bone marrow Cell-mediated immunity: T cells antibodies proteins

Dr Alexander Henzing

Guide for Data Visualization and Analysis using ACSN

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Gene expression analysis. Ulf Leser and Karin Zimmermann

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Quantitative proteomics background

Gene Expression Assays

Chapter 43: The Immune System

Network Webinar Series

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

B Cells and Antibodies

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Recognition of T cell epitopes (Abbas Chapter 6)

HUMORAL IMMUNE RE- SPONSES: ACTIVATION OF B CELLS AND ANTIBODIES JASON CYSTER SECTION 13

Analysis of gene expression data. Ulf Leser and Philippe Thomas

A Streamlined Workflow for Untargeted Metabolomics

(A) Microarray analysis was performed on ATM and MDM isolated from 4 obese donors.

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Autoimmunity and immunemediated. FOCiS. Lecture outline

Actions of Hormones on Target Cells Page 1. Actions of Hormones on Target Cells Page 2. Goals/ What You Need to Know Goals What You Need to Know

Factors for success in big data science

Exploratory data analysis for microarray data

Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Microbiology, Immunology and Serology

A Primer of Genome Science THIRD

Hapten - a small molecule that is antigenic but not (by itself) immunogenic.

DeCyder Extended Data Analysis module Version 1.0

Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients

Exercise with Gene Ontology - Cytoscape - BiNGO

CyTOF2. Mass cytometry system. Unveil new cell types and function with high-parameter protein detection

Understanding West Nile Virus Infection

T Cell Maturation,Activation and Differentiation

ANIMALS FORM & FUNCTION BODY DEFENSES NONSPECIFIC DEFENSES PHYSICAL BARRIERS PHAGOCYTES. Animals Form & Function Activity #4 page 1

Deep profiling of multitube flow cytometry data Supplemental information

The immune system. Bone marrow. Thymus. Spleen. Bone marrow. NK cell. B-cell. T-cell. Basophil Neutrophil. Eosinophil. Myeloid progenitor

Statistical issues in the analysis of microarray data

Identification of rheumatoid arthritis and osteoarthritis patients by transcriptome-based rule set generation

Comparing Methods for Identifying Transcription Factor Target Genes


Course Curriculum for Master Degree in Medical Laboratory Sciences/Clinical Biochemistry

ALLEN Mouse Brain Atlas

Quality Assessment of Exon and Gene Arrays

Lecture 8. Protein Trafficking/Targeting. Protein targeting is necessary for proteins that are destined to work outside the cytoplasm.

Class time required: Two 40-minute class periods + homework. Part 1 may be done as pre-lab homework

Chapter-21b: Hormones and Receptors

Microarray Data Mining: Puce a ADN

Activation and effector functions of HMI

Validated Cell-Based Assays for Rapid Screening and Functional Characterization of Therapeutic Monoclonal Antibodies

Visualization of the Phosphoproteomic Data from AfCS with the Google Motion Chart Gadget

Chapter 8. Summary and Perspectives

Master BioMedical Sciences (BMS) Track Cell Biology and Advanced Microscopy

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Measuring gene expression (Microarrays) Ulf Leser

Correlation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York

A truly robust Expression analyzer

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

Gene Expression Analysis

Name (print) Name (signature) Period. (Total 30 points)

Making the switch to a safer CAR-T cell therapy

GENEGOBI : VISUAL DATA ANALYSIS AID TOOLS FOR MICROARRAY DATA

Thomson Reuters Biomarker Solutions: Hepatitis C Treatment Biomarkers and special considerations in patients with Asthma

Graduate and Postdoctoral Affairs School of Biomedical Sciences College of Medicine. Graduate Certificate. Metabolic & Nutritional Medicine

LESSON 3: ANTIBODIES/BCR/B-CELL RESPONSES

CNV Univariate Analysis Tutorial

BSc in Medical Sciences with PHARMACOLOGY

OplAnalyzer: A Toolbox for MALDI-TOF Mass Spectrometry Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Introduction to Flow Cytometry

岑 祥 股 份 有 限 公 司 技 術 專 員 費 軫 尹

MultiExperiment Viewer Quickstart Guide

Time series experiments

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

Inflammatory Cytokine-induced Expression of Vasohibin-1 by Rheumatoid Synovial Fibroblasts

Frequently Asked Questions (FAQ)

Antibody Function & Structure

Session 1. Course Presentation: Mass spectrometry-based proteomics for molecular and cellular biologists

Vitamin D deficiency exacerbates ischemic cell loss and sensory motor dysfunction in an experimental stroke model

Pulling the Plug on Cancer Cell Communication. Stephen M. Ansell, MD, PhD Mayo Clinic

ProteinPilot Report for ProteinPilot Software

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Biochemistry. Entrance Requirements. Requirements for Honours Programs. 148 Bishop s University 2015/2016

Objectives. Immunologic Methods. Objectives. Immunology vs. Serology. Cross Reactivity. Sensitivity and Specificity. Definitions

What s New in Pathway Studio Web 11.1

Discovery & Modeling of Genomic Regulatory Networks with Big Data

Microarray analysis of viral infections

Transcription:

Identification of rheumatoid arthritis and osterthritis patients by transcriptome-based rule set generation Bering Limited Report generated on September 19, 2014 Contents 1 Dataset summary 2 1.1 Project description........................... 2 2 Array processing and normalization 3 3 Quality 3 3.1 Outlier detection............................ 3 3.1.1 Principal Component Analysis................. 5 3.1.2 Signal density and box plots.................. 6 3.1.3 Array similarity heatmap and hierarchical clustering..... 6 3.2 Batch correction............................ 7 3.3 Quality summary........................ 7 4 Differential expression analysis 8 5 Gene Ontology enrichment 9 5.1 Biological Process........................... 10 5.2 Cellular Component.......................... 10 5.3 Molecular Function........................... 10 6 Reactome pathway enrichment 11 1

1 Dataset summary Number of samples: 30 Number of chip identifiers: 506944 Comparison: ra vs. 1.1 Project description ArrayExpression accession number: E-GEO-55235. Discrimination of rheumatoid arthritis (RA) patients from patients with other inflammatory/degenerative joint diseases or healthy individuals purely on the basis of genes differentially expressed in high-throughput data has proven very difficult. Thus, the present study sought to achieve such discrimination by employing a novel unbiased apprch using rule-based classifiers. Three multi-center genome-wide transcriptomic data sets (Affymetrix HG- U133 A/B) from a total of 79 individuals, including 20 healthy s ( group - CG), as well as 26 osterthritis (OA) and 33 RA patients. Reference: Woetzel D., et al. Identification of rheumatoid arthritis and osterthritis patients by transcriptome-based rule set generation. Arthritis Research & Therapy 2014, 16:R84 file.name GSM1332201 ND 1 S... GSM1332202 ND 2 S... GSM1332203 ND 3 S... GSM1332204 ND 4 S... GSM1332205 ND 5 S... GSM1332206 ND 6 S... GSM1332207 ND 7 S... GSM1332208 ND 8 S... GSM1332209 ND 9 S... GSM1332210 ND 10... GSM1332211 OA 1 S... GSM1332212 OA 2 S... GSM1332213 OA 3 S... GSM1332214 OA 4 S... GSM1332215 OA 5 S... GSM1332216 OA 6 S... GSM1332217 OA 7 S... GSM1332218 OA 8 S... GSM1332219 OA 9 S... GSM1332220 OA 10... phenotype

GSM1332221 RA 1 S... ra GSM1332222 RA 2 S... ra GSM1332223 RA 3 S... ra GSM1332224 RA 4 S... ra GSM1332225 RA 5 S... ra GSM1332226 RA 6 S... ra GSM1332227 RA 7 S... ra GSM1332228 RA 8 S... ra GSM1332229 RA 9 S... ra GSM1332230 RA 10... ra Table 1: Sample-data relationships 2 Array processing and normalization After array normalization and detection of present probes, 14595 probes were retained. 3 Quality GeneProfiler pipeline aims to identify outliers, batch effects, and overly noisy experiments. Automated quality is carried out using the arrayqualitymetrics Bioconductor package. 3.1 Outlier detection Outlier detection is carried out using three distinct apprches: Box plot: Each box corresponds to one array. Typically, one expects the boxes to have similar positions and widths. If the distribution of an array is very different from the others, this may indicate an experimental problem. Outlier detection is performed by computing the Kolmogorov-Smirnov statistic Ka between each array s distribution and the distribution of the pooled data. Signal Density plot: Typically, the distributions of the arrays should have similar shapes and ranges. Arrays whose distributions are very different from the others should be considered for possible problems. Inter-sample correlation heatmap: Patterns in this plot can indicate clustering of the arrays either because of intended biological or unintended experimental factors (batch effects). The distance between two arrays is computed as the mean absolute difference between the data of the arrays (using the data from all probes without filtering). Outlier detection is performed by

looking for arrays for which the sum of the distances to all other arrays is exceptionally large. Results of outlier detection are shown in Table 2. Table columns contain results of a specific outlier detection test. FALSE value indicates that an array is not an outlier, while TRUE value indiciates that an array is an outlier. An array will be considered an outlier, and labeled so in a column Vote, if it is called an outlier by at least two methods. Boxplot Density Heatmap Vote GSM1332201 ND 1 S... FALSE FALSE FALSE FALSE GSM1332202 ND 2 S... TRUE FALSE FALSE FALSE GSM1332203 ND 3 S... FALSE FALSE FALSE FALSE GSM1332204 ND 4 S... FALSE FALSE FALSE FALSE GSM1332205 ND 5 S... FALSE FALSE FALSE FALSE GSM1332206 ND 6 S... FALSE FALSE FALSE FALSE GSM1332207 ND 7 S... TRUE FALSE FALSE FALSE GSM1332208 ND 8 S... TRUE FALSE FALSE FALSE GSM1332209 ND 9 S... FALSE FALSE FALSE FALSE GSM1332210 ND 10... FALSE FALSE FALSE FALSE GSM1332211 OA 1 S... FALSE FALSE FALSE FALSE GSM1332212 OA 2 S... FALSE FALSE FALSE FALSE GSM1332213 OA 3 S... FALSE FALSE FALSE FALSE GSM1332214 OA 4 S... FALSE FALSE FALSE FALSE GSM1332215 OA 5 S... FALSE FALSE FALSE FALSE GSM1332216 OA 6 S... FALSE FALSE FALSE FALSE GSM1332217 OA 7 S... FALSE FALSE FALSE FALSE GSM1332218 OA 8 S... FALSE FALSE FALSE FALSE GSM1332219 OA 9 S... FALSE FALSE FALSE FALSE GSM1332220 OA 10... TRUE FALSE FALSE FALSE GSM1332221 RA 1 S... FALSE FALSE FALSE FALSE GSM1332222 RA 2 S... FALSE FALSE FALSE FALSE GSM1332223 RA 3 S... FALSE FALSE FALSE FALSE GSM1332224 RA 4 S... FALSE FALSE FALSE FALSE GSM1332225 RA 5 S... FALSE FALSE FALSE FALSE GSM1332226 RA 6 S... FALSE FALSE FALSE FALSE GSM1332227 RA 7 S... FALSE FALSE FALSE FALSE GSM1332228 RA 8 S... FALSE FALSE FALSE FALSE GSM1332229 RA 9 S... FALSE FALSE FALSE FALSE GSM1332230 RA 10... FALSE FALSE FALSE FALSE Table 2: Outlying arrays.

3.1.1 Principal Component Analysis Figure 1: Scatterplot visualising Principal Component Analysis for 30 arrays. Outliers (if any) are shown in red. Principal Components Analysis (PCA) plots were used to visualize the overall quality of a micrrray dataset. Each point in the PCA plots corresponds to an array. Dissimilar arrays are further apart.

3.1.2 Signal density and box plots Figure 2: Boxplots and signal intensity densities for 30 arrays. Outliers (if any) are shown in red. 3.1.3 Array similarity heatmap and hierarchical clustering Hiearachical clustering was used to determine if sample clusters correspond to the experimental sample groups, rather than to technical sources of variation.

Figure 3: Array similarity heatmap for 30 arrays. The color scale is chosen to cover the range of distances encountered in the dataset. There were 0 outlying arrays. 3.2 Batch correction If batches are specified, they are corrected. 3.3 Quality summary Of 22283 probes, 12917 passed quality protocols. 30 samples passed outlier detection criteria.

4 Differential expression analysis Differential expression analysis was carried out comparing ra vs.. There were 692 up-regulated and 801 down-regulated genes (p value 0.05, FDR-correction: No). Top 10 differentially expressed genes are shown in Table 3. Symbol Name logfc P.Value CXCL13 Chemokine (C-X-C motif) ligand 1.1E+01 2.4E-10 13 SLAMF8 SLAM family member 8 7.4E+00 4.6E-12 TPD52L1 Tumor protein D52-like 1-4.8E+00 1.4E-11 ADAMDEC1 ADAM-like, decysin 1 4.7E+00 6.4E-10 SERPINA1 Serpin peptidase inhibitor, 4.6E+00 2.0E-09 clade A (alpha-1 antiproteinase, antitrypsin), member 1 NOVA1 Neuro-oncological ventral -3.0E+00 4.3E-10 antigen 1 CCL13 Chemokine (C-C motif) ligand 4.5E+00 5.1E-08 13 ISG20 Interferon stimulated 2.3E+00 2.2E-10 exonuclease gene 20kDa CD27 CD27 molecule 2.9E+00 1.6E-09 CRLF1 Cytokine receptor-like factor 1-4.7E+00 3.2E-07 Table 3: Top 10 differentially expressed genes.

Figure 4: Volcano plot of all differentially expressed genes in ra vs.. Top 5 differentially expressed genes are labeled. 5 Gene Ontology enrichment 1493 differentially expressed genes were enriched for Gene Ontology (GO) Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) terms. All micrrray genes (n=8934) were used as background. Headers in Tables 4, 5, and 6, Significant and P.Value reffer to number of significant genes annotated by a term and corresponding significance p-values respectively.

5.1 Biological Process Term Significant P.Value Immune Response 277 1.20E-30 Immune System Process 372 8.10E-25 Defense Response 254 2.90E-20 Regulation Of Immune System Process 219 1.00E-19 Regulation Of Immune Response 165 5.30E-19 Positive Regulation Of Immune System Process 155 2.10E-18 Positive Regulation Of Response To Stimulus 258 1.20E-17 Regulation Of Response To Stimulus 418 2.20E-17 Signal Transduction 600 3.80E-17 Signaling 638 1.10E-16 Table 4: Top enriched Gene Ontology Biological Process terms. 5.2 Cellular Component Term Significant P.Value Cell Periphery 491 1.00E-18 Plasma Membrane 480 1.40E-18 Membrane 774 1.50E-17 Extracellular Region 421 1.40E-13 Membrane Part 566 2.10E-11 Intrinsic Component Of Membrane 473 3.90E-11 Integral Component Of Membrane 465 1.00E-10 Extracellular Region Part 355 1.70E-10 Extracellular Space 154 6.10E-10 Side Of Membrane 63 8.60E-10 Table 5: Top enriched Gene Ontology Cellular Component terms. 5.3 Molecular Function Term Significant P.Value Receptor Activity 158 8.00E-12 Signal Transducer Activity 167 5.50E-09 Molecular Transducer Activity 167 5.50E-09 Receptor Binding 178 8.30E-09 Transmembrane Signaling Receptor Activity 106 1.70E-08 Signaling Receptor Activity 116 7.60E-08 Antigen Binding 24 9.40E-08 Sulfur Compound Binding 41 8.70E-07 Heparin Binding 32 3.30E-06 Chemokine Activity 16 3.60E-06 Table 6: Top enriched Gene Ontology Molecular Function terms.

6 Reactome pathway enrichment 1493 differentially expressed genes were enriched for Reactome pathways. Top 10 enriched pathways are shown in Table 7. Description P.Value Count Activity.Score Immune System 1.50E-03 85 0.6 Adaptive Immune System 5.40E-03 42 0.6 Phosphorylation of CD3 and TCR zeta 1.90E-02 11 1.8 chains Lipid and lipoprotein metabolism 1.90E-02 7-1.9 Hemostasis 2.00E-02 31-0.2 TCR signaling 2.00E-02 12 1.7 Cytokine Signaling in Immune system 2.60E-02 31 1.0 Platelet activation, signaling and 2.70E-02 29-0.2 aggregation Antigen Activates B Cell Receptor Leading 3.00E-02 24-0.0 to Generation of Second Messengers Alternative complement activation 3.10E-02 26-0.5 Table 7: Top enriched Reactome Pathways. Column P.Value refers to raw enrichment significance p-values. Column Count highlights the total number of differentially expressed genes assigned to a specific pathway. Column Activity.Score corresponds to the average pathway fold change in ra vs. comparison.