Systematic assessment of cancer missense mutation clustering in protein structures
|
|
- Bernice Lawson
- 8 years ago
- Views:
Transcription
1 Systematic assessment of cancer missense mutation clustering in protein structures Atanas Kamburov, Michael Lawrence, Paz Polak, Ignaty Leshchiner, Kasper Lage, Todd R. Golub, Eric S. Lander, Gad Getz SI Appendix
2 Supplemental Methods Collapsing consecutive mutated residues To examine the effect of consecutive mutated residues on CLUMPS results, we implemented a variant of CLUMPS where two or more mutated residues, which were consecutive in the protein sequence, were combined to a single "meta-residue". The 3-D location of the centroid of the new meta-residue [used for Euclidean distance measurements to other mutated (meta-) residues] was calculated based on the 3-D locations of the individual member residues and also depended linearly on their mutational recurrence. For example, if both residues P[k] and P[k+1] of protein P are found mutated and P[k] is mutated much more frequently than P[k+1], then the centroid of the new meta-residue P[k:k+1] will be closer to the centroid of P[k] than to the centroid of P[k+1]. Unlike in the original CLUMPS implementation, (meta-) residues were not allowed to be immediately next to each other in the protein sequence during the permutations. Comparison of methods for cancer gene identification Per-gene p-values calculated with MutSig and its components MutSig-CL, MutSig-FN and MutSig-CV were obtained from the original PanCancer study [1]. To enable comparison of the per-gene p-values calculated with these methods with the CLUMPS p-values (calculated per structure), we considered the smallest CLUMPS p-value of the representative structures for each protein Protein interaction interfaces Information about human protein residues forming interaction interfaces with other human proteins, small molecule/ion ligands, DNA or RNA (based on co-complex structures from PDB) was obtained from the PDBsum database [2] on All residues of a protein predicted by PDBsum to be involved in any type of contact (e.g., hydrogen or disulphide bonds or non-bonded contacts) with the interaction partner were considered interface residues. Only interfaces with at least one mutation were analyzed. In cases where multiple co-complex structures were available for a given pair of interactors, we selected the structure maximizing interface size and sequence coverage of the protein interactor(s), as well as the number of mutations at the interface. As expected, factoring the number of mutations in interaction interfaces into the selection process and especially restricting the analysis to interfaces with at least one observed mutation led to some inflation in a Q-Q plot (SI Appendix, Fig. S12); however, we aimed to avoid missing interesting biological interactions due to falsenegative contact residue predictions in PDBsum. Mutually similar (in terms of interface residues) protein-ligand interfaces were grouped together and from each group, only one representative interface was analyzed (i.e., the one comprising most residues). This was done to avoid testing separately interfaces like KRAS-GTP, KRAS-GDP, KRAS-inhibitor, etc. In the case of protein-protein interactions, we focused only on heteromers since for many homomeric co-complex structures, it is unclear whether the corresponding protein forms oligomers in solution or if the observed residue contacts are attributable only to the way the protein was crystallized ("crystal-packing interactions") [3]. Moreover, in many instances one of the interactors was not annotated with a UniProt identifier in PDB/SIFTS despite the existence of a non-standard protein name annotation. To recover missing UniProt annotations, we aligned all non-annotated sequences that were found in protein complexes with human
3 proteins against UniProt/SwissProt-human using WU-BLAST ( A given query sequence was annotated with the UniProt reference identifier corresponding to the smallest BLASTP alignment p-value but only if at least 90% of the query was aligned to the reference with at least 90% sequence identity. Protein/RNA expression and copy number data Matched TCGA RPPA, RNAseq and copy number data from endometrial [4] and colorectal tumor samples [5] (used for quantifying the expression of SPOP substrates and CCNE1, respectively) were downloaded from the Broad GDAC portal ( The samples were divided into several groups according to SPOP/FBXW7 mutation and substrate copy number statuses (SI Appendix, Fig. S6 B and Main Text Fig. 5). Before plotting, protein and RNA expression levels in each sample were normalized by subtracting the median and dividing by the standard deviation of the corresponding expression level distributions of samples with no SPOP/FBXW7 somatic mutations and no substrate copy number changes. A gene was considered amplified/deleted if it was in a genomic segment, supported by at least 3 SNP probes, with mean above 0.3/below -0.3 in the copy number data. References 1. Lawrence MS et al. (2014) Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505: de Beer TAP, Berka K, Thornton JM, Laskowski RA (2014) PDBsum additions. Nucleic Acids Res. 42:D Janin J (1997) Specific versus non-specific contacts in protein crystals. Nat. Struct. Biol. 4: The Cancer Genome Atlas Network (2013) Integrated genomic characterization of endometrial carcinoma. Nature 497: The Cancer Genome Atlas Network (2012) Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:
4 Figure S1. Overview of our CLUMPS approach for identifying significant mutation clustering in protein structures. WAP: weighted average proximity score; d q,r : spatial (Euclidean) distance between the centroids of residues q and r ; n q and n r : normalized number of samples with missense mutations impacting residues q and r, respectively; t: soft distance threshold (see Materials and Methods in the Main Text for details).
5 Figure S2: Quantile-quantile plot of empirical p-values calculated with CLUMPS for all tested (representative) protein structures (Dataset 1). Significant and near-significant protein structures are labeled; purple label color indicates tumor suppressors and green color indicates oncoproteins.
6 Missense hotspot: p.s340l Splice site hot-spot Figure S3: TumorPortal ( screenshot showing the positions of mutations in NUF2. Missense mutations are shown as green circles, with color intensity scaling with evolutionary conservation. The portion of the NUF2 protein sequence covered by the structure shown in Fig. 3 (Main Text) is highlighted in black.
7 A B Figure S4: Several non-recurrent mutations in STK11 impact residues at the active site, forming a spatial (3-D) cluster. A) TumorPortal ( screenshot showing the positions of mutations in the linear STK11 protein sequence. Missense mutations are shown as green circles, with color intensity scaling with evolutionary conservation. B) Structure of STK11 (PDB: 2WTK) with mutated residues shown as red lines. Mutations that cluster together at the active site are labeled; p.n181 and p.d194 were found mutated in two samples each, the rest of the labeled residues in one sample each. Shown in blue is phosphoaminophosphonic acid-adenylate ester, an analog of substrate ATP.
8 Figure S5: Comparison of CLUMPS p-values (denoted Spatial clustering ) against p-values calculated for the corresponding genes using the MutSig suite of tools for detecting cancer genes. MutSig provides three p-values corresponding to three different statistical tests (MutSig-CL: linear clustering of mutations; MutSig-CV: overall mutation burden, taking into account covariates like replication timing and expression level; and MutSig-FN: the relative frequency of mutations at evolutionarily conserved and likely functional DNA bases), as well as a combined p-value (MutSigintegrated). The plots correspond to a comparison of each of these four MutSig p-values against the CLUMPS p-value for the corresponding gene (the most significant CLUMPS p-value is considered if there are multiple representative protein structures). Spearman s correlation coefficient ρ is provided in each figure. Dashed red lines correspond to nominal significance thresholds (p=0.01). Genes detected as significant or near-significant with CLUMPS, but not with MutSig or its separate components, are labeled.
9 A Cluster E (endometrial only; newly identified) Cluster S (substratebinding pocket) B Figure S6: Clusters of endometrial and prostate cancer mutations in SPOP. A) TumorPortal ( screenshot showing the positions of mutations in SPOP. Missense mutations are shown as green circles, whose color intensity scales with evolutionary conservation. The portion of the SPOP protein sequence covered by the structure shown in Fig. 4 (Main Text) is highlighted in black. B) Protein and RNA levels of the SPOP substrates MAPK8 and PTEN in endometrial tumors with mutations from both Clusters E and S compared to SPOP-wildtype endometrial tumors (protein and RNA expression levels correspond to RPPA and RNAseq measurements by TCGA, respectively).
10 Figure S7: PPP2R1A (grey) bound to PPP2R5C (green) (PDB: 2NYL). Mutated residues in both proteins are highlighted in red, with color intensity scaling with the number of samples harboring missense mutations impacting the corresponding residue. Recurrent mutations ( 3 samples) are shown as sticks, non-recurrent mutations as thin lines. PPP2R1A mutations at the interface are labeled.
11 Figure S8: HRAS (grey) bound to RASA1 (green) (PDB: 1WQ1). Mutations in both proteins are colored in red, with color intensity scaling with recurrence. Recurrent mutations ( 3 samples) are shown as sticks, non-recurrent mutations as thin lines. Mutated interface residues in both proteins are labeled (black label: HRAS residues, green label: RASA1 residues).
12 Figure S9: OGT (grey) bound to an HCFC1 fragment (orange) (PDB: 4N3B). Residues in both proteins that are impacted by missense mutations are highlighted in red; those at the common interaction interface are labeled (black label: OGT residues, brown label: HCFC1 residues).
13 Figure S10: Distribution of the relative reference (UniProt) protein sequence coverage of all 3-D structures of proteins used in the full CLUMPS analysis (prior to selecting the representative structures per protein). SI Appendix, Fig S12 shows a corresponding distribution after the selection of representative structures.
14 Figure S11: Protein sequence coverage by individual PDB structures is depicted for the top 20 proteins that showed significant or near-significant 3-D mutation clustering. The proteins are ordered on the x-axis and the length of each protein sequence is normalized to unity. The y-axis shows log 10 (CLUMPS p-value). Each blue line corresponds to a PDB structure/chain; its x-dimensions show the relative coverage of the protein sequence and its y-dimension shows the mutation clustering p- value for that structure/chain. Many overlapping lines are shown as a single thicker line. Red lines correspond to the structure selected by our greedy search algorithm (see Materials and Methods in the Main text).
15 Figure S12: Distribution of the overall relative reference (UniProt) protein sequence coverage (= total residues covered by all selected 3D structures for a protein over the number of residues in the protein) for all proteins used in the full CLUMPS analysis.
16 A B Figure S13: Plots of functions used for calculating the Weighted Average Proximity (WAP) score: A) f d; t = 6 = e!!!,!!!! B) h N; Θ = 2, m = 3 =!!!!!!!!
17 Figure S14: Comparison of p-values obtained with the original implementation of CLUMPS, which weights mutated residues according to recurrence (see Materials and Methods) (black dots) against corresponding p-values obtained with a version of CLUMPS that weights all mutated residues equally (red stars). The top scoring 300 structures from Dataset 1 are shown.
18 Figure S15: Quantile-quantile plot of empirical p-values corresponding to mutation enrichment in interaction interfaces. Red dots represent significant interfaces (q 0.1; see Table 2 in the Main Text and Datasets 8, 10, 11, 12). The apparent slight inflation is due to the pre-filtering of interfaces to select only those with at least one mutation and because the interface selection strategy favors interfaces with more mutations among different PDB instances of similar interfaces in order to increase sensitivity (see Materials and Methods in the Main Text).
ALLEN Mouse Brain Atlas
TECHNICAL WHITE PAPER: QUALITY CONTROL STANDARDS FOR HIGH-THROUGHPUT RNA IN SITU HYBRIDIZATION DATA GENERATION Consistent data quality and internal reproducibility are critical concerns for high-throughput
More informationPREDA S4-classes. Francesco Ferrari October 13, 2015
PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.
More informationNOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS
NOVEL GENOME-SCALE CORRELATION BETWEEN DNA REPLICATION AND RNA TRANSCRIPTION DURING THE CELL CYCLE IN YEAST IS PREDICTED BY DATA-DRIVEN MODELS Orly Alter (a) *, Gene H. Golub (b), Patrick O. Brown (c)
More informationReal-time PCR: Understanding C t
APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationSystematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
More informationRETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationPackage cgdsr. August 27, 2015
Type Package Package cgdsr August 27, 2015 Title R-Based API for Accessing the MSKCC Cancer Genomics Data Server (CGDS) Version 1.2.5 Date 2015-08-25 Author Anders Jacobsen Maintainer Augustin Luna
More informationInSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis
InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis WHITE PAPER By InSyBio Ltd Konstantinos Theofilatos Bioinformatician, PhD InSyBio Technical Sales Manager August
More informationPackage empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
More informationGuide for Data Visualization and Analysis using ACSN
Guide for Data Visualization and Analysis using ACSN ACSN contains the NaviCell tool box, the intuitive and user- friendly environment for data visualization and analysis. The tool is accessible from the
More informationCRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.
: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data. Nicolas Philippe and Mikael Salson and Thérèse Commes and Eric Rivals February 13, 2013 1 Results
More informationNext Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationAnalysis of FFPE DNA Data in CNAG 2.0 A Manual
Analysis of FFPE DNA Data in CNAG 2.0 A Manual Table of Contents: I. Background P.2 II. Installation and Setup a. Download/Install CNAG 2.0 P.3 b. Setup P.4 III. Extract Mapping 500K FFPE Data P.7 IV.
More informationDeCyder Extended Data Analysis module Version 1.0
GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting
More informationFlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
More informationGuide for Bioinformatics Project Module 3
Structure- Based Evidence and Multiple Sequence Alignment In this module we will revisit some topics we started to look at while performing our BLAST search and looking at the CDD database in the first
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationPROTEINS THE PEPTIDE BOND. The peptide bond, shown above enclosed in the blue curves, generates the basic structural unit for proteins.
Ca 2+ The contents of this module were developed under grant award # P116B-001338 from the Fund for the Improvement of Postsecondary Education (FIPSE), United States Department of Education. However, those
More informationSupplementary Figure 1: Quality Assessment of Mouse Arrays. Supplementary Figure 2: Quality Assessment of Rat Arrays
Supplementary Figure 1: Quality Assessment of Mouse Arrays The mouse microarray data were subjected to an extensive quality-control procedure prior to conducting downstream analyses. We assessed the spread
More informationSingle-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation
PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic
More informationData Analysis for Ion Torrent Sequencing
IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationNext Generation Sequencing: Technology, Mapping, and Analysis
Next Generation Sequencing: Technology, Mapping, and Analysis Gary Benson Computer Science, Biology, Bioinformatics Boston University gbenson@bu.edu http://tandem.bu.edu/ The Human Genome Project took
More informationHuman Genome Organization: An Update. Genome Organization: An Update
Human Genome Organization: An Update Genome Organization: An Update Highlights of Human Genome Project Timetable Proposed in 1990 as 3 billion dollar joint venture between DOE and NIH with 15 year completion
More informationSupporting Information. Fast and Efficient Fragment-Based Lead Generation. by Fully Automated Processing and Analysis of
Supporting Information Fast and Efficient Fragment-Based Lead Generation by Fully Automated Processing and Analysis of Ligand-Observed NMR Binding Data Chen Peng, Alexandra Frommlet, Manuel Perez, Carlos
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationBreast cancer and the role of low penetrance alleles: a focus on ATM gene
Modena 18-19 novembre 2010 Breast cancer and the role of low penetrance alleles: a focus on ATM gene Dr. Laura La Paglia Breast Cancer genetic Other BC susceptibility genes TP53 PTEN STK11 CHEK2 BRCA1
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationThe Ramachandran Map of More Than. 6,500 Perfect Polypeptide Chains
The Ramachandran Map of More Than 1 6,500 Perfect Polypeptide Chains Zoltán Szabadka, Rafael Ördög, Vince Grolmusz manuscript received March 19, 2007 Z. Szabadka, R. Ördög and V. Grolmusz are with Eötvös
More informationCourse on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -
Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA
More informationMicroarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationA General Framework for Weighted Gene Co-expression Network Analysis
Please cite: Statistical Applications in Genetics and Molecular Biology (2005). A General Framework for Weighted Gene Co-expression Network Analysis Bin Zhang and Steve Horvath Departments of Human Genetics
More informationHENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT
HENIPAVIRUS ANTIBODY ESCAPE SEQUENCING REPORT Kimberly Bishop Lilly 1,2, Truong Luu 1,2, Regina Cer 1,2, and LT Vishwesh Mokashi 1 1 Naval Medical Research Center, NMRC Frederick, 8400 Research Plaza,
More informationSupplementary Information
Supplementary Information S1: Degree Distribution of TFs in the E.coli TRN and CRN based on Operons 1000 TRN Number of TFs 100 10 y = 619.55x -1.4163 R 2 = 0.8346 1 1 10 100 1000 Degree of TFs CRN 100
More informationData Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms
Data Processing of Nextera Mate Pair Reads on Illumina Sequencing Platforms Introduction Mate pair sequencing enables the generation of libraries with insert sizes in the range of several kilobases (Kb).
More informationMathematical Models of Supervised Learning and their Application to Medical Diagnosis
Genomic, Proteomic and Transcriptomic Lab High Performance Computing and Networking Institute National Research Council, Italy Mathematical Models of Supervised Learning and their Application to Medical
More informationHow many of you have checked out the web site on protein-dna interactions?
How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss
More informationDNA Insertions and Deletions in the Human Genome. Philipp W. Messer
DNA Insertions and Deletions in the Human Genome Philipp W. Messer Genetic Variation CGACAATAGCGCTCTTACTACGTGTATCG : : CGACAATGGCGCT---ACTACGTGCATCG 1. Nucleotide mutations 2. Genomic rearrangements 3.
More informationSteffen Lindert, René Staritzbichler, Nils Wötzel, Mert Karakaş, Phoebe L. Stewart, and Jens Meiler
Structure 17 Supplemental Data EM-Fold: De Novo Folding of α-helical Proteins Guided by Intermediate-Resolution Electron Microscopy Density Maps Steffen Lindert, René Staritzbichler, Nils Wötzel, Mert
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationPairwise Sequence Alignment
Pairwise Sequence Alignment carolin.kosiol@vetmeduni.ac.at SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
More informationMAKING AN EVOLUTIONARY TREE
Student manual MAKING AN EVOLUTIONARY TREE THEORY The relationship between different species can be derived from different information sources. The connection between species may turn out by similarities
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationWhen you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want
1 When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want to search other databases as well. There are very
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationGenetomic Promototypes
Genetomic Promototypes Mirkó Palla and Dana Pe er Department of Mechanical Engineering Clarkson University Potsdam, New York and Department of Genetics Harvard Medical School 77 Avenue Louis Pasteur Boston,
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationSimplifying Data Interpretation with Nexus Copy Number
Simplifying Data Interpretation with Nexus Copy Number A WHITE PAPER FROM BIODISCOVERY, INC. Rapid technological advancements, such as high-density acgh and SNP arrays as well as next-generation sequencing
More informationNew Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationcansar: integrated cancer knowledgebase
in partnership with cansar: integrated cancer knowledgebase Bissan Al-Lazikani Cancer Research UK Cancer Therapeutics Unit 10 th Dec 2013 Sharing knowledge for drug discovery Resource to effectively integrate
More informationCSC 2427: Algorithms for Molecular Biology Spring 2006. Lecture 16 March 10
CSC 2427: Algorithms for Molecular Biology Spring 2006 Lecture 16 March 10 Lecturer: Michael Brudno Scribe: Jim Huang 16.1 Overview of proteins Proteins are long chains of amino acids (AA) which are produced
More information9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08
9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft
More informationSupplementary Figures S1 - S11
1 Membrane Sculpting by F-BAR Domains Studied by Molecular Dynamics Simulations Hang Yu 1,2, Klaus Schulten 1,2,3, 1 Beckman Institute, University of Illinois, Urbana, Illinois, USA 2 Center of Biophysics
More information2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number
application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000
More informationBioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
More informationMultiExperiment Viewer Quickstart Guide
MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test
More informationLinear Sequence Analysis. 3-D Structure Analysis
Linear Sequence Analysis What can you learn from a (single) protein sequence? Calculate it s physical properties Molecular weight (MW), isoelectric point (pi), amino acid content, hydropathy (hydrophilic
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationMASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
More informationAlgorithms in Computational Biology (236522) spring 2007 Lecture #1
Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationIntroduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
More informationProtein Prospector and Ways of Calculating Expectation Values
Protein Prospector and Ways of Calculating Expectation Values 1/16 Aenoch J. Lynn; Robert J. Chalkley; Peter R. Baker; Mark R. Segal; and Alma L. Burlingame University of California, San Francisco, San
More informationComputing the maximum similarity bi-clusters of gene expression data
BIOINFORMATICS ORIGINAL PAPER Vol. 23 no. 1 2007, pages 50 56 doi:10.1093/bioinformatics/btl560 Gene expression Computing the maximum similarity bi-clusters of gene expression data Xiaowen Liu and Lusheng
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationHierarchical Bayesian Modeling of the HIV Response to Therapy
Hierarchical Bayesian Modeling of the HIV Response to Therapy Shane T. Jensen Department of Statistics, The Wharton School, University of Pennsylvania March 23, 2010 Joint Work with Alex Braunstein and
More informationAnalysis of ChIP-seq data in Galaxy
Analysis of ChIP-seq data in Galaxy November, 2012 Local copy: https://galaxy.wi.mit.edu/ Joint project between BaRC and IT Main site: http://main.g2.bx.psu.edu/ 1 Font Conventions Bold and blue refers
More informationExercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationFrom Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
More informationProteinPilot Report for ProteinPilot Software
ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like
More informationIntroduction to Exploratory Data Analysis
Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,
More informationRNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
More informationTechnical Note. Roche Applied Science. No. LC 18/2004. Assay Formats for Use in Real-Time PCR
Roche Applied Science Technical Note No. LC 18/2004 Purpose of this Note Assay Formats for Use in Real-Time PCR The LightCycler Instrument uses several detection channels to monitor the amplification of
More informationCurrent Motif Discovery Tools and their Limitations
Current Motif Discovery Tools and their Limitations Philipp Bucher SIB / CIG Workshop 3 October 2006 Trendy Concepts and Hypotheses Transcription regulatory elements act in a context-dependent manner.
More informationFormalin fixation at low temperature better preserves nucleic acid integrity. Gianni Bussolati. University of Turin
Formalin fixation at low temperature better preserves nucleic acid integrity Gianni Bussolati University of Turin Disclosure of interests: G.B. was originally responsible for the invention of the Cold
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More information3D structure visualization and high quality imaging. Chimera
3D structure visualization and high quality imaging. Chimera Vincent Zoete 2008 Contact : vincent.zoete@isb sib.ch 1/27 Table of Contents Presentation of Chimera...3 Exercise 1...4 Loading a structure
More informationInterpreting Data in Normal Distributions
Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 4, Issue 1 2005 Article 17 A General Framework for Weighted Gene Co-Expression Network Analysis Bin Zhang Steve Horvath Departments of
More informationIntegrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationDiscovery & Modeling of Genomic Regulatory Networks with Big Data
Discovery & Modeling of Genomic Regulatory Networks with Big Data Hamid Bolouri Division of Human Biology Fred Hutchinson Cancer Research Center labs.fhcrc.org/bolouri I have no financial relationships
More informationDiscovering Bioinformatics
Discovering Bioinformatics Sami Khuri Natascha Khuri Alexander Picker Aidan Budd Sophie Chabanis-Davidson Julia Willingale-Theune English version ELLS European Learning Laboratory for the Life Sciences
More informationVisual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
More informationInteraktionen von RNAs und Proteinen
Sonja Prohaska Computational EvoDevo Universitaet Leipzig June 9, 2015 Studying RNA-protein interactions Given: target protein known to bind to RNA problem: find binding partners and binding sites experimental
More informationPROC. CAIRO INTERNATIONAL BIOMEDICAL ENGINEERING CONFERENCE 2006 1. E-mail: msm_eng@k-space.org
BIOINFTool: Bioinformatics and sequence data analysis in molecular biology using Matlab Mai S. Mabrouk 1, Marwa Hamdy 2, Marwa Mamdouh 2, Marwa Aboelfotoh 2,Yasser M. Kadah 2 1 Biomedical Engineering Department,
More informationBig Data Visualization for Genomics. Luca Vezzadini Kairos3D
Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to
More informationIGV Hands-on Exercise: UI basics and data integration
IGV Hands-on Exercise: UI basics and data integration Verhaak, R.G. et al. Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationCNV Univariate Analysis Tutorial
CNV Univariate Analysis Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Overview 2 2. CNAM Optimal Segmenting 4 A. Performing CNAM Optimal Segmenting..................................
More information