Automating Biostatistics Workflows for Bench Scientists Using R based



Similar documents
Cloud Computing for Scientific Research

Tutorial for proteome data analysis using the Perseus software platform

Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys

Putting the pieces together: Integrated Research Data Management Using the LabKey Server

MRMPilot Software: Accelerating MRM Assay Development for Targeted Quantitative Proteomics

MultiQuant Software 2.0 for Targeted Protein / Peptide Quantification

Spyglass Portal Manual v

Multiple Optimization Using the JMP Statistical Software Kodak Research Conference May 9, 2005

DeCyder Extended Data Analysis module Version 1.0

ICE Trade Vault. Public User & Technology Guide June 6, 2014

Web Dashboard User Guide

Visualization of Semantic Windows with SciDB Integration

Oracle Database 10g Express

Quick and Easy Web Maps with Google Fusion Tables. SCO Technical Paper

Magento module Documentation

Web Development I & II*

Load testing with. WAPT Cloud. Quick Start Guide

Order Manager Toolkit

TDAQ Analytics Dashboard

Starting User Guide 11/29/2011

PyRy3D: a software tool for modeling of large macromolecular complexes MODELING OF STRUCTURES FOR LARGE MACROMOLECULAR COMPLEXES

SelectSurvey.NET Developers Manual

Table of Contents. Table of Contents 3

Analyzing Dose-Response Data 1

An Introduction to KeyLines and Network Visualization

Quick Start Guide. Installation and Setup

What you can do:...3 Data Entry:...3 Drillhole Sample Data:...5 Cross Sections and Level Plans...8 3D Visualization...11

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

IBM SPSS Direct Marketing 23

ONCONTACT MARKETING AND CAMPAIGN USER GUIDE V8.1

Apparo Fast Edit. Excel data import via 1 / 19

IBM SPSS Direct Marketing 22

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

NaviCell Data Visualization Python API

ONCONTACT MARKETING AND CAMPAIGN USER GUIDE V8.1

WIDA Assessment Management System (WIDA AMS) User Guide, Part 2

LabKey Server: An open source platform for scientific data integration, analysis, and collaboration

Automation Engine AE Server management

Personal Portfolios on Blackboard

Mascot Integra: Data management for Proteomics ASMS 2004

Chapter 4: Website Basics

Flexible Virtuemart 2 Template CleanMart (for VM2.0.x only) TUTORIAL. INSTALLATION CleanMart VM 2 Template (in 3 steps):

Biomarker Discovery and Data Visualization Tool for Ovarian Cancer Screening

InfiniteInsight 6.5 sp4

Client Instructions - ID Tech Configuration Instructions

EventTracker: Configuring DLA Extension for AWStats Report AWStats Reports

Microsoft 311 Service Center accelerator Demo Script

Beckman Coulter DTX 880 Multimode Detector Bergen County Technical Schools Stem Cell Lab

Octet System Data Analysis User Guide

LEGENDplex Data Analysis Software

MicroStrategy Desktop

POINT OF SALES SYSTEM (POSS) USER MANUAL

Guide for Bioinformatics Project Module 3

EBOX Digital Content Management System (CMS) User Guide For Site Owners & Administrators

Scatter Plots with Error Bars

Logging In From your Web browser, enter the GLOBE URL:

1 FTire/editor s Main Menu 1

Google Apps for Sharing Folders and Collecting Assignments

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial

Microsoft Office System Tip Sheet

MarkerView Software for Metabolomic and Biomarker Profiling Analysis

WHAT S NEW IN OBIEE

understand how image maps can enhance a design and make a site more interactive know how to create an image map easily with Dreamweaver

QualysGuard WAS. Getting Started Guide Version 4.1. April 24, 2015

ultimo theme Update Guide Copyright Infortis All rights reserved

MassMatrix Web Server User Manual

WebSphere Business Monitor V6.2 Business space dashboards

Laboratory and Data Management for Scientists. David L. Blum, Ph.D. Research Assistant Professor Department of Biochemistry

AJ Shopping Cart. Administration Manual

WebSphere Business Monitor V7.0 Script adapter lab

JustClust User Manual

VALUE LINE INVESTMENT SURVEY ONLINE USER S GUIDE VALUE LINE INVESTMENT SURVEY ONLINE. User s Guide

DataPA OpenAnalytics End User Training

UGENE Quick Start Guide

EventTracker: Configuring DLA Extension for AWStats report AWStats Reports

Creating and Managing Online Surveys LEVEL 2

Using BlueHornet Statistics Sent Message Reporting Message Summary Section Advanced Reporting Basics Delivery Tab

Adobe Dreamweaver CC 14 Tutorial

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Visualizing a Neo4j Graph Database with KeyLines

Hierarchical Clustering Analysis

ForensicDB: A Web-accessible Spectral Database

Bitrix Site Manager ASP.NET. Installation Guide

WebSphere Business Monitor V7.0 Business space dashboards

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Service Desk R11.2 Upgrade Procedure - How to export data from USD into MS Excel

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Implementing a Web-based Transportation Data Management System

Creating Online Surveys with Qualtrics Survey Tool

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Reference Guide for WebCDM Application 2013 CEICData. All rights reserved.

The Turning of JMP Software into a Semiconductor Analysis Software Product:

Ulyxes automatic deformation monitoring system

Transcription:

Automating Biostatistics Workflows for Bench Scientists Using R based Web tools Jeff Skinner, Vivek Gopalan, Jason Barnett and Yentram Huyen user! 2010 Conference July 21 23, 2010 Gaithersburg, MD

NIAID Mission The National Institute of Allergy and Infectious Diseases (NIAID) is one of the 27 Institutes of the NIH and conducts and supports basic and applied research to better understand, treat, and ultimately prevent infectious, immunologic, and allergic diseases.

OCICB Organization Cyber Security Program (CSP) Ken Grossman, Information System Security Officer (ISSO) Office of the Chief Information Officer (OCIO) Mike Tartakovsky, NIAID CIO & OCICB Director Alex Rosenthal, M.S., M.B.A. NIAID Deputy CIO & OCICB Deputy Director Global Biomedical Research Support Program (GBRSP) Christopher Whalen, International Team Lead Kristi Schmidt, RML Team Lead Customer Services Branch (CSB) Bioinformatics and Computational Biosciences Branch (BCBB) Software Engineering Branch (SEB) Program Management Branch (PMB) Operations and Engineering Branch (OEB) Joel Minto, D.B.S. Acting Chief Tram Huyen, Ph.D. Chief Joe Croghan, Chief Brian Conelley, Chief Kim Kassing, Chief

Commonly Encountered Problems Large complicated data files from biological instruments Microarrays, Next Generation Sequencing, 96 well plate readers, NMR and Mass Spectrometry Arcane file extensions, ugly headers and footers, multiple tables per file Tedious data manipulation in MS Excel or Notepad Simple formulas or cut and paste can add up to hours at the computer Critical analyses performed by legacy software Many relevant software tools are no longer maintained, because they were created with outdated technology or the original developers have moved on to new careers

Why Create a Webtool Using R? Advantages of using R R scripting language is easy to use and will be long lived Includes all necessary tools for data import, data manipulation, statistical analyses, graphing, generation of custom reports, etc. Advantages of building a webtool Provides users an accessible graphical user interface (GUI) Simplifies the distribution and maintenance of software Agencies can link software to infrastructure resources like high performance computing clusters and databases, which may not otherwise be available to many end users

HDX NAME Compute estimates of flexibility (i.e. protection factors) for multipleprotein protein regions from hydrogen deuterium exchange (HDX) data using the Maximum Entropy Method (MEM) Compare protection factors among two different groups Map protection factors on the protein surface

Hydrogen Deuterium Exchange Use changes in ph to force a protein to exchange hydrogen for deuterium Use nuclear magnetic resonance (NMR) spectrometry or mass spectrometry to detect the H/D exchange rates H/D exchange rates among different protein fragments reveal information about tertiary structure, folding, etc. Source: www.dac.neu.edu/barnett/mem/engen.htm

Maximum Entropy topymethods Maximize function Q = S + λc using LaGrange multipliers S represents the Skilling entropy of HDX process k S 2 f k ln f k A k k 1 1 C represents the constraints on HDX imposed by the structure of the protein s tertiary structure 2 D calc exp i D 2 k 2 i i k kk 1 D i calc N f k exp kt 2 i

HDX NAME Workflow Workflow inputs: Protein structures (.pdb file): GP120 or CD4 Hydrogen exchange data (.txt file): fragment IDs and exchange rates Additional data (.txt file): Temperature, ph, time series, replicates numbers, protein state (liganded or unliganded) Processing steps: Compute number of deuterium exchanged per amide from the exchange rates, using differential equations for any liganded protein complexes Normalize deuterium exchanged data for constant temperate and ph Estimate average exchange rates using MEM (Laplace software) Compute protection factors by normalization of average rates with intrinsic rates Compute free energy from protection factors Compare fragments from liganded and unliganded states with Student s T tests Map results to protein surface to explore conformational changes

Development of Webtool Backend (Server) Dt Data import, processing and tests t computed tdin R HDXNAME : package library created for webtool Bio3d : Extract sequences and structural properties from PDB files Odesolve : Solving reaction kinetics for differential equations of liganded proteins Rsolnp : Non linear optimization tools for MEM computations Perl used to visualize protein structures and run R from web Bio::Perl : processfragment features from FASTA or PDB files Bio::Structure : parse 3D coordinates of the protein structure Bio::Graphics : generate 2D result images from the 3D structure Frontend (Client/Browser) jquery : Javascript library for AJAX implementation Jmol : Browser plug in to visualize results on protein structure

HDX NAME Webtool Input Options Structure data (FASTA or PDB) HDX data (.txt from instrument) Configuration file (.txt) stores user analysis and workflow settings Uploaded Files List ofalluploadedfiles files Buttons to run analyses Results Displays jmol structure image Displays protein sequence Links to statistical result tables

Configuration File Configuration file stores constants and parameters for all analyses Users can modify default configuration file to customize analyses y g y and store custom settings for future use

Results tables are accessible using web links in table jmol plug in provides interactive 3D image of protein structure Imagecan berotated by point and click Links allow users to zoom, change colors or animate figure Fragment lengths, sec structure and errors mapped on protein sequence

Dose Response Analysis Pipeline (DRAP) Need to fit logistic dose response curves to data from dozens or hundreds of 96 well plates Plates can be organized in countless ways One factor per plate or multiple factors Dilutions on columns or rows Want to view the curve fits and export summarystatisticsstatistics Want to compare EC50s with statistical tests Want to export EC90s for use in QTL analyses

Logistic Dose Response Curves Captures the S shape of many types of biological assays Drug dose response experiments ELISA experiments Unknown model parameters are estimated using iterative Levenberg Marquart methods Top and Bottom parameters estimate maximum and minimum response LogEC50 parameter estimates the location of the curve on X axis Hillslope parameter estimates rate of increase or decrease per unit X Slopes or EC50 estimates can be used to compare effectiveness of different vaccines, drugs, etc. Image created using GraphPad Prism v. 5.03

DRAP Workflow Data from 96 well plates (.dat files) processed in MS Excel Remove headers andfooters footers, record positive andnegativenegative controls Identify data from multiple groups, noting that some groups may occur within a single plate while other groups occur between plates Logistic curve fits computed in commercial Prism software Data from each plate must be imported into Prism separately Data need to be reorganized in Prism to create appropriate graphs and statistical tests, which may require data from multiple plates Summary statistics from Prism pasted into MS Excel or PowerPoint to summarize, reorganize and present results from multiple l tests t in a single report

Development of Webtool Backend (Server) All data processing and curve fitting performed in R drc : Core library to process dose reponse analysis R2HTML : Generate HTML output Perl CGI used to run R from the web CGI::Application library for handling CGI requests Methods to handle Workflow functionalities Frontend (Client) Google Web Toolkit Interactively ti build plate through h web interface Create all the widgets and controls in the web interface Process JSON data from server and updates the widgets User interface for CRUD operation on plate data.

Select zip file with input data Buttons to start or stop analysis Symbols display status of the workflow steps Log info links provide R info and diagnostics User manual and sample data Browse and edit input data files Rainbow icon for dosage designer Long lists of assay response files are loaded interactively like Google Maps Files can be edited in browser, then saved to computer Info panel shows diagnostics and provides link to final results

Editing Files Users can click on notepad icons to edit and save dosage or response data in an interactive text file environment Dosage data can also be edited in the interactive Dosage Designer

Interactive Results Report Interactive report includes tables to organize results according to groups within plates (e.g. drugs) or between plates (e.g. condition) Click on logistic curve graphs for higher resolution images

Acknowledgements Vivek Gopalan: R programming and web development JasonBarnett: Interface design ondrap tool Tram Huyen and Mike Tartakovsky: Funding and oversight Leo Kong and Peter Kwong: HDX NAME experiments Juliana Sa, Olivia Twu, Hongying Jiang, Thomas Wellems and Xin zhuan Su: DRAP experiments Websites: http://exon.niaid.nih.gov http://exon.niaid.nih.gov/drap/ http://exon.niaid.nih.gov/hdx_name/

Literature e Cited Cted Kong et al. 2010. Hydrogen deuterium exchange mass spectrometry of HIV 1 gp120 in unliganded and CD4 bound states. J. Virology. in press. Sa et al. 2009. Geographical patterns of P. falciparum drug resistance distinguished by differential responses to amodiaquine and chloroquine. PNAS. 106(45): 18883 18889 Yuan et al. 2009. Genetic mapping of targets mediating differential chemical phenotypesin P. falciparum. Nature Chemical Biology. 5:765 771