GWASrap User Manual v1.1



Similar documents
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison

Genomes and SNPs in Malaria and Sickle Cell Anemia

Tutorial. Reference Genome Tracks. Sample to Insight. November 27, 2015

Simplifying Data Interpretation with Nexus Copy Number

Bioinformatics Resources at a Glance

SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SNPbrowser Software v3.5

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Delivering the power of the world s most successful genomics platform

A Primer of Genome Science THIRD

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

School of Nursing. Presented by Yvette Conley, PhD

GenBank, Entrez, & FASTA

Frequently Asked Questions Next Generation Sequencing

Outline. MicroRNA Bioinformatics. microrna biogenesis. short non-coding RNAs not considered in this lecture. ! Introduction

BlueFuse Multi Analysis Software for Molecular Cytogenetics

New solutions for Big Data Analysis and Visualization

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

The full setup includes the server itself, the server control panel, Firebird Database Server, and three sample applications with source code.

EMC SourceOne Auditing and Reporting Version 7.0

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

GenomeStudio Data Analysis Software

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Comparing Methods for Identifying Transcription Factor Target Genes

Bioinformatics Grid - Enabled Tools For Biologists.

How to connect to the Middle Country Public Library Wireless Network (mcpl-ap) using Windows XP

Tutorial for proteome data analysis using the Perseus software platform

Teaching Bioinformatics to Undergraduates

Module 1. Sequence Formats and Retrieval. Charles Steward

GeneProf and the new GeneProf Web Services

Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Pure1 Manage User Guide

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

LifeScope Genomic Analysis Software 2.5

UOFL SHAREPOINT ADMINISTRATORS GUIDE

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

GenomeStudio Data Analysis Software

Reference Guide for WebCDM Application 2013 CEICData. All rights reserved.

About the Render Gallery

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

Guide for Data Visualization and Analysis using ACSN

Version 5.0 Release Notes

Software Getting Started Guide

USER S MANUAL. ArboWebForest

Ingenuity Pathway Analysis (IPA )

DiskPulse DISK CHANGE MONITOR

User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N Rev.

2. Signer Authentication

Sophos Mobile Control as a Service Startup guide. Product version: 3.5

WINPLAN++ online. Version 1.9.x User Guide (Last updated: December 2014) NetzWerkPlan GmbH

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

CDOT Workflow ProjectWise Web Access Operations

Table of Contents. Welcome Login Password Assistance Self Registration Secure Mail Compose Drafts...

NaviCell Data Visualization Python API

Edwin Analytics Getting Started Guide

What Do You Think? for Instructors

Verified Volunteers. System User Guide 10/2014. For assistance while navigating through the system, please contact Client Services at:

Maple T.A. Beginner's Guide for Instructors

OpenIMS 4.2. Document Management Server. User manual

Enterprise Historian 3BUF D1 Version 3.2/1 Hot Fix 1 for Patch 4 Release Notes

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Searching Nucleotide Databases

Optimal Browser Settings for Internet Explorer Running on Microsoft Windows

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Human-Mouse Synteny in Functional Genomics Experiment

Step by Step Guide to Importing Genetic Data into JMP Genomics

BusinessObjects Enterprise InfoView User's Guide

Business Intelligence Office of Planning Planning and Statistics Portal Overview

When you install Mascot, it includes a copy of the Swiss-Prot protein database. However, it is almost certain that you and your colleagues will want

SOP 3 v2: web-based selection of oligonucleotide primer trios for genotyping of human and mouse polymorphisms

Getting Started. Getting Started with Time Warner Cable Business Class. Voice Manager. A Guide for Administrators and Users

Specify the location of an HTML control stored in the application repository. See Using the XPath search method, page 2.

Live Agent for Support Agents

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Signing Documents with Requests for Attachments

Agilent CytoGenomics Software A Complete Solution for Cytogenetic Research Data Analysis

Introduction to WebGL

Content Filtering Client Policy & Reporting Administrator s Guide

Structure Tools and Visualization

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Partek Flow Installation Guide

QUICK START FOR COURSES: USING BASIC COURSE SITE FEATURES

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

SAM Brief Student User Guide

9 Working With DICOM. Configuring the DICOM Option

Standard Client Configuration Requirements

Microsoft Visual Studio Integration Guide

Administrator s Guide

Quick Start Guide to Logging in to Online Banking

Basics of Marker Assisted Selection

UGENE Quick Start Guide

Features - Microsoft Data Protection Manager

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

Transcription:

GWASrap User Manual v1.1 1 / 28

Table of contents Introduction... 3 System Requirements... 3 Welcome... 3 Features... 4 Create New Run... 5 GWAS Representation... 7 GWAS Annotation... 13 GWAS Prioritization... 20 Download Result... 23 Web Services... 24 Quick Annotating... 26 GWAS Gallery... 27 Retrieve Jobs... 28 2 / 28

Introduction Genome-wide association study (GWAS) which came on the scene in March 2005 open a new realm to investigate the association between a huge amount of genetic loci and different traits/diseases. Up to now, more than 1200 published genome-wide associations with P-value < 5E-8 on over 250 traits has been successfully reported in the community. With the advent of next-generation sequencing (NGS), straightforward solutions of exome/whole genome sequencing accelerate the discovery of the genes underlying mendelian diseases, as well as enhance the power to detect the rare variants which may explain the missing heritability of common diseases and specific traits. Exploring those traits/diseases-associated SNPs (TASs) which have relatively high signals from GWAS or whole genome sequencing association study needs further downstream statistical inference and bioinformatics prediction. As indispensable steps, variants visualization, functional annotation and selection of risk associated locus greatly facilitate the discovery of true association between genetic marker and disease/trait. Since GWAS will inevitably produce millions of variants with statistics, efficient variants representation directly helps researchers distinguish significant TAS from noise. An increasing number of requirements, such as clarity, diversity and interactivity, pose an awkward question on data visualization. On the other hand, functional impacts of those variants needs more meticulous analysis based on genome mapping annotation and biological effect prediction, especially for the markers with moderate effect in GWAS and happened in special regions (such as non-coding regulatory region, evolutionary conserved region). Comprehensive variants annotation will doubtless accelerate this process. Importantly, to correctly select the true association from lots of GWAS signals, particularly for the hidden moderate TASs, needs annotation-based prioritizing process. So, representing, annotating and prioritizing such data in a smooth way will be a daunting challenge. GWASrap (http://jjwanglab.org/gwasrap) is a comprehensive web-based bioinformatics tools to systematically support variant representation, annotation and prioritization after GWAS. System Requirements GWASrap is best accessed using the Google Chrome web browser. It has been tested to work with Mozilla Firefox and Safari and Internet Explorer 9. Not all functions are available with Internet Explorer 8, due to a lack of HTML5 support by IE. It doesn t support the old version of IE under 8. SInce GWASrap uses many JavaScript features and libraries and will display batch of dataset in one web page, it has some requirements about the hardware configuration. Recommend configuration: two cores CPU and 2G memories. 3 / 28

Welcome This document aims to introduce the usage and function of GWASrap. In order to access the public site, please visit http://jjwanglab.org/gwasrap. Please check the site for the most up-to-date versions of the user manual. Features 1. Circos-style GWAS result visualization with interactive operation; 2. Dynamic Manhattan panel; 3. Dynamic Linkage disequilibrium panel; 4. Comprehensive variants annotation with genomic mapping attribute and effective prediction; 5. GWAS statistical summary; 6. Variant-based prioritization; 7. Interactive prioritization tree viewer; 8. Multiply association formats support; 9. Quick annotating system; 10. GWAS gallery. 4 / 28

Create New Run To perform a new run for your GWAS association result, please follow: 1. Enter the name of the investigated study. 2. Specify your E-mail Address to retrieve your job, a notification will be sent to your assigned mailbox. 3. Select an input format for GWAS result, GWASrap currently support three different formats including Plink-like format, genomic coordinates and single SNP Id. Before association file is inputted, please notice that our system is based on the latest homo species genome assembly version (hg19/grch37) and dbsnp 132. The input variants coordinates should be consistent with hg19 (if have). While, the SNP identification is no special restriction about version, we will convert SNPs to dbsnp 132 automatically. 4. Choose input text or upload an input file. 5. Select P-value cutoff and population. The P-value cutoff refers to the maximal P-value cutoff, variants with P-value larger than the cutoff will be discarded. Investigated population (HapMap I+II+III) for computing the synthetic association. 5 / 28

6. Circos-style plotting option for annotation plot and HTML map. Annotation plot option indicates whether plotting the surrounded features or glyphs. Image map option indicates how percentage of variants with less significant P-value will be omited for plotting HTML map. 7. Prioritization option for specific gene or region. Specify the extra gene list or region list and pre-defined score for priortization. After preparing the parameters, please make sure all required information is filled. Then click the "submit" button, the job will be submitting to web server. 6 / 28

GWAS Representation 1. Circos-style GWAS visualization. Entering your workspace by clicking the finished job, system will display a Circos-style GWAS graph with some interactive attributes. 1.1 Circos-style plotting for variants visualization with broad horizontal area from either genome or chromosome level. It combines kinds of genomic features (such as SNP/CNV density, disease susceptibility locus) and diversified glyphs to extend researcher s intuition validation of GWAS result. 1.2 Viewing the GWAS result from single chromosome level by clicking the glyph of each chromosome's cytoband. 7 / 28

User also can return back to genome view by clicking the "Back to All Chromosomes" button. 1.3 Check the summary information by hovering to target variant. 8 / 28

1.4 The surrounding features and glyphs. 2. Dynamic Manhattan panel. 2.1 Switch to Manhattan panel by clicking the SNP in Circos-style plotting or the left top hover bar to "GWAS ANNO". 9 / 28

2.2 Viewing the GWAS SNP on panel with zooming and searching. Switch the chromosome on left select box. Moving the viewpoint by clicking the left or right arrow, the panel can contain as many as 500 SNPs. Search the SNP in current workspace by input a dbsnp Id (rs111) or genomic coordinates (1:2343254) in the right input box. Zoom the interested region by pulling the mouse on the panel. Click the variant on the panel to interact with annotation tabs. 2.3 HapMap LD panel. Report and display all of SNP with rsquare > 0.5 for target SNP. Check detailed information of LD by hovering to SNP. 3. GWAS overview and statistical information. 3.1 The distribution for SNP type. 10 / 28

3.2 Region counts. 3.3 Classical Manhattan plot. 11 / 28

3.4 Q-Q plot. 12 / 28

GWAS Annotation GWASrap offers a very comprehensive knowledgebase to report lots of important annotation of variant. 1. Several ways to check the annotation information of variant. 1.1 Click the significant variant in Circos-style plotting. 1.2 Click the top variant in ranking tab. 1.3 Click the interested SNP in Manhattan plot panel. 1.4 Search the SNP of workspace in search box. 1.5 Click the interested SNP in LD panel. 1.6 Click the interested SNP in quick annotating system. 2. SNP summary. 2.1 General information Report the variant basic information for target SNP such as allele frequency, snp class. 2.2 1000 Genome SNP Report the 1000 genomic information for this SNP (if has). 2.3 Reference Report the reference or publication if this SNP is reported as significant effect in current GWAS. 2.4 LD plot Hapmap LD information of this variant for investigated population. 13 / 28

3. Genomic mapping annotation. 3.1 Reference Gene Gene annotation from NCBI Refseq. 3.2 Ensemble Gene Gene annotation from Ensemble. 3.3 Known Gene Gene annotation from UCSC. 14 / 28

3.4 Small RNA snorna and mirna annotations from UCSC. 3.5 MicroRNA Target TargetScan generated mirna target site predictions. 3.6 Transcriptional Factor Binding Site Transcription factor binding sites conserved in the human/mouse/rat alignment, based on transfac Matrix Database 3.7 Enhancer Human Enhancer verified by experiment. 3.8 Insulator CTCF binding site for characterization of human genomic insulators. 3.9 Long Non-coding RNA Human long non-coding RNA from re-annotated microarray studies. 15 / 28

4. Effect prediction annotation. 4.1 Transcriptional Factor Binding Site Affinity Variant affinity of TFBS prediction based on fold energy change with PWM scanning. 4.2 MicroRNA Target Site Affinity Variant affinity of mirna target prediction based on fold and hybrid energy change. 4.3 Non-synonymous SNP functional prediction Non-synonymous GV deterioration prediction. 4.4 Protein Phosphorylation Affinity Variant affinity to change protein phosphorylation status. 16 / 28

4.5 Splicing Site Affinity Variant affinity of splicing site prediction based on junction strength change, amino acids change, and exon skipping, 5 - or 3 -exon extension. 4.6 HapMap eqtl Consensus eqtl mapping for HapMap result. 4.7 Three Way SNP Expression Association Gene co-expression relationships with variant effect. 5. Variant Evolution annotation. 17 / 28

5.1 SNP Positive Selection The estimation of FST and heterozygosity of variant for positive selection. 5.2 Gene Positive Selection The estimation of FST and heterozygosity of gene for positive selection. 5.3 Conserved Functional RNA Conserved functional RNA, through RNA secondary structure predictions made with the EvoFold program. 5.4 Conserved Elements and Regions Conserved elements produced by the PhastCons program based on a whole-genome alignment of vertebrates. 18 / 28

6. Disease association annotation. 6.1 OMIM Online Mendelian Inheritance in Man for this variant. 6.2 DGV Curated catalogue of structural variation in the human genome. 6.3 GAD Archive of human genetic association studies of complex diseases and disorders. 19 / 28

GWAS Prioritization GWASrap adopts an independent variant prioritization method based on additive effect principle by combining the original statistical value and variant prioritization score. 1. View the prioritization result for GWAS. GWASrap prioritizes the significant SNP and provides Top 100 significant result with related information. 2. Selection the variant with high prioritization significance. Variant with a improved rank indicates its higher deleterious attributes. PR and FR refer to the previous rank and final rank respectively. 3. Checking the related attributes from prioritization tree. Variant prioritization information can be checked by clicking the node on the tree. A square node will report the prioritization score and deleteriousness attribute about this variant. 20 / 28

4. Prioritize variant in its LD proxy. Prioritization can also be performed with synthetic associations in LD proxy. This step will take some time based on the number of variant in LD, and then a ranking list will be showed with related information. 21 / 28

22 / 28

Download Result GWASrap provides a download tab for helping to fetch related information. User can download the prioritization result, Circos-style graph and statistical information in this tab. 1. GWASrap outcome for significant variants The GWASrap outcome contains Top 100 significant variants after prioritization. (dbsnp Id/Chr/Pos/Original Pvalue/Plotting Scale/SNP Type/Genomic Mapping Score/Effect Prediction Score/User Defined Score/Average Prioritization Score/Prioritization Score/Final Weighting/-logarithm(Final Weighting)/Original Rank/Final Rank) Circle plot for all chromosomes contains Circos-style graphs. Statistics information contains classic Manhattan plot and QQ plot. 23 / 28

Web Services GWASrap provides a range of web services for data retrieving about the annotation information and effect prediction of each variant in dbsnp using the SOAP interface. The WSDL for each service is available in the API tab. Each service returns JSON string including all related information with key/value. Please refer to http://jjwanglab.org/gwasrap/gwasrank/gwasrank/webservice 24 / 28

25 / 28

Quick Annotating Quick RAP can accept either dbsnp Id or chromosomal location as query, and user will instantly fetch the annotation information combined with an interactive LD panel. At the same time, system will prioritize this variant based on corresponding annotation information and evaluate the variant effect in a prioritization tree. Furthermore, Quick RAP can even fit the sequencing data by accepting genomic coordinates and offer maximal annotation. Please refer to http://jjwanglab.org/gwasrap/gwasrank/gwasrank/quickrap 26 / 28

GWAS Gallery System also provides a local repository to store the significant results for hot cases in GWAS. Most of data are borrowed from our published database GWASdb and reconstructed by adopting current framework. By querying this repository, user can directly investigate and utilize the harvest of latest GWAS community without manually tedious collection. For each specific case, we smoothly combined the similar studies to offer a universal web portal for GWAS representation, annotation and prioritization. Please refer to http://jjwanglab.org/gwasrap/gwasrank/gwasrank/gallery 27 / 28

Retrieve Jobs There are three ways to retrieve your submit job in GWASrap. 1. Received by E-mail. Please fill right E-mail address for the notification in the input page. 2. Check from a fixed link. GWASrap provides a encrypted link for retrieving your job. 3. Check from workspace cookies in client browser. GWASrap provides a cookies mechanism with your used web browser, it will help you manage all of your submit jobs. 28 / 28