GWASrap User Manual v1.1 1 / 28
Table of contents Introduction... 3 System Requirements... 3 Welcome... 3 Features... 4 Create New Run... 5 GWAS Representation... 7 GWAS Annotation... 13 GWAS Prioritization... 20 Download Result... 23 Web Services... 24 Quick Annotating... 26 GWAS Gallery... 27 Retrieve Jobs... 28 2 / 28
Introduction Genome-wide association study (GWAS) which came on the scene in March 2005 open a new realm to investigate the association between a huge amount of genetic loci and different traits/diseases. Up to now, more than 1200 published genome-wide associations with P-value < 5E-8 on over 250 traits has been successfully reported in the community. With the advent of next-generation sequencing (NGS), straightforward solutions of exome/whole genome sequencing accelerate the discovery of the genes underlying mendelian diseases, as well as enhance the power to detect the rare variants which may explain the missing heritability of common diseases and specific traits. Exploring those traits/diseases-associated SNPs (TASs) which have relatively high signals from GWAS or whole genome sequencing association study needs further downstream statistical inference and bioinformatics prediction. As indispensable steps, variants visualization, functional annotation and selection of risk associated locus greatly facilitate the discovery of true association between genetic marker and disease/trait. Since GWAS will inevitably produce millions of variants with statistics, efficient variants representation directly helps researchers distinguish significant TAS from noise. An increasing number of requirements, such as clarity, diversity and interactivity, pose an awkward question on data visualization. On the other hand, functional impacts of those variants needs more meticulous analysis based on genome mapping annotation and biological effect prediction, especially for the markers with moderate effect in GWAS and happened in special regions (such as non-coding regulatory region, evolutionary conserved region). Comprehensive variants annotation will doubtless accelerate this process. Importantly, to correctly select the true association from lots of GWAS signals, particularly for the hidden moderate TASs, needs annotation-based prioritizing process. So, representing, annotating and prioritizing such data in a smooth way will be a daunting challenge. GWASrap (http://jjwanglab.org/gwasrap) is a comprehensive web-based bioinformatics tools to systematically support variant representation, annotation and prioritization after GWAS. System Requirements GWASrap is best accessed using the Google Chrome web browser. It has been tested to work with Mozilla Firefox and Safari and Internet Explorer 9. Not all functions are available with Internet Explorer 8, due to a lack of HTML5 support by IE. It doesn t support the old version of IE under 8. SInce GWASrap uses many JavaScript features and libraries and will display batch of dataset in one web page, it has some requirements about the hardware configuration. Recommend configuration: two cores CPU and 2G memories. 3 / 28
Welcome This document aims to introduce the usage and function of GWASrap. In order to access the public site, please visit http://jjwanglab.org/gwasrap. Please check the site for the most up-to-date versions of the user manual. Features 1. Circos-style GWAS result visualization with interactive operation; 2. Dynamic Manhattan panel; 3. Dynamic Linkage disequilibrium panel; 4. Comprehensive variants annotation with genomic mapping attribute and effective prediction; 5. GWAS statistical summary; 6. Variant-based prioritization; 7. Interactive prioritization tree viewer; 8. Multiply association formats support; 9. Quick annotating system; 10. GWAS gallery. 4 / 28
Create New Run To perform a new run for your GWAS association result, please follow: 1. Enter the name of the investigated study. 2. Specify your E-mail Address to retrieve your job, a notification will be sent to your assigned mailbox. 3. Select an input format for GWAS result, GWASrap currently support three different formats including Plink-like format, genomic coordinates and single SNP Id. Before association file is inputted, please notice that our system is based on the latest homo species genome assembly version (hg19/grch37) and dbsnp 132. The input variants coordinates should be consistent with hg19 (if have). While, the SNP identification is no special restriction about version, we will convert SNPs to dbsnp 132 automatically. 4. Choose input text or upload an input file. 5. Select P-value cutoff and population. The P-value cutoff refers to the maximal P-value cutoff, variants with P-value larger than the cutoff will be discarded. Investigated population (HapMap I+II+III) for computing the synthetic association. 5 / 28
6. Circos-style plotting option for annotation plot and HTML map. Annotation plot option indicates whether plotting the surrounded features or glyphs. Image map option indicates how percentage of variants with less significant P-value will be omited for plotting HTML map. 7. Prioritization option for specific gene or region. Specify the extra gene list or region list and pre-defined score for priortization. After preparing the parameters, please make sure all required information is filled. Then click the "submit" button, the job will be submitting to web server. 6 / 28
GWAS Representation 1. Circos-style GWAS visualization. Entering your workspace by clicking the finished job, system will display a Circos-style GWAS graph with some interactive attributes. 1.1 Circos-style plotting for variants visualization with broad horizontal area from either genome or chromosome level. It combines kinds of genomic features (such as SNP/CNV density, disease susceptibility locus) and diversified glyphs to extend researcher s intuition validation of GWAS result. 1.2 Viewing the GWAS result from single chromosome level by clicking the glyph of each chromosome's cytoband. 7 / 28
User also can return back to genome view by clicking the "Back to All Chromosomes" button. 1.3 Check the summary information by hovering to target variant. 8 / 28
1.4 The surrounding features and glyphs. 2. Dynamic Manhattan panel. 2.1 Switch to Manhattan panel by clicking the SNP in Circos-style plotting or the left top hover bar to "GWAS ANNO". 9 / 28
2.2 Viewing the GWAS SNP on panel with zooming and searching. Switch the chromosome on left select box. Moving the viewpoint by clicking the left or right arrow, the panel can contain as many as 500 SNPs. Search the SNP in current workspace by input a dbsnp Id (rs111) or genomic coordinates (1:2343254) in the right input box. Zoom the interested region by pulling the mouse on the panel. Click the variant on the panel to interact with annotation tabs. 2.3 HapMap LD panel. Report and display all of SNP with rsquare > 0.5 for target SNP. Check detailed information of LD by hovering to SNP. 3. GWAS overview and statistical information. 3.1 The distribution for SNP type. 10 / 28
3.2 Region counts. 3.3 Classical Manhattan plot. 11 / 28
3.4 Q-Q plot. 12 / 28
GWAS Annotation GWASrap offers a very comprehensive knowledgebase to report lots of important annotation of variant. 1. Several ways to check the annotation information of variant. 1.1 Click the significant variant in Circos-style plotting. 1.2 Click the top variant in ranking tab. 1.3 Click the interested SNP in Manhattan plot panel. 1.4 Search the SNP of workspace in search box. 1.5 Click the interested SNP in LD panel. 1.6 Click the interested SNP in quick annotating system. 2. SNP summary. 2.1 General information Report the variant basic information for target SNP such as allele frequency, snp class. 2.2 1000 Genome SNP Report the 1000 genomic information for this SNP (if has). 2.3 Reference Report the reference or publication if this SNP is reported as significant effect in current GWAS. 2.4 LD plot Hapmap LD information of this variant for investigated population. 13 / 28
3. Genomic mapping annotation. 3.1 Reference Gene Gene annotation from NCBI Refseq. 3.2 Ensemble Gene Gene annotation from Ensemble. 3.3 Known Gene Gene annotation from UCSC. 14 / 28
3.4 Small RNA snorna and mirna annotations from UCSC. 3.5 MicroRNA Target TargetScan generated mirna target site predictions. 3.6 Transcriptional Factor Binding Site Transcription factor binding sites conserved in the human/mouse/rat alignment, based on transfac Matrix Database 3.7 Enhancer Human Enhancer verified by experiment. 3.8 Insulator CTCF binding site for characterization of human genomic insulators. 3.9 Long Non-coding RNA Human long non-coding RNA from re-annotated microarray studies. 15 / 28
4. Effect prediction annotation. 4.1 Transcriptional Factor Binding Site Affinity Variant affinity of TFBS prediction based on fold energy change with PWM scanning. 4.2 MicroRNA Target Site Affinity Variant affinity of mirna target prediction based on fold and hybrid energy change. 4.3 Non-synonymous SNP functional prediction Non-synonymous GV deterioration prediction. 4.4 Protein Phosphorylation Affinity Variant affinity to change protein phosphorylation status. 16 / 28
4.5 Splicing Site Affinity Variant affinity of splicing site prediction based on junction strength change, amino acids change, and exon skipping, 5 - or 3 -exon extension. 4.6 HapMap eqtl Consensus eqtl mapping for HapMap result. 4.7 Three Way SNP Expression Association Gene co-expression relationships with variant effect. 5. Variant Evolution annotation. 17 / 28
5.1 SNP Positive Selection The estimation of FST and heterozygosity of variant for positive selection. 5.2 Gene Positive Selection The estimation of FST and heterozygosity of gene for positive selection. 5.3 Conserved Functional RNA Conserved functional RNA, through RNA secondary structure predictions made with the EvoFold program. 5.4 Conserved Elements and Regions Conserved elements produced by the PhastCons program based on a whole-genome alignment of vertebrates. 18 / 28
6. Disease association annotation. 6.1 OMIM Online Mendelian Inheritance in Man for this variant. 6.2 DGV Curated catalogue of structural variation in the human genome. 6.3 GAD Archive of human genetic association studies of complex diseases and disorders. 19 / 28
GWAS Prioritization GWASrap adopts an independent variant prioritization method based on additive effect principle by combining the original statistical value and variant prioritization score. 1. View the prioritization result for GWAS. GWASrap prioritizes the significant SNP and provides Top 100 significant result with related information. 2. Selection the variant with high prioritization significance. Variant with a improved rank indicates its higher deleterious attributes. PR and FR refer to the previous rank and final rank respectively. 3. Checking the related attributes from prioritization tree. Variant prioritization information can be checked by clicking the node on the tree. A square node will report the prioritization score and deleteriousness attribute about this variant. 20 / 28
4. Prioritize variant in its LD proxy. Prioritization can also be performed with synthetic associations in LD proxy. This step will take some time based on the number of variant in LD, and then a ranking list will be showed with related information. 21 / 28
22 / 28
Download Result GWASrap provides a download tab for helping to fetch related information. User can download the prioritization result, Circos-style graph and statistical information in this tab. 1. GWASrap outcome for significant variants The GWASrap outcome contains Top 100 significant variants after prioritization. (dbsnp Id/Chr/Pos/Original Pvalue/Plotting Scale/SNP Type/Genomic Mapping Score/Effect Prediction Score/User Defined Score/Average Prioritization Score/Prioritization Score/Final Weighting/-logarithm(Final Weighting)/Original Rank/Final Rank) Circle plot for all chromosomes contains Circos-style graphs. Statistics information contains classic Manhattan plot and QQ plot. 23 / 28
Web Services GWASrap provides a range of web services for data retrieving about the annotation information and effect prediction of each variant in dbsnp using the SOAP interface. The WSDL for each service is available in the API tab. Each service returns JSON string including all related information with key/value. Please refer to http://jjwanglab.org/gwasrap/gwasrank/gwasrank/webservice 24 / 28
25 / 28
Quick Annotating Quick RAP can accept either dbsnp Id or chromosomal location as query, and user will instantly fetch the annotation information combined with an interactive LD panel. At the same time, system will prioritize this variant based on corresponding annotation information and evaluate the variant effect in a prioritization tree. Furthermore, Quick RAP can even fit the sequencing data by accepting genomic coordinates and offer maximal annotation. Please refer to http://jjwanglab.org/gwasrap/gwasrank/gwasrank/quickrap 26 / 28
GWAS Gallery System also provides a local repository to store the significant results for hot cases in GWAS. Most of data are borrowed from our published database GWASdb and reconstructed by adopting current framework. By querying this repository, user can directly investigate and utilize the harvest of latest GWAS community without manually tedious collection. For each specific case, we smoothly combined the similar studies to offer a universal web portal for GWAS representation, annotation and prioritization. Please refer to http://jjwanglab.org/gwasrap/gwasrank/gwasrank/gallery 27 / 28
Retrieve Jobs There are three ways to retrieve your submit job in GWASrap. 1. Received by E-mail. Please fill right E-mail address for the notification in the input page. 2. Check from a fixed link. GWASrap provides a encrypted link for retrieving your job. 3. Check from workspace cookies in client browser. GWASrap provides a cookies mechanism with your used web browser, it will help you manage all of your submit jobs. 28 / 28