Journal of Statistical Software



Similar documents
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

SNPbrowser Software v3.5

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Graphics in R. Biostatistics 615/815

Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller

Each function call carries out a single task associated with drawing the graph.

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Plotting: Customizing the Graph

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

Cluster Analysis using R

Tutorial on gplink. PLINK tutorial, December 2006; Shaun Purcell,

Package snp.plotter. February 20, 2015

Beginner s Matlab Tutorial

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual

Visualization of Phylogenetic Trees and Metadata

Microsoft Access 2010 Overview of Basics

KaleidaGraph Quick Start Guide

Hierarchical Clustering Analysis

Package TRADER. February 10, 2016

Scatter Plots with Error Bars

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

Tutorial 2: Using Excel in Data Analysis

Create a Poster Using Publisher

Figure 1. An embedded chart on a worksheet.

Fireworks CS4 Tutorial Part 1: Intro

PaRFR : Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping. Version 1.0, Oct 2012

So you say you want something printed...

Petrel TIPS&TRICKS from SCM

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

Polynomial Neural Network Discovery Client User Guide

Spreadsheet - Introduction

Gestation Period as a function of Lifespan

Excel 2007: Basics Learning Guide

A Primer of Genome Science THIRD

Create Charts in Excel

Elfring Fonts, Inc. PCL MICR Fonts

Microsoft Excel 2010 Part 3: Advanced Excel

ACE: Illustrator CC Exam Guide

Designing forms for auto field detection in Adobe Acrobat

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Petrel TIPS&TRICKS from SCM

Package tagcloud. R topics documented: July 3, 2015

Tutorials. If you have any questions, comments, or suggestions about these lessons, don't hesitate to contact us at

Package MBA. February 19, Index 7. Canopy LIDAR data

Look Out The Window Functions

Optimization of sampling strata with the SamplingStrata package

Genomes and SNPs in Malaria and Sickle Cell Anemia

Periodontology. Digital Art Guidelines JOURNAL OF. Monochrome Combination Halftones (grayscale or color images with text and/or line art)

Basic Excel Handbook

PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL

Microsoft Excel 2010 Pivot Tables

Scientific Graphing in Excel 2010

Formatting Text in Microsoft Word

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

Microsoft Access Introduction

What is a piper plot?

Logistic Regression (1/24/13)

Single Nucleotide Polymorphisms (SNPs)

SNP Data Integration and Analysis for Drug- Response Biomarker Discovery

14.3 Studying the Human Genome

SNP Essentials The same SNP story

Visualization with Excel Tools and Microsoft Azure

Formulas, Functions and Charts

The key to successful web design is planning. Creating a wireframe can be part of this process.

DataPA OpenAnalytics End User Training

Drawing a histogram using Excel

Beas Inventory location management. Version

ECDL / ICDL Presentation Syllabus Version 5.0

Introduction to Microsoft Excel 2007/2010

Package survpresmooth

KB COPY CENTRE. RM 2300 JCMB The King s Buildings West Mains Road Edinburgh EH9 3JZ. Telephone:

How to build text and objects in the Titler

Visualization: Combo Chart - Google Chart Tools - Google Code

Adobe Illustrator CS5 Part 1: Introduction to Illustrator

Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin

Intro to Excel spreadsheets

Tasks in the Lesson. Mathematical Goals Common Core State Standards. Emphasized Standards for Mathematical Practice. Prior Knowledge Needed

Three daily lessons. Year 5

Step by Step Guide to Importing Genetic Data into JMP Genomics

Introduction to R and Exploratory data analysis

Building a Personal Website (Adapted from the Building a Town Website Student Guide 2003 Macromedia, Inc.)

ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual

This activity will show you how to draw graphs of algebraic functions in Excel.

Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients

New Perspectives on Creating Web Pages with HTML. Considerations for Text and Graphical Tables. A Graphical Table. Using Fixed-Width Fonts

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING

Dealing with Data in Excel 2010

GUIDELINES FOR PREPARING POSTERS USING POWERPOINT PRESENTATION SOFTWARE

Classroom Tips and Techniques: The Student Precalculus Package - Commands and Tutors. Content of the Precalculus Subpackage

How To Run Statistical Tests in Excel

Manual for simulation of EB processing. Software ModeRTL

MicroStrategy Analytics Express User Guide

Adobe Dreamweaver CC 14 Tutorial

Transcription:

JSS Journal of Statistical Software October 2006, Volume 16, Code Snippet 3. http://www.jstatsoft.org/ LDheatmap: An R Function for Graphical Display of Pairwise Linkage Disequilibria between Single Nucleotide Polymorphisms Ji-Hyung Shin Simon Fraser University Sigal Blay Simon Fraser University Jinko Graham Simon Fraser University Brad McNeney Simon Fraser University Abstract We describe the R function LDheatmap() which produces a graphical display, as a heat map, of pairwise linkage disequilibrium measurements between single nucleotide polymorphisms within a genomic region. LDheatmap() uses the grid graphics system, an alternative to the traditional R graphics system. The features of the LDheatmap() function and the use of tools from the grid package to modify heat maps are illustrated by examples. Keywords: single nucleotide polymorphisms, linkage disequilibrium, grid graphics. 1. Introduction Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in the human genome. Due to their abundance and ease of genotyping, SNPs have become popular markers for genetic association studies. Although identifying as many SNPs as possible within a candidate gene is important in finding disease susceptibility alleles, typing them all can be expensive, and testing them for association with traits can lead to multiple-comparison issues. Moreover, due to genetic linkage, nearby SNPs within candidate genes are often highly correlated. Hence, it has become common practice to instead genotype only a subset of SNPs within a candidate gene. Understanding the patterns of association or linkage disequilibrium (LD) between SNPs can aid in selecting SNP subsets. However, for a dense map of SNPs, it can be difficult to interpret results from tabular summaries of pairwise LD measurements since the number of measurements increases rapidly with the number of SNPs within a genomic region. As a

2 LDheatmap: Pairwise Linkage Disequilibria Heat Maps in R tool for interpretation of LD patterns, we developed an R (R Development Core Team 2006) function LDheatmap() which provides a graphical summary of pairwise LD measurements as a heat map. 2. LDheatmap(): Overview The function LDheatmap() takes an input data set that provides information on pairwise LD between SNPs in a genomic region, plots color-coded values of the pairwise LD measurements, and returns an object of class "LDheatmap" containing a number of components. The input data set can be a data frame containing SNP genotypes, a matrix of pairwise LD measurements, or an LDheatmap object returned by the function LDheatmap(). SNP genotypes must be genotype objects created by the genotype() function from the genetics package (Warnes and Leisch 2005). When genotypes are provided, LD measurements are computed using the function LD() from the genetics package. The user can specify either the squared allelic correlation r 2 (Pritchard and Przeworski 2001) or Lewontin s D (Lewontin 1964) as the measure of LD to be plotted. Users who have pre-computed pairwise r 2 or D measures, or who wish to plot some other measure of pairwise LD can provide a matrix of LD measurements instead of genotypes. In fact, any square matrix with values between 0 and 1 inclusive above the diagonal can be displayed by LDheatmap(). When LDheatmap() is passed an object of class "LDheatmap", the function uses the object s LDmatrix component. LDmatrix is the matrix of LD measurements produced by or passed to the previous function call used to create the LDheatmap object. An optional diagonal line, drawn from the bottom left to the top right of the display, can be added to indicate physical or genetic map positions of the SNPs, along with text reporting the total length of the genetic region in either kilobases (kb; for physical distance) or centi-morgans (cm; for genetic distance). In the display of LD, SNPs appear along this diagonal line in the order specified by the user, as the horizontal and vertical coordinates increase. The ordering is achieved by adopting the conventions of the image() function, in which horizontal coordinates of the display correspond to rows of the matrix and vertical coordinates correspond to columns, and vertical coordinates are indexed in increasing order from bottom to top (rather than top to bottom as in a matrix). LDheatmap() depends on the grid package; the graphics functions of LDheatmap() are written in the grid graphics system (Murrell 2005) which provides more flexibility when manipulating a plot than the traditional graphics system. Consequently, traditional graphics functions such as par() do not have any effect on LDheatmap(). The next section provides examples of the use of LDheatmap() and shows how heat maps can be modified using functions from the grid package. 3. Illustration The hapmapceu data set from the LDheatmap package will be used in the next examples. > library(ldheatmap) > data("ceudata") This will load the genetic data CEUSNP and associated distance vector CEUDist included in the package. The CEUSNP data-set is a data frame of genotypes for 15 SNPs on chromosome

Journal of Statistical Software Code Snippets 3 7, obtained from 60 Utah residents with northern and western European ancestry. These data are from release 7 of the International HapMap project (The International HapMap Consortium 2005); see the hapmapceu help file for a more complete description. The CEUDist vector contains the physical map locations (base-pair positions) of the 15 SNPs. The following example shows a typical call to the LDheatmap() function. generated by this call is shown in Figure 1. > MyHeatmap <- LDheatmap(CEUSNP, CEUDist, LDmeasure = "r", + title = "Pairwise LD in r^2", add.map = TRUE, + SNP.name = c("rs2283092", "rs6979287"), color = grey.colors(20), + name = "myldgrob", add.key = TRUE) The heat map Pairwise LD in r^2 Physical Length:8.9kb rs6979287 rs2283092 Figure 1: Heat map of pairwise LD measurements for the 15 SNPs in CEUSNP produced by LDheatmap(). Each colored rectangle represents the squared correlation r 2 between a pair of SNPs (specified by LD.measure="r"). The vector CEUDist of physical map locations of the 15 SNPs provides the information on their relative positions, which are indicated on the diagonal line by line segments, and the total length of the genetic region indicated by the text "Physical Length:8.9kb" (all added by add.map=true). Two of the SNPs are labeled by SNP.name = c("rs2283092", "rs6979287"). It is also possible to label selected SNPs without showing the other genetic information by specifying add.map=false. The default grey-scale colorscheme is specified by color=grey.colors(20) and is indicated by the Color Key on the bottom right corner of the plot (add.key=true). When the function is called, a grid graphical object (grob) named LDheatmapGrob, representing the heat map, is created and the graphical output is produced from this grob. The grob is also one of the components of the LDheatmap object returned by the LDheatmap() function. In this example, the returned LDheatmap object is stored as MyHeatmap, and its LDheatmapGrob component has name "myldgrob". LDheatmapGrob is a gtree object (Murrell 2006a) and has a hierarchical structure as shown in Figure 2. The children of LDheatmapGrob represent the heat map ("heatmap"), optional

4 LDheatmap: Pairwise Linkage Disequilibria Heat Maps in R line parallel to the diagonal of the image indicating the physical or genetic map positions of the SNPs ("genemap"), and color-scale ("Key"). The children of "heatmap" represent the region of colored rectangles ("heatmap") and main title ("title") of the heat map. When add.map=true, "genemap" is created with children representing the diagonal line ("diagonal"), line segments ("segments") and text reporting the total length of the candidate region ("title"). When the parameter SNP.name is specified to label one or more SNPs, as in our example, two additional children are created, representing the labels ("SNPnames") and the symbols plotted at the tips of the corresponding line segments ("symbols"). When add.map=false and SNP.name is specified, only "SNPnames" is created. When add.key=true, "Key" is created with children which represent the colored rectangles ("colorkey"), title ("title"), numeric labels ("labels"), ticks ("ticks") and box frame ("box") of the color legend. These grobs can be used to modify a heat map produced by the LDheatmapGrob (e.g., "myldgrob") add.map = TRUE SNP.name!= NULL add.map = FALSE SNP.name!= NULL add.key = TRUE "heatmap" "genemap" "Key" "heatmap" "title" "diagonal" "segments" "title" "SNPnames" "symbols" "SNPnames" "colorkey" "title" "labels" "ticks" "box" Figure 2: Hierarchical structure of LDheatmapGrob, the grob created by LDheatmap() to produce an LD heat map. LDheatmap(). In the next section, we will show how to do this by examples. 4. Modifying a heat map Modifying an LD heat map produced by LDheatmap() can be done interactively or statically using the functions grid.edit() or editgrob(), respectively (Murrell 2006a). Interactive editing requires that a grob be drawn on the current device, such as the grob named "myldgrob" on the current display in our example. Static editing requires that a grob be saved in the user s workspace, such as the LDheatmapGrob component of the MyHeatmap object saved

Journal of Statistical Software Code Snippets 5 in our example. Suppose we wish to modify the font sizes and colors of the main title, text indicating the genetic region length and title of the color key. We can do so interactively with > grid.edit(gpath("myldgrob", "heatmap", "title"), + gp = gpar(cex = 1.25, col = "blue")) > grid.edit(gpath("myldgrob", "genemap", "title"), + gp = gpar(cex = 0.8, col = "orange")) > grid.edit(gpath("myldgrob", "Key", "title"), gp = gpar(cex = 1.25, + col = "red")) or we can do so statically with > LD.grob1 <- editgrob(myheatmap$ldheatmapgrob, + gpath("heatmap", "title"), gp = gpar(cex = 1.25, + col = "blue")) > LD.grob2 <- editgrob(ld.grob1, gpath("genemap", + "title"), gp = gpar(cex = 0.8, col = "orange")) > LD.grob3 <- editgrob(ld.grob2, gpath("key", "title"), + gp = gpar(cex = 1.25, col = "red")) The final grob, LD.grob3, can be drawn with > grid.newpage() > grid.draw(ld.grob3) For more information on the functions grid.edit(), editgrob(), grid.newpage() and grid.draw() from the grid package, see their respective help files or Murrell (2006a). Figure 3 shows the resulting modified heat map. Pairwise LD in r^2 Physical Length:8.9kb rs6979287 rs2283092 Figure 3: Heat map with modified colors and font sizes for "Pairwise LD in r^2" (main title), "Physical Length:8.9kb" (genetic length text) and "Color Key" (color-scale title).

6 LDheatmap: Pairwise Linkage Disequilibria Heat Maps in R 5. Multiple heat maps on a single device Displaying multiple heat maps on one graphical device may be useful for making comparisons. However, as mentioned earlier, changing par() settings does not affect grid graphics; hence modifying settings of mfrow() or mfcol() for arranging multiple plots in the traditional graphics system are not compatible with LDheatmap(). In the grid graphics system, multiple regions on a device can be defined and plotted by controlling grid viewports (Murrell 2006b). One possible way to display and arrange multiple heat maps produced by LDheatmap() is with a layout (Murrell 1999). Alternately, users may manually position heat maps. Either approach involves controlling grid viewports and navigating the viewport tree managed by grid (Murrell 2006b). In the following example two heat maps of different color scales (grey and white-to-red) are displayed side-by-side by manual positioning. > VP1 <- viewport(x = 0, y = 0, width = 0.5, height = 1, + just = c("left", "bottom"), name = "vp1") > pushviewport(vp1) > LD1 <- LDheatmap(MyHeatmap, color = grey.colors(20), + title = "Pairwise LD in grey.colors(20)", + SNP.name = "rs6979572", name = "ld1", newpage = FALSE) > upviewport() > VP2 <- viewport(x = 0.5, y = 0, width = 0.5, height = 1, + just = c("left", "bottom"), name = "vp2") > pushviewport(vp2) > LD2 <- LDheatmap(MyHeatmap, color = heat.colors(20), + title = "Pairwise LD in heat.colors(20)", + SNP.name = "rs6979572", name = "ld2", newpage = FALSE) > upviewport() The argument newpage=false tells LDheatmap() not to erase previously defined viewports: in the first call to LDheatmap() do not erase viewport VP1, and in the second function call do not erase VP1 or VP2. Our next example shows how to use the grid.edit() function to modify the LDheatmapGrobs "ld1" and "ld2", created in the previous example, so that white lines separate each pixel in the heat map displayed on the left and so that the color of the "genemap" title along the diagonal is changed from black (the default) to blue in the heat map displayed on the right. > grid.edit(gpath("ld1", "heatmap", "heatmap"), + gp = gpar(col = "white", lwd = 2)) > grid.edit(gpath("ld2", "genemap", "title"), gp = gpar(col = "blue")) Note that the gpaths in the two calls to grid.edit() name the top-level grobs "ld1" and "ld2", respectively, to specify which heat map is going be modified. Figure 4 shows the two modified heat maps displayed together. > data("chbjptdata") > pop <- factor(c(rep("chinese", 45), rep("japanese", + 45)))

Journal of Statistical Software Code Snippets 7 Our final example shows how to produce a lattice-like plot with LDheatmaps in the panels, which is useful for viewing patterns of LD in different populations. The command > data("chbjptdata") loads the genotypes CHBJPTSNP and associated distance vector CHBJPTDist. The data frame CHBJPTSNP contains genotypes for 13 SNPs on chromosome 7, from 45 Chinese and 45 Japanese individuals. The Chinese individuals were unrelated residents of the community at Beijing Normal University with at least 3 Han Chinese grandparents. The Japanese individuals were unrelated residents of the Tokyo metropolitan area with all grandparents from Japan. The data are from release 21 of the International HapMap project (The International HapMap Consortium 2005). We first create a factor variable describing the population: > pop <- factor(c(rep("chinese", 45), rep("japanese", + 45))) The population variable may then be used to stratify the heat maps as follows. > library(lattice) > xyplot(1:nrow(chbjptsnp) ~ 1:nrow(CHBJPTSNP) + pop, type = "n", scales = list(draw = FALSE), + xlab = "", ylab = "", panel = function(x, + y, subscripts,...) { + LDheatmap(CHBJPTSNP[subscripts, ], CHBJPTDist, + newpage = FALSE) + }) The resulting heat maps are shown in Figure 5. 6. Further Notes The package LDheatmap contains two other functions written in the grid graphics system. LDheatmap.highlight() and LDheatmap.marks() can be used to highlight or mark with a symbol, respectively, pairwise LD measures on an LD heat map. For more details, see the documentation for the LDheatmap package. 7. Acknowledgments We would like to thank Nicholas Lewin-Koh for his suggestion to modify our original implementation of LDheatmap() to use the grid graphics system, and for his work to develop the LDheatmap.highlight() and LDheatmap.marks() functions in the LDheatmap package. We also would like to thank the anonymous reviewers for their helpful comments regarding the package and manuscript. This research was supported by Natural Sciences and Engineering Research Council of Canada grants 227972-00 and 213124-03, by Juvenile Diabetes Research Foundation International grant 1-2001-873, by Canadian Institutes of Health Research grants NPG-64869 and ATF-66667, and in part by the Mathematics of Information Technology and

8 LDheatmap: Pairwise Linkage Disequilibria Heat Maps in R Complex Systems, Canadian National Centres of Excellence. Michael Smith Foundation for Health Research. JG is a Scholar of the BC References Lewontin R (1964). The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. Genetics, 49, 49 67. Murrell P (1999). Layouts: A Mechanism for Arranging Plots on a Page. Journal of Computational and Graphical Statistics, 8, 121 134. Murrell P (2005). R Graphics. Chapman & Hall/CRC, Boca Raton, Florida. Murrell P (2006a). Modifying grid grobs. R grid package vignette. Murrell P (2006b). Working with grid Viewports. R grid package vignette. Pritchard J, Przeworski M (2001). Linkage Disequilibrium in Humans: Models and Data. The American Journal of Human Genetics, 69, 1 14. R Development Core Team (2006). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http: //www.r-project.org/. The International HapMap Consortium (2005). A Haplotype Map of the Human Genome. Nature, 437, 1299 1320. Warnes G, Leisch F (2005). genetics: Population Genetics. R package version 1.2.0. Affiliation: Ji-Hyung Shin Department of Statistics & Actuarial Science Simon Fraser University 8888 University Drive, Burnaby, British Columbia, V5A 1S6 E-mail: shin@sfu.ca Journal of Statistical Software published by the American Statistical Association http://www.jstatsoft.org/ http://www.amstat.org/ Volume 16, Code Snippet 3 Submitted: 2006-05-15 October 2006 Accepted: 2006-09-20

Journal of Statistical Software Code Snippets 9 Pairwise LD in grey.colors(20) Pairwise LD in heat.colors(20) Physical Length:8.9kb Physical Length:8.9kb rs6979572 rs6979572 Figure 4: Modified heat maps displayed together. The heat map on the left uses a grey (grey.colors(20)) color-scale. The heat map on the right uses a white-to-red (heat.colors(20)) color-scale. White grid-lines were added to the heat map on the left and the color of the text "Physical Length:8.9kb" was changed from black to blue in the heat map on the right.

10 LDheatmap: Pairwise Linkage Disequilibria Heat Maps in R chinese japanese Pairwise LD Pairwise LD Physical Length:7.7kb Physical Length:7.7kb Figure 5: Lattice-like plot with LD heat maps in the panels.