standarich_v1.00: an R package to estimate population allelic richness using standardized sample size



Similar documents
BAPS: Bayesian Analysis of Population Structure

LRmix tutorial, version 4.1

Genomatic: an R package for DNA fragment analysis project management

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual

Service Desk R11.2 Upgrade Procedure - How to export data from USD into MS Excel

Petrel TIPS&TRICKS from SCM

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

OVERVIEW OF R SOFTWARE AND PRACTICAL EXERCISE

STC: Descriptive Statistics in Excel Running Descriptive and Correlational Analysis in Excel 2013

Getting Started with R and RStudio 1

Scatter Plots with Error Bars

Math Tools Cell Phone Plans

CREATING EXCEL PIVOT TABLES AND PIVOT CHARTS FOR LIBRARY QUESTIONNAIRE RESULTS

BayeScan v2.1 User Manual

Plotting Data with Microsoft Excel

STATGRAPHICS Online. Statistical Analysis and Data Visualization System. Revised 6/21/2012. Copyright 2012 by StatPoint Technologies, Inc.

Microsoft. File Management. Windows Desktop. Microsoft File Management NCSEA

Describing Populations Statistically: The Mean, Variance, and Standard Deviation

Data exploration with Microsoft Excel: analysing more than one variable

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Variables. Exploratory Data Analysis

Generating Open For Business Reports with the BIRT RCP Designer

Custom Reporting System User Guide

Data exploration with Microsoft Excel: univariate analysis

Analyzing Data Using Excel

R Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol

Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller

Adobe Dreamweaver CC 14 Tutorial

Exploratory data analysis (Chapter 2) Fall 2011

Advanced Excel 10/20/2011 1

I. Turn it on: Press É

Getting started manual

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Avaya Network Configuration Manager User Guide

7 Time series analysis

Computer with GeneMapper ID (version or most current) software Microsoft Excel, Word Print2PDF software

How Does My TI-84 Do That

Step by Step Guide to Importing Genetic Data into JMP Genomics

Excel Using Pivot Tables

Generating ABI PRISM 7700 Standard Curve Plots in a Spreadsheet Program

Creating an Excel XY (Scatter) Plot

SECTION 2-1: OVERVIEW SECTION 2-2: FREQUENCY DISTRIBUTIONS

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using R for Windows and Macintosh

Work Health and Safety Reporting User Manual

Dealing with Data in Excel 2010

Chapter 10 Encryption Service

UCL Depthmap 7: Data Analysis

Package hoarder. June 30, 2015

CATIA V5 Tutorials. Mechanism Design & Animation. Release 18. Nader G. Zamani. University of Windsor. Jonathan M. Weaver. University of Detroit Mercy

GeoGebra Statistics and Probability

RANGER S.A.S 3D (Survey Analysis Software)

TIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:

Creating Online Surveys with Qualtrics Survey Tool

Table of Contents. Introduction: 2. Settings: 6. Archive 9. Search Browse Schedule Archiving: 18

Archiving in Microsoft Outlook. This document looks at archiving and saving space in the Microsoft Outlook program. INFORMATION SYSTEMS SERVICES

Usage Analysis Tools in SharePoint Products and Technologies

Doors User Data File Export/Import

SAS BI Dashboard 4.3. User's Guide. SAS Documentation

Microsoft Excel 2010 Part 3: Advanced Excel

Using Internet or Windows Explorer to Upload Your Site

Kaplan-Meier Survival Analysis 1

2 Describing, Exploring, and

MicroStrategy Analytics Express User Guide

Importing Data into R

Computer Training Centre University College Cork. Excel 2013 Pivot Tables

MS Access Lab 2. Topic: Tables

+27O.557+! RM Auditor Additions - Web Monitor. Contents

Excel Unit 4. Data files needed to complete these exercises will be found on the S: drive>410>student>computer Technology>Excel>Unit 4

CATIA Basic Concepts TABLE OF CONTENTS

INTRODUCTION TO EXCEL

HSPA 10 CSI Investigation Height and Foot Length: An Exercise in Graphing

Package empiricalfdr.deseq2

Importing and Exporting With SPSS for Windows 17 TUT 117

Academic Support Center. Using the TI-83/84+ Graphing Calculator PART II

Instructions for Use. CyAn ADP. High-speed Analyzer. Summit G June Beckman Coulter, Inc N. Harbor Blvd. Fullerton, CA 92835

Visualization of missing values using the R-package VIM

Journal of Statistical Software

MICROSOFT OUTLOOK 2010 WORK WITH CONTACTS

Common Tools for Displaying and Communicating Data for Process Improvement

Using Excel in Research. Hui Bian Office for Faculty Excellence

Appendix 2.1 Tabular and Graphical Methods Using Excel

Creating Classroom Pages in Edline

CHAPTER 14: MEDICAL CONDITION RECORDS

Paternity Testing. Chapter 23

Webmetrics Web Monitoring Getting Started Guide

Create Charts in Excel

KI6501 Data Manager. Software User Manual

Cammegh Pitboss HQ Website User Documentation

Pre-Calculus Graphing Calculator Handbook

MEDIAplus administration interface

Microsoft Office PowerPoint Creating a new presentation from a design template. Creating a new presentation from a design template

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

MICROSOFT OUTLOOK 2010 READ, ORGANIZE, SEND AND RESPONSE S

Using Excel for descriptive statistics

Transcription:

standarich_v1.00: an R package to estimate population allelic richness using standardized sample size Author: Filipe Alberto, CCMAR, University of the Algarve, Portugal Date: 3-2006 What it does The package purpose is to standardize population sample size before comparing allelic richness (Â) estimates among different populations. The problem of unequal sample size is typical in clonal species where G, the number of different genotypes or genetic individuals, is the relevant statistic to standardize. Independently of the sample design used for clonal organisms G is unpredictable at that stage, even when N, the number of sample units, is kept constant across populations. Therefore for clonal species the data file should contain a single copy of each multilocus genotype (MLG) present for each population. Usually the G counts among populations for clonal species data sets vary much, thus a standardization of G is needed to compare meaningful estimates of Â. For non-clonal species all MLG should be used, as the standardization is for N, and is normally necessary when the sample size used varied among populations. Graphical tools are also available to; 1) plot allele distribution and frequency across populations for each locus (Fig 1A and B), and 2) plot the relationship between allelic richness and genet/individual number in a population (Fig 2A and B). Using an R package R is a free software for statistical analysis and complex computations with numerous graphical applications and programming language (R Development Core team 2004). It is available for a series of platforms (Unix, Windows and Macintosh). The use of R is typically a slow learning process, but in the end it compensates by offering a large body of functionalities, graphics and data handling procedures. To use this package you need to know little about R and follow the instructions in this document. Start by installing R in your system, it is available from http://www.r-project.org/, follow the instructions there in. Once you installed the program and set a shortcut of Rgui to a specific working directory (see the pdf help manuals An introduction to R/introduction and preliminaries ), you can open the R console and select install packages from local Zipp files from the packages menu on the top bar and select the standarich_1.0.zip file. You now only need to load the package in the R environment, by selecting it from the package window in load package. Data file The text tab delimited file to use as input to standarich is very simple: Lines correspond to individual genotypes. The 1 st column has the population codes, the 2 nd the individual codes. The following columns have the allele codes (i.e. two columns per locus), in integers, using three or two digits (but be consistent!). Missing data is represented by the value 999 (don t use 99 even if you use 2 digits to code alleles!). There is no header in this file. 1

Example (a data file with 2 populations and 3 loci): a 1 181 181 222 222 175 177 a 2 181 181 222 222 177 177 a 3 181 181 222 222 177 177 1 181 181 224 230 175 175 2 181 181 230 230 175 175 3 181 203 224 224 175 175 4 181 203 224 230 175 175 5 181 203 224 230 175 175 To load such a file into R environment you can use the read.table( ) function; in the R console type: test<-read.table( myfile.txt, header=false) The object test is now loaded with your data(note that R is case sensitive Test is different to test!) standarich functions standarich contains, two function for data manipulation and two functions for plotting: rgenotypes.arich( test, n_of_replicates) This function should be used initially to perform a multiple random reduction (Leberg et al. 2002) of the number of individuals/genets in each population, by generating random subsamples for each population of varying size from g = 1 to g = G, G being the total number of different genotypes observed in each population (G = N in non-clonal species). Each subsample of a given g size is randomly replicated n_of_replicates times. You will need two arguments to this function: The first is the R object with your data (e.g. test) and the second the number_of_replicates times you want for each subsample of size g to be replicated. The function returns a table (actually an R object of type data.frame) with the multiple random reduction results: # to send the results of the function to the object redtest redtest<-rgenotypes.arich( test, 5 ) redtest 2

The 1 st column refers to population, the 2 nd to the subsample size in that randomization, the third to the  value for that subsample, the 4 th the standard deviation across loci, and in the remaining columns the number of alleles found per each locus. Note how each subsample of size Ind = g is repeated a number_of_replicates times. stand.arich( redtest, g) The second function uses as argument the data.frame produced above (e.g. redtest), and returns a summarized data.frame with the results of  after standardization of all population to a given g, the second argument to this function. You can run the function with different g values using the same redtest data.frame performing standardization for different g values. The results from this method are similar to the rarefraction method (Petit et al. 1998). stand.arich( redtest, 15 ) 3

stand.arich( redtest, 25 ) The 1 st column refers to population, the 2 nd to  and the last to the standard deviation across replicated random subsamples of size g. Note how in the second example (g = 25) and populations have no values, this is because those populations have a number of individuals which is lower than the value used to perform the standardization (G < g). You should try to compromise between a g value that is not too high to result in losing many populations but is not to low to render  differences intangible. Both the above described functions print their results tables to files written in the working directory named results random reduction.txt and standardized allelic richness.txt respectively. Note that if you need to use stand.arich( ) with different g arguments you need to rename the standardized allelic richness.txt file each time. Plotting functions allele.freq.plot( test, tpop=1, tpoint=2, tall=1) The function returns as many plot windows as there are loci in your data file. Each plot represents a table of allele frequencies for a given locus where the actual values are represented by dots of varying diameter. Allele codes are indicated on the x axis and population names on the y axis. The first argument of this function is the object with your data file (e.g. test), the latter arguments are used to control allele (tall) and population (tpop) text size and relative allele frequency dot size (tpoint), all have default values, although you may have to change them according to the number of populations in your data file. 4

allele.freq.plot( test, tpoint=4 ) 108 110 112 114 116 L 5 Alleles Fig 1A The plot for locus 5 for a data set with 3 populations allele.freq.plot ( rteste, tpop=0.65, tall=0.8 ) 100 102 106 108 110 112 114 116 L 5 Alleles 5 Fig 1B The plot for locus 5 for a data set with 37 populations

allele.genotype.plot( redtest, g=0, xmin=0, xmax=50, xmark=10, ymax=5 ) This function uses the data.frame produced by rgenotypes.arich( ) ( e.g. redtest ) to plot the relationship between allelic richness and sample size. The 2 nd argument to the function is a value of g to print as a vertical line in its intersection on the x axis, e.g. to represent the value used in the standardization, cutting the population lines at the standardized  values (intersection in the y axis). The arguments xmax, xmin define the limits of the x axis. The space between consecutive tick marks of the x axis is set by xmark, and ymax defines the size of the y axis (all arguments except the data.frame have default values). #be test37pop a input table with data from 37 populations. red37pop_5rep <- rgenotypes.arich( teste37pop, 5 ) allele.genotype.plot( red37pop_5rep, g=10, xmax=40 ) Allelic richness 5 4 3 Fig 2 A Relationship between allelic richness (Â) and nº of genotypes in a sample. Each line represents a different population. Each point in the line is the mean of all replicates for that subsample g size, here 5 replicates were used. 2 1 0 10 20 30 40 Nº of genotypes #Now using 100 replicates red37pop_100rep <- rgenotypes.arich( teste37pop, 100 ) allele.genotype.plot( red37pop_100rep, g=10, xmax=40 ) Allelic richness 5 4 3 2 Fig 2 B Relationship between allelic richness (Â) and nº of genotypes in a sample. Each line represents a different population. Each point in the line is the mean of all replicates for that subsample g size, here 100 replicates were used to smooth the line. 1 0 10 20 30 40 Nº of genotypes 6

Notice how the lines are smoother with higher number of replicates although the processing time increases much. Additional help In the Rgui console you may find additional help documentation to this manual. After you load the standarich package you can use the function help to see a window with the documentation for every R function: help( rgenotypes.arich ) # or simply? rgenotypes.arich You can access the same information in html format, select from the help menu, HTML help, than select Packages and search for the standarich link. Example data files For testing the functions in package standarich two data sets are available: A data set Exdata is available to test rgenotypes.arich( ) #do data(exdata) a<-rgenotypes.arich(exdata, 50) To test stand.arich( ) a data.frame Exresults, with the results from rgenotypes.arich( ) is available. data(exresults) b<-stand.arich(exresults, 10) References Alberto F, Arnaud-Haond S, Duarte CM, Serrao EA (2006) Genetic diversity of a clonal angiosperm near its range limit: the case of Cymodocea nodosa in the Canary Islands. Marine Ecology Progress Series 309: 117-29. Leberg PL (2002) Estimating allelic richness: Effects of sample size and bottlenecks. Molecular Ecology 11: 2445-2449. Petit RJ, El Mousadik A, Pons O (1998) Identifying populations for conservation on the basis of genetic markers. Conservation Biology, 12: 844-855. 7

R Development Core Team (2004) R: A language and environment for statistical computing. R foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-00-3, URL http://www.rproject.org. 8