G563 Quantitative Paleontology Department of Geological Sciences P. David Polly Week 7 - Geographic Analysis of Living Species The goal of this week s assignment is to learn to process geographic range information, to link it to climatic data, and to do a simple analysis of suitable habitat based on them. Specifically, you will: 1. Install the Quantum Geographic Information System (QGIS) on your computer. QGIS is an open-source, general purpose GIS for creating, displaying, and analysing geographic data. We will use it to prepare data for analysis in Mathematica. 2. Find geographic range data for one or more living species in your group (or a closely related group if yours has no living members). Ideally, you will locate a geographic range map and individual occurrence points based on museum voucher specimens. 3. Download grid points and climatic data from Polly website and from WorldClim site. 4. Process the range data in QGIS and prepare it for analysis next week. 5. Reading for next week: Svenning, J-C, C. Flujgaard, K.A. Marske, D. Nógues-Bravo, and S. Normand. 2011. Applications of species distribution modeling to paleobiology. Quaternary Science Reviews, 30: 2930-2947. Maguire, K. & Stigall, A. 2009 Using ecological niche modeling for quantitative biogeographic analysis: a case study of Miocene and Pliocene Equinae in the Great Plains. Paleobiology 35, 587. Installing QGIS Quantum GIS is available for download at http://www.qgis.org/. The current version is 1.8.0, the release known as Lisboa (all QGIS releases are named after cities in their native languages). For Windows: The program can be installed by downloading and executing the Standalone Installer setup file that is linked from the top of the download page. For Mac OS X: The links to the installation are at the very bottom of the download page (follow link to KyngChaos installation page). To perform the install on Mac you must install the following four packages in order: 1. GDAL 1.9 Complete 2. NumPy (part of the GDAL download) 3. GSL Framework 4. QGIS 1.8.0-2 These are ordinary MAC dmg installations, but only the QGIS one shows up in your application folder. After Installing: Install the point sampling plug-in tool. You can find the installation menu under: 1
Plugins > Fetch Python Plugins > Point Sampling Tool Once installed, this feature will be available in the Plugins > Analyses menu. Finding Geographic Range and Occurrence Data I would like you to try to find two kinds of geographic data for at least one species in your group. Point Occurrences The first are so-called point occurrences, which are sets of latitude and longitude points for places where your species has been verifiably spotted (usually represented by a specimen in a research museum collection somewhere). One source for such data is the Global Biodiversity Information Facility (www.gbif.org). This database has amalgamated sets of collection records from research museums around the world. To find data: 1. Chose Access data portal. 2. search by taxonomic name (preferably genus and species). 3. Choose the most general name from the list of hits (usually the top hit). 4. Chose Explore -> Occurrences. 5. Plot matching records on map to see whether you have reasonable looking data and whether they seem to be geographically biased (e.g., only from Canada or only from part of the known range). 6. Download spreadsheet of results (Tab delimited is probably the best choice). Geographic Range Shape Files The second type of data is geographic range polygons in GIS-format shape files. These are data that draw the geographic range of a species on a map, usually as an outline or polygon. For example, the following map shows a range polygon for the fox species Vulpes velox in North America: These data come in a GIS format called shape files, which contain data defining the boundaries of the range in latitude and longitude coordinates. 2
There are a number of sources that provide such range data, but the location varies by taxon. The following are a few sources: IUCN Spatial Data Download: http://www.iucnredlist.org/technical-documents/spatial-data The IUCN has collected range data for many vertebrate taxa, as well as a limited number of invertebrate and plant species. Mammals of the Western Hemisphere http://www.natureserve.org/getdata/mammalmaps.jsp Nature Serve provides an especially high-quality set of range shape files for mammals in North and South America. Polinators, Birds, Amphibians, and US freshwater fishes: http://www.natureserve.org/getdata/animaldata.jsp More data provided by NatureServe. Download Grid Point and Climate Data Gridded Climate Data from Polly Lab Website Download and install gridded climate data from http://mypage.iu.edu/~pdpolly/data.html from the section Equidistant Geographic Grid points. These are data that have been resampled using points spaced 50 km apart in order to (1) create an even coverage of the Earth s surface (data gridded by latitude and longitude are strongly biased in their sampling because density increases toward the Earth s poles), and (2) to make the modern data comparable to fossil assemblages. Fossil samples are usually are composed of the remains of animals concentrated in stream deposits, lakes, or sinkholes from source populations drawn from a broader local area. Sampling modern data using 50 km grid cells artificially replicates a situation in which all nearby species were concentrated in a single site rather than partitioned into local microhabitats, giving the fauna-to-climate comparison similar resolution in the modern and paleontological records. * Download the grid points as GIS shape files from the third link on the page, continent shape files. You will use these with QGIS. * Download the grid points in SQL format from the second link on the page, zipped MySQL insert file. You should be able to import this using phpmyadmin. It will create a new database called equidistant points with several tables containing the following data sets: 1. Elevation from "TerrainBase Global Land Elevation and Ocean Depth" from the National Geophysical Data Center and World Data Centers-A for Solid Earth Geophysics and for Marine Geology and Geophysics (NGDC/WDC-A) 2. Precipitation and Temperature from "Global Air Temperature and Precipitation: Regridded Monthly and Annual Climatologies" from Wilmott, Matsuura and Legates, Center for Climatic Research, University of Delaware. 3. Macrovegetation cover from Matthews, 1984. "Prescription of Land-surface Boundary Conditions in GISS GCM II: A Simple Method Based on High-resolution Vegetation Data Sets". NASA TM-86096. 4. Ecoregions from R.G. Bailey, "Ecoregions of North America", Rocky Mountain Research Station, US Forest Service, Fort Collins, Colorado. 3
5. Bioclimatic variables on temperature, seasonality, precipitation, etc. from the WorldClim Global Climate Data set compiled by Hijmans et al., 2005. The gridpoints themselves and the data compilation may be cited as "Polly, P.D. 2010. Tiptoeing through the trophics: geographic variation in carnivoran locomotor ecomorphology in relation to environment. Pp. 374-410 in A. Goswami and A. Friscia (eds.), Carnivoran Evolution: New Views on Phylogeny, Form, and Function." Raster Climate Data from WorldClim You can download conveniently packaged modern climate data for the Earth s continents from the WorldClim site (www.worldclim.org). Click Download > Current Conditions. Chose a data set from the Generic Grids set. We want the Bioclim data set. You can choose which resolution you want: 2.5 arc minutes, 5 arc minute, or 10 arc minutes. An arc minute is 1/60 th of one minute of of latitude and longitude (1/360 th of a degree). This is a measure of how detailed the climate data are. At the equator, one arc minute of longitude would be 0.019 miles so ten arc minutes would be 0.19 miles or less everywhere on the globe. The climate does not differ much in a tenth mile radius, so 10 arc minutes is probably sufficient. The higher the resolution, the larger the data file. (NOTE: Experimenting with these data suggests that the resolution is in minutes, rather than arc minutes, despite what the website says. Nevertheless, 10 minutes of longitude at the equator is 11.5 miles, a distance over which the climate does not change radically either). The Bioclim data set contains nineteen climate variables that are relevant to the lives of plants and animals. These variables originated from early quantitative studies about the distribution of organisms by climate at Australian National University (Nix, 1986). The variables are as follows: BIO1 = Annual Mean Temperature BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp)) BIO3 = Isothermality (BIO2/BIO7) (* 100) BIO4 = Temperature Seasonality (standard deviation *100) BIO5 = Max Temperature of Warmest Month BIO6 = Min Temperature of Coldest Month BIO7 = Temperature Annual Range (BIO5-BIO6) BIO8 = Mean Temperature of Wettest Quarter BIO9 = Mean Temperature of Driest Quarter BIO10 = Mean Temperature of Warmest Quarter BIO11 = Mean Temperature of Coldest Quarter BIO12 = Annual Precipitation BIO13 = Precipitation of Wettest Month BIO14 = Precipitation of Driest Month BIO15 = Precipitation Seasonality (Coefficient of Variation) BIO16 = Precipitation of Wettest Quarter BIO17 = Precipitation of Driest Quarter BIO18 = Precipitation of Warmest Quarter BIO19 = Precipitation of Coldest Quarter 4
Process Range and Occurrence Data Convert your Geographic Range Shape file to 50 km grid points 1. Load the shape files for the geographic range of your species and the 50 km grid shape file for the appropriate continent(s) into QGIS. Select both layers in the left column. 2. Choose Vector > Geoprocessing Tools > Clip, which is a tool that clips one GIS layer using the boundaries of another. This is a useful tool if you have a large data set that you want to limit to a local area. We will use it to find the 50 km grid points that belong to a particular species. 3. Using the Clip tool, select the layer with the 50 km points as the Input Vector Layer and the layer with your species range polygon as the Clip Layer. Provide a name and location for the output file (you may want to place the data in a folder of their own because several files will be generated). 4. Once the process is finished, close the layers with the 50 km grid and range. You should see the 50 km points for the range of your species. 5. Export the points as a CSV file: Layer > Save As > Comma Separated Value. 6. Open the CSV file in Excel, add a new first column called ScientificName, and fill in each row with the genus and species name of your taxon. NOTE: the GlobalID column of this table will be used in MySQL to link your species to climate data for each of its grid points. 7. You can add the exported file to your MySQL database as a new table using the same procedure as we did with the Paleobiology Database / NOW Database data, or you can import it into Mathematica as a comma delimited file: myspeciesclimatedata = Import[ filename, CSV ]; If you choose to import the file into MySQL, do the following: a) Open the CSV file in Excel and remove lines that have no latitude and longitude or climate data. Make sure you like the column names, and check whether the Genus and species name of your taxon is written correctly (sometimes the name is truncated by QGIS). If you think you ll add lots of species to your geographic occurrence data, you might consider adding other information that will be useful later such as higher taxonomy, data source, or continent. These extra data are not required for this course. b) Import the spreadsheet into MySQL. Remember to tick the option that your table has column headers and change semicolon to comma (as in Mac versions where default delimiter is semicolon). Rename the spreadsheet to something useful. NOTE: you ll have two different tables of geographic occurrences for this course, one based on point occurrences where the associated climate information is in the table, and one based on evenly sampled points from a geographic range where the climate data are in another table. NOTE ALSO: Climate data in raster format from WorldClim and elsewhere (such as the bioclim rasters you used to crate this data set) often have their temperature variables stored in tenths of a degree Celsius instead of degrees. If you want to convert them by dividing each temperature variable by 10, now is also a convenient time to do this. You can tell whether your data are stored in tenths of a degree if you have mean annual temperatures (BIO1) that are larger than 35 degrees. The variables that need conversion are BIO1, BIO2, BIO4-BIO11. c) Add an ID column that autoincrements and make it a primary index. 5
d) If you import additional species, add them to this same table (each species can be selected individually using SQL). For subsequent imports you will have to Link Climate Data to Point Occurrences 8. Load all 19 bioclim layers from the WorldClim download into QGIS using the Add raster layer tool. Import the point occurrence file using the text import tool (which has an icon that looks like a blue piece of paper with three commas on it). Select all the layers in the left column (hold down shift or the control key to select multiple layers). 9. Next choose Plugins > Analyses > Point Sampling Tool 10. Under the General tab select Scientific name, Latitude, Longitude, and each of the 19 bioclimatic variable layers. Enter a name for the shape file (which will actually consist of a bunch of files, so you might want to create a folder for it). Click OK. After a few seconds or minutes, an error will appear asking you to enter the name of the shape file. There is probably no error check in the left margin to see whether a new layer has appeared with your points in it. 11. Test whether your procedure worked by double clicking on the new layer to view its properties. Click on the Style panel and choose graduated from the pull down menu. Choose one of the bioclim variables in the column tab and increase the number of classes to 10 or 15. Click the classify button at the bottom, and then click ok. The points should take on a graded color according to temperature, precitpitation, or whatever variable you chose. 12. Finally, export the data as a comma delimited file by first selecting your layer in the left hand column, then choosing Layer > Save As > Comma Separated Value. 13. You can add the exported file to your MySQL database as a new table using the same procedure as we did with the Paleobiology Database / NOW Database data, or you can import it into Mathematica as a comma delimited file: myspeciesclimatedata = Import[ filename, CSV ]; If you choose to import the file into MySQL, do the following: e) Open the CSV file in Excel and remove lines that have no latitude and longitude or climate data. Make sure you like the column names, and check whether the Genus and species name of your taxon is written correctly (sometimes the name is truncated by QGIS). If you think you ll add lots of species to your geographic occurrence data, you might consider adding other information that will be useful later such as higher taxonomy, data source, or continent. These extra data are not required for this course. f) Import the spreadsheet into MySQL. Remember to tick the option that your table has column headers and change semicolon to comma (as in Mac versions where default delimiter is semicolon). Rename the spreadsheet to something useful. NOTE: you ll have two different tables of geographic occurrences for this course, one based on point occurrences where the associated climate information is in the table, and one based on evenly sampled points from a geographic range where the climate data are in another table. g) Add an ID column that autoincrements and make it a primary index. h) If you import additional species, add them to this same table (each species can be selected individually using SQL). For subsequent imports you will have to 6
References Nix, H. A. 1986. A biogeographic analysis of Australian Elapid Snakes. Pp. 4-15 in R Longemore (ed), Atlas of Elapid Snakes of Australia. Australian Flora and Fauna Series Number 7. Australian Government Publishing Service: Canberra. 7