ModEco Tutorial In this tutorial you will learn how to use the basic features of the ModEco Software. Contents: Getting Started Page 1 Section 1: File and Data Management Page 1 o 1.1: Loading Single Environmental Layers Page 1 o 1.2: Adding environmental layer folders Page 2 o 1.3: Adding Species Data Page 3 o 1.4: Opening Pre-existing Projects Page 3 o 1.5: Saving a Project Page 3 Section 2: Running a model Page 3 o 2.1: Selecting Model Type Page 4 o 2.2: Data Selection Page 4 o 2.3: Model Algorithm Selection Page 5 o 2.4: Accuracy Assessment Page 5 o 2.5: Result Map Page 6 Section 3: Data Page 6 o 3.1: Understanding Species Data Page 6 o 3.2: Extracting presence data Page 7 o 3.3: Creating Pseudo-Absence data Page 8 Section 4: Accuracy Assessments Page 9 o 4.1: TPR vs. predicted area plot based on classification model Page 10 o 4.2: ROC Curve based on classification model Page 11 Section 5: Utilities Page 13 o 5.1: Factor Histogram Page 13 o 5.2: Scatter plot Page 14 o 5.3: Factor Importance Analysis Page 15 Getting Started: Installing ModEco: 1. Download ModEco from http://gis.ucmerced.edu/modeco/download.html. 2. Unzip the downloaded file. 3. After you have unzipped the file you will see a folder that contains file SetupModEco.msi. Double-click on SetupModEco.msi and follow the steps in the setup wizard to install the program. Starting ModEco: To start ModEco click on [Start->All Programs->ModEco->ModEco ] in Windows Explorer Section 1: File and Data Management There are two main types of data that will be used within ModEco: environmental data, and species data. Data can either be loaded manually or it can be loaded by opening a project that already has data associated with it. If you would like to learn about loading data into ModEco manually, please continue with sections 1.1-1.3. Otherwise continue to section 1.4 (Opening Pre-existing Projects) to open a project that that has all data already loaded into it. 1.1: Loading Single Environmental Layers This section describes how to add a single environmental layer. Page 1
1. Click on [File->Add Environmental layer ] 2. In the add environmental layer window (Figure 1.1) add all of the environmental layers by: a. Clicking on the ( ) button to browse for an environmental. NOTE: environmental layers are generally located in the data folder that resides within the ModEco folder that was created during installation. The path is commonly: C:\Program Files (x86)\modeco\data. Note that ModEco provides sample layers in this directory. b. Add the environmental layer to a group. Either add it to an existing group by selecting a previously made group from the Add to group dropdown box or our type a new group name in the Add to group dropdown box to add the layer to a new group. c. All of the other fields (Keywords and Data units) can be filled in optionally and are only used to help the user to organize and analyze their data. d. Once all of the parameters are filled, click OK and the environmental layer will be added to the project in the specified data group. Figure 1.1: Adding an environmental layer 1.2: Adding Environmental Layer Folders This section describes the process of adding multiple environmental layers to the project by adding the folder that contains the environmental layers. 1. To add an environmental layer folder click [File->Add environmental layer folder]. This will display the environmental layer folder browsing window. 2. Navigate to data folder that is located within the ModEco folder that was created upon installation or any folder that contains environmental layers. The path for the sample data is commonly: C:\Program Files (x86)\modeco\data. 3. Highlight the folder containing the environmental layers and click OK. Now all of the environmental layers within that folder will be added to the project. Here is a list of the 11 environmental layers that should be loaded at this point if using the sample data: DEM (dem300_res) Annual Precipitation (pt_pa) Precipitation of January (pt_pm1) Page 2
Precipitation of April (pt_pm4) Precipitation of July (pt_pm7) Precipitation of October (pt_pm10) Annual Temperature (ta_pa) Temperature of January (ta_pm1) Temperature of April (ta_pm4) Temperature of July (ta_pm7) Temperature of October (ta_pm10) 1.3: Adding Species Data 1. Add species data by clicking on [File-> Add Species data ]. 2. Add species data that are in the data folder by clicking on the species file and clicking OK. Species data will be in the format.smp. The sample species data is located in C:\Program Files (x86)\modeco\data. 3. Repeat steps 1 and 2 for additional species data. Here is a list of the 11 species data files that should be loaded at this point if using the sample data: Blue oak Black oak Coast oak Calibay Fremont CW Interior oak Canyon oak Valley oak Tan oak Madrone Oregon oak. 1.4: Opening Pre-existing Projects In ModEco, the user can load a pre-existing project that has all of the environmental and species data that were used in that project. Also pre-existing projects will also contain any result maps that were created in previously saved sessions. ModEco comes with a sample project called Species.sml..sml is the file extension ModEco uses for project files. This section will help the user learn how to open a project file. 1. To open a project file, click on [File->Open project] and the Open window will be displayed. 2. Navigate to the desired project file, highlight it and click the Open button. Note: ModEco s sample project file is located in the installation folder at: C:\Program Files (x86)\modeco \Data\Species.sml. 3. Once the Open button has been clicked, all of the project data will now be loaded into ModEco. 1.5: Saving a Project After working on a project the user can save all of the data and result maps to a.sml file so that they can be quickly loaded later. This section will instruct the user on how to save a project. 1. Navigate to the menu bar and click [File->Save project]. 2. If the project is a new project, then the Save As window will be displayed. In this case, enter a name for the project file and click Save. Section 2: Running a model (Back to contents) Page 3
There are four steps to running a model in ModEco: selecting the model type, data selection, model algorithm selection and accuracy assessment. After you have completed the previously mentioned steps, ModEco will generate a result map. 2.1: Selecting Model Type The type of data that you have imported into your project will be what determines the type of models that you will be able to run. If the sample data is being used then the data type is Presence and absence and therefore, Presence and absence models should be used. For more information on species data types go to section 3.1. 1. To select the model type, click on the Models button on the menu bar and a menu containing several different model types will be displayed. 2. Now click the Presence and absence models button if using the sample data. 3. Now there will be two options: Classification models and Probabilistic models. For the sake of this tutorial arbitrarily choose the Classification models. 4. Now you should be presented with the Select data window. 2.2: Data Selection Data selection allows you to choose between different environmental data groups and species data that you may have loaded into your project. Figure 2.2: Data Selection 1. In the Select data window (Figure 2.2) select, from the Environmental data group dropdown box, the environmental layer data group that you have just created. 2. In the factors box, there is a list of the environmental factors that are contained within your data group. If you would like to exclude a factor from the model, you can uncheck the associated checkbox. 3. Now select a species point data in the Species data points for training dropdown box. 4. For now, leave all of the other fields as their default value and click Next. 5. Now the Presence and absence models selection window will be displayed. Page 4
2.3: Model Algorithm Selection For each group of model types there are specific model algorithms to choose from. For more information about model types please refer to the user s manual. Figure 2.3: Model Algorithm Selection 1. Under the Select model dropdown box, select Two-class SVM. 2. Leave all of the parameters as their default values and click Next. 3. Now you will be brought to the Accuracy assessment window. 2.4: Accuracy Assessment The Accuracy assessment window provides various analytical indexes and values (Kappa, F-Score, etc.) but does not have any effect on the model. They are just provided to the user for analysis. Page 5
Figure 2.4: Accuracy Assessment 1. When finished examining the accuracy assessment values, click the Finish button to create the result map. 2.5: Result Map The result map is a prediction of areas where the species may occur according to the selected model. The map also predicts absence which is the predicted area that the species will not occur. Figure 2.5: Result Map (Back to Contents) Section 3: Data Depending on what type of species data is present and the desired model type, you may need to extract or add data to or from your species data points. This section will describe how to tell what type of species data is being used, what types of models can be run, and how to create presence only data by extracting presence data from presence and absence data and how to add pseudo-absence data to create presence and absence data from one-class data using the Data tab. 3.1: Understanding Species Data This section of the tutorial will explain how to discover what type of data a species file is and what types of models it can be used in. 1. To discover the type of a species data file, go to the project window on the left-hand side of the screen and right click on one of the species data and then click on properties to display the Properties window. 2. The most important field in this window for determining the type of species data file is the Number of species field. See figures 1.2 a) and 1.2 b). Page 6
3. It is also important to note that Regression data will be pictured a green box, one-class data will be pictured as a circle with dots in it, and multi-class and presence and absence data will be pictured as different sized dots in the project window. Figure 1.2 a) One-class data Figure 1.2 b) Presence and absence data Figure 1.2 c) Multi-class data 4. Notice in figure 1.2 a) that there is only one species type and the One class species data check box is checked. This means that this is a one class data that only contains presence data. In figure 1.2 b) it shows only one species type but does not have the One class species data check box checked. This means that it is a presence and absence dataset that has presence and absence points for one species only. Figure 1.2 c) shows multiple species data are present and that means that this is multi-class data. 5. Depending on what type of data you have, the types of models that can be run are limited. Here is a list of models that are compatible with each type of data: Presence only data: Presence-only models Pseudo-absence models Presence vs. background models Presence and absence data: Presence and absence models Multi-class data: Multi-class models Regression data: Page 7
Abundance regression models 3.2: Extracting presence data First lets extract presence only data from presence and absence data because ModEco s sample data is presence and absence data. 1. In the project window, double click on the species data file that you would like to extract the presence data from so that it is visualized. 2. Then click on the Data tab on the menu bar. Figure 3.1 Data tab 3. In Figure 3.1 notice that there are many options but for section 3.2 we are only concerned with the Resample species data points option. Click on the Resample species data points option and you will be presented with the Resample species data points window (Figure 3.2). Figure 3.2 Resample species data points 4. Now select one of the presence and absence species files from the Species data points drop box and select the Extract presence data from presence and absence data option, then click OK. 5. This will create a new species data file. This file will be named with the same name as the file it was created from but with (Resampled) after the original name. For example, if the blueoak_r file is used, then the new presence only file will be called blueoak_r(resampled). 6. Now that there is presence-only data, models that require presence only data will now be available to run: Presence-only models, Pseudo-absence models and Presence vs. background models. 3.3: Creating Pseudo-Absence data Many of the models in ModEco, Pseudo-absence models and Presence vs. background models, for example, will already create Pseudo-absence data for the user. However, the user may want to run one of the other models that require you to start with presence and absence data. This can be done by creating pseudo-absence in the data tab. Page 8
1. In the project window, open the one-class species data file that you would like to use to produce presence and absence data by creating pseudo-absence points by double clicking it. If you do not have one-class data, you can create it by completing section 3.2. 2. Then navigate to the Data tab on the menu bar. 3. In Figure 3.1, notice that there are many options for creating pseudo-absence points. Each option provides a different method of selecting the extent in which the pseudo-absence points will be created. For example, the option Create pseudo-absence points (with environmental map) will allow you to create pseudoabsence points within the extent of an environmental map. Click on Create pseudo-absence points (with environmental map). You will now be presented with the Create pseudo-absence points window (Figure 3.3). Figure 3.3 Create pseudo-absence points 4. In the first drop down box of the Create pseudo-absence points window, select the one-class species point layer to add pseudo-absence data to. 5. Next, select what data group that the environmental mask will be selected from. 6. Then select an environmental mask to use for defining the extent of the pseudo-absence point generation. 7. The Presence points number box informs the user of the number of points that the one-class layer currently has. 8. Now it is time to choose how many pseudo-absence points to create. a. The first option is to choose the number by choosing a multiple for which to multiply the number of presence points by. This number is inputted into the box labeled times of presence points. For example, if 2 is entered, then that means the number of pseudo-absence points will be equal to 2 times the number of presence points. b. The second option is to select the number of pseudo-absence points directly. To do this simply click on the option next to the box labeled absence points and enter the desired number of pseudoabsence points. 9. After defining all of the parameters, click OK and a presence and absence data layer will be created for use in Presence and absence models. Page 9
(Back to Contents) Section 4: Accuracy Assessments As mentioned in section 2.4, accuracy assessments are already included in running a model. However, there are additional features available in the Assessments tab on the menu bar. Here we will go over how to use the TPR vs. predicted area plot based on classification model, the ROC curve based on classification model, the TPR vs. predicted area plot based on continuous prediction, and the ROC curve based on probabilistic prediction. 4.1: TPR vs. predicted area plot based on classification model ModEco allows users to plot the true positive rate vs. the prediction area to aid users to assess their models. 1. Obtain a one-class presence only species dataset. If a one-class species dataset is not present, please refer to section 3.2. If you have a one-class dataset but it is not in the current project please load it by referring to section 1.1. If you are not sure how to tell if your dataset is a one-class species layer please refer to section 3.1. 2. Once you have your one-class data in the current project, click on the Assessments tab on the Menu bar. 3. Click the TPR vs. predicted area plot based on classification model option and you will be presented with three options: Support Vector Machine, BioCLIM and Domain. 4. Depending on what option you choose you will be presented with one of the three windows in figures Figures 4.1 a)-4.1 c). Figure 4.1 a) TPR for SVM Figure 4.1 b) TPR for BioCLIM Page 10
Figure 4.1 c) TPR for DOMAIN 5. In all three options, the user must choose an environmental data group in the Environmental data group drop box. This is simply the set of environmental layers that you would like to use for this function. If you do not have any environmental data groups created please refer to section 1.1. 6. Next, in the Factors section, the user is allowed to select specific environmental layers that they would like to run the TPR vs. predicted area plot assessment with. Simply check or uncheck layers to include or exclude them from the assessment. 7. In the Species data points drop box, select the species dataset that is to be used in the assessment. 8. Now, depending on which TPR assessment type you have chosen (SVM, BioCLIM or DOMAIN) you will have a different final parameter to define. For the SVM version, there will be the Vary Gamma or the Vary Cost option. Also for the SVM version you will have an extra button below the OK button called Set SVM. This button opens a window that will allow you to set the parameters of the SVM. For the BioCLIM version, there will be the Percentile option. For the DOMAIN version, there will be the Similarity threshold option. 9. Consult the manual for information on model parameters. 10. Now that all of the parameters have been selected, click the OK button to generate a curve similar to Figure 4.2. Page 11
Figure 4.2 Sample TPR curve 11. Once the curve is created, the graph shows the relationship between the true positive rate compared to the area that was predicted by the model. 4.2: ROC Curve based on classification model The Receiver Operating Characteristic (ROC) curve is a plot of the sensitivity (true positives rate) vs. 1-specificity (true negative rate) by varying discrimination thresholds. The ROC Curve, in contrast to the TPR Curve, is useful for assessing species datasets that contain presence and absence data. 1. Obtain a presence and absence species dataset. If you do not have a presence and absence species dataset please refer to section 3.3. If you have a presence and absence dataset but it is not in the current project please load it by referring to section 1.1. If you are not sure how to tell if your dataset is a one-class species layer please refer to section 3.1. 2. Once the presence and absence data is in the current project, click on the Assessments tab in the Menu bar and then click on the ROC curve based on classification model option. You will be presented with four options: Support Vector Machine, BioCLIM, DOMAIN, and the Generalized Linear Model. 3. Depending on what option you choose you will be presented with one of the four windows in figures Figures 4.3 a)-4.3 d). Page 12
Figure 4.3 a) SVM ROC Figure 4.3 b) BioCLIM ROC Figure 4.3 c) DOMAIN ROC Figure 4.3 d) GLM ROC 4. In all four options, the user must choose an environmental data group in the Environmental data group drop box. This is simply the set of environmental layers that you would like to use for this function. If you do not have any environmental data groups created please refer to section 1.1. 5. Next, in the Factors section, the user is allowed to select specific environmental layers from the data group that they would like to run the ROC Curve assessment with. Simply check or uncheck layers to include or exclude them from the assessment. 6. In the Species data points drop box, select the species dataset that is to be used in the assessment. 7. In the case of the Support Vector Machine version of the ROC Curve the user will be able to set the SVM by clicking on the Set SVM button. In the case of the Generalized Linear Model the user is required to select a link function. For more information about model parameters see the manual. 8. After all of the parameters have been defined, click OK and the ROC Curve will be generated. Page 13
Figure 4.4: Sample ROC Curve 9. As shown in Figure 4.4 the ROC curve shows the relationship between the true positive rate and the false positive rate. (Back to Contents) Section 5: Utilities It is important to examine the input environmental and species data when running a model. In ModEco, we provide some basic functions that allow users to visualize the relationship between observed species localities and environmental features. In this tutorial we will go over some of the basic utilities in ModEco: Factor Histogram, Scatter Plot, and Factor Importance Analysis. 5.1: Factor Histogram The Factor histogram is designed for comparing the distributions of environmental variables between the observed species localities and the whole study area. 1. To create a factor histogram, click [Utilities->Factor histogram]. The Factor importance analysis window will now be displayed (Figure 5.1). 2. Adjust the different parameters to analyze the data. Page 14
Figure 5.1: Factor Histogram 5.2: Scatter plot The scatter plot is another graphical tool used to evaluate the ability of two selected environmental factors in discriminating the species distribution. 1. To create a scatter plot, navigate to the Utilities tab on the Menu bar and click on the Scatter plot button and the scatter plot window will be displayed (Figure 5.2). Figure 5.2: Scatter Plot 2. Change the parameters Data group, X, Y and the Sampling layers to create different scatter plots. 3. There is also the option to show or no not show any of the types of points (presence, absence, unknown and overlapped) by selecting or unselecting the checkboxes. Page 15
5.3: Factor Importance Analysis The factor importance analysis is used to examine the contributions of different environmental factors to the overall classification accuracy based on a specific niche model. 1. Navigate to the Utilities tab on the Menu bar and click on Factor importance analysis. 2. Select one of the niche models to perform the analysis with. 3. Then define the model parameters in the Model window (Figure 5.3) and click OK. Figure 5.3: Sample of a Possible Model window 4. Now the Factor importance analysis window will be displayed (Figure 5.4). It shows the kappa value with Only this factor, Without this factor, and With all factors. Figure 5.4: Sample Factor importance analysis (Back to Contents) Page 16