NeuralEnsembles: a neural network based ensemble forecasting. program for habitat and bioclimatic suitability analysis. Jesse R.
|
|
- Alexandrina Gibbs
- 8 years ago
- Views:
Transcription
1 NeuralEnsembles: a neural network based ensemble forecasting program for habitat and bioclimatic suitability analysis Jesse R. O Hanley Kent Business School, University of Kent, Canterbury, Kent CT2 7PE, United Kingdom j.ohanley@kent.ac.uk Draft: March 2008 Keywords: habitat suitability modeling, bioclimatic envelope modeling, ensemble forecasting, artificial neural networks, species presence/absence data
2 ABSTRACT NeuralEnsembles is an integrated modeling and assessment tool for predicting areas of species habitat/bioclimatic suitability based on presence/absence data. This free, Windows based program, which comes with a friendly graphical user interface, generates predictions using ensembles of artificial neural networks. Models can quickly and easily be produced for multiple species and subsequently be extrapolated either to new regions or under different future climate scenarios. An array of options is provided for optimizing the construction and training of ensemble models. Main outputs of the program include text files of suitability predictions, maps and various statistical measures of model performance and accuracy.
3 SUMMARY There has, over the past few decades, been a proliferation in the ecological and climate change literature dealing with both methodological aspects and applied studies involving species habitat/bioclimatic suitability models. Such models attempt to predict the potential occurrence of a species across some area of interest based on inferred correlations between observations of species presence (and possibly absence) and a small set of environmental or climatic variables, which are normally perceived to be biologically important in determining the species distribution pattern (Guisan and Zimmermann 2000). Having parameterized a model for a chosen species, the model can subsequently be used to predict areas of likely presence either within the sampling area or extrapolated to an entirely new unobserved region (Fielding and Haworth 1995). Alternatively, as is often the case in climate change studies, models can also be used to project areas of potentially suitable climate space in either the past or, more commonly, into the future under different climate change scenarios (Pearson and Dawson 2003). More recently, a number of automated software programs have been developed to expediently build and output results of species habitat/suitability models. With the exception of a few R based programs like BIOMOD (Thuiller 2003), GRASP (Lehmann et al. 2002) and PresenceAbsence (Freeman and Moisen 2008), which are specifically designed for modeling species presence/absence data 1, most, including the more widely used ones with friendly graphical user interfaces like Maxent (Phillips et al. 2006), BIOMAPPER (Hirzel et al. 2002), GARP (Stockwell and Peters 1999), DOMAIN (Carpenter et al. 1993), and BIOCLIM (Nix 1986), have been specially designed for modeling presence-only datasets 2. 1 A type of dataset comprising a set of locations with both confirmed presence and confirmed absence of a species. 2 Datasets comprising a set of confirmed presence locations for a species but no confirmed absence locations. Given a set of presence locations, suitability is then normally defined in a relative manner to either background environment data or a set of pseudo-absence locations (Pearce and Boyce 2006) 3
4 In this software note, I present a new, freely available Windows based program called NeuralEnsembles version 1.0 (< which comes replete with an easy-to-use graphical user interface, for predicting areas of habitat/bioclimatic suitability based on species presence/absence data. Primary estimates of suitability are derived in NeuralEnsembles by means of training and running artificial neural networks (ANNs). ANNs are non-linear statistical models, inspired by the structure and function of the nervous system that have the ability to learn underlying patterns of correlation between observed input (environmental/climatic variables) and target (species presence/absence) data. ANNs have been used with great success in a variety of species habitat/bioclimatic suitability analyses (Araújo et al. 2005; Berry et al. 2007; Pearson et al. 2002; Segurado and Araújo 2004; Thuiller 2003) in addition to a great many other environmental fields including remote sensing (Gopal and Woodcock 1996), climatology (Cavazos 1997), hydrology (Dawson and Wilby 2001), and geology (Lee et al. 2004). ANNs are implemented in NeuralEnsembles using a modification of the open source FANN (fast artificial neural network) library version written in C (D. Oberhoff, pers. comm.; Nissen 2008). In comparison to other software applications, the most important distinguishing feature of NeuralEnsembles is the use of an ensemble (or committee) of multiple ANN submodels for making predictions about species habitat/bioclimatic suitability. Although the use of ensemble forecasting is still rather nascent within the ecological modeling literature (Araújo and New 2007), there is a large body of evidence from theory and practical work which clearly demonstrate the superiority of using an ensemble model over any single model (Sharkey 1999; Granitto et al. 2005). In practical terms, ensemble forecasting offers improved precision, as measured statistically by the accuracy of the combined predictions. 4
5 There are two basic execution modes for NeuralEnsembles, which can be set on the primary user interface (Figure 1). In standard training and projection mode, a user must input one or more species presence/absence data files and a single environmental/climatic training data file. The program uses these data during an iterative training (calibration) phase to produce a set of parameterized ANN submodels for each species. Projections from the individual ANNs are then subsequently combined to produce an ensemble forecast for the observed set of presence/absence locations within the study area. Note that the set of locations in a species presence/absence file need only, in fact, be a subset of the full list of points provided in the environmental training file. The program automatically pairs each observed presence/absence location with its matching array of observed environmental data. As an option, a user can also produce multiple projections of habitat/bioclimatic suitability for different spatial regions or different time periods (e.g., under alternative future climate change scenarios) by simply loading one or more environmental projection files. This is also particularly useful when the user wishes to produce a projection for an entire area that only has a subset of observed presence/absence locations on which a model is being trained. The other basic execution mode for NeuralEnsembles is projection only. This is useful when a trained model has already been developed for a particular species and the user latter wishes to use the model to make new projections. In this mode, no model training is carried out. As such, instead of inputting one or more species presence/absence files and an environmental training file, a user is prompted to input for each species the main directory where the outputs from the previous training run have been stored, along with any new set of environmental projection files on which a model is to be run. 5
6 Both types of input files should be formatted as space or tab delimited text files with or without a header line. The presence/absence file should have a row for each presence and absence location and a total of 3 columns (x y pres): the first two (x and y) defining a pair of geographic coordinates (e.g., longitude and latitude or easting and northing) and the third (pres) being either a 1 or 0, depending on the observed presence or absence of the species, respectively. Similarly, the environmental data file (as well as the environmental scenario files) should have a row for each environmental coordinate and a total of 2 + n columns (x y val 1 val 2 val n ): the first two (x and y) again corresponding to a pair of geographic coordinates and the remaining n columns (val 1 val 2 val n ) being a set of n environmental/climatic predictor variables. Although not strictly necessary, it is strongly suggested that the environmental/climatic variables be normalized before being loaded in NeuralEnsembles (e.g., by computing z-scores or by normalizing onto a 0-1 range using min and max values). When in training and projection mode, various parameter settings are available in the options setting window (Figure 1) for controlling and optimizing the architecture and training of the ANN submodels. By default, individual ANN submodels are constructed in NeuralEnsembles as fully connected, feed-forward neural networks containing a single hidden layer with ⅔(n+1) hidden units, where n is the number of input variables. The use of sigmoid transfer functions in both the hidden and output layers ensures that the outputs from the ANNs range between 0 and 1 and can thus be interpreted as conditional probability estimates of species presence (Bishop 1995). As an option, both the number of hidden layers and hidden units in each layer can be freely adjusted by the user. Additionally, instead of having a fixed architecture, a user can decide to use an evolving network structure, based on the Cascade 2 training algorithm (Fahlman et al. 6
7 1996), which iteratively adds hidden units/layers to the network in order to optimize the hidden structure of the ANN. When using an evolving topology, hidden units are added to a network according to Akaike s Information Criterion (AIC) (Ren and Zhao 2002). Although it is possible in NeuralEnsembles to train each ANN submodel independently using a simple bagging type procedure (Breiman 1996), a much more elaborate ensemble construction procedure called SECA (Granitto et al. 2005) is available and used by default. SECA, which stands for stepwise ensemble construction algorithm, attempts to optimize the performance of the entire ensemble through the sequential training and aggregation of the individual ANNs. This is accomplished by first generating for each ANN a separate calibration dataset via bootstrapping the available data, while using the compliment set of unsampled data to form a matching validation dataset. In successive fashion, each ANN is then trained until the combined error of the current ANN and any previous-stage aggregate ensemble reaches an approximate minimum 3 in terms of total error on the current calibration and validation datasets. At each new stage, only the weights of the ANN currently being added are updated in the usual manner, while the weights of any previous-stage ANNs are kept constant. Once training is complete, the newly trained ANN is combined with the previous-stage aggregate model. Ensemble models are by default constructed as a weighted average of the individual ANN submodels. While a simple unweighted average can also be used, weighting has the added benefit of putting greater weight on statistically better performing submodels, which in turn serves to increase the prediction power of the full ensemble. Per Granitto et al. s (2005) weighted version of SECA (W-SECA), individual weights are computed by normalizing the inverse squared classification error of each ANN on a full dataset of available observations. 3 Training is stopped when the combined error fails to decrease below a pre-set tolerance for a given number of training epochs or until a maximum number of training epochs has been reached. 7
8 Classification error is determined in NeuralEnsembles by means of the cross-entropy (CE) error function (Bishop 1995), making this another important distinguishing feature of the program. The most common error measure used in ANN training is the standard mean squared error (MSE). MSE derives from the maximum-likelihood principle when the target data follow a Gaussian distribution. While this is generally appropriate for regression type problems, it is obviously not the best error measure to use for binary targets like species presence/absence data. In contrast, CE, which derives from the maximum-likelihood function for Bernoulli random variables, is a more natural error measure to use when dealing with classification-type problems (e.g., presence vs absence). The main benefit of using a CE error measure is an improved level of prediction accuracy as measured by Kappa and AUC (see below). Other options include setting: (1) the number of submodels to be used in the ensemble; (2) several stopping conditions for controlling the duration of network training; (3) the type of training algorithm (standard back propagation, batch back propagation, Rprop and Quickprop) and associated learning parameter; (4) the initial random weights in the network; (5) the number of training runs for each ANN in order to minimize the error of a given submodel; and (6) random shuffling of training patterns on or off. Key outputs of the model include: (1) text files of all model results for further analysis or manipulation inside a GIS; (2) maps of observed presence and predicted suitability within the study area; (3) various statistics for evaluating a models accuracy based on discrimination ability and calibration; and (4) saved ANN parameter settings files for making any subsequent projections in projection only mode. Note that if additional environmental projection files have also been loaded, then maps of predicted suitability will also be generated for each possible scenario. 8
9 Projection outputs include for each location the mean suitability value produced by the ensemble model as well as the standard error and 95% confidence interval half-width 4 for the estimated mean. Mean suitability values ranging between 0 and 1 are calculated as an unweighted or weighted average (see above) of the individual projections produced by each ANN submodel. Additionally, a binary prediction defining a location as either suitable (1) or unsuitable (0) is produced by applying a user-specified threshold to the mean suitability value. Options for the threshold include the maximum Kappa cutoff value, the sensitivity-specificity cross-over point defined on a receiver operating characteristic (ROC) plot, or the 99%, 95% or 90% sensitivity values (see below for details). Plotting of maps (an optional setting) is carried out by running an automated program script written in R. R is a free and widely used programming language and software environment for statistical computing and graphics (R Development Core Team 2008). Consequently, map production requires that R version or higher already be installed on the user s computer (see the R website for instructions on downloading and installing). The two basic types of maps (Figure 2) produced for the study area and any environmental projection scenarios include (1) a suitability surface map showing mean suitability values for each location and (2) a suitability distribution map showing areas of potentially suitable or unsuitable habitat/bioclimatic space. For the study area, a map of observed presence locations is also plotted. As an option, the user can have the observed presence locations overlayed on the suitability distribution map in order to facilitate a simple visual inspection of model performance. Key statistical outputs include a calibration plot showing the numerical accuracy of the predicted values (Vaughan and Ormerod 2005) and the two most common measures of 4 Based on a Student s t-test statistic for ensembles with 100 submodels and standardized z-values for ensembles of size >100 submodels. 9
10 discrimination accuracy 5 used in species distribution modeling: Cohen s Kappa statistic (K) and the Area Under the receiver operating characteristic Curve (AUC). Kappa provides a measure of similarity between spatial patterns, adjusted for chance agreement (Cohen 1960). Values of Kappa range from 0, indicating no agreement between observed and projected distributions, to 1 for perfect agreement. Because Kappa must be computed given a threshold for distinguishing presence from absence points, maximum values for Kappa are calculated by iteratively adjusting the threshold from 0 to 1 in increments of AUC is determined from a plot of the Receiver Operating Characteristic (ROC) curve, which measures the model s sensitivity (the proportion of correctly predicted presences to the total number of predicted presences) versus its false positive fraction (the proportion of falsely predicted presences to the total number of predicted absences) for all possible classification thresholds. AUC provides an unbiased measure of a model s predictive accuracy that is independent of both species prevalence and classification threshold (Fielding & Bell 1997). Values for AUC range from 0.5 for models with no discrimination ability, to 1 for models with perfect discrimination. Besides reporting confidence intervals and one-tailed p-values of significance for both Kappa and AUC statistics, also provided in the statistics summary are the CE error of the full ensemble and the average CE error of the individual ANN submodels. Under fairly general conditions, it can be shown that the CE error of the full ensembles should normally be less than or equal to the average of the individual ANNs (Bishop 1995). Hence, any positive difference between the two gives a clear measure of the benefit of using an ensemble forecast compared to any single model. 5 Testing is performed by only combining predictions from networks that have not been trained on a particular input/target pattern. 10
11 To cite NeuralEnsembles or acknowledge its use, please use the following, substituting the version of the application you are using for Version 1.0 along with the appropriate access date: O Hanley, J.R NeuralEnsembles: a neural network based ensemble forecasting program for habitat and bioclimatic suitability analysis (Version 1.0). [Online] Available at: < (Access Date). ACKNOWLEDGEMENTS Partial funding for the NeuralEnsemble program was provided by the MONARCH and BRANCH projects. I would especially like to thank Daniel Oberhoff from the Fraunhofer Institute for Applied Information Technology for sharing his modified version of the FANN C library, which implements the cross-entropy error function. This has significantly added to quality of the end-product. 11
12 REFERENCES Araújo, M.B., and New, M Ensemble forecasting of species distributions. - Trends in Ecology and Evolution 22: Araújo, M.B., Pearson, R.G., Thuiller, W., Erhard, M Validation of species-climate impact models under climate change. - Global Change Biology 11: Berry, P.M., O Hanley, J.R., Thomson, C.L., Harrison, P.A, Masters, G.J. and Dawson, T.P. (eds.) Modelling Natural Resource Responses to Climate Change (MONARCH): MONARCH 3 Contract report. - UKCIP Technical Report, Oxford. Bishop, C.M Neural networks for pattern recognition. - Oxford University Press, Oxford. Breiman, L Bagging predictors. - Machine Learning 24: Cavazos, T Downscaling large-scale circulation to local winter rainfall in north-eastern Mexico. - International Journal of Climatology 17: Cohen, J A coefficient of agreement for nominal scales. - Educational and Psychological Measurement 20: Dawson, C.W. and Wilby, R.L Hydrological modelling using artificial neural networks. - Progress in Physical Geography 25: Fahlman, S.E., Baker, L.D., Boyan, J.A The cascade 2 learning architecture. - Technical Report, CMU-CS-TR96-184, Carnegie Mellon University. Fielding, A.H. and Bell, J.F A review of methods for the assessment of prediction errors in conservation presence/absence models. - Environmental Conservation 24: Fielding, A.H. and Haworth, P.F Testing the generality of bird-habitat models. - Conservation Biology 9:
13 Freeman, E.A. and Moisen, G PresenceAbsence: an R package for presence absence analysis. - Journal of Statistical Software 23: Gopal, S., and Woodcock, C Remote sensing of forest change using artificial neural networks. - IEEE Transactions of Geoscience and Remote Sensing 34: Granitto, P.M., Verdes, P.F., Ceccatto, H.A Neural network ensembles: evaluation of aggregation algorithms. - Artificial Intelligence 163: Guisan, A. and Zimmermann, N.E Predictive habitat distribution models in ecology. - Ecological Modelling 135: Lee, S., Ryu, J.H., Won, J.S., Park, H.J Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. - Engineering Geology 71: Nissen, S Fast Artificial Neural Network Library (FANN). < (March 2008). Nix, HA A biogeographic analysis of Australian Elapid Snakes. - In: Longmore, R. (ed.), Atlas of Elapid Snakes of Australia. Australian Flora and Fauna Series Number 7, Australian Government Publishing Service: Canberra, pp Pearce, J.L. and Boyce, M.S Modelling distribution and abundance with presence-only data. - Journal of Applied Ecology 43: Pearson, R.G. and Dawson, T.E Predicting the impacts of climate change on the distribution of species: are bioclimatic envelope models useful? - Global Ecology and Biogeography 12: Pearson R.G., Dawson T.E., Berry P.M., Harrison, P.A SPECIES: a spatial evaluation of climate impact on the envelope of species. - Ecological Modelling 154:
14 Phillips, S.J., Anderson, R.P., and Schapire, R.E Maximum entropy modeling of species geographic distributions. - Ecological Modelling, 190: R Development Core Team R: A language and environment for statistical computing. - R Foundation for Statistical Computing, Vienna. < (March 2008). Ren, L. and Zhao, Z An optimal neural network and concrete strength modeling. - Advances in Engineering Software 33: Segurado, P. and Araújo, M.B An evaluation of methods for modelling species distributions. - Journal of Biogeography 31: Sharkey, A.J.C Combining artificial neural nets. - Springer, London. Thuiller, W BIOMOD - optimizing predictions of species distributions and projecting potential future shifts under global change. - Global Change Biology 9: Vaughan, I.P. and Ormerod, S.J The continuing challenges of testing species distribution models. - Journal of Applied Ecology 42:
15 Figure 1. The NeuralEnsembles main graphical user interface and options settings windows. 15
16 Figure 2. Sample suitability surface and suitability distribution maps for Boloria euphrosyne (Pearl-bordered Fritillary). 16
ModEco: an integrated software package for ecological niche modeling
Ecography 33: 16, 2010 doi: 10.1111/j.1600-0587.2010.06416.x # 2010 The Authors. Journal compilation # 2010 Ecography Subject Editor: Thiago Rangel. Accepted 7 March 2010 ModEco: an integrated software
More informationA Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationDATA MINING SPECIES DISTRIBUTION AND LANDCOVER. Dawn Magness Kenai National Wildife Refuge
DATA MINING SPECIES DISTRIBUTION AND LANDCOVER Dawn Magness Kenai National Wildife Refuge Why Data Mining Random Forest Algorithm Examples from the Kenai Species Distribution Model Pattern Landcover Model
More informationSpecies Distribution Modeling
Geography Compass 4/6 (2010): 490 509, 10.1111/j.1749-8198.2010.00351.x Species Distribution Modeling Jennifer Miller* Department of Geography and the Environment, University of Texas at Austin Abstract
More informationNeural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
More informationWorking with climate data and niche modeling I. Creation of bioclimatic variables
Working with climate data and niche modeling I. Creation of bioclimatic variables Julián Ramírez-Villegas 1 and Aaron Bueno-Cabrera 2 1 International Centre for Tropical Agriculture (CIAT), Cali, Colombia,
More informationPower Prediction Analysis using Artificial Neural Network in MS Excel
Power Prediction Analysis using Artificial Neural Network in MS Excel NURHASHINMAH MAHAMAD, MUHAMAD KAMAL B. MOHAMMED AMIN Electronic System Engineering Department Malaysia Japan International Institute
More informationClassification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
More informationINTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr.
INTELLIGENT ENERGY MANAGEMENT OF ELECTRICAL POWER SYSTEMS WITH DISTRIBUTED FEEDING ON THE BASIS OF FORECASTS OF DEMAND AND GENERATION Chr. Meisenbach M. Hable G. Winkler P. Meier Technology, Laboratory
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationCross-validation of species distribution models: removing spatial sorting bias and calibration with a null model
Ecology, 93(3), 2012, pp. 679 688 Ó 2012 by the Ecological Society of America Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model ROBERT J.
More information6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
More informationAnalecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
More informationPolynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationInternational Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationArtificial Neural Network and Non-Linear Regression: A Comparative Study
International Journal of Scientific and Research Publications, Volume 2, Issue 12, December 2012 1 Artificial Neural Network and Non-Linear Regression: A Comparative Study Shraddha Srivastava 1, *, K.C.
More informationImpact of Feature Selection on the Performance of Wireless Intrusion Detection Systems
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Impact of Feature Selection on the Performance of ireless Intrusion Detection Systems
More informationTime Series Data Mining in Rainfall Forecasting Using Artificial Neural Network
Time Series Data Mining in Rainfall Forecasting Using Artificial Neural Network Prince Gupta 1, Satanand Mishra 2, S.K.Pandey 3 1,3 VNS Group, RGPV, Bhopal, 2 CSIR-AMPRI, BHOPAL prince2010.gupta@gmail.com
More informationPrediction Model for Crude Oil Price Using Artificial Neural Networks
Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks
More informationNTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling
1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information
More informationModEco Tutorial In this tutorial you will learn how to use the basic features of the ModEco Software.
ModEco Tutorial In this tutorial you will learn how to use the basic features of the ModEco Software. Contents: Getting Started Page 1 Section 1: File and Data Management Page 1 o 1.1: Loading Single Environmental
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationPredictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.
Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationCourse Syllabus. Purposes of Course:
Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationTitle: Do they? How do they? WHY do they differ? -- on finding reasons for differing performances of species distribution models.
Elith and Graham 2009 Forum piece for Ecograph Page 1 Title: Do they? How do they? WHY do they differ? -- on finding reasons for differing performances of species distribution models. Authors: Jane Elith
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationTransferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent
Ecography 30: 550 560, 2007 doi: 10.1111/j.2007.0906-7590.05102.x # 2007 The Authors. Journal compilation # 2007 Ecography Subject Editor: Miguel Araújo. Accepted 25 May 2007 Transferability and model
More informationBeating the MLB Moneyline
Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationGLOVE-BASED GESTURE RECOGNITION SYSTEM
CLAWAR 2012 Proceedings of the Fifteenth International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines, Baltimore, MD, USA, 23 26 July 2012 747 GLOVE-BASED GESTURE
More informationPoint Biserial Correlation Tests
Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
More informationFraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationHow To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
More informationPLAANN as a Classification Tool for Customer Intelligence in Banking
PLAANN as a Classification Tool for Customer Intelligence in Banking EUNITE World Competition in domain of Intelligent Technologies The Research Report Ireneusz Czarnowski and Piotr Jedrzejowicz Department
More informationMachine Learning for Medical Image Analysis. A. Criminisi & the InnerEye team @ MSRC
Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain
More informationApplication of Neural Network in User Authentication for Smart Home System
Application of Neural Network in User Authentication for Smart Home System A. Joseph, D.B.L. Bong, D.A.A. Mat Abstract Security has been an important issue and concern in the smart home systems. Smart
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationPredicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
More informationEFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationA Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
More informationNTC Project: S01-PH10 (formerly I01-P10) 1 Forecasting Women s Apparel Sales Using Mathematical Modeling
1 Forecasting Women s Apparel Sales Using Mathematical Modeling Celia Frank* 1, Balaji Vemulapalli 1, Les M. Sztandera 2, Amar Raheja 3 1 School of Textiles and Materials Technology 2 Computer Information
More informationStatistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit
Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,
More informationEvaluation & Validation: Credibility: Evaluating what has been learned
Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model
More informationFine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
More informationNeural network software tool development: exploring programming language options
INEB- PSI Technical Report 2006-1 Neural network software tool development: exploring programming language options Alexandra Oliveira aao@fe.up.pt Supervisor: Professor Joaquim Marques de Sá June 2006
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationHYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
More informationNeural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
More informationA comparison of single and multiple response machine learning algorithms for species distribution modeling
A comparison of single and multiple response machine learning algorithms for species distribution modeling Julie A. Lapidus 1 and Eli L. Moss 2 1 Scripps College, 1030 Columbia Avenue Claremont, CA 91711
More informationNeural Networks and Back Propagation Algorithm
Neural Networks and Back Propagation Algorithm Mirza Cilimkovic Institute of Technology Blanchardstown Blanchardstown Road North Dublin 15 Ireland mirzac@gmail.com Abstract Neural Networks (NN) are important
More informationA Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data
A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationIBM SPSS Neural Networks 22
IBM SPSS Neural Networks 22 Note Before using this information and the product it supports, read the information in Notices on page 21. Product Information This edition applies to version 22, release 0,
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
More informationSub-pixel mapping: A comparison of techniques
Sub-pixel mapping: A comparison of techniques Koen C. Mertens, Lieven P.C. Verbeke & Robert R. De Wulf Laboratory of Forest Management and Spatial Information Techniques, Ghent University, 9000 Gent, Belgium
More informationJournal of Optimization in Industrial Engineering 13 (2013) 49-54
Journal of Optimization in Industrial Engineering 13 (2013) 49-54 Optimization of Plastic Injection Molding Process by Combination of Artificial Neural Network and Genetic Algorithm Abstract Mohammad Saleh
More informationMERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION
MERGING BUSINESS KPIs WITH PREDICTIVE MODEL KPIs FOR BINARY CLASSIFICATION MODEL SELECTION Matthew A. Lanham & Ralph D. Badinelli Virginia Polytechnic Institute and State University Department of Business
More informationGetting Even More Out of Ensemble Selection
Getting Even More Out of Ensemble Selection Quan Sun Department of Computer Science The University of Waikato Hamilton, New Zealand qs12@cs.waikato.ac.nz ABSTRACT Ensemble Selection uses forward stepwise
More informationModeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG
Paper 3406-2015 Modeling to improve the customer unit target selection for inspections of Commercial Losses in Brazilian Electric Sector - The case CEMIG Sérgio Henrique Rodrigues Ribeiro, CEMIG; Iguatinan
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationCash Forecasting: An Application of Artificial Neural Networks in Finance
International Journal of Computer Science & Applications Vol. III, No. I, pp. 61-77 2006 Technomathematics Research Foundation Cash Forecasting: An Application of Artificial Neural Networks in Finance
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationWATER INTERACTIONS WITH ENERGY, ENVIRONMENT AND FOOD & AGRICULTURE Vol. II Spatial Data Handling and GIS - Atkinson, P.M.
SPATIAL DATA HANDLING AND GIS Atkinson, P.M. School of Geography, University of Southampton, UK Keywords: data models, data transformation, GIS cycle, sampling, GIS functionality Contents 1. Background
More informationNEURAL NETWORKS IN DATA MINING
NEURAL NETWORKS IN DATA MINING 1 DR. YASHPAL SINGH, 2 ALOK SINGH CHAUHAN 1 Reader, Bundelkhand Institute of Engineering & Technology, Jhansi, India 2 Lecturer, United Institute of Management, Allahabad,
More informationPerformance Based Evaluation of New Software Testing Using Artificial Neural Network
Performance Based Evaluation of New Software Testing Using Artificial Neural Network Jogi John 1, Mangesh Wanjari 2 1 Priyadarshini College of Engineering, Nagpur, Maharashtra, India 2 Shri Ramdeobaba
More informationSelf-Organising Data Mining
Self-Organising Data Mining F.Lemke, J.-A. Müller This paper describes the possibility to widely automate the whole knowledge discovery process by applying selforganisation and other principles, and what
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationData Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
More informationOpen Access Research on Application of Neural Network in Computer Network Security Evaluation. Shujuan Jin *
Send Orders for Reprints to reprints@benthamscience.ae 766 The Open Electrical & Electronic Engineering Journal, 2014, 8, 766-771 Open Access Research on Application of Neural Network in Computer Network
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationTIBCO Spotfire Business Author Essentials Quick Reference Guide. Table of contents:
Table of contents: Access Data for Analysis Data file types Format assumptions Data from Excel Information links Add multiple data tables Create & Interpret Visualizations Table Pie Chart Cross Table Treemap
More informationClassification and Regression by randomforest
Vol. 2/3, December 02 18 Classification and Regression by randomforest Andy Liaw and Matthew Wiener Introduction Recently there has been a lot of interest in ensemble learning methods that generate many
More informationREVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
More informationII. Methods - 2 - X (i.e. if the system is convective or not). Y = 1 X ). Usually, given these estimates, an
STORMS PREDICTION: LOGISTIC REGRESSION VS RANDOM FOREST FOR UNBALANCED DATA Anne Ruiz-Gazen Institut de Mathématiques de Toulouse and Gremaq, Université Toulouse I, France Nathalie Villa Institut de Mathématiques
More informationStock Prediction using Artificial Neural Networks
Stock Prediction using Artificial Neural Networks Abhishek Kar (Y8021), Dept. of Computer Science and Engineering, IIT Kanpur Abstract In this work we present an Artificial Neural Network approach to predict
More informationTHE RISK DISTRIBUTION CURVE AND ITS DERIVATIVES. Ralph Stern Cardiovascular Medicine University of Michigan Ann Arbor, Michigan. stern@umich.
THE RISK DISTRIBUTION CURVE AND ITS DERIVATIVES Ralph Stern Cardiovascular Medicine University of Michigan Ann Arbor, Michigan stern@umich.edu ABSTRACT Risk stratification is most directly and informatively
More informationPredictive Vegetation Modelling: Comparison of Methods, Effect of Sampling Design and Application on Different Scales
Predictive Vegetation Modelling: Comparison of Methods, Effect of Sampling Design and Application on Different Scales *************************** Dissertation zur Erlangung des akademischen Grades doctor
More information