A Maximum Entropy Approach to Species Distribution Modeling
|
|
|
- Phebe Parks
- 10 years ago
- Views:
Transcription
1 Proceedings of the Twenty-First International Conference on Machine Learning, pages , A Maximum Entropy Approach to Species Distribution Modeling Steven J. Phillips [email protected] AT&T Labs Research, 180 Park Avenue, Florham Park, NJ Miroslav Dudík [email protected] Robert E. Schapire [email protected] Princeton University, Department of Computer Science, 35 Olden Street, Princeton, NJ Abstract We study the problem of modeling species geographic distributions, a critical problem in conservation biology. We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features. We describe experiments comparing maxent with a standard distribution-modeling tool, called GARP, on a dataset containing observation data for North American breeding birds. We also study how well maxent performs as a function of the number of training examples and training time, analyze the use of regularization to avoid overfitting when the number of examples is small, and explore the interpretability of models constructed using maxent. 1. Introduction We study the problem of modeling the geographic distribution of a given animal or plant species. This is a critical problem in conservation biology: to save a threatened species, one first needs to know where the species prefers to live, and what its requirements are for survival, i.e., its ecological niche (Hutchinson, 1957). The data available for this problem typically consists of a list of georeferenced occurrence localities, i.e., a set of geographic coordinates where the species has been observed. In addition, there is data on a number of environmental variables, such as average temperature, average rainfall, elevation, etc., which have been measured or estimated across a geographic region of interest. The goal is to predict which areas within the region satisfy the requirements of the species ecological niche, and thus form part of the species potential distribution (Anderson & Martínez-Meyer, 2004). The potential distribution describes where conditions are suitable for survival of the species, and is thus of great importance for conservation. It can also be used to estimate the species realized distribution, for example by removing areas where the species is Appearing in Proceedings of the 21 st International Conference on Machine Learning, Banff, Canada, Copyright 2004 by the authors. known to be absent because of deforestation or other habitat destruction. Although a species realized distribution may exhibit some spatial correlation, the potential distribution does not, so considering spatial correlation is not necessarily desirable during species distribution modeling. It is often the case that only presence data is available indicating the occurrence of the species. Natural history museum and herbarium collections constitute the richest source of occurrence localities (Ponder et al., 2001; Stockwell & Peterson, 2002). Their collections typically have no information about the failure to observe the species at any given location; in addition, many locations have not been surveyed. In the lingo of machine learning, this means that we have only positive examples and no negative examples from which to learn. Moreover, the number of sightings (training examples) will often be very small by machinelearning standards, say a hundred or less. Thus, the first contribution of this paper is the introduction of a scientifically important problem as a challenging domain for study by the machine learning community. To address this problem, we propose the application of maximum-entropy (maxent) techniques which have been so effective in other domains, such as natural language processing (Berger et al., 1996). Briefly, in maxent, one is given a set of samples from a distribution over some space, as well as a set of features (real-valued functions) on this space. The idea of maxent is to estimate the target distribution by finding the distribution of maximum entropy (i.e., that is closest to uniform) subject to the constraint that the expected value of each feature under this estimated distribution matches its empirical average. This turns out to be equivalent, under convex duality, to finding the maximum likelihood Gibbs distribution (i.e., distribution that is exponential in a linear combination of the features). For species distribution modeling, the occurrence localities of the species serve as the sample points, the geographical region of interest is the space on which this distribution is defined, and the features are the environmental variables (or functions thereof). See Figure 1 for an example. In Section 2, we describe the basics of maxent in greater detail. Iterative scaling and its variants (Darroch & Ratcliff, 1972; Della Pietra et al., 1997) are standard algorithms for computing the maximum entropy distribution. We use our own variant which iteratively updates the weights on
2 Figure 1. Left to right: Yellow-throated Vireo training localities from the first random partition, an example environmental variable (annual average temperature, higher values in red), maxent prediction using linear, quadratic and product features, and GARP prediction. Prediction strength is shown as white (weakest) to red (strongest); reds could be interpreted as suitable conditions for the species. features sequentially (one by one) rather than in parallel (all at once), along the lines of Collins, Schapire and Singer (2002). This sequential approach is analogous to AdaBoost which modifies the weight of a single feature (usually called a base or weak classifier in that context) on each round. As in boosting, this approach allows us to use very large feature spaces. One would intuitively expect an oversize feature space to be a problem for generalization since it increases the possibility of overfitting, leading others to use feature selection for maxent (Berger et al., 1996). We instead use a regularization approach, introduced in a companion theoretical paper (Dudík et al., 2004), which allows one to prove bounds on the performance of maxent using finite data, even when the number of features is very large or even uncountably infinite. Here we investigate in detail the practical efficacy of the technique for species distribution modeling. We also describe a numerical acceleration method that speeds up learning. In Section 3, we describe an extensive set of experiments we conducted comparing maxent to a widely used existing distribution modeling algorithm; results of the experiments are described in Section 4. Quite a number of approaches have been suggested for species distribution modeling including neural nets, genetic algorithms, generalized linear models, generalized additive models, bioclimatic envelopes and more; see Elith (2002) for a comparison. From these, we selected the Genetic Algorithm for Ruleset Prediction (GARP) (Stockwell & Noble, 1992; Stockwell & Peters, 1999), because it has seen widespread recent use to study diverse topics such as global warming (Thomas et al., 2004), infectious diseases (Peterson & Shaw, 2003) and invasive species (Peterson & Robins, 2003); many further applications are cited in these references. GARP was also selected because it is one of the few methods available that does not require absence data (negative examples). We compare GARP and maxent using data derived from the North American Breeding Bird Survey (BBS) (Sauer et al., 2001), an extensive dataset consisting of thousands of occurrence localities for North American birds and used previously for species distribution modeling, in particular for evaluating GARP (Peterson, 2001). The comparison suggests that maxent methods hold great promise for species distribution modeling, often achieving substantially superior performance in controlled experiments relative to GARP. In addition to comparisons with GARP, we performed experiments testing: (1) the performance of maxent as a function of the number of sample points available, so as to determine the all important question of how much data is enough; (2) the effectiveness of regularization to avoid overfitting on small sample sizes; and (3) the effectiveness of our numerical acceleration methods. Lastly, it is desirable for a species distribution model to allow interpretation to deduce the most important limiting factors for the species. A noted limitation of GARP is the difficulty of interpreting its models (Elith, 2002). We show how the models generated by maxent can be put into a form that is easily understandable and interpretable by humans. 2. The Maximum Entropy Approach In this section, we describe our approach to modeling species distributions. As explained above, we are given a space X representing some geographic region of interest. Typically, X is a set of discrete grid cells; here we only assume that X is finite. We also are given a set of points x 1,...,x m in X, each representing a locality where the species has been observed and recorded. Finally, we are provided with a set of environmental variables defined on X, such as precipitation, elevation, etc. Given these ingredients, our goal is to estimate the range of the given species. In this paper, we formalize this rather vague goal within a probabilistic framework. Although this will inevitably involve simplifying assumptions, what we gain will be a language for defining the problem with mathematical precision as well as a sensible approach for applying machine learning. Unlike others who have studied this problem, we adopt the view that the localities x 1,...,x m were selected independently from X according to some unknown probability distribution π, and that our goal is to estimate π. At the foundation of our approach is the premise that the distribution π (or a thresholded version of it) coincides with the biologists concept of the species potential distribution. Superficially, this is not unreasonable, although it does ignore the fact that some localities are more likely to have been visited than others. The distribution π may therefore exhibit sampling bias, and will be weighted towards areas and environmental conditions that have been better sampled, for example because they are more accessible. That being said, the problem becomes one of density estimation: given x 1,...,x m chosen independently from some unknown distribution π, we must construct a distribution ˆπ that approximates π.
3 Common name Abbreviation # examples Gray Vireo GV 78 Hutton s Vireo HV 198 Plumbeous Vireo PV 256 Philadelphia Vireo PhV 325 Bell s Vireo BV 419 Cassin s Vireo CV 424 Blue-headed Vireo BhV 973 White-eyed Vireo WeV 1271 Yellow-throated Vireo YV 1611 Loggerhead Shrike LS 1850 Warbling Vireo WV 2526 Red-eyed Vireo RV 2773 Table 1. Studied species, with number of presence records In constructing ˆπ, we also make use of a given set of features f 1,...,f n where f j : X. These features might consist of the raw environmental variables, or they might be higher level features derived from them (see Section 3.3). Let f denote the vector of all n features. For any function f : X, let π[f] denote its expectation under π. Let π denote the empirical distribution, i.e., π(x) = {1 i m : x i = x} /m. In general, π may be quite distant, under any reasonable measure, from π. On the other hand, for a given function f, we do expect π[f], the empirical average of f, to be rather close to its true expectation π[f]. It is natural, therefore, to seek an approximation ˆπ under which f j s expectation is equal (or at least very close) to π[f j ] for every f j. There will typically be many distributions satisfying these constraints. The maximum entropy principle suggests that, from among all distributions satisfying these constraints, we choose the one of maximum entropy, i.e., the one that is closest to uniform. Here, as usual, the entropy of a distribution p on X is defined to be H(p) = x X p(x)ln p(x). Thus, the idea is to estimate π by the distribution ˆπ of maximum entropy subject to the condition that ˆπ[f j ] = π[f j ] for all features f j. Alternatively, we can consider all Gibbs distributions of the form q (x) = e (x) /Z where Z = x X e (x) is a normalizing constant, and λ n. Then, following Della Pietra, Della Pietra and Lafferty (1997), it can be proved that the maxent distribution described above is the same as the maximum likelihood Gibbs distribution, i.e., the distribution q that minimizes RE( π q ) where RE(p q) = x X p(x)ln(p(x)/q(x)) denotes relative entropy or Kullback-Leibler divergence. Note that the negative log likelihood π[ ln(q )] (also called log loss) only differs from RE( π q ) by the constant H( π); we therefore use the two interchangeably as objective functions A sequential-update algorithm There are a number of algorithms for finding the maxent distribution, especially iterative scaling and its variants (Darroch & Ratcliff, 1972; Della Pietra et al., 1997) as well as the gradient and second-order descent methods (Malouf, 2002; Salakhutdinov et al., 2003). In this paper, we used a sequential-update algorithm that modifies one weight λ j at a time, as explored by Collins, Schapire and Singer (2002) in a similar setting. We chose this coordinate-wise descent procedure since it is easily applicable when the number of features is very large (or infinite). Specifically, our very simple algorithm works as follows. Assume without loss of generality that each feature f j is bounded in [0,1]. On each of a sequence of rounds, we choose the feature f j to update for which RE( π[f j ] q [f j ]) is maximized, where λ is the current weight vector (and where RE(p q), for p,q, is binary relative entropy). We next update λ j λ j +α where ) ( π[fj ](1 q [f j ]) α = ln. (1) (1 π[f j ])q [f j ] The output distribution ˆπ is the one defined by the computed weights, i.e., q. Essentially, this algorithm works by altering one weight λ j at a time so as to greedily maximize the likelihood (or an approximation thereof). This procedure is guaranteed to converge to the optimal maximum entropy distribution. The derivation of this algorithm, along with its proof of convergence are given in a companion paper (Dudík et al., 2004) and are based on techniques explained by Della Pietra, Della Pietra and Lafferty (1997) as well as Collins, Schapire and Singer (2002). To accelerate convergence, we do a line search in each iteration: evaluate the log loss when λ j is incremented by 2 i α for i = 0,1,... in turn, and choose the last i before the log loss decreases. This is similar to line search methods described in (Minka, 2001) Regularization The basic approach described above computes the maximum entropy distribution ˆπ for which ˆπ[f j ] = π[f j ]. However, we do not expect π[f j ] to be equal to π[f j ] but only close to it. Therefore, in keeping with our motivation, we can soften these constraints to have the form ˆπ[f j ] π[f j ] β j (2) where β j is an estimate of how close π[f j ], being an empirical average, must be to its true expectation π[f j ]. Maximizing entropy subject to Eq. (2) turns out to be equivalent to finding the Gibbs distribution ˆπ = q which minimizes RE( π q ) + j β j λ j. (3) In other words, this approach is equivalent to maximizing the likelihood of the sought after Gibbs distribution with (weighted) l 1 -regularization. This form of regularization also makes sense because the number of training examples needed to approximate the best Gibbs distribution can be bounded when the l 1 -norm of the weight vector λ is bounded. (See (Dudík et al., 2004) for details.) In a Bayesian framework, Eq. (3) corresponds to a negative log posterior given a Laplace prior. Other priors studied for maxent are Gaussian (Chen & Rosenfeld, 2000) and exponential (Goodman, 2003). Laplace priors have been studied in the context of neural networks by Williams (1995). The regularized formulation can be solved using a simple modification of the above algorithm. On each round, a feature f j and value α are chosen so as to maximize the change in (an approximation of) the regularized objective function in Eq. (3). This works out to be α π[f j ] + ln(1 + (e α 1)q [f j ]) + β j ( λ j + α λ j ).
4 GARP threshold = 1 GARP threshold = 10 Bird Area L LQ LQP T GARP Area L LQ LQP T GARP GV HV PV PhV BV CV BhV WeV YV LS WV RV Avg Table 2. Omission rates in the equalized area test for GARP threshold of 1 (left) and 10 (right). Area column is area of species potential distribution, as produced by GARP; other predictions are thresholded to give the same predicted area. The predictions analyzed are: maxent with linear (L); linear and quadratic (LQ); linear, quadratic and product (LQP); and threshold (T) features; and GARP. Bird L LQ LQP T GARP GV HV PV PhV BV CV BhV WeV YV LS WV RV Avg Table 3. AUC values averaged over 10 random partitions of occurrence localities. Predictions analyzed are as in Table 2. true positive rate Loggerh. Sh. Yellow th. V LQP T GARP false positive rate L LQ Figure 2. ROC curves for the first random partition of occurrence localities of the Loggerhead Shrike and the Yellow-throated Vireo. In both cases, maxent with linear features is the lowest curve, GARP is the second lowest, and the remaining three are very close together. Portions of LQ and T curves are obscured by the LQP curve The maximizingα, must be either λ j or Eq. (1) with π[f j ] replaced by π[f j ] β j (provided λ j +α 0) or π[f j ]+β j (provided λ j +α 0). Thus, the best α (for a given f j ) can be computed by trying all three possibilities. Once f j and α have been selected, we only need update λ j λ j + α. As before, this algorithm can be proved to converge to a solution to the problem described above. Throughout our study we reduced the β j to a single regularization parameter β as follows. We expect π[f j ] π[f j ] σ[f j ]/ m, where σ[f j ] is the standard deviation of f j under π. We therefore approximated σ[f j ] by the sample deviation σ[f j ] and used β j = β σ[f j ]/ m. 3. Experimental Methods 3.1. The Breeding Bird Survey The North American Breeding Bird Survey (Sauer et al., 2001) is a data set with a large amount of highquality location data. It is good for a first evaluation of maxent for species distribution modeling, as the generous quantities of data allow for detailed experiments and statistical analyses. It has also been used to demonstrate the utility of GARP (Peterson, 2001). Roadside surveys are conducted on standard routes during the peak of the nesting season. Each route consists of fifty stops located at 0.5 mile intervals. A three-minute count is conducted at each stop, during which the observer records all birds heard or seen within 0.25 mile of the stop. Data from all fifty stops are combined to obtain the set of species observed on the route. There are 4161 routes within the region covered by the environmental coverages described below Environmental Variables The environmental variables (coverages) use a North American grid with 0.2 degree square cells, and are all included with the GARP distribution, available at Some coverages are derived from weather station readings during the period 1961 to 1990 (New et al., 1999). Out of these we use annual precipitation, number of wet days, average daily temperature and temperature range. The remaining coverages are derived from a digital elevation model for North America, and consist of elevation, aspect and slope. Each coverage is defined over a grid, of which 58,065 points have data for all coverages Experimental Design We chose 12 out of the 421 species included in the Breeding Bird Survey to model, and considered a route to be an occurrence locality for a species if it had a presence record for any year of the survey. The chosen species and the number of routes where each has occurrence localities
5 are shown in Table 1. The occurrence data was divided into ten random partitions: in each partition, 50% of the occurrence localities were randomly selected for the training set, while the remaining 50% were set aside for testing. We chose four feature types for maxent to use: the raw environmental variables (linear features); squares of environmental variables (quadratic features); products of pairs of environmental variables (product features); and binary features derived by thresholding environmental variables (threshold features). The latter features are equal to one if an environmental variable is above some threshold, and zero otherwise. Use of linear features constrains the mean of the environmental variables in the maxent distribution, linear plus quadratic features constrain the variance, while linear plus quadratic plus product features constrain the covariance of pairs of environmental variables. On each training set, we ran maxent with four different subsets of the feature types: linear (L); linear and quadratic (LQ); linear, quadratic and product (LQP); and threshold (T). We also ran GARP on each training set. The output of GARP and maxent have quite different interpretations. Nevertheless, each can be used to (partially) rank all locations according to their habitability. To compare these rankings, we used receiver operating characteristic (ROC) curves. For each of the runs, we calculated the AUC (area under the ROC curve), and determined the average AUC over the ten occurrence data partitions. See Section 3.5 for further discussion of this metric. The AUC comparison is somewhat biased in maxent s favor, as a continuous prediction will typically have a higher AUC than a discrete prediction. We therefore do a second comparison, where we select operating thresholds for GARP that have been widely used in practice, and compare the algorithms only at those operating points. We call this an equalized area test, and the details are as follows. We applied two thresholds to each GARP prediction, namely 1 and 10, corresponding to at least one, or all, best-subset models predicting presence (see Section 3.4 for GARP details). These are the most-often used GARP thresholds (Anderson & Martínez-Meyer, 2004). For each of the two resulting predictions, we set thresholds for the maxent models that result in prediction of the same area (geographic extent) as GARP. The predictions, now binary and with the same predicted area, are then simply compared using omission rates (fraction of test localities not predicted present). Again, averages were taken over the 10 random partitions of the occurrence data. Most applications of species distribution modeling have much less data available than for North American birds. Indeed, species of conservation importance may have extremely few georeferenced locality records, often fewer than 10. To investigate the use of maxent in such limited data settings, we perform experiments using limited subsets of the Breeding Bird data. We selected increasing subsets of training data in each partition, ran all four versions of maxent, and took an average AUC over ten partitions. In order to determine sensitivity of maxent to the value of β and its interaction with sample size, we varied β and the number of training examples and took an average AUC over ten partitions for all four versions of maxent. Lastly, to measure the effect of our acceleration method, we performed runs using the first random partition for the Loggerhead Shrike and the Yellow-throated Vireo, both with and without line search for α (as described in Section 2), and measured the log loss on both training and test data as a function of running time Algorithm implementations For the maxent runs, we ran the iterative algorithm described in Section 2 for 500 rounds, or until the change in the objective function on a single round fell below For the regularization parameter β, to avoid overfitting the test data, we used the same setting of 0.1 for all feature types, except threshold features for which we used 1.0. In Section 4.4, we describe experiments showing how sensitive our results are to the choice of β. To reduce the variability inherent in GARP s random search procedure, we made composite GARP predictions using the best-subsets procedure (Anderson et al., 2003), as was done in recent applications (Peterson et al., 2003; Raxworthy et al., 2004). We generated 100 binary models, using GARP version with default parameter values, then eliminated models with more than 5% intrinsic omission (negative prediction of training localities). If at most 10 models remained, they then constituted the best subset; otherwise, we selected the 10 models whose predicted area was closest to the median of the remaining models. The composite prediction gives the number of best-subset models in which each point is predicted suitable (0-10). For Cassin s Vireo, the best subset was empty for most random partitions of occurrence localities, so we increased the intrinsic omission threshold to 10% for that species ROC curves An ROC curve shows the performance of a classifier whose output depends on a threshold parameter. It plots true positive rate against false positive rate for each threshold. A point (x, y) indicates that for some threshold, the classifier classifies a fraction x of negative examples as positive, and a fraction y of positive examples as positive. The curve is obtained by joining the dots. The area under an ROC curve (AUC) has a natural statistical interpretation. Pick a random positive example and a random negative example. The area under the curve is the probability that the classifier correctly orders the two points (with random ordering in the case of ties). A perfect classifier therefore has an AUC of 1. However, to use ROC curves with presence-only data, we must interpret as negative examples all grid cells with no occurrence localities, even if they support good environmental conditions for the species. The maximum AUC is therefore less than one (Wiley et al., 2003), and is smaller for wider-ranging species. 4. Results 4.1. Equalized Area Test The results of the equalized area test are in Table 2. With a threshold of 1, GARP predicts large areas as having suitable conditions for the species, and all algorithms have
6 area under ROC curve (AUC) Gray Vireo Hutton s Vireo.91 Plumbeous V. Blue h. Vireo White eyed V..84 Yellow th. V..75 Loggerh. Shrike Red eyed Vireo number of training examples (m) LQ, β=0.1 LQP, β=0.1 T, β=1.0 L, β=0.1 GARP, all available training data Figure 3. Learning curves. AUC averaged over 10 partitions for four versions of maxent (L, LQ, LQP and T) as a function of the number of training examples. Numbers of training examples are plotted on a logarithmic scale. We also include the average AUC for GARP on all training examples. Curves for the remaining species look qualitatively similar. very low average omission (with the exception of GARP on Cassin s Vireo). A threshold of 10 causes less overprediction, and reveals more differences between the algorithms. The best results are obtained by maxent with two of the feature sets (LQP and T). These two are superior to GARP on all species, often very substantially; LQ is superior to GARP for all species but BhV ROC analysis Table 3 shows the AUC for each species, averaged over the 10 random partitions of the occurrence localities. Example ROC curves used in computing the averages can be seen in Figure 2, which shows the performance of the algorithms on the first random partition for the Loggerhead Shrike and Yellow-throated Vireo. The AUC for maxent improves dramatically going from linear (L) to linear plus quadratic (LQ) features, with a small further improvement when product features are added (LQP). The AUC for threshold features (T) is similar to LQP. For all species, the AUC for GARP is lower than for all maxent feature sets except sometimes L. Note that GARP is disadvantaged in AUC comparisons by not distinguishing between points in its highest rank (those points predicted present in all best-subset models), as can be seen in Figure 2, where GARP loses area at the left end of the ROC curve. Nevertheless, wherever GARP has data points, maxent with the better feature sets is quite consistently as good as or better than GARP Learning Curve Experiments Figure 3 shows the AUC averaged over 10 partitions for an increasing number of training examples on eight of the species. We also include GARP results for full training sets as a base line. As expected, models with a larger number of features tend to overfit small training sets, but they give more accurate predictions for large training sets. Linear models do not capture species distribution very well and are included only for completeness. With the exception of the Plumbeous Vireo, three remaining versions of maxent outperform L models already for the smallest training sets. LQP models become better than LQ for training examples; their performance, however, matches that of LQ already for smaller training sets. T models perform worse than both LQ and LQP for small training sets, but they slightly outperform LQP once training sets reach 400 examples. Learning curves for species with large numbers of examples indicate that for both LQ and LQP about examples suffice for a prediction that is close to optimal for those models Sensitivity to Regularization Figure 4 shows the sensitivity of maxent to the regularization value β for LQP and T versions of maxent. Due to the lack of space we do not present results for L and LQ versions, and give sensitivity curves for only four species. Curves for the remaining species look qualitatively similar. Note the remarkably consistent peak at β 1.0 for threshold feature curves; theoretical reasons for this phenomenon require further investigation. For LQP runs, peaks are much less pronounced and do not appear at the same value of β across different species. Benefits of regularization in LQP runs diminish as the number of training examples increases (this is even more so for LQ and L runs, not presented here). This is because the relatively small number of features (compared with threshold features) naturally prevents overfitting large training sets Feature Profiles Maxent as we have described it returns a vector λ that characterizes the Gibbs distribution q (x) = e (x) /Z minimizing the (regularized) log loss. When each feature is derived from one environmental variable then the linear
7 area under ROC curve (AUC) Threshold features Hutton s Vireo Blue h. Vireo.75 Yellow th. V. Loggerh. S Linear, quadratic and product features Hutton s Vireo.82 Blue h. Vireo Yellow th. V..76 Loggerh. S regularization value (β) 10 and 100 training examples 32 and 316 training examples Figure 4. Sensitivity of maxent to regularization. AUC averaged over 10 partitions as a function of β for a varying number of training examples. For a fixed value of β, maxent finds better solutions (with a larger AUC) as the number of examples grows. We ran maxent with 10, 32, 100 and 316 training examples. Curves from bottom up correspond to these numbers; curves for higher numbers are missing where fewer training examples were available. Values of β are of the form 10 i/6 and are plotted on a logarithmic scale. additive contribution to exponent dtr6190 ann h aspect h dem h slope pre6190 ann tmp6190 ann wet6190 ann value of environmental variable threshold, β=1.0 linear+quadratic, β=0.1 threshold, β=0.01 Figure 5. Feature profiles learned on the first partition of the Yellow-throated Vireo. For every environmental variable, its additive contribution to the exponent of the Gibbs distribution is given as a function of its value. This contribution is the sum of features derived from that variable weighted by the corresponding lambdas. Profiles for three types of maxent runs have been shifted for clarity this corresponds to adding a constant in the exponent; it has, however, no effect on the resulting model since constants in the exponent cancel out with the normalization factor. combination in the exponent of q can be decomposed into a sum of terms each of which depends on a single environmental variable. Plotting the value of each term as a function of the corresponding environmental variable we obtain feature profiles for the respective variables. This decomposition can be carried out for L, LQ and T models, but not for LQP models. Note that adding a constant to a profile has no impact on the resulting distribution as constants in the exponent cancel out with Z. For L models profiles are linear functions, for LQ models profiles are quadratic functions, and for T models profiles can be arbitrary step functions. These profiles provide an easier to understand characterization of the distribution than the vector λ. Figure 5 shows feature profiles for an LQ run on the first partition of the Yellow-throated Vireo and two T runs with different values of β. The value of β = 0.01 only prevents components of λ from becoming extremely large, but it does little to prevent heavy overfitting with numerous peaks capturing single training examples. Raising β to 1.0 completely eliminates these peaks. This is especially prominent for the aspect variable where the regularized T as well as the LQ model show no dependence while the insufficiently regularized T model overfits heavily. Note the rough agreement between LQ profiles and regularized T profiles. Peaks in these profiles can be interpreted as intervals of environmental conditions favored by a species. However, from a flat profile we may not conclude that the species distribution does not depend on the corresponding variable since variables may be correlated and maxent will sometimes pick only one of the correlated variables Acceleration For the LQP version of maxent, line search on α substantially accelerated convergence when meaured in terms of log loss both on training and on test data. Log loss on test data in the first partition decreased with running time (measured on a 1GHz Pentium) as follows: Bird Line search? 10s 50s 100s 300s LS no LS yes YV no YV yes The observed acceleration is similar to that obtained by Goodman (2002). Line search made no discernible dif-
8 ference for threshold features. Indeed, while there is an approximation made in the derivation of α in Sections 2 and 2.2, the derivation is exact for binary features, hence line search is not needed. Maxent was much faster with threshold features: log loss was within.001 of convergence in at most 50 seconds for both species. 5. Conclusions Species distribution modeling represents a scientifically important area that deserves the attention of the machine learning community while presenting it with some interesting challenges. In this work, we have shown how to use maxent to predict species distributions. Maxent only requires positive examples, and in our study, is substantially superior to the standard method, performing well with fairly few examples, particularly when regularization is employed. The models generated by maxent have a natural probabilistic interpretation, giving a smooth gradation from most to least suitable conditions. We have also shown that the models can be easily interpreted by human experts, a property of great practical importance. While maxent fits the problem of species distribution modeling cleanly and effectively, there are many other techniques that could be used such as Markov random fields or mixture models. Alternatively, some of our assumptions could be relaxed, mainly that of the independence of sampling. In our future work, we plan to address sampling bias and include it in the maxent framework in a principled manner. We leave the question of alternative techniques to attack this problem open for future research. Acknowledgments Thanks to Rob Anderson, Michael Collins, Ned Horning, Claire Kremen, Chris Raxworthy, Sacha Spector and Eleanor Sterling for numerous helpful discussions. R. Schapire and M. Dudík received support through NSF grant CCR Part of this work was completed while R. Schapire was employed by AT&T Labs Research. M. Dudík was partially supported by a Gordon Wu fellowship. References Anderson, R. P., Lew, D., & Peterson, A. T. (2003). Evaluating predictive models of species distributions: Criteria for selecting optimal models. Ecological Modelling, 162, Anderson, R. P., & Martínez-Meyer, E. (2004). Modeling species geographic distributions for preliminary conservation assessments: an implementation with the spiny pocket mice (Heteromys) of Ecuador. Biological Conservation, 116, Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, Chen, S. F., & Rosenfeld, R. (2000). A survey of smoothing techniques for ME models. IEEE Trans. on Speech and Audio Processing, 8, Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances. Machine Learning, 48, Darroch, J. N., & Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. The Annals of Math. Statistics, 43, Della Pietra, S., Della Pietra, V., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, Dudík, M., Phillips, S. J., & Schapire, R. E. (2004). Performance guarantees for regularized maximum entropy density estimation. Proceedings of the 17th Annual Conference on Computational Learning Theory. Elith, J. (2002). Quantitative methods for modeling species habitat: Comparative performance and an application to Australian plants. In S. Ferson and M. Burgman (Eds.), Quantitative methods for conservation biology, New York: Springer-Verlag. Goodman, J. (2002). Sequential conditional generalized iterative scaling. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 9 16). Goodman, J. (2003). Exponential priors for maximum entropy models (Technical Report). Microsoft Research. (Available from joshuago/longexponentialprior.ps). Hutchinson, G. E. (1957). Concluding remarks. Cold Spring Harbor Symposia on Quantitative Biology, 22, Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Conference on Natural Language Learning (pp ). Minka, T. (2001). Algorithms for maximum-likelihood logistic regression (Technical Report). CMU CALD. (Available from minka/papers/logreg.html). New, M., Hulme, M., & Jones, P. (1999). Representing twentieth-century space-time climate variability. Part 1: Development of a mean monthly terrestrial climatology. Journal of Climate, 12, Peterson, A. T. (2001). Predicting species geographic distributions based on ecological niche modeling. The Condor, 103, Peterson, A. T., Papes, M., & Kluza, D. A. (2003). Predicting the potential invasive distributions of four alien plant species in North America. Weed Science, 51, Peterson, A. T., & Robins, C. R. (2003). Using ecological-niche modeling to predict barred owl invasions with implications for spotted owl conservation. Conservation Biology, 17, Peterson, A. T., & Shaw, J. (2003). Lutzomyia vectors for cutaneous leishmaniasis in southern Brazil: ecological niche models, predicted geographic distribution, and climate change effects. International Journal of Parasitology, 33, Ponder, W. F., Carter, G. A., Flemons, P., & Chapman, R. R. (2001). Evaluation of museum collection data for use in biodiversity assessment. Conservation Biology, 15, Raxworthy, C. J., Martinez-Meyer, E., Horning, N., Nussbaum, R. A., Schneider, G. E., Ortega-Huerta, M. A., & Peterson, A. T. (2004). Predicting distributions of known and unknown reptile species in Madagascar. Nature, 426, Salakhutdinov, R., Roweis, S. T., & Ghahramani, Z. (2003). On the convergence of bound optimization algorithms. Uncertainty in Artificial Intelligence 19 (pp ). Sauer, J. R., Hines, J. E., & Fallon, J. (2001). The North American breeding bird survey, results and analysis , Version USGS Patuxent Wildlife Research Center, Laurel, MD. Stockwell, D., & Peters, D. (1999). The GARP modelling system: problems and solutions to automated spatial prediction. International Journal of Geographical Information Science, 13, Stockwell, D. R. B., & Noble, I. R. (1992). Induction of sets of rules from animal distribution data: a robust and informative method of data analysis. Mathematics and Computers in Simulation, 33, Stockwell, D. R. B., & Peterson, A. T. (2002). Controlling bias in biodiversity data. In J. M. Scott, P. J. Heglund, M. L. Morrison, J. B. Haufler, M. G. Raphael, W. A. Wall and F. B. Samson (Eds.), Predicting species occurrences: Issues of accuracy and scale, Washington, DC: Island Press. Thomas, C. D., Cameron, A., Green, R. E., Bakkenes, M., Beaumont, L. J., Collingham, Y. C., Erasmus, B. F. N., de Siqueira, M. F., Grainger, A., Hannah, L., Hughes, L., Huntley, B., van Jaarsveld, A. S., Midgley, G. F., Miles, L., Ortega-Huerta, M. A., Peterson, A. T., Phillips, O. L., & Williams, S. E. (2004). Extinction risk from climate change. Nature, 427, Wiley, E. O., McNyset, K. M., Peterson, A. T., Robins, C. R., & Stewart, A. M. (2003). Niche modeling and geographic range predictions in the marine environment using a machine-learning algorithm. Oceanography, 16, Williams, P. M. (1995). Bayesian regularization and pruning using a Laplace prior. Neural Computation, 7,
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
On Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: [email protected] Fabrizio Smeraldi School of
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
Maximum entropy modeling of species geographic distributions
Ecological Modelling 190 (2006) 231 259 Maximum entropy modeling of species geographic distributions Steven J. Phillips a,, Robert P. Anderson b,c, Robert E. Schapire d a AT&T Labs-Research, 180 Park Avenue,
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Bootstrapping Big Data
Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu
How Boosting the Margin Can Also Boost Classifier Complexity
Lev Reyzin [email protected] Yale University, Department of Computer Science, 51 Prospect Street, New Haven, CT 652, USA Robert E. Schapire [email protected] Princeton University, Department
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
Black Tern Distribution Modeling
Black Tern Distribution Modeling Scientific Name: Chlidonias niger Distribution Status: Migratory Summer Breeder State Rank: S3B Global Rank: G4 Inductive Modeling Model Created By: Joy Ritter Model Creation
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
Piping Plover Distribution Modeling
Piping Plover Distribution Modeling Scientific Name: Charadrius melodus Distribution Status: Migratory Summer Breeder State Rank: S2B Global Rank: G3 Inductive Modeling Model Created By: Joy Ritter Model
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
A Learning Algorithm For Neural Network Ensembles
A Learning Algorithm For Neural Network Ensembles H. D. Navone, P. M. Granitto, P. F. Verdes and H. A. Ceccatto Instituto de Física Rosario (CONICET-UNR) Blvd. 27 de Febrero 210 Bis, 2000 Rosario. República
Data Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
Burrowing Owl Distribution Modeling
Burrowing Owl Distribution Modeling Scientific Name: Athene cunicularia Distribution Status: Migratory Summer Breeder State Rank: S3B Global Rank: G4 Inductive Modeling Model Created By: Joy Ritter Model
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Stock Option Pricing Using Bayes Filters
Stock Option Pricing Using Bayes Filters Lin Liao [email protected] Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this
Chapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
Knowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Evaluating the Accuracy of a Classifier Holdout, random subsampling, crossvalidation, and the bootstrap are common techniques for
Probabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Chapter 11 Boosting. Xiaogang Su Department of Statistics University of Central Florida - 1 -
Chapter 11 Boosting Xiaogang Su Department of Statistics University of Central Florida - 1 - Perturb and Combine (P&C) Methods have been devised to take advantage of the instability of trees to create
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Machine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
Globally Optimal Crowdsourcing Quality Management
Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University [email protected] Aditya G. Parameswaran University of Illinois (UIUC) [email protected] Jennifer Widom Stanford
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS
OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration
Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the
REVIEW OF ENSEMBLE CLASSIFICATION
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 2, Issue.
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
ModEco: an integrated software package for ecological niche modeling
Ecography 33: 16, 2010 doi: 10.1111/j.1600-0587.2010.06416.x # 2010 The Authors. Journal compilation # 2010 Ecography Subject Editor: Thiago Rangel. Accepted 7 March 2010 ModEco: an integrated software
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Data Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trakovski [email protected] Neural Networks 2 Neural Networks Analogy to biological neural systems, the most robust learning systems
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
The Best of Both Worlds:
The Best of Both Worlds: A Hybrid Approach to Calculating Value at Risk Jacob Boudoukh 1, Matthew Richardson and Robert F. Whitelaw Stern School of Business, NYU The hybrid approach combines the two most
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
Multiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
Introduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
Tutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition
CNFSAT: Predictive Models, Dimensional Reduction, and Phase Transition Neil P. Slagle College of Computing Georgia Institute of Technology Atlanta, GA [email protected] Abstract CNFSAT embodies the P
Fraud Detection for Online Retail using Random Forests
Fraud Detection for Online Retail using Random Forests Eric Altendorf, Peter Brende, Josh Daniel, Laurent Lessard Abstract As online commerce becomes more common, fraud is an increasingly important concern.
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
CHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
The Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Logistic Regression for Spam Filtering
Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Algorithmic Trading Session 1 Introduction. Oliver Steinki, CFA, FRM
Algorithmic Trading Session 1 Introduction Oliver Steinki, CFA, FRM Outline An Introduction to Algorithmic Trading Definition, Research Areas, Relevance and Applications General Trading Overview Goals
The CUSUM algorithm a small review. Pierre Granjon
The CUSUM algorithm a small review Pierre Granjon June, 1 Contents 1 The CUSUM algorithm 1.1 Algorithm............................... 1.1.1 The problem......................... 1.1. The different steps......................
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Advanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Conditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
CS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent
Ecography 30: 550 560, 2007 doi: 10.1111/j.2007.0906-7590.05102.x # 2007 The Authors. Journal compilation # 2007 Ecography Subject Editor: Miguel Araújo. Accepted 25 May 2007 Transferability and model
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED Omar Akchiche 1 and Omar Khadir 2 1,2 Laboratory of Mathematics, Cryptography and Mechanics, Fstm, University of Hassan II Mohammedia-Casablanca,
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
GEOENGINE MSc in Geomatics Engineering (Master Thesis) Anamelechi, Falasy Ebere
Master s Thesis: ANAMELECHI, FALASY EBERE Analysis of a Raster DEM Creation for a Farm Management Information System based on GNSS and Total Station Coordinates Duration of the Thesis: 6 Months Completion
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean
Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen
6.2.8 Neural networks for data mining
6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural
