Predicting the Next Pitch
|
|
|
- Rodger Bates
- 9 years ago
- Views:
Transcription
1 Predicting the Next Pitch Gartheeban Ganeshapillai, John Guttag Massachusetts Institute of Technology, Cambridge, MA, USA, Abstract If a batter can correctly anticipate the next pitch type, he is in a better position to attack it. That is why batteries worry about having signs stolen or becoming too predictable in their pitch selection. In this paper, we present a machine-learning based predictor of the next pitch type. This predictor incorporates information that is available to a batter such as the count, the current game state, the pitcher s tendency to throw a particular type of pitch, etc. We use a linear support vector machine with soft-margin to build a separate predictor for each pitcher, and use the weights of the linear classifier to interpret the importance of each feature. We evaluated our method using the STATS Inc. pitch dataset, which contains a record of each pitch thrown in both the regular and post seasons. Our classifiers predict the next pitch more accurately than a naïve classifier that always predicts the pitch most commonly thrown by that pitcher. When our classifiers were trained on data from 2008 and tested on data from 2009, they provided a mean improvement on predicting fastballs of 18% and a maximum improvement of 311%. The most useful features in predicting the next pitch were Pitcher/Batter prior, Pitcher/Count prior, the previous pitch, and the score of the game. 1 Introduction Batters often sit on a pitch. That is to say, they guess that a pitch will have certain attributes, e.g., be a fastball, and prepare themselves to swing if their guess is correct and not swing otherwise [1][2]. In this paper, we present a pitcher-specific machine-learning based system for predicting whether the next pitch will be of a specific type (e.g., a fastball). This predictor incorporates some information about the current at bat and game situation (e.g., the previous pitch, the count, the current score differential, and the current base runners) [3] and some information about the pitcher s tendency to throw a pitch (prior) with that property in various situations. We evaluated our method on an MLB STATS Inc. dataset, which contains a record of each pitch thrown in both the regular and post seasons [4]. We trained our model using the data from 2008, and tested it on the data from For predicting whether the next pitch would be a fastball, our classifier predicts, on average, 7 of the pitches accurately. This represents a mean improvement of 18% over a naïve model that uses the pitcher s historical fastball frequency to predict the next pitch
2 type. For the 187 pitchers whose historical fastball pitch frequency was between 3 and 7, the mean improvement is 21%. For the 100 pitchers for whom the algorithm performed the best, the improvement averaged 43.5%. 2 Method We pose the prediction of the next pitch as a binary classification problem. For instance, we try to predict whether the next pitch is a fastball or not, and use a binary classifier to model it. In Section 3, we present the results for six different pitch types. However, for simplicity of explanation, we will talk only about predicting fastballs in this section. Table 1 lists the features in the feature vector that forms the independent variable used to predict the dependent variable, i.e., the next pitch. Table 1. Feature Vector Game performance Game state Prior probability Support of the priors Batter Profile Previous Pitch Gradient over last 3 pitches Balls Inning Home team Home team Slugging Percentage Pitch Type Velocity Strikes Handedness Batting team Batting team Runs Outs Score differential Bases loaded Number of Pitches thrown Count Batter Defense formation Count Batter Defense formation For each pitch class Runs Slugging percentage Pitch Result Velocity Vertical Zone Horizontal Zone Vertical Zone Horizontal Zone Handedness denotes whether the pitcher and batter are of the same handedness. Prior probabilities are computed for all pitcher/variable pairs (e.g., pitcher/home team and pitcher/batter pairs). Score differential is the absolute value of the difference between the runs scored by both teams. We build a separate model for each pitcher and train it using historical data. For each pitch thrown by that pitcher we derive a feature vector and associate a binary label (true for fastball and false for nonfastball) with that sample. The samples described by these feature vectors are not perfectly linearly separable. That is to say there does not exist a hyperplane such that all of the samples with a positive label lie on one side of the hyperplane and all of the samples with a negative label lie on the other. At first blush, this suggests that one should use a non-linear classifier, e.g., a support vector machine with a Gaussian kernel. Unfortunately, however, such classifiers produce a model that is difficult to interpret, i.e., it is hard to tell which features contribute most to the result. Consequently, we use a linear support vector machine classifier with soft-margin [5] to build our models. A support vector machine is a classifier that, given a set of training samples marked as belonging to one of two categories, builds a model that assigns new samples into one category or the other. It constructs a hyperplane that separates the samples belonging to different categories. It minimizes the generalization error of the classifier by maximizing the distance to the hyperplane from the nearest
3 X2 (Feature 2) Separating Hyperplane samples on either side. The samples that define the hyperplane of the classifier are called support vectors (SV), and hence the classifier is named a support vector machine (see figure 1) [5]. When the samples are not linearly separable, a linear support vector machine with soft margin will choose a hyperplane that splits the examples as cleanly as possible, while still maximizing the distance to the nearest examples (see figure 1). Soft M argin Support Vectors Sometimes, the values of the features such as pitcher/batter prior probability might be empty or unreliable because of small support, e.g., a Negative Positive particular pitcher might not have faced a SampleMachine and soft-margin Figure 1. Support Vector particular batter, or thrown only a few pitches to that batter. In such cases, the prior probability may not be meaningful. In such situations, values with low supports can be improved by shrinkage towards the global average [7]. The global average can be obtained from the pitcher s overall prior probability. For the pitcher-variable prior s, global average p, support n, and some constant β, the shrunk pitcher-variable prior ŝ is given by 𝑛. 𝑠 + 𝛽. 𝑝 ŝ = 𝑛 + 𝛽 X1 (Feature 1) 3 Results We use SVM-Light tool to build our models [6]. We use the records from year 2008 to train our model, and the records from 2009 to test. We compare our method s accuracy (Ao) against the accuracy of a naïve model (An) that uses each pitcher s prior probability, i.e., the likelihood from his history. We measure the usefulness of our method by improvement, I over the pitcher s prior. 𝐼= 𝐴! 𝐴! 10 𝐴! 3.1 Predicting the next pitch type We train a binary classifier to predict whether the next pitch is going to be a fastball or not. We consider those 359 pitchers who threw at least 300 pitches in both 2008 and The average accuracy of our model is 7, compared to the naïve model s accuracy of 59.5%. Average improvement on a by pitcher basis was 18% across all the pitchers. On pitchers with a prior probability of throwing a fastball between 0.3 and 0.7, the average improvement is 21%, and for the 100 pitchers for whom we got the highest improvement the average improvement is 43.5%. For pitchers with a very high prior there is not much room for improvement. For example, for Mariano Rivera our model improved the accuracy from 92.1% (naïve model) to 94.1%. Since our model considers several factors, it performs well even when pitchers change their dominant pitch types. For instance, even when the frequency of fastballs thrown by Andy Sonnanstine changed from 61% in 2008
4 to 7% in 2009, our model achieved 32% accuracy in predicting the next pitch, where as the naive model managed only 8% accuracy. Table 2. Performance on specific pitchers. Name Greatest Improvement Highest Accuracy Least Accuracy Andy Sonnanstine Brian Bannister Miguel Batista Scott Feldman Rafael Perez Francisco Rodriguez Kyle McClellan Nick Blackburn Mariano Rivera Tim Wakefield Mark DiFelice Bartolo Colon Roy Corcoran Matt Thornton Aaron Cook Jesus Colome Andy Sonnanstine Brian Bannister Chad Durbin Francisco Cordero Jason Jennings Braden Looper Darren Oliver Cliff Lee ERA (2009) Pitches Training Test Accuracy Ao An 32% 8% 47% 16% 72% 36% 65% 35% 73% 4 71% 44% 76% 49% 62% 41% 94% 92% 93% 89% 92% 92% % 88% 88% 88% 86% 86% 84% 79% 32% 8% 47% 16% 56% 49% 58% 39% 51% 48% 47% I 311% 196% 10 85% 79% 62% 53% 52% 1% 4% 1% 6% 311% 196% 16% 48% 14% 22% 27% 3.2 Most useful predictors In our pitcher-specific model, we use a linear classifier. A linear classifier s weights can be interpreted to identify the most useful predictors. Figure 2 shows the distribution of weights (mean and standard deviation) across pitchers, and Table 3 lists the top 12 predictors and their mean weights across all the pitchers. Notice that home team does not appear in this table, suggesting that the stadium in which the game is played does not seem to have much impact on pitch selection.
5 Table 2. Top 12 predictors Weight Normalized Weight Predictor Pitcher Batter prior Shrunk Pitcher Batter prior Pitcher Count prior Shrunk Pitcher Count prior Previous pitch s velocity Velocity Gradient Previous pitch type Inning Outs Score difference Bases occupied Linear Classifier s weights Feature ID Figure 2. Distribution of classifier weights 3.4 Predictability Next, we test the pitchers predictabilities in various situations. We use our method s accuracy to quantify the predictability of a pitcher under various circumstances. Figure 3. Predictability against count First, we look at count (balls, strikes) ordered by its favorability towards the batter. Figure 3 is the scatter plot of the accuracy against the counts across pitchers with the mean and standard deviation plotted on top. This pattern shows that pitchers are more predictable, at less favorable counts (to the pitcher) such as (3,0) compared to more favorable counts such as (1,2). Next, we investigate whether a high score differential makes a pitcher more or less predictable. We look at two cases: the score differential is less than three (low), and the score differential greater than three (high). We observe a pattern when we consider their variations against the inning (Figure 4). At the beginning of the game (before the 4th inning), accuracy is statistically significantly different (Pvalue 0.01) between these cases, i.e., the pitcher is less predictable when the score difference is high. This difference, however, disappears in the latter innings. This suggests that perhaps starting pitchers are more likely to vary their pitch selection based on the score than are relievers. Since there is not much room for improvement on pitchers with a high prior, we can expect the improvement to be low for these pitchers. However, the improvement has a clear downward trend against the pitcher s prior, starting as early as at 0.5 (Figure 5). A plausible explanation is that the pitchers with a dominant pitch type trust that pitch, and don t change their pitch selection based on other factors. Hence, the improvement is low for these pitchers.
6 Figure 4. Influence of score differential and innings on accuracy Figure 5. Relationship between prior and improvement We didn t observe any significant patterns on the predictability against other features such as batters OPS, home team, or defense configuration. Our model s performance is independent of pitcher s ERA. Further, the mean accuracy was independent of both whether the pitcher was a starter, closer, or other relievers and of the number of pitches per pitcher in the training set. 3.3 Other pitch types We also tested our method for other pitch types. Table 4 compares the improvement in the accuracy achieved by our model on different pitch types. Here, n is the number of pitchers. There is no pitcher with prior probability between 0.3 and 0.7 for sinker or knuckleball. Table 3. Comparison with other pitch types Pitch Type Fastball Changeup Slider Curve Split-Finger or Forkball Cut Fastball Prior between 0.3 and 0.7 n Ao An I % 57.9% 19.5% % 64.8% 3.9% % 61.5% 6.3% % 64.3% 2.5% % 68. 8% 9 56% 54.4% 3.4% Prior between 0.4 and 0.6 n I % % % % % 4 Conclusion While most pitchers have a dominant pitch, there are clearly other factors that influence their pitch selection. For many pitchers these factors can be used to significantly improve one s ability to predict the type of the next pitch. We make no claim for the optimality of our choice of features. In fact, we expect that further study will lead to feature vectors that yield better performance.
7 Acknowledgement We would like to thank STATS Inc. for providing us with the data. This work was supported by Quanta Computers Inc. References [1] Smith, David W. Do Batters Learn During a Game? Web. June 7, < September 8, 2010> [2] Laurila, David. Prospectus Q & A: Joe Mauer. Baseball Prospectus. 8 July Web. 5 Jan < [3] Appleman, David. Pitch Type Linear Weights. Fangraphs. 20 May Web. 5 Jan < [4] STATS. Inc. Web. 5 Jan < [5] Mukherjee. S and Vapnik. V, "Multivariate density estimation: A support vector machine approach". Technical Report, AI Memo 1653, MIT AI Lab [6] T. Joachims, et al, Making large-scale SVM Learning Practical, "Advances in Kernel Methods - Support Vector Learning", MIT-Press, [7] Robert M. Bell and Yehuda Koren, "Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights". In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining (ICDM '07).
Maximizing Precision of Hit Predictions in Baseball
Maximizing Precision of Hit Predictions in Baseball Jason Clavelli [email protected] Joel Gottsegen [email protected] December 13, 2013 Introduction In recent years, there has been increasing interest
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
Beating the MLB Moneyline
Beating the MLB Moneyline Leland Chen [email protected] Andrew He [email protected] 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series
SUNSET PARK LITTLE LEAGUE S GUIDE TO SCOREKEEPING. By Frank Elsasser (with materials compiled from various internet sites)
SUNSET PARK LITTLE LEAGUE S GUIDE TO SCOREKEEPING By Frank Elsasser (with materials compiled from various internet sites) Scorekeeping, especially for a Little League baseball game, is both fun and simple.
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
The Effects of Atmospheric Conditions on Pitchers
The Effects of Atmospheric Conditions on Rodney Paul Syracuse University Matt Filippi Syracuse University Greg Ackerman Syracuse University Zack Albright Syracuse University Andrew Weinbach Coastal Carolina
Data Mining in Sports Analytics. Salford Systems Dan Steinberg Mikhail Golovnya
Data Mining in Sports Analytics Salford Systems Dan Steinberg Mikhail Golovnya Data Mining Defined Data mining is the search for patterns in data using modern highly automated, computer intensive methods
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
Causal Inference and Major League Baseball
Causal Inference and Major League Baseball Jamie Thornton Department of Mathematical Sciences Montana State University May 4, 2012 A writing project submitted in partial fulfillment of the requirements
Early defect identification of semiconductor processes using machine learning
STANFORD UNIVERISTY MACHINE LEARNING CS229 Early defect identification of semiconductor processes using machine learning Friday, December 16, 2011 Authors: Saul ROSA Anton VLADIMIROV Professor: Dr. Andrew
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
How to increase Bat Speed & Bat Quickness / Acceleration
How to increase Bat Speed & Bat Quickness / Acceleration What is Bat Speed? Bat Speed: Bat speed is measured in miles per hour (MPH) and considers only the highest speed of the bat head (peak velocity)
Suggested Practice Plan Rookie and Tee Ball
Suggested Practice Plan Rookie and Tee Ball Table of Contents General Coaching Tips --------------------------------------------------------- 2 Stretching Exercises ------------------------------------------------------------
Predicting Flight Delays
Predicting Flight Delays Dieterich Lawson [email protected] William Castillo [email protected] Introduction Every year approximately 20% of airline flights are delayed or cancelled, costing
A Guide to Baseball Scorekeeping
A Guide to Baseball Scorekeeping Keeping score for Claremont Little League Spring 2015 Randy Swift [email protected] Paul Dickson opens his book, The Joy of Keeping Score: The baseball world is divided into
FAST INVERSE SQUARE ROOT
FAST INVERSE SQUARE ROOT CHRIS LOMONT Abstract. Computing reciprocal square roots is necessary in many applications, such as vector normalization in video games. Often, some loss of precision is acceptable
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model
AI TERM PROJECT GROUP 14 1 Anti-Spam Filter Based on,, and model Yun-Nung Chen, Che-An Lu, Chao-Yu Huang Abstract spam email filters are a well-known and powerful type of filters. We construct different
Acceleration Introduction: Objectives: Methods:
Acceleration Introduction: Acceleration is defined as the rate of change of velocity with respect to time, thus the concepts of velocity also apply to acceleration. In the velocity-time graph, acceleration
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
Machine Learning Final Project Spam Email Filtering
Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE
LCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,
LINEAR INEQUALITIES When we use the equal sign in an equation we are stating that both sides of the equation are equal to each other. In an inequality, we are stating that both sides of the equation are
Author Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
Q1. The game is ready to start and not all my girls are here, what do I do?
Frequently Asked Softball Questions JGSL.ORG Q1. The game is ready to start and not all my girls are here, what do I do? A1. The game begins at the scheduled starting time. If you have less than seven
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
An Exploration into the Relationship of MLB Player Salary and Performance
1 Nicholas Dorsey Applied Econometrics 4/30/2015 An Exploration into the Relationship of MLB Player Salary and Performance Statement of the Problem: When considering professionals sports leagues, the common
Forecasting and Hedging in the Foreign Exchange Markets
Christian Ullrich Forecasting and Hedging in the Foreign Exchange Markets 4u Springer Contents Part I Introduction 1 Motivation 3 2 Analytical Outlook 7 2.1 Foreign Exchange Market Predictability 7 2.2
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
SVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
Tee Ball Practice Plans and Drills
Tee Ball Practice Plans and Drills Introduction: Whether you are a parent whose child is about to start Tee Ball for the first time or you are about to take on the responsibility of coaching a Tee Ball
How to Win at the Track
How to Win at the Track Cary Kempston [email protected] Friday, December 14, 2007 1 Introduction Gambling on horse races is done according to a pari-mutuel betting system. All of the money is pooled,
Table of Contents. Stretching Exercises --------------------------------------------------------------------5
Table of Contents General Coaching Tips --------------------------------------------------------------------3 Basic Practice Set-Up --------------------------------------------------------------------4
INTANGIBLES. Big-League Stories and Strategies for Winning the Mental Game in Baseball and in Life
Big-League Stories and Strategies for Winning the Mental Game in Baseball and in Life These Baseball IQ test forms are meant as a supplement to your book purchase. It was important to us to provide you
Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
Evaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Statistical Learning for Short-Term Photovoltaic Power Predictions
Statistical Learning for Short-Term Photovoltaic Power Predictions Björn Wolff 1, Elke Lorenz 2, Oliver Kramer 1 1 Department of Computing Science 2 Institute of Physics, Energy and Semiconductor Research
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Pre-Season Pitching Program
Pre-Season Pitching Program One of the biggest challenges facing a coach is determining the best way to prepare pitchers for the opening of practice or tryouts. In professional baseball, managers pretty
2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)
Exam Name TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. 1) Regression is always a superior forecasting method to exponential smoothing, so regression should be used
SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH
1 SURVIVABILITY OF COMPLEX SYSTEM SUPPORT VECTOR MACHINE BASED APPROACH Y, HONG, N. GAUTAM, S. R. T. KUMARA, A. SURANA, H. GUPTA, S. LEE, V. NARAYANAN, H. THADAKAMALLA The Dept. of Industrial Engineering,
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Unit 9 Describing Relationships in Scatter Plots and Line Graphs
Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear
Supervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
Chapter 9 Descriptive Statistics for Bivariate Data
9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)
Cross-Validation. Synonyms Rotation estimation
Comp. by: BVijayalakshmiGalleys0000875816 Date:6/11/08 Time:19:52:53 Stage:First Proof C PAYAM REFAEILZADEH, LEI TANG, HUAN LIU Arizona State University Synonyms Rotation estimation Definition is a statistical
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Employer Health Insurance Premium Prediction Elliott Lui
Employer Health Insurance Premium Prediction Elliott Lui 1 Introduction The US spends 15.2% of its GDP on health care, more than any other country, and the cost of health insurance is rising faster than
Data Mining Techniques for Prognosis in Pancreatic Cancer
Data Mining Techniques for Prognosis in Pancreatic Cancer by Stuart Floyd A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUE In partial fulfillment of the requirements for the Degree
Chapter 10: Linear Kinematics of Human Movement
Chapter 10: Linear Kinematics of Human Movement Basic Biomechanics, 4 th edition Susan J. Hall Presentation Created by TK Koesterer, Ph.D., ATC Humboldt State University Objectives Discuss the interrelationship
Neural Networks and Support Vector Machines
INF5390 - Kunstig intelligens Neural Networks and Support Vector Machines Roar Fjellheim INF5390-13 Neural Networks and SVM 1 Outline Neural networks Perceptrons Neural networks Support vector machines
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
Who Is Responsible For A Called Strike?
Who Is Responsible For A Called Strike? Joe Rosales, Scott Spratt Baseball Info Solutions Coplay, PA 18037 Emails: [email protected], [email protected] Abstract The purpose of
LOGISTIC REGRESSION ANALYSIS
LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic
Car Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
FRICTION, WORK, AND THE INCLINED PLANE
FRICTION, WORK, AND THE INCLINED PLANE Objective: To measure the coefficient of static and inetic friction between a bloc and an inclined plane and to examine the relationship between the plane s angle
Baseball and Statistics: Rethinking Slugging Percentage. Tanner Mortensen. December 5, 2013
Baseball and Statistics: Rethinking Slugging Percentage Tanner Mortensen December 5, 23 Abstract In baseball, slugging percentage represents power, speed, and the ability to generate hits. This statistic
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Detecting Corporate Fraud: An Application of Machine Learning
Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several
Project: OUTFIELD FENCES
1 Project: OUTFIELD FENCES DESCRIPTION: In this project you will work with the equations of projectile motion and use mathematical models to analyze a design problem. Two softball fields in Rolla, Missouri
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
Harleysville Girls Softball Association Youth Softball 6U League Rules
Harleysville Girls Softball Association Youth Softball 6U League Rules ASA rules will apply except as listed below. It is the responsibility of the coach and parents of children to enforce the rules. Uniform
Introduction to Logistic Regression
OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction
Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2
Financial Mathematics and Simulation MATH 6740 1 Spring 2011 Homework 2 Due Date: Friday, March 11 at 5:00 PM This homework has 170 points plus 20 bonus points available but, as always, homeworks are graded
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)
The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader) Abstract This project measures the effects of various baseball statistics on the win percentage of all the teams in MLB. Data
Investigation of Support Vector Machines for Email Classification
Investigation of Support Vector Machines for Email Classification by Andrew Farrugia Thesis Submitted by Andrew Farrugia in partial fulfillment of the Requirements for the Degree of Bachelor of Software
ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Mathematics Revision Guides Histograms, Cumulative Frequency and Box Plots Page 1 of 25 M.K. HOME TUITION Mathematics Revision Guides Level: GCSE Higher Tier HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES
CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,
Beating the NCAA Football Point Spread
Beating the NCAA Football Point Spread Brian Liu Mathematical & Computational Sciences Stanford University Patrick Lai Computer Science Department Stanford University December 10, 2010 1 Introduction Over
A Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
