# Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Size: px
Start display at page:

Download "Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not."

Transcription

1 Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C - We want to build classifier C(X) that uses X to predict class label for Y Often we may be just as interested in estimating probability of each class, i.e. P(Y=y X=X) for y in C Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Simulated dataset used in book for illustration: Default in ISLR library X = (student, balance, income) Y = default (taking values Yes and No) n=10,000 values > head(default) default student balance income 1 No No No Yes No No No No > table(default\$default) No Yes Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani

2 4.2 Why not just use linear regression? It is technically possible, for binary data: Example of how we might do it for binary data: Convert Y from categorical to numeric: Yes = 1, No = 0 Least squares Linear regression estimates E(Y X) and for a binary variable Y, we know E(Y) = 0*Pr(Y=0) + 1*Pr(Y=1) = Pr(Y=1) We could always produce a classification by saying C(X) = "yes" if our predicted Y is > 0.5 But there are problems, even for the binary case: - We can get predictions that are < 0 or > 1 (linear regression case is on the left of Fig 4.2): So if we want probability estimates, a linear function is not a good idea. Other problems: With more than 2 classes, the conversion to 0/1 is problematic, since it implies an ordering of the different classes: Movie categories C = { Drama, Action, Comedy, Documentary } Choosing Y = 1 if Drama, 2 if Action, 3 if Comedy, 4 if Documentary would imply an ordering of the categories, and give us different results in a linear regression than Y = 1 if Action, 2 if Comedy, 3 if Drama, 4 if Documentary In the next two sections we will learn two methods for classiciation, Logistic regression and Linear Discriminant Analysis

3 4.3 Logistic Regression The logistic model with one X We will focus mostly on the binary case, in which the two classes are labelled as C = {0,1}. This is NOT the same as fitting a least squares regression with 0 / 1 response. First, consider the case of a single predictor, X, which we assume takes numeric values. Logistic regression writes p(x) = Pr(Y=1 X), and uses the form What does this look like? One curve is (beta0,beta1) = (-2,1) (0,2), or (2,0.5) We can re-express (4.2) as: Log odds or "logit" function So a unit change in X results in a change of beta_1 in the log-odds. - Not as nice an interpretation as linear regression - But the sign of the coefficients still gives an indication of the direction of the effect. - From plot of the logit function:beta_0 determines p(0) = exp(beta0) / (1+exp(beta0)) Thought provoking question: If our model is log( p(x) / (1-p(x) ) = beta0 + beta1 x, where's the error term? It's not on the right of the formula

4 4.3.2 Estimating the regression coefficients in a logistic regression We use maximum likelihood to estimate the parameters. Likelihood function = probability of the observed training data. Example: Y X likelihood = P(Y) = Pr(Y1=0)*Pr(Y2=0)*Pr(Y3=1)*Pr(Y4=0)*Pr(Y5=1) = [1-p(1.4)] * [1-p(2.4)] * p(3.5) * [1-p(3.6)] * p(5.0) In the above, expressions like p(1.4) are given by equation (4.2) for p(x) on previous page. These depend on beta0 and beta1. The general form of the likelihood function is Maximum likelihood estimation : choose beta0 and beta1 to maxiize the likelihood function. - That is, what values of beta0 and beta1 make the observed data most probable? This is easily done in R, using the "glm" function. For the "Default" dataset the above estimates indicate what? Making Predictions Default probability for an individual with a balance of \$1,000 is For a balance of \$2,000:

5 We can also make predictions for the effect of being a student on defaulting on your loan. First fit the logistic regression model: It appears that students are more likely to default on a loan than non-students. And the effect is significant (small p-value) Multiple Logistic Regression (logistic regression with more than one variable) What?? Negative??

6 What's going on with the negative "student" coefficient?... the idea of "Confounding" Left: Average default rates for nonstudents (blue) and students (orange) Right: Boxplots of credit card balance for nonstudents and students. Students tend to have higher balances than non-students, and default rates increase with credit card balance. So students tend to default more because they have higher balances. The two predictors "student" and "balance" are confounded. A simple linear regression with only "student" ignores balance, and since students have a higher balance, the estimated effect is that being a student increases the chance of defaulting. Note that for each level of balance (in left plot), students have a lower default rate than nonstudents. The multiple logistic regression is identifying the correct interpretation. That is, after adjusting for the effect of credit card balance, students are less likely to default Logistic regression for > 2 response classes Logistic regression is easily generalized to more than two response classes. One version, in the "glmnet" package in R, uses a linear function for each class: k = 1,..., K This is also referred to as multinomial regression

7 4.4 Linear Discriminant Analysis (LDA) Another way to estimate P(Y X) using linear function. Different from, but related to logistic regression. In some settings (well-separated classes, or small samples in which the within class-distribution of Xs is roughlyt normal), logistic regression estimates can be unstable. LDA can be more stable. LDA is also more popular than logistic methods for multiclass problems. Key idea: - Model the distribution of X within each class (i.e. model P(X Y) ). - Then use Bayes' theorem to turn this around and get an estimate of P(Y X) Bayes Theorem: Definitions: Bayes' Theorem is written in a slightly different form: We will typically use a normal distribution for f_k(x)

8 4.4.2 LDA for p=1 dimension Example: - Both rows: x in one class is N(-2,1), the other N(1,1) - Top row: pi1 = pi2 = Bottom row: pi1 =.8, pi2 =.2 Mathematical details: The Gaussian density has the form NOTE: we assume for now that variances in all classes are equal.

9 Plugging this density into Bayes formula gives the expression below for P(Y=k X=x): To classify at the value X=x, we must calculate p_k(x) for all k=1, 2,..., K classes, and then choose the class with the largest p_k(x). If we take logs of the above expression, and simplify, it turns out that p_k(x) is largest whenever the following discriminant score is largest (it is also calculated at a specific x for all k). Not that this discriminant score is a linear function of x. This is where the "linear" in LDA comes from. If we have K=2 classes and equal class probabilities pi_1 = pi_2 = 0.5, it's possible to show that the decision boundary is at the midpoint of the class means: Example with class means 1.5 and -1.5, and equal class probabilities: Left: Theoretical model, with ideal boundary Right: 20 observations sampled in each class, solid line = estimated decision boundary

10 Estimation: The above derivations assumed we knew the population means, the common variance, and the prior probabilities of each class. In practice, we estimate these from the training data and "plug in" the values to the above formulas. Above formula is the usual sample variance for x values belonging to class k. Note: These estimates are fast to compute, even for very large samples. Estimation of an LDA model is generally faster than a logistic regression, which uses iterative algorithms to fine the maximum likelihood estimates LDA with p > 1 dimensions

11 The univariate normal densities in the previous section are replaced by multivariate normal density functions. They have the form: The covariance matrix determines the shape of the density. It also determines the correlations between the different elements of the vector x. The plot on the previous page shows two cases, one where the covariance matrix is the identity, the other where it has diagonal elements of 1 and off-diagnoal elements of 0.7. Diagonal elements of the covariance matrix are the variances of the elements of x. Calculations using the above density, similar to what we did for p=1 dimension, give the discriminant function This may look messy, but careful examination indicates it is still a linear function of the elements of the vector x Illustration: K = 3 classes and p=2 dimensions We assume pi1 = pi2 = pi3 = 1/3 For plots on next page: 1. The dashed lines below are known as the Bayes decision boundaries. They come from the delta function above, and would give minimum possible error. Of course they are not known unless the mu's and Sigma are known. 2. The ellipses correspond to contours of equal height of the Multivariate Gaussian density function, and give an idea of the distribution of the data within each class. Notice how the shape, size and orientation of the ellipses are the same within each class. 3. In the right plot, samples of size 20 are simulated from each class. The color corresponds to the actual class from which they were simulated, so some points will fall outside the optimal decision boundaries (i.e. be misclassified).

12 Calculating class probabilities for LDA Once we have estimates of the discriminant functions above (delta_k(x)), we can use these to calculate the probabilities of the various classes: The "estimated" delta_k functions require estimation of multivariate means and covariance matrices. Formulas are similar to 1D case. Aside: Inferential statements, such as Confidence Intervals and Hypothesis Tests, are not as common in LDA as they are in logistic regression. This is partly because logistic regression is based on a model for random Y with fixed X, while LDA is based on a model for random X with fixed Y. Credit card default example: - Use two predictors (balance and student) - LDA gives the following results, presented as a "confusion matrix" comparing actual and predicted class: True Default class No Yes Total Predicted No class Yes Total Overall misclassification rate = (252+23) / = 275/10000 = 2.75% BUT, if we predicted "everyone will not default" our misclassification rate would be... Of the true No's we make 23/9667 = 0.2% errors (false positive rate) Of the true Yes's, we make 252/333 = 75.5% errors!!! (false negative rate)

13 Problem: we predicted classes using a threshold of 0.5 for the predicted class probability. - This minimize overall misclassification rate - But it is doesn't give accurate predictions for the "yes" class. Solution: use different threshold value (below: threshold = 0.2) Our overall error rate has increased to ( ) /10000 = 3.73%. - False negative rate (i.e. error rate for "yes") = 138/333 = 41.4% (better than 75.5% before) - False positive = 235/9667 = 2.4% (was 0.2%) We could consider varying the threshold between 0 and 0.5, and look at how our error rates change Another popular way of visualizing the tradeoff for different thresholds is the ROC curve - We vary the threshold from 0 to 1. - At each threshold value we get a false positive and a false negative rate (as in above plot) - Calculate "true positive" rate = (1 - false negative) = fraction of defaulters correctly identified. - Then we plot false positive vs. true positive - Put another way, we plot values on the orange curve vs. 1-values on the blue curve - So we want False positive = 0 and true positive = 1. This is the upper right corner in the plot below.

14 True positive rate = proportion of defaulters correctly identified this is 1-false negative rate Area under the ROC curve (AUC) is often used as a performance measure. High AUC is good. Dotted 45-degree line = result if "student" & "balance" had no relationship with Y In this example, the results for logistic regression are nearly identical to LDA. Note that the LDA model isn't completely sensible for this data. - "Student" = 1 if student, and 0 if non-student. - Why is this "not sensible"? What assumptions is LDA making about the Xs within each class? Quadratic Discriminant Analysis (QDA) Recall that LDA assumes - multivariate normal distribution within each class k - each class has a different mean vector - all classes have the same covariance matrix

15 Quadratic discriminant analysis allows the covariance matrices to be different within each class: LDA vs. QDA: LDA: QDA: By allowing the covariance matrices to be different, we get decision boundaries that are nonlinear. Another variation of LDA, not mentioned in the book is Naive Bayes - Naive Bayes assumes that each class has a different covariance matrix, like QDA - But it also assumes that the within-class covariance matrix is diagonal. - That is, it assumes that conditional on class, the predictors are uncorrelated. - This is a strong assumption - The assumption can be useful when we have very high dimensional data (many Xs, large p)

16 A naive Bayes model can also be used for non-normal data, or data with some normal and some non-normal (e.g. qualitative) variables: - we just assume that within each class the pdf is a product of one-dimensional pdfs:

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

### Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### Linear Classification. Volker Tresp Summer 2015

Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

### Classification Problems

Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

Statistical Learning: Chapter 5 Resampling methods (Cross-validation and bootstrap) (Note: prior to these notes, we'll discuss a modification of an earlier train/test experiment from Ch 2) We discuss 2

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

### Nominal and ordinal logistic regression

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

### Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

### Machine Learning Logistic Regression

Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

### Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Prof. Dr. J. Franke All of Statistics 1.52 Binary response variables - logistic regression Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit

### Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev

86 ITHEA APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING Anatoli Nachev Abstract: This paper presents a case study of data mining modeling techniques for direct marketing. It focuses to three

### Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Linear Models for Classification

Linear Models for Classification Sumeet Agarwal, EEL709 (Most figures from Bishop, PRML) Approaches to classification Discriminant function: Directly assigns each data point x to a particular class Ci

### Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

### Predict Influencers in the Social Network

Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

### Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

### Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Logistic Regression (1/24/13)

STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

### Detecting Corporate Fraud: An Application of Machine Learning

Detecting Corporate Fraud: An Application of Machine Learning Ophir Gottlieb, Curt Salisbury, Howard Shek, Vishal Vaidyanathan December 15, 2006 ABSTRACT This paper explores the application of several

### Classification by Pairwise Coupling

Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

### Data Mining. Nonlinear Classification

Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

### Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

### The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities Elizabeth Garrett-Mayer, PhD Assistant Professor Sidney Kimmel Comprehensive Cancer Center Johns Hopkins University 1

### Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

### Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

### CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Christfried Webers. Canberra February June 2015

c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

### Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classiﬁca6on

Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classiﬁca6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

### Lecture 8: Signal Detection and Noise Assumption

ECE 83 Fall Statistical Signal Processing instructor: R. Nowak, scribe: Feng Ju Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(, σ I n n and S = [s, s,...,

Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

### Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

### The Artificial Prediction Market

The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

### Sections 2.11 and 5.8

Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

### SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,

### Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

### QUALITY ENGINEERING PROGRAM

QUALITY ENGINEERING PROGRAM Production engineering deals with the practical engineering problems that occur in manufacturing planning, manufacturing processes and in the integration of the facilities and

### Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Content Expectations for Precalculus Michigan Precalculus 2011 REVERSE CORRELATION CHAPTER/LESSON TITLES Chapter 0 Preparing for Precalculus 0-1 Sets There are no state-mandated Precalculus 0-2 Operations

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Multiple Choice Models II

Multiple Choice Models II Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical

### On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers

On the Path to an Ideal ROC Curve: Considering Cost Asymmetry in Learning Classifiers Francis R. Bach Computer Science Division University of California Berkeley, CA 9472 fbach@cs.berkeley.edu Abstract

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

### MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

MAXIMIZING RETURN ON DIRET MARKETING AMPAIGNS IN OMMERIAL BANKING S 229 Project: Final Report Oleksandra Onosova INTRODUTION Recent innovations in cloud computing and unified communications have made a

### 1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification

1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### LCs for Binary Classification

Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Computational Assignment 4: Discriminant Analysis

Computational Assignment 4: Discriminant Analysis -Written by James Wilson -Edited by Andrew Nobel In this assignment, we will investigate running Fisher s Discriminant analysis in R. This is a powerful

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to