Regression analysis. MTAT Data Mining Anna Leontjeva

Size: px
Start display at page:

Download "Regression analysis. MTAT Data Mining Anna Leontjeva"


1 Regression analysis MTAT Data Mining 2016 Anna Leontjeva

2 Previous lecture Supervised vs. Unsupervised Learning?

3 Previous lecture Supervised vs. Unsupervised Learning? Iris setosa Iris versicolor Iris virginica

4 Previous lecture Supervised vs. Unsupervised Learning? R packages and their dependencies

5 Previous lecture Supervised vs. Unsupervised Learning? The goal of the supervised approach is to learn function that maps input x to output y, given a labeled set of pairs D = {(x i,y i )} N i=1 The goal of the unsupervised approach is to learn interesting patterns given only an input D = {x i } N i=1

6 Previous lecture Classification or regression?

7 Previous lecture Classification or regression? D = {(x i,y i )} N i=1 Classification: y i 2 {1,..C} Regression: y i 2 R

8 Agenda KNN Linear regression Logistic regression Overfitting Regularization

9 Parametric and non-parametric methods by Kerby Rosanes parametric model has a fixed number of parameters (model-based) in non-parametric model number of parameters grow with the amount of training data (instance-based)

10 Parametric and non-parametric methods by Kerby Rosanes parametric model has a fixed number of parameters (model-based) Regression in non-parametric model number of parameters grow with the amount of training data (instance-based) K-nearest neighbors

11 Parametric and non-parametric methods by Kerby Rosanes + - parametric: faster to use stronger assumptions about data distributions non-parametric: more flexible computationally challenging

12 Non-parametric: K-nearest neighbors (KNN)? To classify new point x: - look at K points in the training set that are closest to x! - count members of each class in this set - assign a class to x with majority voting (or return a fraction for a class)

13 K-nearest neighbors (KNN)? Define distance!! e.g Euclidian To classify new point x: - look at K points in the training set that are closest to x! - count members of each class in this set - assign a class to x with majority voting (or return a fraction for a class)

14 KNN simple concept easy to implement difficult to implement efficiently not interpretable (instance-based) asymptotically optimal suffers from curse of dimensionality

15 KNN simple concept easy to implement difficult to implement efficiently not interpretable (instance-based) asymptotically optimal suffers from curse of dimensionality

16 The curse of dimensionality increase of the dimensionality (number of features) leads to sparsity of data points* definitions of density and distance between points are less meaningful algorithms may perform poorly in high-dimensional data *

17 KNN Linear regression Logistic regression Overfitting Regularization

18 Parametric model: Linear regression

19 Simple linear regression y Task: given a list of observations x ŷ = ax + b D = {(x i,y i )} N i=1 find a line that approximates the correspondence in the data

20 Simple linear regression y = + x + output (dependent variable, response) input (independent variable, feature, explanatory variable, etc)

21 Simple linear regression y = + x + intercept (bias) mean of y when x=0 noise (error term, residual) shows what we are not able to predict with x coefficient (slope, or weight w) shows how increases output if input increases by one unit

22 Simple linear regression y = + x + {

23 Simple linear regression We search for a function ŷ = f(x) such that minimizes mean squared error (MSE) : 1 N NX i=1 (y i ŷ i ) 2 1 = N NX (y i x i ) 2 i=1 which means to find derivatives wrt and solve the system of equations: @ =0

24 Simple linear @ =0, ( =ȳ x = P N i=1 (x i x)(y i ȳ) P N i=1 (x i x) 2 where NX NX x = 1 N x i ȳ = 1 N y i i=1 i=1

25 Simple linear @ =0, Closed-form solution* ( =ȳ x = P N i=1 (x i x)(y i ȳ) P N i=1 (x i x) 2 where NX NX x = 1 N x i ȳ = 1 N y i i=1 i=1 *it solves a given problem in terms of functions and mathematical operations from a generally-accepted set of operations

26 Simple linear regression: example Built-in R dataset:a collection of observations of the Old Faithful geyser in the USA Yellowstone National Park > data(faithful) > head(faithful) eruptions waiting > dim(faithful) [1] the length of the waiting period until the next one (in mins) the duration of the geyser eruptions (in mins) > model <- lm(data=faithful, eruptions ~ waiting) What model do we define here? What is input and output?

27 Simple linear regression: example > summary(model)! Call: lm(formula = eruptions ~ waiting, data = faithful)! Residuals: Min 1Q Median 3Q Max ! Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** waiting <2e-16 *** --- R 2 =1 Signif. codes: 0 *** ** 0.01 * ! Residual standard error: on 270 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16 P (yi f i (x)) 2 P (yi ȳ) 2 The fitted model is: eruptions = x waiting

28 Simple linear regression: example in R The fitted model is: eruptions = x waiting What is the eruption time if waiting was 70?

29 Simple linear regression: example in R The fitted model is: eruptions = x waiting What is the eruption time if waiting was 70? > * [1] > coef(model)[[1]] + coef(model)[[2]]*70 Calculating predictions for new set: > test_set = data.frame(waiting=c(70,80,100)) > predict(model, newdata=test_set)

30 Machine learning secret sauce Train Data Remember Test

31 Simple linear regression: example in R train_idx <- sample(nrow(faithful), 172) train <- faithful[train_idx,] test <- faithful[-train_idx,] model <- lm(data=train, eruptions ~ waiting) test$predictions <- predict(newdata=test, model) MSE <- (1/nrow(test))*sum((test$eruptions - test$predictions)^2) > MSE [1] ggplot(train, aes(x=waiting, y=eruptions)) + geom_point() + geom_smooth(method='lm') + theme_bw() ggplot(test, aes(x=eruptions, y=predictions)) + geom_point(color='red') + theme_bw() + geom_abline(intercept = 0, slope = 1)

32 Multivariate linear regression all the same, but instead of one feature, x is a k-dimensional vector x i =(x i1,x i2,..,x ik ) the model is the linear combination of all features: ŷ = x x k x k via the matrix representation: ŷ = X 0 1 ŷ 1 B. A = yˆ n x x 1k B A 1 x n1... x np 0 0 k 1 C A

33 Multivariate linear regression Recall from a simple regression a system of @ =0, ( =ȳ x = P N i=1 (x i x)(y i ȳ) P N i=1 (x i x) 2 For multivariate regression MSE is defined: MSE = 1 N (y ŷ)t = 2y T X +2X T X

34 Multivariate = 2y T X +2X T X =0) =(X T X) 1 X T y complexity of matrix inverse is high: O(n ) in practice iterative methods are used (e.g. gradient descent)

35 Assumptions - the relationship between x and y is linear y - y distributed normally at each value of x x - no heteroscedasticity (variance is systematically changing) - independence and normality of errors - lack of multicollinearity (non-correlated features)

36 Multivariate linear regression Linear model requires parameters to be linear, not features! This is linear model y = x x x 2 This is linear model y = x x x x 2 2 x 0 = (x) x z, p (x),log(x)... This is not linear model y = x x 2

37 Multivariate linear regression > head(prestige) education income women prestige census type GOV.ADMINISTRATORS prof GENERAL.MANAGERS prof ACCOUNTANTS prof PURCHASING.OFFICERS prof CHEMISTS prof PHYSICISTS prof

38 Multivariate linear regression > head(prestige) education income women prestige census type GOV.ADMINISTRATORS prof GENERAL.MANAGERS prof ACCOUNTANTS prof PURCHASING.OFFICERS prof CHEMISTS prof PHYSICISTS prof > model_multivariate <- lm(data=prestige, prestige ~ education + log(income, base=10) + women) > summary(model_multivariate)! Call: lm(formula = prestige ~ education + log(income, base = 10) +! women, data = Prestige) Residuals: Min 1Q Median 3Q Max ! Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-11 *** education < 2e-16 *** log(income, base = 10) e-10 *** women Signif. codes: 0 *** ** 0.01 * ! Residual standard error: on 98 degrees of freedom Multiple R-squared: , Adjusted R-squared: 0.83 F-statistic: on 3 and 98 DF, p-value: < 2.2e-16 Interpret

39 KNN Linear regression Logistic regression Overfitting Regularization

40 Logistic regression is not regression!

41 Logistic regression is not regression!* it is classification y i 2 {1,..C} * it is called so due to its similarity to linear regression

42 Binary logistic regression means that y is binary: (0,1) ŷ = x x k x k

43 Binary logistic regression logit( ) =ln( y is binary: (0,1) models the log odds of probability of "success" as a linear function of input features, where: = P (y =1 x) 1 = P (y =0 x) 1 )= x x k x k

44 Binary logistic regression = P (y =1 x) Denote M := x x k x k ln( 1 )=M ) 1 =expm =exp M (1 ) =exp M exp M = expm 1 + exp M = exp M sigmoid function

45 Binary logistic regression sigmoid means S-shaped also known as squashing function since it maps the line to [0,1], which is necessary if the output needs to be interpreted as probability

46 Binary logistic regression 1 0 If we threshold the output at 0.5, we create a decision rule of the form ŷ =1, p(y =1 x) > 0.5

47 Binary logistic regression: example in R logit <- glm(data=train, as.factor(danger)~waiting, family='binomial') summary(logit) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) *** waiting *** --- Signif. codes: 0 *** ** 0.01 * Recall that: logit( ) =ln( 1 )= x x k x k Coefficients are difficult to interpret directly: the log odds We can take exponent of coefficients: exp(coef(logit)) (Intercept) waiting "0.000" "1.893"

48 Binary logistic regression: example in R (Intercept) waiting "0.000" "1.893" Now, coefficients express odds: P (y =1 x) P (y =0 x) Write down the model Interpret If it is close to 1, it is not interesting, as one unit increase of x does not change odds of success.

49 Binary logistic regression: example in R logit <- glm(data=train, as.factor(danger)~waiting, family='binomial') test$predictions_probability <- predict(newdata=test, logit, type = 'response') test$predictions_binary <- ifelse(test$predictions_probability<=0.5,0,1) table(real=test$danger, predictions=test$predictions_binary) predictions real How is it called?

50 Binary logistic regression: ROC curve > roc_obj <- roc(response=test$danger, predictor=test$predictions_probability) > roc_obj$auc Area under the curve:

51 Intuition behind ROC: step I: sort your data according to the score: step II: according to the sorting write down the true class: step III: go up for 1 and right for 0

52 model2 makes Intuition behind ROC: mistakes earlier random guess model1: Area under the curve: both our model, and the second one predict correctly model2: Area under the curve: when they are confident in their scores

53 KNN Linear regression Logistic regression Overfitting Regularization

54 Under- and Overfitting

55 How to detect overfitting! Slides by Digvijay Singh

56 How to detect overfitting

57 How to detect overfitting Model does not generalize

58 Prediction error Bias-variance tradeoff Model complexity

59 Prediction error Bias-variance tradeoff Model complexity We wants to choose a model that both accurately captures patterns in training data, but also generalizes well to unseen data.

60 Prediction error Bias-variance tradeoff Model complexity We wants to choose a model that both accurately captures patterns in training data, but also generalizes well to unseen data. Unfortunately, it is a tradeoff between two.

61 Bias-variance tradeoff Bias - error from erroneous assumptions in the learning algorithm (underfitting) Variance - error from sensitivity to small fluctuations (overfitting)

62 Bias variance tradeoff Variance Bias Dimensionality reduction! Feature selection! Larger training set! Adding features! Tuning of hyperparameters

63 Bias variance tradeoff Variance Bias Dimensionality reduction! Feature selection! Larger training set! Adding features! Tuning of hyperparameters

64 Tuning of hyperparameters Method Variance Bias Linear and logistic regression! K-nearest neighbors! Decision trees!! regularization increase of k pruning

65 KNN Linear regression Logistic regression Overfitting Regularization

66 ( Regularization (for regression) Recall that in regression we minimized MSE: 1 N NX (y i ŷ i ) 2 i=1 It is loss function L for regression Different methods have different loss functions that describe how to penalize errors

67 ( Regularization (for regression) Recall that in regression we minimized MSE: 1 N NX (y i ŷ i ) 2 i=1 It is loss function L for regression Regularization R imposes a penalty on the size of coefficients: L = MSE + R where R can be: 1 or 2 2

68 Regularization (for regression) L = MSE + R where R can be: 1 or 2 2 Lasso Ridge (l 1 norm) (l 2 norm) Lasso results in many coefficients being zero, thus performing feature selection Ridge regression tends to keep all coefficients, but decrease them to small numbers

69 Regularization (for regression) L = MSE + R px j=1 j where R can be: 1 or 2 2 Lasso Ridge (l 1 norm) (l 2 norm) px j=1 2 j Lasso results in many coefficients being zero, thus performing feature selection Ridge regression tends to keep all coefficients, but decrease them to small numbers

70 KNN Summary? Linear regression y x Logistic regression Overfitting Regularization

71 Recommended literature

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information


MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Data Mining: An Overview. David Madigan

Data Mining: An Overview. David Madigan Data Mining: An Overview David Madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

LCs for Binary Classification

LCs for Binary Classification Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics!! Lecture 6 Three Approaches to Classification Construct

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht 539 Sennott Square, x5 Administration Instructor: Milos Hauskrecht 539 Sennott

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Predictive Modeling Techniques in Insurance

Predictive Modeling Techniques in Insurance Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business 1 Model Building

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Multivariate Logistic Regression

Multivariate Logistic Regression 1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

More information

A Simple Introduction to Support Vector Machines

A Simple Introduction to Support Vector Machines A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear

More information

Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University Goals of the Lecture Introduce Additive Models

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information



More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu Modern machine learning is rooted in statistics. You will find many familiar

More information

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann

More information


LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Decompose Error Rate into components, some of which can be measured on unlabeled data

Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information


EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

3F3: Signal and Pattern Processing

3F3: Signal and Pattern Processing 3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani Department of Engineering University of Cambridge Lent Term Classification We will represent data by

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information


MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Lecture 6. Artificial Neural Networks

Lecture 6. Artificial Neural Networks Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm

More information

Content-Based Recommendation

Content-Based Recommendation Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

More information

L3: Statistical Modeling with Hadoop

L3: Statistical Modeling with Hadoop L3: Statistical Modeling with Hadoop Feng Li School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs} September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs} CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

JetBlue Airways Stock Price Analysis and Prediction

JetBlue Airways Stock Price Analysis and Prediction JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue

More information

CSE 473: Artificial Intelligence Autumn 2010

CSE 473: Artificial Intelligence Autumn 2010 CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Lecture 2: The SVM classifier

Lecture 2: The SVM classifier Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

The Artificial Prediction Market

The Artificial Prediction Market The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Bayesian Nonparametrics Parametric vs Nonparametric

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

CSC 411: Lecture 07: Multiclass Classification

CSC 411: Lecture 07: Multiclass Classification CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass

More information

Better credit models benefit us all

Better credit models benefit us all Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information


I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of

More information

Introduction to Machine Learning Using Python. Vikram Kamath

Introduction to Machine Learning Using Python. Vikram Kamath Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression

More information

Machine Learning Methods for Demand Estimation

Machine Learning Methods for Demand Estimation Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information