Regression analysis. MTAT Data Mining Anna Leontjeva
|
|
- Alban Berry
- 7 years ago
- Views:
Transcription
1 Regression analysis MTAT Data Mining 2016 Anna Leontjeva
2 Previous lecture Supervised vs. Unsupervised Learning?
3 Previous lecture Supervised vs. Unsupervised Learning? Iris setosa Iris versicolor Iris virginica
4 Previous lecture Supervised vs. Unsupervised Learning? R packages and their dependencies
5 Previous lecture Supervised vs. Unsupervised Learning? The goal of the supervised approach is to learn function that maps input x to output y, given a labeled set of pairs D = {(x i,y i )} N i=1 The goal of the unsupervised approach is to learn interesting patterns given only an input D = {x i } N i=1
6 Previous lecture Classification or regression?
7 Previous lecture Classification or regression? D = {(x i,y i )} N i=1 Classification: y i 2 {1,..C} Regression: y i 2 R
8 Agenda KNN Linear regression Logistic regression Overfitting Regularization
9 Parametric and non-parametric methods by Kerby Rosanes parametric model has a fixed number of parameters (model-based) in non-parametric model number of parameters grow with the amount of training data (instance-based)
10 Parametric and non-parametric methods by Kerby Rosanes parametric model has a fixed number of parameters (model-based) Regression in non-parametric model number of parameters grow with the amount of training data (instance-based) K-nearest neighbors
11 Parametric and non-parametric methods by Kerby Rosanes + - parametric: faster to use stronger assumptions about data distributions non-parametric: more flexible computationally challenging
12 Non-parametric: K-nearest neighbors (KNN)? To classify new point x: - look at K points in the training set that are closest to x! - count members of each class in this set - assign a class to x with majority voting (or return a fraction for a class)
13 K-nearest neighbors (KNN)? Define distance!! e.g Euclidian To classify new point x: - look at K points in the training set that are closest to x! - count members of each class in this set - assign a class to x with majority voting (or return a fraction for a class)
14 KNN simple concept easy to implement difficult to implement efficiently not interpretable (instance-based) asymptotically optimal suffers from curse of dimensionality
15 KNN simple concept easy to implement difficult to implement efficiently not interpretable (instance-based) asymptotically optimal suffers from curse of dimensionality
16 The curse of dimensionality increase of the dimensionality (number of features) leads to sparsity of data points* definitions of density and distance between points are less meaningful algorithms may perform poorly in high-dimensional data *
17 KNN Linear regression Logistic regression Overfitting Regularization
18 Parametric model: Linear regression
19 Simple linear regression y Task: given a list of observations x ŷ = ax + b D = {(x i,y i )} N i=1 find a line that approximates the correspondence in the data
20 Simple linear regression y = + x + output (dependent variable, response) input (independent variable, feature, explanatory variable, etc)
21 Simple linear regression y = + x + intercept (bias) mean of y when x=0 noise (error term, residual) shows what we are not able to predict with x coefficient (slope, or weight w) shows how increases output if input increases by one unit
22 Simple linear regression y = + x + {
23 Simple linear regression We search for a function ŷ = f(x) such that minimizes mean squared error (MSE) : 1 N NX i=1 (y i ŷ i ) 2 1 = N NX (y i x i ) 2 i=1 which means to find derivatives wrt and solve the system of equations: @ =0
24 Simple linear @ =0, ( =ȳ x = P N i=1 (x i x)(y i ȳ) P N i=1 (x i x) 2 where NX NX x = 1 N x i ȳ = 1 N y i i=1 i=1
25 Simple linear @ =0, Closed-form solution* ( =ȳ x = P N i=1 (x i x)(y i ȳ) P N i=1 (x i x) 2 where NX NX x = 1 N x i ȳ = 1 N y i i=1 i=1 *it solves a given problem in terms of functions and mathematical operations from a generally-accepted set of operations
26 Simple linear regression: example Built-in R dataset:a collection of observations of the Old Faithful geyser in the USA Yellowstone National Park > data(faithful) > head(faithful) eruptions waiting > dim(faithful) [1] the length of the waiting period until the next one (in mins) the duration of the geyser eruptions (in mins) > model <- lm(data=faithful, eruptions ~ waiting) What model do we define here? What is input and output?
27 Simple linear regression: example > summary(model)! Call: lm(formula = eruptions ~ waiting, data = faithful)! Residuals: Min 1Q Median 3Q Max ! Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** waiting <2e-16 *** --- R 2 =1 Signif. codes: 0 *** ** 0.01 * ! Residual standard error: on 270 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16 P (yi f i (x)) 2 P (yi ȳ) 2 The fitted model is: eruptions = x waiting
28 Simple linear regression: example in R The fitted model is: eruptions = x waiting What is the eruption time if waiting was 70?
29 Simple linear regression: example in R The fitted model is: eruptions = x waiting What is the eruption time if waiting was 70? > * [1] > coef(model)[[1]] + coef(model)[[2]]*70 Calculating predictions for new set: > test_set = data.frame(waiting=c(70,80,100)) > predict(model, newdata=test_set)
30 Machine learning secret sauce Train Data Remember Test
31 Simple linear regression: example in R train_idx <- sample(nrow(faithful), 172) train <- faithful[train_idx,] test <- faithful[-train_idx,] model <- lm(data=train, eruptions ~ waiting) test$predictions <- predict(newdata=test, model) MSE <- (1/nrow(test))*sum((test$eruptions - test$predictions)^2) > MSE [1] ggplot(train, aes(x=waiting, y=eruptions)) + geom_point() + geom_smooth(method='lm') + theme_bw() ggplot(test, aes(x=eruptions, y=predictions)) + geom_point(color='red') + theme_bw() + geom_abline(intercept = 0, slope = 1)
32 Multivariate linear regression all the same, but instead of one feature, x is a k-dimensional vector x i =(x i1,x i2,..,x ik ) the model is the linear combination of all features: ŷ = x x k x k via the matrix representation: ŷ = X 0 1 ŷ 1 B. A = yˆ n x x 1k B A 1 x n1... x np 0 0 k 1 C A
33 Multivariate linear regression Recall from a simple regression a system of @ =0, ( =ȳ x = P N i=1 (x i x)(y i ȳ) P N i=1 (x i x) 2 For multivariate regression MSE is defined: MSE = 1 N (y ŷ)t = 2y T X +2X T X
34 Multivariate = 2y T X +2X T X =0) =(X T X) 1 X T y complexity of matrix inverse is high: O(n ) in practice iterative methods are used (e.g. gradient descent)
35 Assumptions - the relationship between x and y is linear y - y distributed normally at each value of x x - no heteroscedasticity (variance is systematically changing) - independence and normality of errors - lack of multicollinearity (non-correlated features)
36 Multivariate linear regression Linear model requires parameters to be linear, not features! This is linear model y = x x x 2 This is linear model y = x x x x 2 2 x 0 = (x) x z, p (x),log(x)... This is not linear model y = x x 2
37 Multivariate linear regression > head(prestige) education income women prestige census type GOV.ADMINISTRATORS prof GENERAL.MANAGERS prof ACCOUNTANTS prof PURCHASING.OFFICERS prof CHEMISTS prof PHYSICISTS prof
38 Multivariate linear regression > head(prestige) education income women prestige census type GOV.ADMINISTRATORS prof GENERAL.MANAGERS prof ACCOUNTANTS prof PURCHASING.OFFICERS prof CHEMISTS prof PHYSICISTS prof > model_multivariate <- lm(data=prestige, prestige ~ education + log(income, base=10) + women) > summary(model_multivariate)! Call: lm(formula = prestige ~ education + log(income, base = 10) +! women, data = Prestige) Residuals: Min 1Q Median 3Q Max ! Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-11 *** education < 2e-16 *** log(income, base = 10) e-10 *** women Signif. codes: 0 *** ** 0.01 * ! Residual standard error: on 98 degrees of freedom Multiple R-squared: , Adjusted R-squared: 0.83 F-statistic: on 3 and 98 DF, p-value: < 2.2e-16 Interpret
39 KNN Linear regression Logistic regression Overfitting Regularization
40 Logistic regression is not regression!
41 Logistic regression is not regression!* it is classification y i 2 {1,..C} * it is called so due to its similarity to linear regression
42 Binary logistic regression means that y is binary: (0,1) ŷ = x x k x k
43 Binary logistic regression logit( ) =ln( y is binary: (0,1) models the log odds of probability of "success" as a linear function of input features, where: = P (y =1 x) 1 = P (y =0 x) 1 )= x x k x k
44 Binary logistic regression = P (y =1 x) Denote M := x x k x k ln( 1 )=M ) 1 =expm =exp M (1 ) =exp M exp M = expm 1 + exp M = exp M sigmoid function
45 Binary logistic regression sigmoid means S-shaped also known as squashing function since it maps the line to [0,1], which is necessary if the output needs to be interpreted as probability
46 Binary logistic regression 1 0 If we threshold the output at 0.5, we create a decision rule of the form ŷ =1, p(y =1 x) > 0.5
47 Binary logistic regression: example in R logit <- glm(data=train, as.factor(danger)~waiting, family='binomial') summary(logit) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) *** waiting *** --- Signif. codes: 0 *** ** 0.01 * Recall that: logit( ) =ln( 1 )= x x k x k Coefficients are difficult to interpret directly: the log odds We can take exponent of coefficients: exp(coef(logit)) (Intercept) waiting "0.000" "1.893"
48 Binary logistic regression: example in R (Intercept) waiting "0.000" "1.893" Now, coefficients express odds: P (y =1 x) P (y =0 x) Write down the model Interpret If it is close to 1, it is not interesting, as one unit increase of x does not change odds of success.
49 Binary logistic regression: example in R logit <- glm(data=train, as.factor(danger)~waiting, family='binomial') test$predictions_probability <- predict(newdata=test, logit, type = 'response') test$predictions_binary <- ifelse(test$predictions_probability<=0.5,0,1) table(real=test$danger, predictions=test$predictions_binary) predictions real How is it called?
50 Binary logistic regression: ROC curve > roc_obj <- roc(response=test$danger, predictor=test$predictions_probability) > roc_obj$auc Area under the curve:
51 Intuition behind ROC: step I: sort your data according to the score: step II: according to the sorting write down the true class: step III: go up for 1 and right for 0
52 model2 makes Intuition behind ROC: mistakes earlier random guess model1: Area under the curve: both our model, and the second one predict correctly model2: Area under the curve: when they are confident in their scores
53 KNN Linear regression Logistic regression Overfitting Regularization
54 Under- and Overfitting
55 How to detect overfitting! Slides by Digvijay Singh
56 How to detect overfitting
57 How to detect overfitting Model does not generalize
58 Prediction error Bias-variance tradeoff Model complexity
59 Prediction error Bias-variance tradeoff Model complexity We wants to choose a model that both accurately captures patterns in training data, but also generalizes well to unseen data.
60 Prediction error Bias-variance tradeoff Model complexity We wants to choose a model that both accurately captures patterns in training data, but also generalizes well to unseen data. Unfortunately, it is a tradeoff between two.
61 Bias-variance tradeoff Bias - error from erroneous assumptions in the learning algorithm (underfitting) Variance - error from sensitivity to small fluctuations (overfitting)
62 Bias variance tradeoff Variance Bias Dimensionality reduction! Feature selection! Larger training set! Adding features! Tuning of hyperparameters
63 Bias variance tradeoff Variance Bias Dimensionality reduction! Feature selection! Larger training set! Adding features! Tuning of hyperparameters
64 Tuning of hyperparameters Method Variance Bias Linear and logistic regression! K-nearest neighbors! Decision trees!! regularization increase of k pruning
65 KNN Linear regression Logistic regression Overfitting Regularization
66 ( Regularization (for regression) Recall that in regression we minimized MSE: 1 N NX (y i ŷ i ) 2 i=1 It is loss function L for regression Different methods have different loss functions that describe how to penalize errors
67 ( Regularization (for regression) Recall that in regression we minimized MSE: 1 N NX (y i ŷ i ) 2 i=1 It is loss function L for regression Regularization R imposes a penalty on the size of coefficients: L = MSE + R where R can be: 1 or 2 2
68 Regularization (for regression) L = MSE + R where R can be: 1 or 2 2 Lasso Ridge (l 1 norm) (l 2 norm) Lasso results in many coefficients being zero, thus performing feature selection Ridge regression tends to keep all coefficients, but decrease them to small numbers
69 Regularization (for regression) L = MSE + R px j=1 j where R can be: 1 or 2 2 Lasso Ridge (l 1 norm) (l 2 norm) px j=1 2 j Lasso results in many coefficients being zero, thus performing feature selection Ridge regression tends to keep all coefficients, but decrease them to small numbers
70 KNN Summary? Linear regression y x Logistic regression Overfitting Regularization
71 Recommended literature
Local classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationMACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationWe extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationLCs for Binary Classification
Linear Classifiers A linear classifier is a classifier such that classification is performed by a dot product beteen the to vectors representing the document and the category, respectively. Therefore it
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationIntroduction to nonparametric regression: Least squares vs. Nearest neighbors
Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationPredictive Modeling Techniques in Insurance
Predictive Modeling Techniques in Insurance Tuesday May 5, 2015 JF. Breton Application Engineer 2014 The MathWorks, Inc. 1 Opening Presenter: JF. Breton: 13 years of experience in predictive analytics
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationSection 6: Model Selection, Logistic Regression and more...
Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building
More informationMachine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler
Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationA Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationProbabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur
Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationUsing R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationRegression III: Advanced Methods
Lecture 4: Transformations Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture The Ladder of Roots and Powers Changing the shape of distributions Transforming
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationReference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors
Classification k-nearest neighbors Data Mining Dr. Engin YILDIZTEPE Reference Books Han, J., Kamber, M., Pei, J., (2011). Data Mining: Concepts and Techniques. Third edition. San Francisco: Morgan Kaufmann
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More informationDecompose Error Rate into components, some of which can be measured on unlabeled data
Bias-Variance Theory Decompose Error Rate into components, some of which can be measured on unlabeled data Bias-Variance Decomposition for Regression Bias-Variance Decomposition for Classification Bias-Variance
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationEDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationPsychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
More information3F3: Signal and Pattern Processing
3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani zoubin@eng.cam.ac.uk Department of Engineering University of Cambridge Lent Term Classification We will represent data by
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationLecture 6. Artificial Neural Networks
Lecture 6 Artificial Neural Networks 1 1 Artificial Neural Networks In this note we provide an overview of the key concepts that have led to the emergence of Artificial Neural Networks as a major paradigm
More informationContent-Based Recommendation
Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains
More informationL3: Statistical Modeling with Hadoop
L3: Statistical Modeling with Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 10, 2014 Today we are going to learn...
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationEconomics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis
Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions
More informationLecture 8 February 4
ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationJetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
More informationCSE 473: Artificial Intelligence Autumn 2010
CSE 473: Artificial Intelligence Autumn 2010 Machine Learning: Naive Bayes and Perceptron Luke Zettlemoyer Many slides over the course adapted from Dan Klein. 1 Outline Learning: Naive Bayes and Perceptron
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationLecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
More informationThe Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationCSC 411: Lecture 07: Multiclass Classification
CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 1, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 07-Multiclass
More informationBetter credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationI L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
Beckman HLM Reading Group: Questions, Answers and Examples Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Linear Algebra Slide 1 of
More informationIntroduction to Machine Learning Using Python. Vikram Kamath
Introduction to Machine Learning Using Python Vikram Kamath Contents: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction/Definition Where and Why ML is used Types of Learning Supervised Learning Linear Regression
More informationMachine Learning Methods for Demand Estimation
Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More information