Predicting Defaults of Loans using Lending Club s Loan Data


 Anna Wright
 1 years ago
 Views:
Transcription
1 Predicting Defaults of Loans using Lending Club s Loan Data Oleh Dubno Fall 2014 General Assembly Data Science Link to my Developer Notebook (ipynb)  Background and Hypothesis: The data is coming from Lending Club, a peer to peer lending company, headquartered in San Francisco. LC began by operating as an online consumer lending platform that enables borrowers to obtain a loan that s funded by individuals and institutions. LC, just recently made their loans available to small businesses. I will be focusing on the prior. The dataset and the associated description of its features are downloadable on the LC site. It comes equipped with 188,127 values and 31 features. Goal: Discover the features that are indicative of someone paying or defaulting on their loan. Tools: Logistic regression, Naïve Bayes, Decision Tree To determine which features of the data set contribute towards someone repaying or defaulting on his or her loan and using the Decision Tree to see how well the model performs against a test set. Folium To map the features of the dataset. By initially mapping a bar chart of the loan statuses, seven unique values become discoverable. To do the logistic regression only two are required. (see figure below) The focus is around predicting who repays or defaults on their loan. As a result, the Current column will be removed, the Fully Paid column will remain and the rest of the columns will be grouped and characterized as Unpaid. This is then converted to Boolean values: Unpaid 0 and Paid 1.
2 The data has now been drastically reduced. Given that Current is a heavy hitter, removing it reduces the dataset to 54,419 entries. This is necessary, provided the goal is not to focus on current loans. Data Overview The average funded amount of an individual loan is $13, The minimum loan given out is $1,00.00 with a median amount of $12,000 and a maximum amount of just $35, The funded amount is normally distributed and the numbers do not appear to be skewed. Good! The average annual income is $71, with a minimum income of $4,800, a median of $62,000 and a maximum income of $7,141,778. The maximum value serves as a definite outlier and the set will be limited to $200,000. Not surprisingly, as Annual Income goes up so does the Funded Amount. The sweet spot, after which Annual Income does not predict Funded Amount, seems to be at about the mean of the annual income itself of $72,000. I suppose the mean annual income of $72,000 matches the cut off for loans at $35,000 for good reasons. Interestingly, Lending club seems to have a strict policy, limiting the Amount Funded according to the individual Annual Income, up to $72,000, after which it begins to vary.
3 Lets run an OLS regression using Annual Income (predictor) to predict Amount Funded (the explained variable). OLS (Ordinary Least Squares) attempts to predict the dependent variable, Amount Funded, using the independent variable Annual Income. The regression algorithm learns from this data to predict the right Amount Funded given the Annual Income. The OLS regression with Annual Income is set to predict Amount Funded (limiting the dataset to income <= $200,000) shows an R^2 of.201 This means that 20% of the variance in Funded Amount is explained by Annual Income. This, however, is a low R^2. With the assistance of the scatter plot, we do see that Annual Income is suggestive in determining the Funded Amount only up until the Annual Income of $72,000. Logistic Regression Next: 4 Logistic Regressions Determining Loan Status The first logistic regression is using the time of employment and the grade that the loan received from LC to predict loan status. Below is a chart highlighting the coefficients. Coefficients represent the mean change in the response variable for one unit of change in the predictor variable. In other words, a 1 year increase in employment length increases the chance of the loan being paid back by A 2 year increase in employment length increases the chance of the loan being paid back by , and so on. It would be interesting to see how effective the grade, that LC provides their loans, is at predicting loan status. Some background. The provided grades range from A G : A being the highest and G the lowest. As a result I mapped 7, the highest value, to A, 6 to B, 5 to C and so on until 1 as G. As the grade increases by 1 grade value the chance of the loan being paid off increases by Given we re using binary output of 0 as unpaid and 1 as paid. The closer the multiple of the grade and the coefficient is to 1 the higher the likelihood of the loan being paid off. Pretty much, if the grade is E or 3 the chance of payback is very high.
4 The second logistic regression is using funded amount and annual income to predict loan status. The reason for such low coefficients, for funded amount and annual income, is that the numbers are in thousands, granted they're in dollar amounts, and the explained variable, loan status, is binary ranging from 0 to 1. Let's look at the amount funded. As the amount funded increases by $10,000 the chance of it getting paid back decreases by = (10,000 x ). Similar, as annual income increases so does the chance of the loan being paid off. Intuitive, right? This is understandable and supported by the positive coefficient In other words as the annual income increases by $10,000 so does the chance of the loan being paid back by (10,000 x ) The third logistic regression is using home ownership status (Rent, Mortgage, Own, None, Other) to predict loan status. My understanding for someone putting OTHER for home ownership on the loan application is that they either did not want to reveal their home ownership situation, are hiding something, or are bad at filling out applications. None could be an honest answer, from someone that may be living with their parents. Regardless, it seems that if someone checks off OTHER and gets funded, then there s a very good chance of that individual defaulting on his or her loan.
5 The fourth logistic regression is using employment length (<1 year 10+ years) to predict loan status. There doesn t immediately appear to be too much variance between the generated coefficients of years employed. It looks like; so long as the person is employed they will be paying back their loan. However, it holds true, that if someone is unemployed or has less than a year of employment then they ll have a lower chance of repaying their loan. I didn t investigate which percentage of <1 year is employed or unemployed. Interestingly, and probably just a coincidence, because the results are really marginal, if a person is employed for 4 years they have the same coefficient of paying back their loan as someone employed for one year or less. Just an observation. I will not be pursuing that point any further. To conclude the work on logistic regression: the data set is deficient in explored features that I lacked, in experience leveraged with time, to explore. From the findings that I got, I can t speak definitively, but I would say avoid giving loans to people that don t specify home ownership and do give loans to people with higher income. Decision Tree and The Confusion Matrix Confusion Matrix allows for more detailed analysis than mere proportion of correct guesses. For instance 177 loans from paid loans were incorrectly predicted as unpaid. Based on the entries in the confusion matrix, the total number of correct predictions made by the model is (177 loans + 31,594 loans) and the total number of incorrect predictions is (177 loans + 8,920 loans). The confusion matrix provides the information needed to determine how well a classification model performs. The performance metric, accuracy, summarizes this information with a single number.777 Accuracy takes the total number of correct predictions and divides it by the total number of all predictions made.
6 Mapping Paid and Unpaid Loans The above map is referred to as the choropleth map, "a thematic map in which areas are shade patterned in proportion to the measurement of the statistical variable being displayed." (wikipedia) As the intensity of the color increases (gets closer to 1), on average the majority of the people residing in that state have paid of their loan. The number near the point references the amount of loans given in that state. By the looks of the map Nebraska, Missouri, Oregon, Virginia, Montana, Wyoming and South Dakota are not the states that are too fortunate in repaying their loans. Of course this an average of individual loans, per state, discounting specific regions of the state, and is not the best estimate for whether a funded individual in that state is likely to repay their loan. However, maybe the other features could help determine which state is less likelier to pay off a loan.
7 Mapping Amount Funded Understanding that as the amount funded increases so does the chance of the loan not being paid back, we could see that Mississippi is a state with a fairly large funded amount. Mississippi is also a state, according to the map on loan status, a state that doesn t do too well in repaying their loans. On average, individuals receiving a loan in Mississippi are much more likelier to default on their loan as they are also likelier to receive bigger loans. Lets look further.
8 Mapping Annual Income There are several outliers in the data that have been removed, in terms of annual income. Before removing the outliers, the income ranges from $33, to $7,241,778. Which is an obscene amount. I limit it to $200, The map ranges reflects the annual income up to $120,000. Interestingly, Mississippi is the state with an average income, between 60k 80k with the lowest payback rate and on average the state that takes out the highest loans.
9 Mapping The Grade Assigned to Individual Loans Keeping on track with Mississippi, a state I'm not too familiar with, it also happens to have a terrible rating for loans according to the data. I could understand why Lending Club, on average, would give a pretty poor grade to loans in Oregon. The average population there a fairly good income, but I guess it s not too predictive of a good grade. We could see that by looking at the income map presented before.
10 Mapping Employment Length Mississippi appears to have fairly good employment. It doesn t appear to be too predictive of their faulty loans. Conclusion: Avoid Mississippi. Wish I could go further into this. Don t give a loan to someone that doesn t know his or her homeownership status. Lending Club data download site: https://www.lendingclub.com/info/download data.action
Using Excel for Statistical Analysis
Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure
More informationLending Club Interest Rate Data Analysis
Lending Club Interest Rate Data Analysis 1. Introduction Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially
More informationPresentation of Data
ECON 2A: Advanced Macroeconomic Theory I ve put together some pointers when assembling the data analysis portion of your presentation and final paper. These are not all inclusive, but are things to keep
More informationTitle: Lending Club Interest Rates are closely linked with FICO scores and Loan Length
Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].
More informationIntroduction to Regression. Dr. Tom Pierce Radford University
Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.
More informationNotes 5: More on regression and residuals ECO 231W  Undergraduate Econometrics
Notes 5: More on regression and residuals ECO 231W  Undergraduate Econometrics Prof. Carolina Caetano 1 Regression Method Let s review the method to calculate the regression line: 1. Find the point of
More informationPaying off a debt. Ethan D. Bolker Maura B. Mast. December 4, 2007
Paying off a debt Ethan D. Bolker Maura B. Mast December 4, 2007 Plan Lecture notes Can you afford a mortgage? There s a $250,000 condominium you want to buy. You ve managed to scrape together $50,000
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationTAX SALE ARBITRAGE SYSTEM
TAX SALE ARBITRAGE SYSTEM TAX SALE ARBITRAGE SYSTEM I am Not an attorney, Not giving legal or tax advise Best efforts to find legal answers Law always changing Giving you more information in the online
More informationMEASURES OF DISPERSION
MEASURES OF DISPERSION Measures of Dispersion While measures of central tendency indicate what value of a variable is (in one sense or other) average or central or typical in a set of data, measures of
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationT O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
More informationEquifax Risk Score 2.0
Equifax Risk Score 2.0 Customer Handbook Prepared by: Larry Macdonald, Sr. Product Manager 10Jun2014 Table of Contents Introducing the Equifax Risk Score 2.0... 3 1. Modeling Concepts... 3 1.1 Population
More informationRegression Analysis Using ArcMap. By Jennie Murack
Regression Analysis Using ArcMap By Jennie Murack Regression Basics How is Regression Different from other Spatial Statistical Analyses? With other tools you ask WHERE something is happening? Are there
More informationScatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationEverything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6
Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6 Number Systems No course on programming would be complete without a discussion of the Hexadecimal (Hex) number
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More informationUnderstanding. What you need to know about the most widely used credit scores
Understanding What you need to know about the most widely used credit scores 300 850 2 The score lenders use. FICO Scores are the most widely used credit scores according to a recent CEB TowerGroup analyst
More informationStatistical Foundations: Measures of Location and Central Tendency and Summation and Expectation
Statistical Foundations: and Central Tendency and and Lecture 4 September 5, 2006 Psychology 790 Lecture #49/05/2006 Slide 1 of 26 Today s Lecture Today s Lecture Where this Fits central tendency/location
More informationModels for Discrete Variables
Probability Models for Discrete Variables Our study of probability begins much as any data analysis does: What is the distribution of the data? Histograms, boxplots, percentiles, means, standard deviations
More informationHelpful Information for a First Time Mortgage
Helpful Information for a First Time Mortgage Getting Started Many people buying their first home are afraid lenders don't really want to work with them. But that's simply not true. Without you, there
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationA short primer on residual plots
Chapter 24 A short primer on residual plots Contents 24.1 Linear Regression................................... 1598 24.2 ANOVA residual plots................................. 1599 24.3 Logistic Regression
More informationPrediction of Car Prices of Federal Auctions
Prediction of Car Prices of Federal Auctions BUDT733 Final Project Report Tetsuya Morito Karen Pereira JungFu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend
More information6th Grade Lesson Plan: Probably Probability
6th Grade Lesson Plan: Probably Probability Overview This series of lessons was designed to meet the needs of gifted children for extension beyond the standard curriculum with the greatest ease of use
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationCross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
More informationX  Xbar : ( 4150) (4850) (5050) (5050) (5450) (5750) Deviations: (note that sum = 0) Squared :
Review Exercises Average and Standard Deviation Chapter 4, FPP, p. 7476 Dr. McGahagan Problem 1. Basic calculations. Find the mean, median, and SD of the list x = (50 41 48 54 57 50) Mean = (sum x) /
More informationRelative Risk, Odds, and Fisher s exact test
Relative Risk, Odds, and Fisher s exact test I) Relative Risk A) Simply, relative risk is the ratio of p 1 / p 2. For instance, suppose we wanted to take another look at our Seat belt safety data from
More informationDiscriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into
Discriminant Function Analysis in SPSS To do DFA in SPSS, start from Classify in the Analyze menu (because we re trying to classify participants into different groups). In this case we re looking at a
More informationJetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
More informationharpreet@utdallas.edu, {ram.gopal, xinxin.li}@business.uconn.edu
Risk and Return of Investments in Online PeertoPeer Lending (Extended Abstract) Harpreet Singh a, Ram Gopal b, Xinxin Li b a School of Management, University of Texas at Dallas, Richardson, Texas 750830688
More informationTHINGS TO KNOW WHEN SHOPPING FOR STUDENT LOANS BROUGHT TO YOU BY
10 THINGS TO KNOW WHEN SHOPPING FOR STUDENT LOANS BROUGHT TO YOU BY The College Ave Team WHAT S INSIDE 4 UNDERSTAND HOW LENDING (AND BORROWING) WORKS 5 THERE ARE TWO TYPES OF STUDENT LOANS: FEDERAL AND
More informationCAPSTONE ADVISOR: PROFESSOR MARY HANSEN
STEVEN NWAMKPA GOVERNMENT INTERVENTION IN THE FINANCIAL MARKET: DOES AN INCREASE IN SMALL BUSINESS ADMINISTRATION GUARANTEE LOANS TO SMALL BUSINESSES INCREASE GDP PER CAPITA INCOME? CAPSTONE ADVISOR: PROFESSOR
More informationA Sample Portfolio Optimization Program in Excel PRELIMINARY USER INSTRUCTIONS
A Sample Portfolio Optimization Program in Excel PRELIMINARY USER INSTRUCTIONS Copyright 2006 By Robert C. Smithson, Anava Capital Management LLC And Anava Corporation This program is free software; you
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationChapter 15 Multiple Choice Questions (The answers are provided after the last question.)
Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationRegression Analysis: Basic Concepts
The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance
More informationSPSS: Descriptive and Inferential Statistics. For Windows
For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 ChiSquare Test... 10 2.2 T tests... 11 2.3 Correlation...
More informationCredit Scorecards for SME Finance The Process of Improving Risk Measurement and Management
Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management April 2009 By Dean Caire, CFA Most of the literature on credit scoring discusses the various modelling techniques
More informationSTAB22 section 2.1. Figure 1: Scatterplot of price vs. size for Mocha Frappuccino
STAB22 section 2.1 2.3 Both ounces and price are quantitative variables, and so we could draw a scatterplot to see how they are related. We might expect that bigger sizes cost more, though a Venti (24
More informationThe United States of Obesity
The United States of Obesity Matt Malloure Grand Valley State University matt.malloure@gmail.com Diann Reischman Grand Valley State University reischmd@gvsu.edu Mary Richardson Grand Valley State University
More informationFigure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 123. Charting features have improved significantly over the
More informationThe More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner
Paper 33612015 The More Trees, the Better! Scaling Up Performance Using Random Forest in SAS Enterprise Miner Narmada Deve Panneerselvam, Spears School of Business, Oklahoma State University, Stillwater,
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationApproximately 45 minutes worth of materials for a Y8 9 Citizenship/PSHE lesson on Managing money / Personal finance.
Approximately 45 minutes worth of materials for a Y8 9 Citizenship/PSHE lesson on Managing money / Personal finance. Learning objectives: understanding that some money choices are risky evaluating the
More informationA better way your parents can help you into your first home.
A better way your parents can help you into your first home. exclusively from You ll never need to ask your parents to guarantee your home loan. Almost half of first home owners get financial help from
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationDigging Deeper into Safety and Injury Prevention Data
Digging Deeper into Safety and Injury Prevention Data Amanda Schwartz: Have you ever wondered how you could make your center safer using information you already collect? I'm Amanda Schwartz from the Head
More informationLogistic Regression. Introduction. The Purpose Of Logistic Regression
Logistic Regression...1 Introduction...1 The Purpose Of Logistic Regression...1 Assumptions Of Logistic Regression...2 The Logistic Regression Equation...3 Interpreting Log Odds And The Odds Ratio...4
More informationDiagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Webbased Analytics Table of Contents INTRODUCTION: WHAT
More informationby Peter Renton LendAcademy.com
by Peter Renton LendAcademy.com www.lendacademy.com This e book is provided for information purposes only. It is not intended to be financial advice; you should always seek a professional before making
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationEECS 349 Titanic Machine Learning From Disaster
EECS 349 Titanic Machine Learning From Disaster Xiaodong Yang Northwestern University Abstract In this project, we see how we can use machinelearning techniques to predict survivors of the Titanic. With
More informationSoftware User Experience and Likelihood to Recommend: Linking UX and NPS
Software User Experience and Likelihood to Recommend: Linking UX and NPS Erin Bradner User Research Manager Autodesk Inc. One Market St San Francisco, CA USA erin.bradner@autodesk.com Jeff Sauro Founder
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationRemortgage. A simple guide for Remortgage
A simple guide for Barr Financial Services is regulated by the FSA. FAS no. 506976. INTRODUCTION This guide hopes to help you understand what a remortgage is and why it may be right for you. If you own
More informationUnit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.
Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L34) is a summary BLM for the material
More informationCREDIT SCORE A COMPREHENSIVE GUIDE TO YOUR. arborfcu.org
A COMPREHENSIVE GUIDE TO YOUR CREDIT SCORE You hear a lot of things about your credit score and how important it is, but how much do you really know about it? Courtesy of Arbor Financial Credit Union This
More informationUnit 22 OneSided and TwoSided Hypotheses Tests
Unit 22 OneSided and TwoSided Hypotheses Tests Objectives: To differentiate between a onesided hypothesis test and a twosided hypothesis test about a population proportion or a population mean To understand
More informationData Mining  Evaluation of Classifiers
Data Mining  Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationHomework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm.
Homework #3 is due Friday by 5pm. Homework #4 will be posted to the class website later this week. It will be due Friday, March 7 th, at 5pm. Political Science 15 Lecture 12: Hypothesis Testing Sampling
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationGrade 6 Math Circles. Binary and Beyond
Faculty of Mathematics Waterloo, Ontario N2L 3G1 The Decimal System Grade 6 Math Circles October 15/16, 2013 Binary and Beyond The cool reality is that we learn to count in only one of many possible number
More informationCase Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?
Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention
More informationVariables and Data A variable contains data about anything we measure. For example; age or gender of the participants or their score on a test.
The Analysis of Research Data The design of any project will determine what sort of statistical tests you should perform on your data and how successful the data analysis will be. For example if you decide
More informationCredit Scoring Modelling for Retail Banking Sector.
Credit Scoring Modelling for Retail Banking Sector. Elena Bartolozzi, Matthew Cornford, Leticia GarcíaErgüín, Cristina Pascual Deocón, Oscar Iván Vasquez & Fransico Javier Plaza. II Modelling Week, Universidad
More informationChapter 6: Answers. Omnibus Tests of Model Coefficients. Chisquare df Sig Step Block Model.
Task Chapter 6: Answers Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and
More informationStudents' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
More informationReal Estate Market Analysis Smith Realty, LLC Arlington, VA
Real Estate Market Analysis Smith Realty, LLC Arlington, VA Team 5 Monisha Banerjee Megahn Hallahan Dave Lake Tyler Morris Matt Welsh Thursday, May 11, 2006 Agenda I. Objective & Motivations II. Data Background
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression  ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationONE HEN ACADEMY EDUCATOR GUIDE
ONE HEN ACADEMY EDUCATOR GUIDE 2013 One Hen, Inc. 3 OHA Module 3: Loans, Interest, & Borrowing Money This OHA Module introduces students to the common financial concepts of loans, loan interest, and the
More informationHypothesis Testing. Chapter Introduction
Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic
More informationSMR Research Corporation Stuart A. Feldstein, President
SMR Research Corporation Stuart A. Feldstein, President 300 Valentine Street Hackettstown, NJ 07840 Phone 9088527677 Fax 9088526884 Visit www.smrresearch.com Home Equity Lending To DebtFree Home Owners
More informationCRJ Doctoral Comprehensive Exam Statistics Friday August 23, :00pm 5:30pm
CRJ Doctoral Comprehensive Exam Statistics Friday August 23, 23 2:pm 5:3pm Instructions: (Answer all questions below) Question I: Data Collection and Bivariate Hypothesis Testing. Answer the following
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationData Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
More informationCorrelation and Regression (Exercises)
Chapter V Correlation and Regression (Exercises) 5.. The following table shows the annual income [ ] and years of education of persons: Person Income [ ] Years of Education 25 9 2 2 3 4 6 4 35 6 5 4 8
More informationData Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:
More informationCash Rents Methodology and Quality Measures
ISSN: 2167129X Cash Rents Methodology and Quality Measures Released August 1, 2014, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United States Department of Agriculture
More informationClub Accounts. 2011 Question 6.
Club Accounts. 2011 Question 6. Anyone familiar with Farm Accounts or Service Firms (notes for both topics are back on the webpage you found this on), will have no trouble with Club Accounts. Essentially
More information11.3 BREAKEVEN ANALYSIS. Fixed and Variable Costs
385 356 PART FOUR Capital Budgeting a large number of NPV estimates that we summarize by calculating the average value and some measure of how spread out the different possibilities are. For example, it
More informationINTRODUCING AZURE MACHINE LEARNING
David Chappell INTRODUCING AZURE MACHINE LEARNING A GUIDE FOR TECHNICAL PROFESSIONALS Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents What is Machine Learning?... 3 The
More informationVLSM CERTIFICATION OBJECTIVES Q&A. TwoMinute Drill Self Test VLSM 8.02 Route Summarization
8 VLSM CERTIFICATION OBJECTIVES 8.01 VLSM 8.02 Route Summarization Q&A TwoMinute Drill Self Test 228 Chapter 8: VLSM In Chapter 7, you were introduced to IP addressing and subnetting, including such topics
More informationData Visualization. BUS 230: Business and Economic Research and Communication
Data Visualization BUS 230: Business and Economic Research and Communication Data Visualization 1/ 16 Purpose of graphs and charts is to show a picture that can enhance a message, or quickly communicate
More information