Predicting Defaults of Loans using Lending Club s Loan Data
|
|
- Anna Wright
- 7 years ago
- Views:
Transcription
1 Predicting Defaults of Loans using Lending Club s Loan Data Oleh Dubno Fall 2014 General Assembly Data Science Link to my Developer Notebook (ipynb) - Background and Hypothesis: The data is coming from Lending Club, a peer- to- peer lending company, headquartered in San Francisco. LC began by operating as an online consumer- lending platform that enables borrowers to obtain a loan that s funded by individuals and institutions. LC, just recently made their loans available to small businesses. I will be focusing on the prior. The dataset and the associated description of its features are downloadable on the LC site. It comes equipped with 188,127 values and 31 features. Goal: Discover the features that are indicative of someone paying or defaulting on their loan. Tools: Logistic regression, Naïve Bayes, Decision Tree To determine which features of the data set contribute towards someone repaying or defaulting on his or her loan and using the Decision Tree to see how well the model performs against a test set. Folium To map the features of the dataset. By initially mapping a bar chart of the loan statuses, seven unique values become discoverable. To do the logistic regression only two are required. (see figure below) The focus is around predicting who repays or defaults on their loan. As a result, the Current column will be removed, the Fully Paid column will remain and the rest of the columns will be grouped and characterized as Unpaid. This is then converted to Boolean values: Unpaid 0 and Paid 1.
2 The data has now been drastically reduced. Given that Current is a heavy hitter, removing it reduces the dataset to 54,419 entries. This is necessary, provided the goal is not to focus on current loans. Data Overview The average funded amount of an individual loan is $13, The minimum loan given out is $1,00.00 with a median amount of $12,000 and a maximum amount of just $35, The funded amount is normally distributed and the numbers do not appear to be skewed. Good! The average annual income is $71, with a minimum income of $4,800, a median of $62,000 and a maximum income of $7,141,778. The maximum value serves as a definite outlier and the set will be limited to $200,000. Not surprisingly, as Annual Income goes up so does the Funded Amount. The sweet spot, after which Annual Income does not predict Funded Amount, seems to be at about the mean of the annual income itself of $72,000. I suppose the mean annual income of $72,000 matches the cut off for loans at $35,000 for good reasons. Interestingly, Lending club seems to have a strict policy, limiting the Amount Funded according to the individual Annual Income, up to $72,000, after which it begins to vary.
3 Lets run an OLS regression using Annual Income (predictor) to predict Amount Funded (the explained variable). OLS (Ordinary Least Squares) attempts to predict the dependent variable, Amount Funded, using the independent variable Annual Income. The regression algorithm learns from this data to predict the right Amount Funded given the Annual Income. The OLS regression with Annual Income is set to predict Amount Funded (limiting the dataset to income <= $200,000) shows an R^2 of.201 This means that 20% of the variance in Funded Amount is explained by Annual Income. This, however, is a low R^2. With the assistance of the scatter plot, we do see that Annual Income is suggestive in determining the Funded Amount only up until the Annual Income of $72,000. Logistic Regression Next: 4 Logistic Regressions Determining Loan Status The first logistic regression is using the time of employment and the grade that the loan received from LC to predict loan status. Below is a chart highlighting the coefficients. Coefficients represent the mean change in the response variable for one unit of change in the predictor variable. In other words, a 1 year increase in employment length increases the chance of the loan being paid back by A 2 year increase in employment length increases the chance of the loan being paid back by , and so on. It would be interesting to see how effective the grade, that LC provides their loans, is at predicting loan status. Some background. The provided grades range from A G : A being the highest and G the lowest. As a result I mapped 7, the highest value, to A, 6 to B, 5 to C and so on until 1 as G. As the grade increases by 1 grade value the chance of the loan being paid off increases by Given we re using binary output of 0 as unpaid and 1 as paid. The closer the multiple of the grade and the coefficient is to 1 the higher the likelihood of the loan being paid off. Pretty much, if the grade is E or 3 the chance of payback is very high.
4 The second logistic regression is using funded amount and annual income to predict loan status. The reason for such low coefficients, for funded amount and annual income, is that the numbers are in thousands, granted they're in dollar amounts, and the explained variable, loan status, is binary ranging from 0 to 1. Let's look at the amount funded. As the amount funded increases by $10,000 the chance of it getting paid back decreases by = (10,000 x ). Similar, as annual income increases so does the chance of the loan being paid off. Intuitive, right? This is understandable and supported by the positive coefficient In other words as the annual income increases by $10,000 so does the chance of the loan being paid back by (10,000 x ) The third logistic regression is using home ownership status (Rent, Mortgage, Own, None, Other) to predict loan status. My understanding for someone putting OTHER for home ownership on the loan application is that they either did not want to reveal their home ownership situation, are hiding something, or are bad at filling out applications. None could be an honest answer, from someone that may be living with their parents. Regardless, it seems that if someone checks off OTHER and gets funded, then there s a very good chance of that individual defaulting on his or her loan.
5 The fourth logistic regression is using employment length (<1 year 10+ years) to predict loan status. There doesn t immediately appear to be too much variance between the generated coefficients of years employed. It looks like; so long as the person is employed they will be paying back their loan. However, it holds true, that if someone is unemployed or has less than a year of employment then they ll have a lower chance of repaying their loan. I didn t investigate which percentage of <1 year is employed or unemployed. Interestingly, and probably just a coincidence, because the results are really marginal, if a person is employed for 4 years they have the same coefficient of paying back their loan as someone employed for one year or less. Just an observation. I will not be pursuing that point any further. To conclude the work on logistic regression: the data set is deficient in explored features that I lacked, in experience leveraged with time, to explore. From the findings that I got, I can t speak definitively, but I would say avoid giving loans to people that don t specify home ownership and do give loans to people with higher income. Decision Tree and The Confusion Matrix Confusion Matrix allows for more detailed analysis than mere proportion of correct guesses. For instance 177 loans from paid loans were incorrectly predicted as unpaid. Based on the entries in the confusion matrix, the total number of correct predictions made by the model is (177 loans + 31,594 loans) and the total number of incorrect predictions is (177 loans + 8,920 loans). The confusion matrix provides the information needed to determine how well a classification model performs. The performance metric, accuracy, summarizes this information with a single number.777 Accuracy takes the total number of correct predictions and divides it by the total number of all predictions made.
6 Mapping Paid and Unpaid Loans The above map is referred to as the choropleth map, "a thematic map in which areas are shade patterned in proportion to the measurement of the statistical variable being displayed." (wikipedia) As the intensity of the color increases (gets closer to 1), on average the majority of the people residing in that state have paid of their loan. The number near the point references the amount of loans given in that state. By the looks of the map Nebraska, Missouri, Oregon, Virginia, Montana, Wyoming and South Dakota are not the states that are too fortunate in repaying their loans. Of course this an average of individual loans, per state, discounting specific regions of the state, and is not the best estimate for whether a funded individual in that state is likely to repay their loan. However, maybe the other features could help determine which state is less likelier to pay off a loan.
7 Mapping Amount Funded Understanding that as the amount funded increases so does the chance of the loan not being paid back, we could see that Mississippi is a state with a fairly large funded amount. Mississippi is also a state, according to the map on loan status, a state that doesn t do too well in repaying their loans. On average, individuals receiving a loan in Mississippi are much more likelier to default on their loan as they are also likelier to receive bigger loans. Lets look further.
8 Mapping Annual Income There are several outliers in the data that have been removed, in terms of annual income. Before removing the outliers, the income ranges from $33, to $7,241,778. Which is an obscene amount. I limit it to $200, The map ranges reflects the annual income up to $120,000. Interestingly, Mississippi is the state with an average income, between 60k 80k with the lowest payback rate and on average the state that takes out the highest loans.
9 Mapping The Grade Assigned to Individual Loans Keeping on track with Mississippi, a state I'm not too familiar with, it also happens to have a terrible rating for loans according to the data. I could understand why Lending Club, on average, would give a pretty poor grade to loans in Oregon. The average population there a fairly good income, but I guess it s not too predictive of a good grade. We could see that by looking at the income map presented before.
10 Mapping Employment Length Mississippi appears to have fairly good employment. It doesn t appear to be too predictive of their faulty loans. Conclusion: Avoid Mississippi. Wish I could go further into this. Don t give a loan to someone that doesn t know his or her homeownership status. Lending Club data download site: data.action
Using Excel for Statistical Analysis
Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure
More informationLending Club Interest Rate Data Analysis
Lending Club Interest Rate Data Analysis 1. Introduction Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially
More informationTitle: Lending Club Interest Rates are closely linked with FICO scores and Loan Length
Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
More informationPaying off a debt. Ethan D. Bolker Maura B. Mast. December 4, 2007
Paying off a debt Ethan D. Bolker Maura B. Mast December 4, 2007 Plan Lecture notes Can you afford a mortgage? There s a $250,000 condominium you want to buy. You ve managed to scrape together $50,000
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS
DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.
More informationUnderstanding. What you need to know about the most widely used credit scores
Understanding What you need to know about the most widely used credit scores 300 850 2 The score lenders use. FICO Scores are the most widely used credit scores according to a recent CEB TowerGroup analyst
More informationT O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationDigging Deeper into Safety and Injury Prevention Data
Digging Deeper into Safety and Injury Prevention Data Amanda Schwartz: Have you ever wondered how you could make your center safer using information you already collect? I'm Amanda Schwartz from the Head
More informationScatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
More information6th Grade Lesson Plan: Probably Probability
6th Grade Lesson Plan: Probably Probability Overview This series of lessons was designed to meet the needs of gifted children for extension beyond the standard curriculum with the greatest ease of use
More informationCredit Scorecards for SME Finance The Process of Improving Risk Measurement and Management
Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management April 2009 By Dean Caire, CFA Most of the literature on credit scoring discusses the various modelling techniques
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationPrediction of Car Prices of Federal Auctions
Prediction of Car Prices of Federal Auctions BUDT733- Final Project Report Tetsuya Morito Karen Pereira Jung-Fu Su Mahsa Saedirad 1 Executive Summary The goal of this project is to provide buyers who attend
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationHelpful Information for a First Time Mortgage
Helpful Information for a First Time Mortgage Getting Started Many people buying their first home are afraid lenders don't really want to work with them. But that's simply not true. Without you, there
More informationharpreet@utdallas.edu, {ram.gopal, xinxin.li}@business.uconn.edu
Risk and Return of Investments in Online Peer-to-Peer Lending (Extended Abstract) Harpreet Singh a, Ram Gopal b, Xinxin Li b a School of Management, University of Texas at Dallas, Richardson, Texas 75083-0688
More informationA Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic
A Study to Predict No Show Probability for a Scheduled Appointment at Free Health Clinic Report prepared for Brandon Slama Department of Health Management and Informatics University of Missouri, Columbia
More informationCross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.
Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models. Dr. Jon Starkweather, Research and Statistical Support consultant This month
More informationGood luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:
Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours
More informationJetBlue Airways Stock Price Analysis and Prediction
JetBlue Airways Stock Price Analysis and Prediction Team Member: Lulu Liu, Jiaojiao Liu DSO530 Final Project JETBLUE AIRWAYS STOCK PRICE ANALYSIS AND PREDICTION 1 Motivation Started in February 2000, JetBlue
More informationWeek 4: Standard Error and Confidence Intervals
Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Standard Error and Confidence Intervals Sampling Most research data come from subjects we think of as samples drawn from a larger population.
More informationEverything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6
Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6 Number Systems No course on programming would be complete without a discussion of the Hexadecimal (Hex) number
More informationWhat is a Remortgage and Why Do You Need One?
A simple guide for Barr Financial Services is regulated by the FSA. FAS no. 506976. INTRODUCTION This guide hopes to help you understand what a remortgage is and why it may be right for you. If you own
More informationCredit Scoring Modelling for Retail Banking Sector.
Credit Scoring Modelling for Retail Banking Sector. Elena Bartolozzi, Matthew Cornford, Leticia García-Ergüín, Cristina Pascual Deocón, Oscar Iván Vasquez & Fransico Javier Plaza. II Modelling Week, Universidad
More informationCAPSTONE ADVISOR: PROFESSOR MARY HANSEN
STEVEN NWAMKPA GOVERNMENT INTERVENTION IN THE FINANCIAL MARKET: DOES AN INCREASE IN SMALL BUSINESS ADMINISTRATION GUARANTEE LOANS TO SMALL BUSINESSES INCREASE GDP PER CAPITA INCOME? CAPSTONE ADVISOR: PROFESSOR
More informationONE HEN ACADEMY EDUCATOR GUIDE
ONE HEN ACADEMY EDUCATOR GUIDE 2013 One Hen, Inc. 3 OHA Module 3: Loans, Interest, & Borrowing Money This OHA Module introduces students to the common financial concepts of loans, loan interest, and the
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationGrade 6 Math Circles. Binary and Beyond
Faculty of Mathematics Waterloo, Ontario N2L 3G1 The Decimal System Grade 6 Math Circles October 15/16, 2013 Binary and Beyond The cool reality is that we learn to count in only one of many possible number
More informationUnit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.
Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material
More informationLearning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal
Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether
More informationWhat is a Credit Score and Why Do I Care What It Is?
What is a Credit Score and Why Do I Care What It Is? Your Credit Score is a lot like the score you get on a test. You get points for good credit decisions and behavior and you get points taken away for
More informationCredit Score and Report Overview
Credit Score and Report Overview Have you ever wondered how your credit score is calculated? Have you ever asked, but are always given vague answers? I will tell you exactly how credit scores are determined
More informationA better way your parents can help you into your first home.
A better way your parents can help you into your first home. exclusively from You ll never need to ask your parents to guarantee your home loan. Almost half of first home owners get financial help from
More informationQ: What types of businesses/industries can benefit from the SBA loan programs? A: Most small owner-operated business can benefit from SBA loans
Interview with Alan Thomes, President, SBA Loan Division State Bank and Trust Company For many new start-ups and small businesses, an SBA loan may be an appropriate form of financing. In this interview
More informationApproximately 45 minutes worth of materials for a Y8 9 Citizenship/PSHE lesson on Managing money / Personal finance.
Approximately 45 minutes worth of materials for a Y8 9 Citizenship/PSHE lesson on Managing money / Personal finance. Learning objectives: understanding that some money choices are risky evaluating the
More informationFigure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationSoftware User Experience and Likelihood to Recommend: Linking UX and NPS
Software User Experience and Likelihood to Recommend: Linking UX and NPS Erin Bradner User Research Manager Autodesk Inc. One Market St San Francisco, CA USA erin.bradner@autodesk.com Jeff Sauro Founder
More informationDiagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
More informationHealth Care Vocabulary Lesson
Hello. This is AJ Hoge again. Welcome to the vocabulary lesson for Health Care. Let s start. * * * * * At the beginning of the conversation Joe and Kristin talk about a friend, Joe s friend, whose name
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationStudents' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
More information11.3 BREAK-EVEN ANALYSIS. Fixed and Variable Costs
385 356 PART FOUR Capital Budgeting a large number of NPV estimates that we summarize by calculating the average value and some measure of how spread out the different possibilities are. For example, it
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More information27 Ways To Buy Multi-Family Properties With NO MONEY DOWN
27 Ways To Buy Multi-Family Properties With NO MONEY DOWN By David Lindahl RE Mentor, Inc 100 Weymouth Street, Building D Rockland, MA 02370 REMENTOR.COM 2 Legal Notice This information is designed to
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationSMR Research Corporation Stuart A. Feldstein, President
SMR Research Corporation Stuart A. Feldstein, President 300 Valentine Street Hackettstown, NJ 07840 Phone 908-852-7677 Fax 908-852-6884 Visit www.smrresearch.com Home Equity Lending To Debt-Free Home Owners
More informationWhat is Predictive Analytics?
What is Predictive Analytics? Firstly, Analytics is the use of data, statistical analysis, and explanatory and predictive models to gain insights and act on complex issues. EDUCAUSE Center for Applied
More informationCash Rents Methodology and Quality Measures
ISSN: 2167-129X Cash Rents Methodology and Quality Measures Released August 1, 2014, by the National Agricultural Statistics Service (NASS), Agricultural Statistics Board, United States Department of Agriculture
More informationThe 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation. *solve: to find a solution, explanation, or answer for
The 5 P s in Problem Solving 1 How do other people solve problems? The 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation *solve: to find a solution, explanation, or answer
More informationCase Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?
Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationPersonal Financial Literacy
Personal Financial Literacy 7 Unit Overview Being financially literate means taking responsibility for learning how to manage your money. In this unit, you will learn about banking services that can help
More informationReal Estate Market Analysis Smith Realty, LLC Arlington, VA
Real Estate Market Analysis Smith Realty, LLC Arlington, VA Team 5 Monisha Banerjee Megahn Hallahan Dave Lake Tyler Morris Matt Welsh Thursday, May 11, 2006 Agenda I. Objective & Motivations II. Data Background
More informationPoint and Interval Estimates
Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number
More informationUnit 6 Number and Operations in Base Ten: Decimals
Unit 6 Number and Operations in Base Ten: Decimals Introduction Students will extend the place value system to decimals. They will apply their understanding of models for decimals and decimal notation,
More informationINTRODUCING AZURE MACHINE LEARNING
David Chappell INTRODUCING AZURE MACHINE LEARNING A GUIDE FOR TECHNICAL PROFESSIONALS Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents What is Machine Learning?... 3 The
More informationClub Accounts. 2011 Question 6.
Club Accounts. 2011 Question 6. Anyone familiar with Farm Accounts or Service Firms (notes for both topics are back on the webpage you found this on), will have no trouble with Club Accounts. Essentially
More informationComparison of EngineRoom (6.0) with Minitab (16) and Quality Companion (3)
Comparison of EngineRoom (6.0) with Minitab (16) and Quality Companion (3) What is EngineRoom? A Microsoft Excel add in A suite of powerful, simple to use Lean and Six Sigma data analysis tools Built for
More informationDescribing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
More informationDecimal Notations for Fractions Number and Operations Fractions /4.NF
Decimal Notations for Fractions Number and Operations Fractions /4.NF Domain: Cluster: Standard: 4.NF Number and Operations Fractions Understand decimal notation for fractions, and compare decimal fractions.
More informationCheck Your Credit First
I hear the same thing from many aspiring first time home owners: they would LOVE to buy a home, but they have difficulty in getting approved for financing. There are 3 key items banks are looking at when
More informationData Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
More informationIBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationData Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago sagarikaprusty@gmail.com Keywords:
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationClassification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationINTRODUCTION TO CREDIT
Grades 4-5 Lesson 3 INTRODUCTION TO CREDIT Key concepts: card companies. Borrowing money through credit, evaluating credit and credit Summary: This lesson introduces students to credit cards, credit card
More informationR Graphics Cookbook. Chang O'REILLY. Winston. Tokyo. Beijing Cambridge. Farnham Koln Sebastopol
R Graphics Cookbook Winston Chang Beijing Cambridge Farnham Koln Sebastopol O'REILLY Tokyo Table of Contents Preface ix 1. R Basics 1 1.1. Installing a Package 1 1.2. Loading a Package 2 1.3. Loading a
More informationMicrosoft Azure Machine learning Algorithms
Microsoft Azure Machine learning Algorithms Tomaž KAŠTRUN @tomaz_tsql Tomaz.kastrun@gmail.com http://tomaztsql.wordpress.com Our Sponsors Speaker info https://tomaztsql.wordpress.com Agenda Focus on explanation
More informationChart Pack. Table of Contents: In States That Don t Expand Medicaid, Who Gets New Coverage Assistance Under the ACA and Who Doesn t?
In States That Don t Expand Medicaid, Who Gets New Coverage Assistance Under the ACA and Who Doesn t? Chart Pack Table of Contents: Table 1. Median Income of Uninsured Adults, by State and Eligibility
More informationUpdates to Graphing with Excel
Updates to Graphing with Excel NCC has recently upgraded to a new version of the Microsoft Office suite of programs. As such, many of the directions in the Biology Student Handbook for how to graph with
More informationIBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationCourse Overview Lean Six Sigma Green Belt
Course Overview Lean Six Sigma Green Belt Summary and Objectives This Six Sigma Green Belt course is comprised of 11 separate sessions. Each session is a collection of related lessons and includes an interactive
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationChapter 2: Descriptive Statistics
Chapter 2: Descriptive Statistics **This chapter corresponds to chapters 2 ( Means to an End ) and 3 ( Vive la Difference ) of your book. What it is: Descriptive statistics are values that describe the
More informationGetting Performance From Process Improvement
IT Metrics and Productivity e-newsletter Article Series: Getting Performance From Process Improvement By Michael West Article 3: Improving Performance Through Process Improvement In the first two articles
More informationDEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.
DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you. JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition
More informationKSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationBalance Sheet. Financial Management Series #1 9/2009
Balance Sheet Prepared By: James N. Kurtz, Extension Educator Financial Management Series #1 9/2009 A complete set of financial statements for agriculture include: a Balance Sheet; an Income Statement;
More informationMethods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL
Paper SA01-2012 Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL ABSTRACT Analysts typically consider combinations
More informationIt Is In Your Interest
STUDENT MODULE 7.2 BORROWING MONEY PAGE 1 Standard 7: The student will identify the procedures and analyze the responsibilities of borrowing money. It Is In Your Interest Jason did not understand how it
More informationSUGI 29 Statistics and Data Analysis
Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,
More informationTutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
More information