Course Syllabus. Purposes of Course:



Similar documents
AMIS 7640 Data Mining for Business Intelligence

AMIS 7640 Data Mining for Business Intelligence

MARK 7377 Customer Relationship Management / Database Marketing. Spring Last Updated: Jan 12, Rex Yuxing Du

CS Data Science and Visualization Spring 2016

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Course Description This course will change the way you think about data and its role in business.

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

FLORIDA INTERNATIONAL UNIVERSITY

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

Lowering social cost of car accidents by predicting high-risk drivers

CRN: STAT / CRN / INFO 4300 CRN

Categorical Data Analysis

El Camino College. Course Syllabus Fall 2014

MATH Probability & Statistics - Fall Semester 2015 Dr. Brandon Samples - Department of Mathematics - Georgia College

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Introduction to Statistics (Online) Syllabus/Course Information

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

More precisely, upon successful completion of this course you can expect to be able to:

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

Azure Machine Learning, SQL Data Mining and R

UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Syllabus

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Regression and Multivariate Data Analysis

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

Syllabus for MUAS 5322 Analysis of Music Production May 2016 Term

COURSE OUTCOMES: Upon successful completion of CUL 1010 students will:

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

POLS 3374, Section 2 (CID 80217) Quantitative Methods for Political Science Fall 2012, Online. Dr. Stacy G. Ulbig, Ph.D. CONTACT INFORMATION

Data Mining In Excel: Lecture Notes and Cases

COM 1010, Basic Web Design

Syllabus: SCML 3106 Principles of Supply Chain Management

Decision Sciences Data Analysis for Managers

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Introduction to Physics I (PHYS ) Fall Semester 2012

City University of Hong Kong. Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013

COURSE OBJECTIVES AND STUDENT LEARNING OUTCOMES:

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

DATA MINING FOR BUSINESS ANALYTICS

F l o r i d a G u l f C o a s t U n i v e r s i t y S t a t i s t i c a l M e t h o d s F a l l C R N

Investment Management Course

EDMS 769L: Statistical Analysis of Longitudinal Data 1809 PAC, Th 4:15-7:00pm 2009 Spring Semester

A. COURSE DESCRIPTION

How To Pass Chemistry 131

CEDAR CREST COLLEGE Psychological Assessment, PSY Spring Dr. Diane M. Moyer dmmoyer@cedarcrest.edu Office: Curtis 123

Data Mining: Overview. What is Data Mining?

REQUIRED TEXT: Slavin, R. E. Educational Psychology, Ninth Edition. Allyn and Bacon, 2009.

Alabama Department of Postsecondary Education. Representing The Alabama Community College System

MS1b Statistical Data Mining

Math 830- Elementary Algebra

CS 341: Foundations of Computer Science II elearning Section Syllabus, Spring 2015

Emmanuele Archange PC #234 MMC. By appointment

BUSINESS INTELLIGENCE WITH DATA MINING FALL 2012 PROFESSOR MAYTAL SAAR-TSECHANSKY

Math (Mathematics for Business and Social Sciences I),19480, MW 11:10 12:35, Room 117, AEC

PSY 6361 Teaching of Psychology Online Course Spring nd Eight Weeks

CHEM PRINCIPLES OF CHEMISTRY Lecture

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

ISM 4113: SYSTEMS ANALYSIS & DESIGN

ERP 5210 Performance Dashboards, Scorecard, and Data Visualization Course Syllabus Spring 2015

Multimedia 320 Syllabus

Psychology 420 (Sections 101 and 102) Experimental Psychology: Social Psychology Laboratory

Statistical Methods Online Course Syllabus

Course Name: Sociology 101, Introduction to Sociology Section # 9214 Ms. Haynes, vhaynes@elcamino.edu, ext. 2075/2076

Social Media Mining. Data Mining Essentials

COURSE REQUIREMENTS AND EXPECTATIONS FOR ALL STUDENTS ENROLLED IN COLLEGE ALGEBRA ROWAN UNIVERSITY CAMDEN CAMPUS SPRING 2011

Business Analytics Syllabus

Forensic Biology 3318 Syllabus

IST359 - INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

ACCY 2001 Intro Financial Accounting Fall 2014

CS 1340 Sec. A Time: 8:00AM, Location: Nevins Instructor: Dr. R. Paul Mihail, 2119 Nevins Hall, rpmihail@valdosta.

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

STAT 121 Hybrid SUMMER 2014 Introduction to Statistics for the Social Sciences Session I: May 27 th July 3 rd

Office: LSK 5045 Begin subject: [ISOM3360]...

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

MAC 2233, STA 2023, and junior standing

MSIS 635 Session 1 Health Information Analytics Spring 2014

SYLLABUS Honors College Algebra MAC 1105H / 3 credit hours Fall 2014

ITIS5432 A Business Analytics Methods Thursdays, 2:30pm -5:30pm, DT701

Social Work Statistics Spring 2000

MAT150 College Algebra Syllabus Spring 2015

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

IST359 INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

ABNORMAL PSYCHOLOGY (PSYCH 238) Psychology Building, Rm.31 Spring, 2010: Section K. Tues, Thurs 1:45-2:45pm and by appointment (schedule via )

Lake-Sumter Community College Course Syllabus. STA 2023 Course Title: Elementary Statistics I. Contact Information: Office Hours:

Attendance Policy for First Time in College (FTIC) Students in General Education Courses. Frequently Asked Questions

Advanced Statistics & Data Analysis

Law Practice Management Professor Jeanne Curtin Fall 2011 SYLLABUS

Intermediate Algebra Math 0305 Course Syllabus: Spring 2013

Nashville State Community College Business & Applied Arts Division Visual Communications/ Photography

Predictive Data modeling for health care: Comparative performance study of different prediction models

THE UNIVERSITY OF NORTH CAROLINA AT GREENSBORO BRYAN SCHOOL OF BUSINESS AND ECONOMICS

QMB Business Analytics CRN Fall 2015 T & R 9.30 to AM -- Lutgert Hall 2209

Transcription:

Course Syllabus Eco 5385.701 Predictive Analytics for Economists Summer 2014 TTh 6:00 8:50 pm and Sat. 12:00 2:50 pm First Day of Class: Tuesday, June 3 Last Day of Class: Tuesday, July 1 251 Maguire Building This course is a follow-up to Eco 5350 Introductory Econometrics. Statistical methods used in engineering and computer science are introduced to complement the traditional economist s toolbox of business and economics decision-making tools. Purposes of Course: There are several major purposes of this course. As the result of taking this course, the student should have an understanding of: The basics of supervised learning prediction and classification Prediction models including multiple linear regression, artificial neural networks, regression trees, K-nearest Neighbors Classification models including logit/probit models, classification trees, Naïve-Bayes models Model validation by means of data partitioning Methods of unsupervised learning exploratory data analysis, principal components, cluster analysis, association rules Ensemble modeling How to use standard Data Mining Packages including XLMINER and SPSS Modeler Evaluation of the Student: The evaluation in the class consists of four parts: Quick Quizzes (20%) Homework Exercises (20%) Take-home project to be turned in on Thursday, June 26 (10%) Mid-Term Exam (25%) Final Exam (25%) 1

The Quick Quizzes (QQs) will consist of a short answer and/or multiple-choice quiz that will be administered in the first five minutes of the class. It is meant to see if you have retained the information of the previous lecture and if you have done any assigned readings that I may have asked you to do. In addition to keeping the students current in the class and providing review material for the mid-term and final exams, the QQs allow me to keep track of student attendance. It has been my experience is that for each Quick Quiz a student misses before the mid-term exam the student, on average, loses 2.5% on his/her mid-term score. The bottom line is that it pays to come to class! I will be dropping your lowest QQ grade before calculating the QQ average. With respect to homework exercises, students can confer with each other with respect to programming advice and discussion of basic ideas but in the final analysis each student is expected to write up his/her own homework answers and not make copies of others homework. Copying someone else s homework to hand in as one s own work is a violation of the SMU Honor Code and will be dealt with according to the rules of the SMU Honor Code. It is important to know that the homework assignments are very important in that the basic ideas covered by them invariably show up on the mid-term and final exams. If you know you are going to be missing a class on the day a homework exercise is due, hand in your homework in advance to receive full credit for your work. Any homework that is handed in late will be given a one letter grade reduction for each day of tardiness. I will be dropping the lowest exercise score before calculating the exercise average. The mid-term exam will cover approximately 50% of the course material and the final exam is going to be comprehensive including to some degree the material from the midterm exam. A comprehensive final of course forces the student to form an overall view of the course material and gives the student a chance to revisit what the student might not have fully understood before taking the mid-term. In my opinion, developing a overall view of how the pieces of this course fit together is an important accomplishment of the student. At roughly the half-way point in class, I will assign a term project which the students are to complete individually. However, the students should feel free to consult the professor on any questions that he/she might have on the project. This project is governed by the SMU Honor Code. That is, you are to do your own work and not rely on the work of your classmates. You are to hand in the term project on Thursday, June 26. Additional Details Required Textbook: Data Mining for Business Intelligence (2 nd ed., 2010) G. Shmueli, N.R. Patel, and P.C. Bruce Classroom Website: http://faculty.smu.edu/tfomby/ 2

Office: Room 301M, Umphrey Lee, 214-768-2559. E-mail address: tfomby@smu.edu Office Hours: Tuesday and Thursday 3:30 5:00PM or by appointment. Teaching Assistant: Jing Li, E-mail address: lij@smu.edu Textbook and Computer Software: The required textbook for this course is Data Mining for Business Intelligence by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce, (Wiley, 2 nd ed., 2010) hereafter referred to as SPB. This book includes complementary access to an EXCEL add-in called XLMiner. In the back of the book you will find an insert that contains the license for downloading the add-in to your computer from the website www.solver.com/xlminer and using it for a six month period. This education version is slightly less equipped than the professional version of the add-in but it will be adequate for the work we will be doing in this course. Once you have registered your copy of the SPB textbook you will have online access to all of the datasets used as case studies in the textbook. We will also be using SPSS Modeler and the tutorial found at http://pic.dhe.ibm.com/infocenter/spssmodl/v15r0m0/index.jsp. Access this software package can be obtained by direct download, through the Virtual Computer Lab (VCL) at https://apps.smu.edu/citrix/appssmuweb/ or on lab computers in the Economics department or other locations on campus. Excused Absences: Students will be excused from taking the mid-term exam or the final exam only with a note from a physician, or in the case of a death in the family, with a note from a parent or guardian. Even with an excused absence, either of these exams must eventually be taken before a course grade will be assigned to the student. If you must miss a class due to legitimate circumstances beyond your control, be sure and contact me beforehand so that I will know of your circumstances. If excused, I will correspondingly excuse you from any QQ that is given that day. I want to emphasize that diligent attendance in this course is essential because a lot of the course material presented in class will be from my personal class notes and can t be found in the textbook. Note: After 4 unexcused absences, I reserve the right to administratively drop students from the class. 3

Grading Scale: My grading scale in this course is as follows: 92-100 A 90-91 A- 88-89 B+ 82-87 B 80-81 B- 78-79 C+ 72-77 C 70-71 C- 68-69 D+ 62-67 D 60-61 D- 0-59 F Excused Absences for University Extracurricular Activities: Students participating in an officially sanctioned, scheduled University extracurricular activity will be given the opportunity to make up class assignments or other graded assignments missed as a result of their participation. It is the responsibility of the student to make arrangements with me prior to any missed scheduled examination or other missed assignment for making up the work. Disability Accommodations: Students needing academic accommodations for a disability must first contact Ms. Rebecca Marin, Coordinator, Services for Students with Disabilities (8-4557) to verify the disability and establish eligibility for accommodations. They should then schedule an appointment with the professor to make appropriate arrangements (See University Policy No. 2.4). Religious Observance: Religiously observant students wishing to be absent on holidays that require missing class should notify me in writing at the beginning of the semester, and should discuss, in advance, acceptable ways of making up any work missed because of the absence (See University Policy No. 1.9). 4

Honor Code: All SMU students are bound by the Honor Code (see SMU Student Handbook for a complete discussion of the SMU Honor Code). The code states that any giving or receiving of aid on academic work submitted for evaluation, without the express consent of the instructor, or the toleration of such action shall constitute a breach of the Honor Code. A violation can result in an F for the course and an Honor Code Violation on your transcript. Topics I. Introduction A. What is Data Mining? B. Terminology of Data Mining C. Types of Variables: Interval, Nominal (Unordered Categorical), and Ordinal (Ordered Categorical) D. Statistical Models vs. Machine Learning Models (Read Breiman article) E. Data Mining from a Process Perspective (Fig. 1.2 in SPB) F. Data Mining Methods Classified by Nature of the Data (Table 1.1 in SPB) References: SPB, Chapter 1 and Breiman, Leo (2001), Statistical Modeling: The Two Cultures, Statistical Science, 16, 199-231. The Breiman article will be posted to the student by class e-mail. II. Overview of the Data Mining Process A. Core Ideas in Data Mining i. Classification ii. Prediction iii. Association Rules iv. Data Reduction v. Data Exploration vi. Data Visualization B. Supervised and Unsupervised Learning C. The Steps in Data Mining D. SEMMA (SAS) and CRISP (IBM) E. Preliminary Steps i. Sampling from a Database ii. Pre-processing and Cleaning the Data iii. Partitioning the Data: Training, Validation, and Test data sets 5

F. Building a Model An Example with Linear Regression References: SPB, Chapter 2. SAS_SEMMA.pdf and CRISP_DM.pdf. These pdf files will be posted to the students by class e-mail. III. Data Exploration and Data Refinement A. Data Summaries B. Data Visualization C. Treatment of Missing Observations D. Detection of Outliers the Box Plot E. Correlation Analysis Reference: SPB, Chapter 3. IV. Variable Importance and Dimension Reduction A. Binning: Reducing the Number of Categories in Categorical Variables B. Principal Component Analysis of Continuous Variables C. Dimension Reduction using Best Subset Regression Modelling Techniques D. Dimension Reduction using Bivariate Association Probabilities (as in the Feature Selection node in SPSS Modeler), and Regression and Classification Trees Reference: SPB, Chapter 4. IV. Evaluation Methods for Prediction and Classification Problems A. Prediction Measures: MAE, MSE, RMSE, MAPE, MSPE, and RMSPE B. Classification Measures: Classification Matrix, ROC Curves, Lift Charts, and Lift Charts that Incorporate Costs and Benefits C. The Role of Over-sampling in Classification Problems Reference: SPB, Chapter 5. V. Prediction Methods A. Linear Regression: Best Subset Selection i. Forward Selection ii. Backward Selection iii. Step-wise Regression (Efroymson s method) iv. All Subsets Regression (Cp Mallows and Adjusted R-square criteria) B. k-nearest Neighbors (k-nn) C. Regression Trees i. CART ii. CHAID D. Neural Nets i. Architecture of Neural Nets a. Neurons b. Input Layer 6

c. Hidden Layers d. Output Layer ii. Fitting Neural Nets: Back Propagation E. Comparison of the Four Methods F. Model Averaging (Ensemble Model) References: SPB, Chapters 6, 7, 9, and 11. Mid-Term Exam Approximately Thursday, June 19 VI. Classification Methods A. The Naïve Rule B. Naïve-Bayes Classifier C. K-Nearest Neighbors D. Classification Trees E. Neural Nets F. Logistic Regression References: SPB, Chapters 6, 7, 8, 9, and 10. VIII. Non-supervised Learning A. Association Rules i. Support and Confidence ii. The Apriori Algorithm iii. The Selection of Strong Rules B. Cluster Analysis i. Hierarchical Methods ii. Optimization and the K-means Algorithm iii. Similarity Measures iv. Other Distance Measures C. Text Mining References: SPB, Chapters 13, 14, and Classroom Discussion of Text Mining. IX. Ensemble Methods A. Nelson and Granger-Ramanathan Methods for Continuous Targets B. Majority Voting for Categorical Targets C. Bagging D. Boosting Reference: Classroom lecture and handouts 7

FINAL EXAM Tuesday, July 1, 6:00 9:00 PM Room 251 Maguire Building 8