KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics



Similar documents
KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE. School of Mathematical Sciences

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

MS1b Statistical Data Mining

Learning outcomes. Knowledge and understanding. Competence and skills

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Faculty of Science School of Mathematics and Statistics

Office: LSK 5045 Begin subject: [ISOM3360]...

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

An Introduction to Data Mining

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Azure Machine Learning, SQL Data Mining and R

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Principles of Data Mining by Hand&Mannila&Smyth

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

CS Data Science and Visualization Spring 2016

Supervised Learning (Big Data Analytics)

Course Description This course will change the way you think about data and its role in business.

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Big Data Analytics and Optimization

Machine Learning with MATLAB David Willingham Application Engineer

Predicting Student Persistence Using Data Mining and Statistical Analysis Methods

HT2015: SC4 Statistical Data Mining and Machine Learning

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

The Data Mining Process

2015 Workshops for Professors

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Course Syllabus. Purposes of Course:

Information and Decision Sciences (IDS)

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

FLORIDA STATE COLLEGE AT JACKSONVILLE COLLEGE CREDIT COURSE OUTLINE. Calculus for Business and Social Sciences

Accelerated Undergraduate/Graduate (BS/MS) Dual Degree Program in Computer Science

Industrial and Systems Engineering Master of Science Program Data Analytics and Optimization

King Saud University

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

MSCA Introduction to Statistical Concepts

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

New Course Proposal OSC 4820, Business Analytics and Data Mining

Statistics Graduate Courses

CIS 270. Systems Analysis and Design

ACADEMIC POLICY AND PLANNING COMMITTEE REQUEST FOR AHC GENERAL EDUCATION CONSIDERATION

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

How To Understand The Theory Of Probability

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

MD - Data Mining

How To Get A Masters Degree In Logistics And Supply Chain Management

CATALOG CHANGES - F13. The Department of Ocean and Mechanical Engineering offers programs of study leading to the following degrees:

Machine learning for algo trading

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Chapter 12 Discovering New Knowledge Data Mining

Middle School Course Catalog

Statistics W4240: Data Mining Columbia University Spring, 2014

Machine Learning for Data Science (CS4786) Lecture 1

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Leveraging Ensemble Models in SAS Enterprise Miner

Predictive Modeling Techniques in Insurance

Meta-learning. Synonyms. Definition. Characteristics

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Teaching Biostatistics to Postgraduate Students in Public Health

How To Make A Credit Risk Model For A Bank Account

Graduation Requirements

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

AMIS 7640 Data Mining for Business Intelligence

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis

Introduction to Data Mining

New Course Proposal: ITEC-621 Predictive Analytics. Prerequisites: ITEC-610 Applied Managerial Statistics

Comparison of Data Mining Techniques used for Financial Data Analysis

The Partnership for the Assessment of College and Careers (PARCC) Acceptance Policy Adopted by the Illinois Council of Community College Presidents

Program Planning Guide Business Administration, Associate in Applied Science General Business Administration Track (A25120)

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

RANGER COLLEGE SYLLABUS

CRN: STAT / CRN / INFO 4300 CRN

DOCTOR OF PHILOSOPHY DEGREE. Educational Leadership Doctor of Philosophy Degree Major Course Requirements. EDU721 (3.

Predictive Data modeling for health care: Comparative performance study of different prediction models

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

City University of Hong Kong. Information on a Course offered by Department of Management Sciences with effect from Semester A in 2010 / 2011

MSCA Introduction to Statistical Concepts

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

ANALYTICS CENTER LEARNING PROGRAM

Cabrillo College Catalog

Transcription:

ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of Statistical Data Mining): 1.0 Course Designations and Approvals Required course approvals: Academic Unit Curriculum Committee College Curriculum Committee Optional designations: Is designation desired? General Education: Yes No Writing Intensive: Yes No Honors Yes No Approval request date: *Approval request date: Approval granted date: **Approval granted date: 2.0 Course information: Course title: KGCOE- CQAS- 747-Principles of Statistical Data Mining Credit hours: 3 Prerequisite(s): one course in basic statistics Co-requisite(s): None Course proposed by: Ernest Fokoué Effective date: August 2013 Contact hours Maximum students/section Classroom 3 25 Lab 0 Studio 0 Other (specify) 0 2.a Course Conversion Designation*** (Please check which applies to this course). *For more information on Course Conversion Designations please see page four. Semester Equivalent (SE) Please indicate which quarter course it is equivalent to: Semester Replacement (SR) Please indicate the quarter course(s) this course is replacing: 0307-846- Principles of Statistical Data Mining September 2010

2.b Semester(s) offered (check) Fall (distance) Spring (campus) Summer Other All courses must be offered at least once every 2 years. If course will be offered on a bi-annual basis, please indicate here: 2.c Student Requirements Students required to take this course: (by program and year, as appropriate) None Students who might elect to take the course: This is an elective for graduate students in Advanced Certificate and MS programs in Applied Statistics. Graduate students in other programs who interested in statistical data mining will also elect to take this class. In the sections that follow, please use sub-numbering as appropriate (eg. 3.1, 3.2, etc.) 3.0 Goals of the course (including rationale for the course, when appropriate): For students 3.1 To achieve a practical understanding of modern statistical data mining techniques 3.2 To develop the ability to correctly apply modern data mining techniques to a variety of real world case studies involving very massive high dimensional complex data. 3.3 To gain a hands on experience with data mining through case studies, among which examples like: Describing website visitors, Market basket analysis, Describing customer satisfaction, Predicting credit risk of small businesses, Predicting e-learning student performance, Predicting customer lifetime value and Operational risk management. 4.0 Course description (as it will appear in the RIT Catalog, including pre- and corequisites, and quarters offered). Please use the following format: Course: KGCOE-CQAS-846 Principles of Statistical Data Mining I This course covers topics such as clustering, classification and regression trees, multiple linear regression under various conditions, logistic regression, PCA and kernel PCA, model-based clustering via mixture of gaussians, spectral clustering, text mining, neural networks, support vector machines, multidimensional scaling, variable selection, model selection, k-means clustering, k-nearest neighbors classifiers, statistical tools for modern machine learning and data mining, naïve Bayes classifiers, variance reduction methods (bagging) and ensemble methods for predictive optimality.this course is designed to provide the student with a solid practical hands-on introduction to the fundamentals of modern concepts and techniques of statistical data mining, with a strong emphasis on the wide applicability of these techniques to real world problems. Throughout the course, many real world case studies are used to motivate and explain the strengths and appropriateness of each method of interest. To ease the exploration of the techniques, SAS Enterprise Miner will be our main computing software. We will occasionally mention other notable software for data mining such as Rattle in the R environment. Topics throughout this course include among other things: Distance Measures in Data Mining, Hierarchical clustering, Classification and Regression trees, Multiple Linear Regression under various conditions, Logistic regression for Pattern Recognition, Principal Components analysis, Factor Analysis, Model-based Clustering via Mixture of 2

Gaussians, Spectral Clustering Techniques, Text Mining, Neural Networks for classification and regression, Support Vector Machines for Classification and Regression, Multidimensional Scaling, Variable Selection, Model Selection, k-means clustering, k- Nearest Neighbors classifiers, Statistical tools for modern machine learning and data mining, Bayes Classifiers, Fisher Linear Discriminant Analysis and Quadratic discrimination, Variance Reduction Methods (Bagging) and Ensemble Methods for Predictive Optimality (Boosting and Random Forests) Prerequisite(s): one course in basic statistics. Class 3, Lab 0, Credit 3 (Fall-distance, Spring-campus) 5.0 Possible resources (texts, references, computer packages, etc.) Required texts 5.1 Applied Data Mining for Business and Industry, 2nd ed., Paolo Giudici and Silvia Figini (2009), Wiley, ISBN 978-0-470-74582-3 Recommended Texts 5.2 Statistical Data Mining Using SAS Applications, 2nd ed., George Fernandez (2009), CRC Press, ISBN 978-1-439-81075-3 5.3 Data Mining Using SAS Enterprise Miner, Randall Matignon (2009), Wiley 5.4 Getting Started with SAS Enterprise Miner (From SAS) 5.5 Applied Analytics Using SAS Enterprise Miner (From SAS) 6.0 Topics (outline): 6.1. Complex data structures and the emergence of Data Mining and Machine Learning 6.2. Measures of location and measures of variability 6.3. Distance measures, Similarity Measures and Dependency measures 6.4. Multiple linear regression and its extensions to Radial Basis Function regression 6.5. Difference of focus between model identification and predictive optimality 6.6. Principles and applications of dimensionality reduction techniques 6.7. Principal component Analysis and Singular Value Decomposition 6.8. Cluster analysis.via Hierarchical and Hierarchical Methods 6.9. Factor Analysis and Mixtures of Factor Analyzers 6.10. Multidimensional scaling and its relationship to other techniques 6.11. Model Based Clustering via Mixtures of Gaussians 6.12. Logistic regression for Pattern Recognition 6.13. Linear and Quadratic Discriminant analysis. 6.14. Classification and Regression Trees 6.15. Neural networks: Multilayer Perceptron and Kohonen networks. 6.16. Support Vector Machines for classification and regression 6.17. Nearest-neighbor models: kmeans and K Nearest Neighbors 6.18. Variance Reduction Techniques: Bagging Predictors 6.19. Non-parametric modeling and Bayesian Modeling 6.20. Generalized linear models and Log-linear models 6.21. Graphical models and their applications 6.22. Model Evaluation and model selection techniques 6.23. Ensemble Methods for Predictive Optimality: Boosting 3

4

7.0 Intended course learning outcomes and associated assessment methods of those outcomes (please include as many Course Learning Outcomes as appropriate, one outcome and assessment method per row). Course Objectives Level 2: Comprehension: 2.1.Understands the central role of model uncertainty in data mining, and maintains a keen awareness of the difference between accurate model identification and optimal prediction 2.2.Appreciates and takes into account the everpresent bias/variance dilemma in model selection and model building, and strives to find solutions that achieve bias/variance trade-off 2.3.Knows when and how to combine unsupervised learning techniques (e.g.: PCA for feature extraction) with supervised learning techniques (e.g. Neural Networks) to achieve optimality 2.4.Recognizes when and how to use Ensemble methods rather than select a single model, and also knows when to use variance reduction techniques like Bagging! 2.5.Understands the profound meaning of the No Free Lunch theorem, and refrains from relying solely on one single method of data mining, and indeed always comparing various methods before making recommendations Level 3: Application: 3.1.Identifies an interesting real world engineering problem during the course of study and formulates its statistically 3.2.Recognizes for each real world case study which classes of data mining methods are more appropriate 3.3.Uses statistical software like SAS Enterprise Miner to perform a thorough data mining analysis of real world problems Level 4: Analysis: 4.1.Determines/decides which statistical model(s) appear to be most appropriate for the task at hand in light of the graphs and descriptive statistics obtained for exploratory data analysis Assessment Method Homework Exams Projects 5

4.2.Fits the chosen plausible model(s) using a statistical software package like SAS Enterprise Miner, then extracts and interprets the estimates of the parameters 4.3.Performs additional statistical hypothesis tests wherever needed 4.4.Checks all the assumptions underlying each method/technique used 4.5.Interprets the statistical estimation and prediction results produced by the software package Level 5: Synthesis: 5.1.Selects the best model according to some of the usual model selection criteria 5.2.Provides any needed/required formal prediction or estimation. 5.3.Uses an ensemble (aggregation) of methods wherever the need arises 5.4.Draws conclusions and interpretations about the original engineering task based on sound formal analysis like confidence intervals and results of hypothesis testing. Level 6: Evaluation: 6.1.Evaluates several potential statistical models and decides on the most appropriate one for a given purpose. 6.2.Provides any needed/required formal prediction or estimation 6.3.Makes recommendations in clear and non technical language based a thorough assessment of the statistical findings 6

8.0 Program outcomes and/or goals supported by this course Relationship to Program Outcomes (1 = slightly, 2=moderately, 3=significantly) Program Outcomes and/or Goals for CQAS 8.1 Advanced Certificate in Lean Six Sigma 8.1.1 Demonstrates an solid understanding of statistical thinking and Lean Six Sigma methodology in solving real-world problems. 8.1.2 Leads Lean Six Sigma improvement projects. Level of Support 1 2 3 8.2 Advanced Certificate and Masters of Science in Applied Statistics 8.2.1 Demonstrates solid understanding of statistical thinking and applied statistics methodology in solving real-world problems. 8.2.2 Designs studies that are efficient and valid. 8.2.3 Analyzes data using appropriate statistical methods. 8.2.4 Communicates the results of statistical analysis with effective reports and presentations. Note: Students obtaining the Advanced Certificate in Applied Statistics will not be expected to perform at the same level as students obtaining a Master of Science degree. 9.0 - Not Applicable General Education Learning Outcome Supported by the Course, if appropriate Communication Express themselves effectively in common college-level written forms using standard American English Revise and improve written and visual content Express themselves effectively in presentations, either in spoken standard American English or sign language (American Sign Language or English-based Signing) Comprehend information accessed through reading and discussion Intellectual Inquiry Review, assess, and draw conclusions about hypotheses and theories Analyze arguments, in relation to their premises, assumptions, contexts, and conclusions Construct logical and reasonable arguments that include anticipation of counterarguments Use relevant evidence gathered through accepted scholarly methods and properly acknowledge sources of information Assessment Method 7

Ethical, Social and Global Awareness Analyze similarities and differences in human experiences and consequent perspectives Examine connections among the world s populations Identify contemporary ethical questions and relevant stakeholder positions Scientific, Mathematical and Technological Literacy Explain basic principles and concepts of one of the natural sciences Apply methods of scientific inquiry and problem solving to contemporary issues Comprehend and evaluate mathematical and statistical information Perform college-level mathematical operations on quantitative data Describe the potential and the limitations of technology Use appropriate technology to achieve desired outcomes Creativity, Innovation and Artistic Literacy Demonstrate creative/innovative approaches to course-based assignments or projects Interpret and evaluate artistic expression considering the cultural context in which it was created 10.0 Other relevant information (such as special classroom, studio, or lab needs, special scheduling, media requirements, etc.) None *Optional course designation; approval request date: This is the date that the college curriculum committee forwards this course to the appropriate optional course designation curriculum committee for review. The chair of the college curriculum committee is responsible to fill in this date. **Optional course designation; approval granted date: This is the date the optional course designation curriculum committee approves a course for the requested optional course designation. The chair of the appropriate optional course designation curriculum committee is responsible to fill in this date. ***Course Conversion Designations Please use the following definitions to complete table 2.a on page one. Semester Equivalent (SE) Closely corresponds to an existing quarter course (e.g., a 4 quarter credit hour (qch) course which becomes a 3 semester credit hour (sch) course.) The semester course may develop material in greater depth or length. Semester Replacement (SR) A semester course (or courses) taking the place of a previous quarter course(s) by rearranging or combining material from a previous quarter course(s) (e.g. a two semester sequence that replaces a three quarter sequence). New (N) - No corresponding quarter course(s). 8