AMIS 7640 Data Mining for Business Intelligence



Similar documents
AMIS 7640 Data Mining for Business Intelligence

Course Syllabus. Purposes of Course:

Course Description This course will change the way you think about data and its role in business.

How To Learn Data Analytics

Data Warehouses and Business Intelligence ITP 487 (3 Units) Fall Objective

Office: LSK 5045 Begin subject: [ISOM3360]...

CRN: STAT / CRN / INFO 4300 CRN

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

MARK 7377 Customer Relationship Management / Database Marketing. Spring Last Updated: Jan 12, Rex Yuxing Du

College of Health and Human Services. Fall Syllabus

Business Intelligence and Analytics SCH-MGMT 553 (New course number being proposed) Tu/Th 11:15 AM 12:30 PM in SOM Lab 20

INFS5873 Business Analytics. Course Outline Semester 2, 2014

UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Syllabus

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

BUSINESS INTELLIGENCE WITH DATA MINING FALL 2012 PROFESSOR MAYTAL SAAR-TSECHANSKY

not possible or was possible at a high cost for collecting the data.

Customer Relationship Management (BUSML 4212)

Subject Description Form

Prerequisites. Course Outline

2015 Workshops for Professors

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

Principles of Data Mining by Hand&Mannila&Smyth

Syllabus. HMI 7437: Data Warehousing and Data/Text Mining for Healthcare

Information Management course

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

Introduction to Data Mining

MIS630 Data and Knowledge Management Course Syllabus

ANALYTICS CENTER LEARNING PROGRAM

MIS Big Data Information Systems

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

General Business 704: Data to Decisions Fall 2013 Wisconsin School of Business, UW-Madison. All class meetings will be held in 2294 Grainger.

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

CIS 8695 Big Data Analytics

Business Intelligence: Effective Decision Making

Accounting : Accounting Information Systems and Controls. Fall 2015 COLLEGE OF BUSINESS AND INNOVATION

QMB Business Analytics CRN Fall 2015 T & R 9.30 to AM -- Lutgert Hall 2209

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Database Marketing, Business Intelligence and Knowledge Discovery

QMB Business Analytics CRN Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209

Intro. to Data Visualization Spring 2016

School of Business and Nonprofit Management Course Syllabus

ISQS 3358 BUSINESS INTELLIGENCE FALL 2014

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

A Knowledge Management Framework Using Business Intelligence Solutions

BA 561: Database Design and Applications Acct 565: Advanced Accounting Information Systems Syllabus Spring 2015

Data Mining + Business Intelligence. Integration, Design and Implementation

COURSE SYLLABUS. Enterprise Information Systems and Business Intelligence

BUAD 310 Applied Business Statistics. Syllabus Fall 2013

Data Mining Algorithms Part 1. Dejan Sarka

MIS 6302.X02: Analytics and Information Technology The University of Texas at Dallas Spring 2014

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

MIS Big Data Information Systems

CS 649 Database Management Systems. Fall 2011

COURSE PROFILE. Business Intelligence MIS531 Fall

Predictive Analytics Certificate Program

DATA MINING FOR BUSINESS ANALYTICS

Fluency With Information Technology CSE100/IMT100

Florida Gulf Coast University Lutgert College of Business Marketing Department MAR3503 Consumer Behavior Spring 2015

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

ITIS5432 A Business Analytics Methods Thursdays, 2:30pm -5:30pm, DT701

Faculty of Science School of Mathematics and Statistics

City University of Hong Kong. Information on a Course offered by the Department of Management Sciences with effect from Semester A in 2012 / 2013

City University of Hong Kong. Information on a Course offered by Department of Information Systems with effect from Semester B in 2013 / 2014

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

L&I SCI 410: Database Information Retrieval Systems

Statistics for BIG data

Introduction. A. Bellaachia Page: 1

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

IST565 M001 Yu Spring 2015 Syllabus Data Mining

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Data Warehousing and Data Mining

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

Requirements Fulfilled This course is required for all students majoring in Information Technology in the College of Information Technology.

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Introduction to Data Mining and Business Intelligence Lecture 1/DMBI/IKI83403T/MTI/UI

269 Business Intelligence Technologies Data Mining Winter (See pages 8-9 for information about 469)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

College of Charleston School of Business DSCI : Management Information Systems Fall 2014

Web Data Mining: A Case Study. Abstract. Introduction

LEARNING SOLUTIONS website milner.com/learning phone

5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Foundations of Business Intelligence: Databases and Information Management

Master of Science in Health Information Technology Degree Curriculum

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

SE 333/433 Software Testing and Quality Assurance

Upon successful completion of this course, a student will meet the following outcomes:

Transcription:

The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2014, Session 2 Contact Information: Instructor: Prof. Waleed A. Muhanna (muhanna.1@osu.edu) Phone: (614) 292-3808; Fax: 292-2118 Office: 400C Fisher Hall Office Hours: TR 1:00-2:00, and by appointment. Course Overview: Advances in information technologies and the increased digitization of business have led to an explosive growth in the amount of structured and unstructured data collected and stored in databases and other electronic repositories. Much but certainly not all of this data comes from operational business software (e.g., finance/accounting applications, Enterprise Resource Management (ERP), Customer Relationship Management (CRM), workflow and document management systems, surveillance and monitoring systems, and Web logs) and is often archived into vast data warehouses to become part of corporate memory. The result of this massive accumulation of data is that organizations have become data-rich yet still knowledge-poor. What can be learned from these mountains of data to improve decisions? How can an organization leverage its massive data warehouses for strategic advantage? A large number of methods with roots in statistics, informational retrieval and machine learning have been developed to address the issue of knowledge extraction from data sets both small and large. The term "data-mining" refers to this collection of methods. These methods have broad applications; they have been successfully applied in areas as diverse as market-basket analysis of scanner data, customer relationship management, churn analysis, direct marketing, fraud detection, click-stream analysis, personalization and recommendation systems, risk management and credit scoring. The key objectives of this course are two-fold: (1) to provide you with a theoretical and practical understanding of core data mining concepts and techniques; and (2) to provide you with hands-on experience in applying these techniques to practical real-word business problems using commercial data mining software. As an applied course, the emphasis will be less on the inner working of each method and more on when and how to use each technique and how to interpret the results. Business Intelligence is a process for increasing the competitive advantage of a business through the intelligent use of available data in decision making. Business intelligence systems combine operational and other data with data mining tools to improve the timeliness and quality 1

of inputs to decision processes. Broadly defined, data mining is the process of selection, exploration and modeling of (often, but not necessarily, large) data sets, in order to discover predictive and descriptive models and patterns. It encompasses both top-down (confirmatory or hypothesis driven) analysis using traditional statistical techniques and bottom-up (exploratory) analysis using database and machine learning techniques to discover regularities, relations, or local structure/patterns that are at first unknown. (The confirmative tools of top-down analysis can then be used to confirm the discoveries and evaluate the quality of decisions based on those discoveries.) In keeping with this broad definition, topics and related methods discussed include information retrieval and enterprise reporting, classification, predictive modeling, clustering, association rules mining, and social network analysis. The application of these methods will be illustrated using modern software tools via examples, homework assignments and group term projects. Upon completion of this course, students should be able to: 1. Fully appreciate the concept of data as a strategic resource; 2. Use existing data retrieval and manipulation tools for data/information extraction and enterprise reporting; 3. Understand how and when data mining can be used as a problem-solving technique; 4. Describe different methods of data mining; 5. Select an appropriate data mining technique for a specific problem; 6. Use existing data mining software to mine a prepared data set; and 7. Interpret and evaluate the results of data mining. Prerequisites: The course is specifically designed with MBA/MACC students as the intended target audience. The key prerequisites consist of good graduate standing and completion of an introductory course in probability and statistics. Assignments do not involve programming, per se, and no prior professional IT experience is assumed. Course Materials: Textbook: Data Mining for Business Intelligence, 2 nd Edition, by Galit Shmueli, Nitin R. Patel, and Peter C. Bruce (Wiley: 2010). A set of articles, assignments, tutorials, data sets, lecture notes, and various supplementary materials which will made available through the course homepage at: http://fisher.osu.edu/~muhanna.1/datamining.html Course Organization: The course will be run as a mixture of lectures, in-class demonstrations, assignments, and classroom discussions. Readings will be from the required text together with other supplementary materials. Some material will be covered only in the readings; other will be covered only in lecture which may depart from the text in either content or order. To maximize 2

learning, classroom discussion and the amount of time spent on different topics will be adjusted according to the background and interests of the students. Assignments In addition to the reading requirements from the text and the supplementary materials, there will be 4-5 homework assignments, spaced out over the course of the 7-weeks term. They are designed to reinforce your understanding of the topics covered. Assignments are to be handed in on or before the class period of the due date. No late work is accepted. A limited amount of cooperation among students on homework and lab assignments is permitted. You may discuss with classmates general solution strategies. However, everyone should independently do and turn in his/her own work. Exam There will be one in-class exam: a final exam. The exam is closed-book and closed-notes, and it will be held in accordance with Fisher Graduate Programs schedule during the final examination period on Tuesday, December 16, 2014 at 3:00 p.m. The exam is designed to assess each student's (a) command of factual knowledge and concepts from the course; and (b) his or her ability to integrate and generalize these concepts and principles and apply them to new situations. The format of the exam will primarily be problems and short essay questions. The final exam must be taken during its scheduled time; make up exams will only be given for special and compelling cases, in accordance with University guidelines. Team-Based Term Project Students will have the opportunity to further sharpen their skills and acquire hands-on experience with practical databases and real data mining problems through a term project. The projects will be carried out in teams of 3-4 students and involve the use of DM software. Although I am generally open to suggestions, each project will normally involve the selection, design, and performance of a data mining plan using a public data set (such as those provided by the SAS Institute or in the UCI KDD Archive (http://kdd.ics.uci.edu/) or a non-proprietary data set available through private student contacts. A handout about the project will be made available online at the beginning of the course. Teams will submit a written project proposal partway through the term, followed by a written report and, if time permits, brief class presentation on the project during the last class meeting. Software The methods discussed in this class are computationally intensive and non-trivial; they cannot be performed using Excel. Fortunately, these methods have matured enough to the point where they are now implemented in commercial software. We will use Microsoft Access to familiarize you with relational query language SQL, the industry standard for data extraction, summarization and enterprise reporting. XLMiner, an EXCEL add-in, will be introduced in class and used by students to do assignments and solve business problems using data mining techniques. If you buy a new copy of the textbook, your copy should include a complementary 3

6-month license to XLMiner (in the back of the book you will find an insert that contains the license for downloading the add-in). Alternatively, I will provide you with a special Textbook Code and Course Code that will enable you to download the software, and use it throughout the term with a 140-day license. Participation A portion of the final grade will be based on your class attendance and active participation, elements that are crucial to the success of class meetings. Attendance refers to punctual attendance. Your fellow students and I will expect you to come fully prepared to answer questions and discuss the assigned readings. Each individual is expected to actively and constructively contribute to class discussions. Good contributions transcend assigned readings and are inspired, timely, analytical, and relevant to the topics discussed. Students can also earn participation credit by drawing attention to related development, information and resources dealing with related topics. Your class participation grade will reflect my judgment of the quality and quantity of your contributions during the entire term. Cold calling: On occasion, I will make cold calls. This is not intended to put you on the spot but to encourage class discussion and participation. Evaluation: 30% of the final grade will be based on graded homework assignments. The final exam will account for 40% of your grade. The group term project will account for 20% of the grade. The remaining 10% is assigned to class participation. Final grades will be based on overall class performance. Feedback and Continuous Improvement: Students are strongly encouraged to visit with me in my office and/or use e-mail to ask questions, to share suggestions about any aspect of the course, or to clear up possible points of confusion. I will use your feedback to continuously improve and fine-tune the coverage levels and the teaching/learning processes. Please note that I may not always be able to make all of the changes suggested, but I will do my best to accommodate your suggestions. Standards of Integrity and Conduct: Academic integrity is essential to maintaining an environment that fosters excellence in teaching, research, and other educational and scholarly activities. Each student in this course is expected to be familiar with and abide by the principles and standards set forth in The Ohio State University s code of student conduct and code of academic conduct. You can view these documents or download pdf versions at: http://studentaffairs.osu.edu/resource_csc.asp http://www.gradsch.ohio-state.edu/academic-and-research-misconduct.html 4

It is also expected that each student will behave in a manner that is consistent with the Fisher Honor Statement, which reads as follows: As a member of the Fisher College of Business Community, I am personally committed to the highest standards of behavior. Honesty and integrity are the foundations from which I will measure my actions. I will hold myself accountable to adhere to these standards. As a future leader in the community and business environment, I pledge to live by these principles and celebrate those who share these ideals. Students with Disabilities: Any student who feels s/he may need an accommodation based on the impact of a disability should contact me privately to discuss your specific needs. I rely on the Office for Disability Services for assistance in verifying the need for accommodations and developing accommodation strategies. If you have special needs and have not previously contacted the Office for Disability Services, I encourage you to do so. Tentative Course Schedule: The following schedule gives the general plan for the course; changes may be made at my discretion but are designed to optimize the quality and flow of the content. The course web site gives the dynamic picture and is an integral part of the class; please make sure to check it on a regular basis. Session & Date Session 1 (T 10/21) Topics and Required Readings Course Introduction Overview/goals of data mining Myths about data mining The Data Mining process Data, data everywhere, The Data deluge, The Economist, 2/10. Big Data: The Management Revolution, HBR, 10/12. Cases: o Diamonds in the Data Mine HBR, 5/03 (PDF) o A golden vein The Economist, 1/04. (PDF) o How Verizon cut Customer churn Financial Express, 10/03. (PDF) TB: Chapters 1 & 2 Session 2 (R 10/23) Data Extraction and Manipulation The Relational Data Model and Relational DBMS Enterprise Reporting 5

Relational Algebra SQL: The Relational Query Language Assignment 1 Session 3 (T 10/28) OLAP and Multidimensional Data Analysis Datawarehousing and Multidimensional Databases Data Quality Summarization and Data Cubes OLAP Tools and Pivot Tables (Check course web site) An Introduction to OLAP Multidimensional Terminology and Technology (PDF) Session 4 (R 10/30) Sessions 5 & 6 (T 11/4 & R 11/6 ) Data Exploration and Dimension Reduction Data Summarization and Visualization Correlation Analysis Principal Component Analysis TB: Chapters 3 & 4 Classification and Predictive Modeling Decision Tree induction Model Evaluation and Interpretation TB: Chapters 9 & 5 Assignment 2 Sessions 7 & 8 (R 11/13 & T 11/18) Predictive Modeling Using Regression Review of OLS Regression Logistic Regression Model Evaluation and Interpretation TB: Chapters 6 & 10 Assignment 3 6

Session 9 (R 11/20) Predictive Modeling Using Neural Networks & Ensemble Methods Introduction to Neural Networks Neural Networks vs. Regression Model Ensemble TB: Chapter 11 Sessions 10 & 11 (T 11/25 & T 12/2) Association & Market-Basked Analysis Frequent Itemset and Association Rule Mining Pattern evaluation (subjective and objective interestingness measures) Sequential patterns TB: Chapter 13 R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, Proc. 20th Int. Conf. Very Large Data Bases (VLDB), 1994. (PDF; only skim) Assignment 4 Sessions 12 & 13 (R 12/4 & T 12/9) Cluster Analysis Segmentation and Personalization The K-means algorithm Hierarchical (Agglomerative) Clustering Cluster Validation and Interpretation TB: Chapter 14 Assignment 5 Session 14 (R 12/11) Link and Social Network Analysis Network Centrality / Centralization Identifying Prestige / Influence Discovering Key Opinion Leaders Tracking Organized Networks (T 12/16) Final Exam 7

About The Instructor: Dr. Waleed A. Muhanna is a Professor and Department Chair of Accounting & Management Information Systems at the Fisher College of Business, The Ohio State University. He received his undergraduate degree in computer science from the University of Tulsa, and holds a master s degree in computer science and doctorate in management information systems from the University of Wisconsin Madison. Dr. Muhanna s teaching and consulting activities span a number of areas, with particular emphasis on e-commerce, data management and mining, and information systems strategy. Professor Muhanna s current research focuses on IT strategy, data analytics, assessing the business value of information technology, and understanding the impact of information technology, including the Internet, on organizations and markets. His other research interests include trust and reputation online, e-commerce strategy, model and database management systems, and system performance modeling and evaluation. Professor Muhanna has published numerous articles in scholarly journals, including Management Science, MIS Quarterly, Strategic Management Journal, Decision Sciences, the Journal of Information Systems, the International Journal of Accounting Information Systems, ACM Transactions on Computer Systems, IEEE Transactions on Software Engineering, Communications of the ACM, Decision Support Systems, Information & Management, European Journal of Operational Research, Computers in Human Behavior, and the Annals of Operations Research. He recently served as the Director of the Ph.D. Program in Accounting & MIS, and currently serves as the Academic Director of the Center for Business Performance Management. Professor Muhanna is the past Vice-Chair of INFORMS' Information Systems Society, has recently completed his service as an Associate Editor at Information Systems Research, and currently serves on the editorial boards of leading academic journals, including Management Science, Information Technology and Management, and the International Journal of Accounting Information Systems. 8