IST565 M001 Yu Spring 2015 Syllabus Data Mining



Similar documents
IST659 Fall 2015 M003 Class Syllabus. Data Administration Concepts and Database Management

IST359 - INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

IST359 INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

IST 755 Strategic Management of Information Resources School of Information Studies, Syracuse University Spring Semester 2015

IST659 Database Admin Concepts & Management Syllabus Spring Location: Time: Office Hours:

Syracuse University School of Information Studies. IST553 - Information Architecture for Internet Services. Tentative Syllabus - Spring 2015

IST 754: Final Project in Telecommunications Systems

Managing Information Systems Projects (IST645 M ) School Of Information Studies Fall 2010

IST687 Scientific Data Management

Course Description This course will change the way you think about data and its role in business.

IST 600: Advocacy for Academic, Public, and School Libraries Course Syllabus ~~ Spring Contact Information:

Accounting : Accounting Information Systems and Controls. Fall 2015 COLLEGE OF BUSINESS AND INNOVATION

Syllabus for IST 346 Operating Systems Administration Permanently Tentative

CRN: STAT / CRN / INFO 4300 CRN

IST687 Applied Data Science

IST 645 Managing Information Technology Projects

BUSINESS INTELLIGENCE WITH DATA MINING FALL 2012 PROFESSOR MAYTAL SAAR-TSECHANSKY

DATA MINING FOR BUSINESS ANALYTICS

Borough of Manhattan Community College Department of Social Science. POL American Government Spring 2014

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

PHOENIX COLLEGE ONLINE. SBS220 Internet Marketing for Small Business

DBA 9101, Comprehensive Exam Course Syllabus. Course Description. Course Textbook. Course Learning Outcomes. Credits.

BUS Computer Concepts and Applications for Business Fall 2012

Strategic Use of Information Technology (CIS ) Summer /

Social Psychology PSY Syllabus Fall

Psychological Testing (PSYCH 149) Syllabus

UNIVERSITY OF FLORIDA DEPARTMENT OF TOURISM, RECREATION AND SPORT MANAGEMENT HFT 3253 Lodging Management and Operations Summer 2015 (4.10.

How To Learn Data Analytics

The world is a complex place, and. requires that we learn how to. imagine its full potential.

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

Investment Management Course

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

UNIVERSITY OF MASSACHUSETTS BOSTON COLLEGE OF MANAGEMENT AF Theory of Finance SYLLABUS Spring 2013

METHODS OF SOCIAL RESEARCH

Systems and Internet Marketing Syllabus Fall 2012 Department of Management, Marketing and International Business

Required Textbook: Customer Service: A Practical Approach; 6th Edition, by Elaine K. Harris, Prentice Hall, ISBN

Systems and Internet Marketing Syllabus Spring 2011 Department of Management, Marketing and International Business

Florida Gulf Coast University Lutgert College of Business Marketing Department MAR3503 Consumer Behavior Spring 2015

Psychology 4978: Clinical Psychology Capstone (Section 1) Fall 2015

PSY 3329 Educational Psychology Online Course Spring Week Course

SOC1001: Sociology 1 CRN Winter, 2016: 6 week term Online Course 4/11/16 5/21/16

CS 1361-D10: Computer Science I

English 1302 Writing Across the Curriculum Spring 2016

SYLLABUS: MKT , Monday evening 4:00-6:30pm; BU124 Spring Semester, 2012

MAT Elements of Modern Mathematics Syllabus for Spring 2011 Section 100, TTh 9:30-10:50 AM; Section 200, TTh 8:00-9:20 AM

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Forensic Biology 3318 Syllabus

PSY 2012 General Psychology Sections 4041 and 1H85

College of Health and Human Services. Fall Syllabus

Spring 2013 CS 6930 Advanced Topics in Web Security and Privacy - 3 Credit Hours Syllabus and Course Policies

BCM :00-12:15 p.m. 1:30-3:35 p.m. Wednesday 10:00-12:00 noon

Office: LSK 5045 Begin subject: [ISOM3360]...

PSY 350 ABNORMAL PSYCHOLOGY SPRING 2011

PSYCH 3510: Introduction to Clinical Psychology Fall 2013 MWF 2:00pm-2:50pm Geology 108

USC VITERBI SCHOOL OF ENGINEERING INFORMATICS PROGRAM

COMMONWEALTH OF MASSACHUSETTS BUNKER HILL COMMUNITY COLLEGE CHARLESTOWN, MASSACHUSETTS COMPUTER INFORMATION TECHNOLOGY DEPARTMENT

Precalculus Algebra Online Course Syllabus

CISM Fundamentals of Computer Applications

EMPORIA STATE UNIVERSITYSCHOOL OF BUSINESS Department of Accounting and Information Systems. IS213 A Management Information Systems Concepts

MAC 1105 FLEX SYLLABUS

ECON 351: Microeconomics for Business

ISM and 05D, Online Class Business Processes and Information Technology SYLLABUS Fall 2015

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF BUSINESS AND TECHNOLOGY Fall 2015

Applied Information Technology Department

ITSY1342 Section 151 (I-Net) Information Technology Security

BBA 380 Management for Environmental Sustainability and Durable Competitive Advantage THE BBA PROGRAM

Management 352: Human Resource Management Spring 2015 Syllabus

A. COURSE DESCRIPTION

PSY 6361 Teaching of Psychology Online Course Spring nd Eight Weeks

CE 460 Course Syllabus

University of Massachusetts Dartmouth Charlton College of Business Information Technology for Small Business MIS 375.

AGEC 448 AGEC 601 AGRICULTURAL COMMODITY FUTURES COMMODITY FUTURES & OPTIONS MARKETS SYLLABUS SPRING 2014 SCHEDULE

WILLIAM PATERSON UNIVERSITY Department of English ENG 2070: Effective Business Writing Winter Semester: December 22, 2014 January 11, 2014

Introduction to Public Relations MCM Course Syllabus Spring 2011 Division of Communication and Contemporary Culture

Statistical Methods Online Course Syllabus

AEC 3073 INTERCULTURAL COMMUNICATION Ms. Mary Rodriguez

INFO 2130 Introduction to Business Computing Spring 2013 Self-Paced Section 006

Mission of the Hospitality Management Program: Create, share and apply knowledge to develop leaders for the hospitality industry.

CED 713 Introduction to School Counseling Counselor Education Program University of Nevada, Las Vegas FALL 2014

ACTG 051A: Intermediate Accounting 1A Foothill College, Summer 2015

COURSE SYLLABUS. ESE 544/444 Project Management

Southwestern Michigan College School of Business Dowagiac, Michigan. Course Syllabus FALL SEMESTER 2012

The University Of Texas At Austin. The McCombs School of Business

CS4320 Computer and Network Security. Fall 2015 Syllabus

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

Dr. Jennifer Pfeifer Office Hours: 329 LISB, Tuesday/Thursday 10:15am-11:15am or by appointment

PSYC 414 COGNITIVE PSYCHOLOGY

ABNORMAL PSYCHOLOGY (PSYCH 238) Psychology Building, Rm.31 Spring, 2010: Section K. Tues, Thurs 1:45-2:45pm and by appointment (schedule via )

CMST 310 Orientation to Service-Learning

ASU College of Education Department of Curriculum and Instruction EDG 6361 American Higher Education Course Syllabus

CSC-570 Introduction to Database Management Systems

EDG 6315: Content Area Instruction Angelo State University Department of Curriculum & Instruction

Transcription:

IST565 M001 Yu Spring 2015 Syllabus Data Mining Draft updated 10/28/2014 Instructor: Professor Bei Yu Classroom: Hinds 117 Email: byu.teaching@gmail.com Class time: 3:45-5:05 Wednesdays Office: Hinds 320 TA: TBD Office hour: TBD TA Office hour: TBD Note on Class Delivery Format In MySlice the class time is Mon 3:45-5:05 Hinds 013; Wed 3:45-5:05 Hinds 117 However, the actual face-to-face time is Wednesdays only - this is a flip class, meaning students will spend half of the class time on self-paced study by watching videos and slides provided by the instructor, and come to Hinds 117 every Wednesday afternoon 3:45-5:05 for question answering, group discussions, and hands-on exercises. Students are required to bring a laptop with Weka installed to the face-to-face sessions. If you have questions regarding the format of the class, please email me at byu.teaching@gmail.com. Difference between IST565 Data Mining and IST736 Text Mining A number of students have asked the question what is the main difference between the two courses: data mining and text mining? Here is a brief answer. The two classes share the theoretical foundation in machine learning. Therefore the fundamental concepts in machine learning, such as classification and clustering, are covered in both classes. However, these two classes differ in the following aspects: Content wise, the data mining class focuses on structured data, meaning the data sets we play in the class are usually in.csv format. Text mining focuses on unstructured, text data, which come in words. How to convert text to numbers that still bear the meaning of text is an important topic in text mining. In text mining we will have to deal with some problems that do not exist in mining structured data, such as the subjectivity in decisions. For example, how to determine if a tweet is positive, negative, or neutral? Different people might give different assessment. Technology wise, the data mining class uses GUI-based tool like Weka to analyze data, and thus does not require nor practice programming skills. The Text Mining class uses a python-based command line tool call Scikit learn, which requires students to learn/practice python programming on linux platform. Students without programming background can still take the class but are expected to spend a little more time on programming.

I. Course Description and Objectives This course will introduce popular data mining methods for extracting knowledge from data. The principles and theories of data mining methods will be discussed and will be related to the issues in applying data mining to problems. Students will also acquire hands-on experience using state-of-theart software to develop data mining solutions to scientific and business problems. The focus of this course is in understanding data and how to formulate data mining tasks in order to solve problems using the data. The topics of the course will include the key tasks of data mining, including data preparation, concept description, association rule mining, classification, clustering, evaluation and analysis. Through the exploration of the concepts and techniques of data mining and practical exercises, students will develop skills that can be applied to business, science or other organizational problems. The format of the class meetings will be a combined lecture and lab format, with lectures and class discussions to cover material and lab time to investigate small examples for the topic of the week. There will be weekly readings based on the textbook and on other materials which will be posted online. Upon completion of this course, students are expected to be able to: Understand the fundamental processes, concepts and techniques of data mining, Develop familiarity with data mining techniques and be able to apply them to real-world problems, Advance your understanding of contemporary data-mining systems. II. Course Materials Required textbook Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005) Introduction to Data Mining. (Free sample chapters available at authors website http://wwwusers.cs.umn.edu/~kumar/dmbook/index.php) Software Required: Weka Optional: Rapid-Miner, Scikit-Learn, SPSS, R All of these software packages have been installed on the lab computers. You can access them through remote lab. All software packages except SPSS are open-source.. You may install them on your own computer. The university bookstore offers SPSS license at student rate of $50/year (http://its.syr.edu/licenses/spss%20statistics.html).

III. Assessment Your final grade is determined by your performance on the items in the table below. An overview of each item is provided in the remainder of this section. Assessment Item Weight % Class exercises 15 10 homework assignments 60 Project proposal Project presentation and report 25 Total 100 Commented not graded Class exercises Students are required to participate in class discussions and exercises held on the discussion forums in BlackBoard. These exercises are designed to encourage students to practice their newly learned knowledge, and thus the grading is based on participation only, not performance. All participations in the exercise forums will be tallied every week. If there is x number of exercises throughout the semester, and a student finishes y number of exercises in total, the student s grade is y/x*15. Homework assignments 10 homework assignments will be given during the semester. You are free to discuss the assignments with your classmates, but you must write up the report all by yourself. Plagiarism cases will be reported to the university. Assignments must be professionally prepared and submitted electronically to the BlackBoard. All assignments should be submitted in Word files named as HW_Num_Lastname_Firstname.doc(x), e.g. HW_1_Smith_John.doc. Grades and comments for the assignments will be made available in the BlackBoard. Please DO NOT submit PDF files due to poor support for commenting in pdf files. A common error analysis file will be released when assignments are turned one week after submission. To ensure fast return, all assignments should be submitted on time each Tuesday night at 11pm. Late submissions will be penalized 20% for any part of the first 24 hours, 50% for any part of the second 24 hours, and 100% thereafter. Solutions to the assignments will be posted soon after the cutoff time, in order for students to receive at least some general feedback before grading is completed. Course project The objective of the project is to use the main skills taught in this class to solve a real data mining problem. You can either work alone or form a team of up to 3 students (including yourself) to work on a project throughout the semester and turn in the reports at each checkpoint. Students in a team should state their individual contributions in the reports.

Checkpoint 1: project proposal Your proposal should include an overview of the data mining problem, the data set you will use and its availability, and your proposed data mining approach. Checkpoint 2: project presentation and final project report Before the end of the semester, you will present your data mining project for the whole class, and at the same time submit a final project report. The presentation is a good opportunity to explain your project to the instructor and your peer students. The final project report detailing the data mining problem, its significance and broader impact, the data mining approaches, results, and interpretation of the discovered patterns.. Grading An internal i-school faculty survey indicated that the average expectation amongst faculty members for grade distribution in graduate classes was as follows: A (38%); B (48%); C (12%) and F (3%), although grade distribution may vary among classes, and grades are not curved in this class. For this class, an "A" would mean the student has the capability to independently solve a simple data mining task. Below is a common formula for number-to-letter grade conversion. However, because of the varied level of background knowledge on data analysis among our students, some students might have bumpier start than others. What really matters is whether a student becomes a good data analyst upon finishing the class. Therefore, a trend analysis of each student will be conducted at the end of the semester to determine a student s final grade. Grade Points Grade Points Grade Points Grade Points B+ 87-89 C+ 77-79 D 60-69 A 93-100 B 83-86 C 73-76 F 0-59 A- 90-92 B- 80-82 C- 70-72 IV. Course policies Communications This course will use the SU BlackBoard System as the main communication platforms. Students are required to check their BlackBoard accounts on a regular basis. Important announcements will be posted to the Announcements board, which automatically sends the announcement to students syr email account. Failure to read the class announcements will not be considered a suitable excuse for not being informed. The BlackBoard can be accessed at http://blackboard.syr.edu. Questions regarding the BlackBoard should be directed to ilms@syr.edu or Peggy Brown at 315-443-9370. All emails to the instructor should be sent to byu.teaching@gmail.com with subject line starting with IST 565. This email account is dedicated to teaching matters with higher priority than my official email byu@syr.edu. All your emails will be archived for easy follow-up afterwards.

Academic Integrity The academic community of Syracuse University and of the School of Information Studies requires the highest standards of professional ethics and personal integrity from all members of the community. Violations of these standards are violations of a mutual obligation characterized by trust, honesty, and personal honor. As a community, we commit ourselves to standards of academic conduct, impose sanctions against those who violate these standards, and keep appropriate records of violations. The academic integrity statement can be found at: http://supolicies.syr.edu/ethics/acad_integrity.htm. Respect Intellectual Property Rights and cite all sources in your work. Any valid citation style may be used. The following link may be used for further information regarding appropriate citation styles: http://researchguides.library.syr.edu/citation. Student with Disabilities If you believe that you need accommodations for a disability, please contact the Office of Disability Services (ODS), http://disabilityservices.syr.edu, located in Room 309 of 804 University Avenue, or call (315) 443-4498 for an appointment to discuss your needs and the process for requesting accommodations. ODS is responsible for coordinating disability-related accommodations and will issue students with documented disabilities Accommodation Authorization Letters, as appropriate. Since accommodations may require early planning and generally are not provided retroactively, please contact ODS as soon as possible. Ownership of Student Work This course may use course participation and documents created by students for educational purpose. In compliance with the Federal Family Educational Rights and Privacy Act, works in all media produced by students as part of their course participation at Syracuse University may be used for educational purposes, provided that the course syllabus makes clear that such use may occur. It is understood that registration for and continued enrollment in a course where such use of student works is announced constitutes permission by the student. After such a course has been completed, any further use of student works will meet one of the following conditions: (1) the work will be rendered anonymous through the removal of all personal identification of the work s creator/originator(s); or (2) the creator/originator(s) written permission will be secured. As generally accepted practice, honors theses, graduate theses, graduate research projects, dissertations, or other exit projects submitted in partial fulfillment of degree requirements are placed in the library, University Archives, or academic departments for public reference.

Week Date Topic Class Schedule Textbook Readings (Pang et al.) Item release Item due Item return 1 01/14 Introduction Ch.1 HW1 2 01/21 Data preparation Ch. 2 HW2 HW1 3 01/28 Data exploration Ch. 3 HW3 HW2 HW1 4 02/04 Classification algorithm: decision tree Ch. 4.1-4.3 HW4 HW3 HW2 5 02/11 Model evaluation Ch. 4.4-4.6 HW5 HW4 HW3 Classification Ch. 5.3 HW6 HW5 HW4 6 02/18 algorithm: naïve Bayes Classification Ch. 5.2, 5.5 HW7 HW6 HW5 7 02/25 algorithm: knn and SVMs Clustering Ch. 8.1-8.2 HW8 HW7 HW6 8 03/04 algorithm: k- Means 03/11 Spring Break Clustering: HAC Ch. 8.3 HW9 HW8 HW7 9 03/18 Project proposal 10 03/25 Association rules Ch. 6 HW10 HW9 HW8 11 04/01 Text mining HW10 HW9 12 04/08 Text mining HW10 13 04/15 Project clinic 14 04/22 Project presentation Final project report