CS 6220: Data Mining Techniques Course Project Description



Similar documents
Office: LSK 5045 Begin subject: [ISOM3360]...

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

King Saud University

Data Mining - Evaluation of Classifiers

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Analysis One Code Desc. Transaction Amount. Fiscal Period

UNIVERSITY OF MASSACHUSETTS BOSTON COLLEGE OF MANAGEMENT AF Theory of Finance SYLLABUS Spring 2013

Using Data Mining for Mobile Communication Clustering and Characterization

Data Mining Part 5. Prediction

Open source framework for data-flow visual analytic tools for large databases

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

Course Syllabus. Purposes of Course:

A Content based Spam Filtering Using Optical Back Propagation Technique

Improving spam mail filtering using classification algorithms with discretization Filter

ENVIRONMENTAL SCIENCE LAB (FOR MAJORS) ESCI ) Teach practical skills for use in the lab and field.

English 2413 Technical Writing. Instructor: Professor Deanna White Office: HSS Spring, 2011 Phone:

Categorical Data Visualization and Clustering Using Subjective Factors

PEABODY COLLEGE ACADEMIC CALENDAR

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Introduction to Computer Engineering EECS 203

Comparison of Data Mining Techniques used for Financial Data Analysis

Data Mining: A Preprocessing Engine

HOSPITALITY SERVICES MANAGEMENT (aka Services Management and Marketing) HM Course Syllabus Spring 2006

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Tweaking Naïve Bayes classifier for intelligent spam detection

Academic Calendars. Term I (20081) Term II (20082) Term III (20083) Weekend College. International Student Admission Deadlines

An Introduction to Data Mining

Course Description This course will change the way you think about data and its role in business.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, HIKARI Ltd,

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

Department of Engineering Science Syllabus

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

BUSINESS INTELLIGENCE WITH DATA MINING FALL 2012 PROFESSOR MAYTAL SAAR-TSECHANSKY

Predictive Data modeling for health care: Comparative performance study of different prediction models

Feature Subset Selection in Spam Detection

I INF 300: Probability and Statistics for Data Analytics (3 credit hours) Spring 2015, Class number 9873

CLINICAL DECISION SUPPORT FOR HEART DISEASE USING PREDICTIVE MODELS

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Data Mining for Direct Marketing: Problems and

Data Mining for Fun and Profit

Classification algorithm in Data mining: An Overview

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

POLITICAL SCIENCE : INTRODUCTION TO POLITICAL SCIENCE Spring 2015 Online

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

How To Learn Data Analytics

MS1b Statistical Data Mining

Social Media Mining. Data Mining Essentials

Class Info: ITD 210-W01 Phone: Location/Room: Internet Office Location 1: FAC 2, Room 223

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

Business Intelligence. Data Mining and Optimization for Decision Making

The University of North Carolina at Greensboro CRS 605: Research Methodology in Consumer, Apparel, and Retail Studies (3 Credits) Spring 2014

Data Mining: Overview. What is Data Mining?

An Approach to Detect Spam s by Using Majority Voting

How To Create A Text Classification System For Spam Filtering

IST639 Enterprise Technologies Course Syllabus Spring 2014

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Data Mining Algorithms Part 1. Dejan Sarka

Machine Learning. CS494/594, Fall :10 AM 12:25 PM Claxton 205. Slides adapted (and extended) from: ETHEM ALPAYDIN The MIT Press, 2004

INDIVIDUAL MASTERY for: St#: Test: CH 9 Acceleration Test on 29/07/2015 Grade: B Score: % (35.00 of 41.00)

Web Mining as a Tool for Understanding Online Learning

INDIVIDUAL MASTERY for: St#: Test: CH 9 Acceleration Test on 09/06/2015 Grade: A Score: % (38.00 of 41.00)

Karen D.W. Patterson, PhD Office: ASM 2089 Telephone:

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

DATA MINING TECHNIQUES AND APPLICATIONS

Spam Classification With Artificial Neural Network and Negative Selection Algorithm

REVIEW OF HEART DISEASE PREDICTION SYSTEM USING DATA MINING AND HYBRID INTELLIGENT TECHNIQUES

A Survey on Intrusion Detection System with Data Mining Techniques

CS/CEL 4750 Software Engineering II Spring 2014 ONLINE/HYBRID Course Delivery

1. Classification problems

Statistics W4240: Data Mining Columbia University Spring, 2014

Semi-Supervised and Unsupervised Machine Learning. Novel Strategies

Map/Reduce Affinity Propagation Clustering Algorithm

Chapter 6. The stacking ensemble approach

Intrusion Detection. Jeffrey J.P. Tsai. Imperial College Press. A Machine Learning Approach. Zhenwei Yu. University of Illinois, Chicago, USA

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

APPROVED - CAMBRIAN COLLEGE ACADEMIC SCHEDULE September August 2016

Summer and select CRN Client Principal, Hewlett-Packard Cell: Office hours by appointment

Data Mining. Nonlinear Classification

Real-Life Industrial Process Data Mining RISC-PROS. Activities of the RISC-PROS project

AUTO CLAIM FRAUD DETECTION USING MULTI CLASSIFIER SYSTEM

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Pentaho Data Mining Last Modified on January 22, 2007

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

The objectives of the course are to provide students with a solid foundation in all aspects of internet marketing. Specifically my goals are:

Biol 2401: Human Anatomy & Physiology I

Random Forest Based Imbalanced Data Cleaning and Classification

Machine learning for algo trading

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

PSY 3329 Educational Psychology Online Course Spring Week Course

Transcription:

CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013

General Goal In this project, you will have an opportunity to apply the data mining algorithms and techniques you learned in the class to some real-world problems. You can choose any problem that you are interested in, and formalize it into a data mining task. Then you get some data related to the task and apply some data mining algorithms to your data. Also you need to evaluate and compare your algorithms. And finally, you should submit a report, together with your data and code.

Detailed Stages and Deadlines: 1. Form Groups 3-4 students Deadline: Jan. 21 (11:59pm) Where to submit: Blackboard What to submit: Group name; Group members; Group leader Points: 1 point.

Detailed Stages and Deadlines: 2. Project Proposal Deadline: Feb. 11 (11:59pm) Where to submit: Blackboard What to submit: A 2-Page proposal including 2.1 Problem and goal What do you want to solve? Why do you think it is important? What results do you expect? 2.2 Formalization into data mining task E.g., Frequent pattern mining, classification, and clustering. 2.3 Data plan What kind of data? Where and how do you get the data? Make sure get data in time 2.4 Schedule: detailed plan of your project Points: 5 points Note: We will discuss with every group about your proposals later that week.

Detailed Stages and Deadlines: 3. Midterm Check Deadline: Mar. 18 (11:59pm) Where to submit: Blackboard What to submit: A Temporary report A draft of report Discuss about progress Issues and difficulties you have met points: 2 points Note: We will discuss with every group about your progress later that week.

Detailed Stages and Deadlines: 4. Final Report Deadline: Apr. 15 (11:59pm) Where to submit: Blackboard What to submit: A final report, data, and code Problem introduction, formalization, algorithms, experiment results, etc.. points: 12 + 5 points

Grading Total Credit 20 points of regular credit and 5 points of extra credit 1 Group formation (1 point) 2 Proposal (5 points) 3 Midterm check (2 points) 4 Data and code (5 points) Any programming language Documentation Own implementation 5 Final report (7 points) At least two algorithms and two evaluation methods 6 Additional features (5 extra points) Novelty of the problem Your own data More than two algorithms/evaluation methods Other innovative features (e.g., new algorithm)

Grading Collaboration Rules Every member in a team gets the same score (encourage teamwork) Exception: the team has the right to claim someone as a free rider, and we will lower his/her score A table describing your division An example: Task People 1. Collecting and preprocessing data Student A 2. Implementing Algorithm 1 Student B 3. Implementing Algorithm 2 Student C and D 4. Evaluating and comparing algorithms Student A 5. Writing report Student B and C Peer evaluation

Resources and References Datasets UCI Machine Learning Repository http://archive.ics.uci.edu/ml/ DBLP four-area dataset : http://www.ccs.neu.edu/home/yzsun/data/dblp four area.zip Sample Projects Students previous machine learning projects in Stanford University: http://cs229.stanford.edu/projects2011.html

A Simple Example: Email Classification Problem Determine whether a given email is spam or not Data Mining Task Binary classification Data UCI spam data set (http://archive.ics.uci.edu/ml/datasets/spambase) Number of instances: 4601 Number of attributes: 57 The last column denotes whether the e-mail was spam (1) or not (0) Partition it into training set and test set

A Simple Example: Email Classification Algorithms Naive Bayesian classifier Artificial neural network AdaBoost Evaluation and Comparison Error rate ROC and AUC Speed

Have fun with your project!