270107 - MD - Data Mining



Similar documents
LI - Logics in Information Technology

TDS - Socio-Environmental Data Science

240ST014 - Data Analysis of Transport and Logistics

WSE - Writing Skills for Engineering

SICSB - Information Systems and Communications for Health Services

IES - Introduction to Software Engineering

Data Mining course Master in Information Technologies Enginyeria Informàtica Tomàs Aluja. LIAM EIO. UPC Lluis Belanche LSI. UPC

Faculty of Science School of Mathematics and Statistics

Learning outcomes. Knowledge and understanding. Competence and skills

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

INFO-N1O23 - Informatics

MS1b Statistical Data Mining

COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences Academic Year Qualification.

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

GPS - Software Project Management

SI - Computer Security

ATV - Lifetime Data Analysis

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

ADL - Longitudinal Data Analysis

INEP-I3O23 - Introduction to Software Engineering

INF - Computer Science

PI - Internet Protocols

Subject Description Form

CS Data Science and Visualization Spring 2016

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Statistics W4240: Data Mining Columbia University Spring, 2014

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

TXC - Computer Network Technology

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

DAIC - Advanced Experimental Design in Clinical Research

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Assessing Data Mining: The State of the Practice

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

240IOI21 - Operations Management

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Course Syllabus Business Intelligence and CRM Technologies

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering

Top Top 10 Algorithms in Data Mining

FOPR-I1O23 - Fundamentals of Programming

Meta-learning. Synonyms. Definition. Characteristics

Learning From Mechanical Failure in Engineering

Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

College of Health and Human Services. Fall Syllabus

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

COURSE PROFILE. Business Intelligence MIS531 Fall

ICPSR Summer Program

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

Top 10 Algorithms in Data Mining

Business Intelligence for The Internet of Things

ADSO-I5O01 - Operating Systems Administration

NS - Network Security

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Project Management

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Project Management

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Fortgeschrittene Computerintensive Methoden

Data Mining Carnegie Mellon University Mini 2, Fall Syllabus

Principles of Data Mining by Hand&Mannila&Smyth

SEAX-C9X44 - Network Security and Administration

ICPSR Summer Program, 2014

240ST Operations Management in the Supply Chain

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

FBIO - Fundations of Bioinformatics

HT2015: SC4 Statistical Data Mining and Machine Learning

Information Management course

Course Title: Advanced Topics in Quantitative Methods: Educational Data Science Practicum

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

FEN - Financial Engineering: Applications to Information Technology Projects

Comparison of Data Mining Techniques used for Financial Data Analysis

DSA - Service and Application Design

DATA MINING TECHNIQUES AND APPLICATIONS

Machine Design and Manufacturing Technologies

How To Do Data Mining In R

Course Syllabus. Purposes of Course:

Data Mining: Overview. What is Data Mining?

ENGSOF2 - Software Engineering II

An Overview of Knowledge Discovery Database and Data mining Techniques

The University of Jordan

King Saud University

Statistics Graduate Courses

Master Specialization in Knowledge Engineering

Business Intelligence. Data Mining and Optimization for Decision Making

Information and Decision Sciences (IDS)

Machine Learning for Data Science (CS4786) Lecture 1

Supply Chain Management

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

EEAE - Fundamentals of Economics, Environmental Economics and Ecological Economics

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Visualization of textual data: unfolding the Kohonen maps.

Flight Mechanics

Data Mining. Nonlinear Classification

Regularized Logistic Regression for Mind Reading with Parallel Validation

Transcription:

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of Computer Science BACHELOR'S DEGREE IN INFORMATICS ENGINEERING (Syllabus 010). (Teaching unit Optional) 6 Teaching languages: Catalan Teaching staff Coordinator: - Karina Gibert Oliveras (karina.gibert@upc.edu) - Mario Martín Muñoz (mmartin@cs.upc.edu) Prior skills Foundations of probability and statistics. Basic Programming in R Requirements - Prerequisite PE - Prerequisite PRO Degree competences to which the subject contributes Specific: CSI.. To conceive, deploy, organize and manage computer systems and services, in business or institutional contexts, to improve the business processes; to take responsibility and lead the start-up and the continuous improvement; to evaluate its economic and social impact. CSI.3. To demonstrate knowledge and application capacity of extraction and knowledge management systems. CSI.6. To demonstrate knowledge and capacity to apply decision support and business intelligence systems. Generical: G3. THIRD LANGUAGE: to know the English language in a correct oral and written level, and accordingly to the needs of the graduates in Informatics Engineering. Capacity to work in a multidisciplinary group and in a multi-language environment and to communicate, orally and in a written way, knowledge, procedures, results and ideas related to the technical informatics engineer profession. G9. PROPER THINKING HABITS: capacity of critical, logical and mathematical reasoning. Capacity to solve problems in her study area. Abstraction capacity: capacity to create and use models that reflect real situations. Capacity to design and perform simple experiments and analyse and interpret its results. Analysis, synthesis and evaluation capacity. 1 / 10

Teaching methodology The learning methodology will consist in the analysis of case studies concerning complex data sets from real problems. From these problems the body of necessary scientific knowledge will be introduced. The theoretical and practical lessons are interleaved such that programming and/or integration of data mining functions enhance the assimilation of the various concepts explained. The open programming environment R will be used in the laboratory. The laboratory classes will be devoted to solving problems related to the knowledge provided in the theory classes and to the resolution by the students of a similar problem. This problem may include the resolution of very brief conceptual questions and will be delivered for its evaluation. Finally, the students must complete two full practical works, a statistical modeling problem and a modelling problem of the "scientific", "transaction" or "marketing" kind (only one of them must be chosen by the student). This last practical work will be presented orally to the whole class. Learning objectives of the subject 1.Knowing the types of the main problems of Data Mining.Data quality assesment and preprocessing 3.Problem solving: identify the statistical and/or machine learning techniques more appropriate to solve the problem 5.Implement simple learning algorithms 6.Validation of results 7.Presentation of results in a professional environment for decision making Study load Total learning time: 150h Theory classes: 30h 0.00% Practical classes: 0h 0.00% Laboratory classes: 30h 0.00% Guided activities: 6h 4.00% Self study: 84h 56.00% / 10

Content Introduction to Data Mining. Statistical modeling and types of problems: analysis of binary data ("transactions"), analysis of scientific data and analysis of data from enterprises Visualization and dimensionality reduction Feature selection and extraction. Visualization of multivariate data. Clustering Direct partitioning methods, hierarchical methods and expectation maximization Predictive Methods Regressió lineal múltiple i generalitzada. Regressió Logística. Xarxes Neuronals Decision Trees Classification and regression trees (CART). Validation protocols and data resampling Holdout, cross-validation and the bootstrap 3 / 10

Generation of association rules A-priori and Eclat algorithms. Discriminant Analysis Bayesian decision theory. LDA and QDA Discriminant Analysis and Naïve Bayes Non parametric discrimination Nearest neighbours Regression Shrinkage and Variable Selection Regularized linear regression. LASSO and the Elastic Net methods. Formal concept analysis Formal method for pattern finding Preprocessing a 4 / 10

Bagging i ensemble methods Bagging i ensemble methods 5 / 10

Planning of activities Development Unit 1 Hours: h Theory classes: h Laboratory classes: 0h Self study: 0h 1 A review of R language Hours: 6h Theory classes: 0h Laboratory classes: 6h Self study: 0h Development of item Hours: 16h Theory classes: 4h Laboratory classes: 4h Self study: 8h Development of item 3 Hours: 9h Laboratory classes: h Development of Item 4 Hours: 11h Laboratory classes: 4h 6 / 10

Development of item 5 Hours: 9h Laboratory classes: h Development of Item 6 Hours: 7h Laboratory classes: 0h Development of Item 7 Hours: 9h Laboratory classes: h Development of Item 8 Hours: 11h Laboratory classes: 4h 7 / 10

Development of Item 9 Hours: 11h Laboratory classes: h Self study: 6h Development of Item 10 Hours: 13h Laboratory classes: 4h Self study: 6h 6 Practice 1 Hours: 3h Guided activities: 3h Self study: 0h, 3, 5, 6 Practice Hours: 3h Guided activities: 3h Self study: 0h 3, 5, 6, 7 8 / 10

Qualification system The evaluation of the course will be based on the grade obtained in the exercises developed during the lab sessions. On the other hand there will be two practical works. For each practical work, the student will deliver the corresponding written report. Finally, at the end of the course, the students must present orally the second practical work. The student will be required to show the necessary reasoning as well as English skills. These skills will be are evaluated using the corresponding rubrics. The overall laboratory grade is the average of the grades obtained for the exercises developed out of the laboratory sessions. The final mark will be obtained as follows: Lab = overall laboratory grade PR1 = grade for the first practical work PR = grade for the second practical work Final grade = 0.*Labo + 0.4*Pr1 + 0.4*Pr In both practical works (counting 40% each), 35% corresponds to the technical correction and 5% corresponds to the 'reasoning' generic competence, so that this competence gets an overall weight of 10% of the final grade. 9 / 10

Bibliography Basic: Hand, D.J. Construction and assessment of classification rules. Wiley, 1997. ISBN 978-0-471-96583-1. Hastie, T.; Tibshirani, R.; Friedman, J. The elements of statistical learning: data mining, inference, and prediction. nd ed. Springer, 009. ISBN 9780387848570. Hernández Orallo, J.; Ramírez Quintana, M.J.; Ferri Ramírez, C. Introducción a la minería de datos. Pearson, 004. ISBN 978840540917. Maindonald, J.H.; Braun, J. Data analysis and graphics using R: an example-based approach. 3rd ed. Cambridge University, 010. ISBN 97805176939. Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern classification. nd ed. John Wiley & Sons, 001. ISBN 0-471-05669-3. Complementary: Aluja Banet, T.; Morineau, A. Aprender de los datos: el análisis de componentes principales: una aproximación desde el Data Mining. EUB, 1999. ISBN 9788483104. Others resources: Hyperlink http://www.cran.es.r-project.org http://www.kdnuggets.com/ http://www.cs.waikako.ac.nz 10 / 10