DSSP Data Science Starter Program - Polytechnique

Size: px
Start display at page:

Download "DSSP Data Science Starter Program - Polytechnique"

Transcription

1 DSSP Data Science Starter Program - Polytechnique A novel professional training on Data Science and Bigdata, offered by École Polytechnique jointly by the Applied Mathematics and Informatics Department 1. Target Audience and Prerequisite(s) Year 1 / October 3 - December 13, 2014 The proposed modules are suitable for anyone with some basic knowledge of Computer Science or Statistics. No programming experience is required. The program is designed for individuals (researchers and practitioners). The concepts and training delivered in this program enable a sound understanding of the context and challenges of Big Data, a challenge that shapes the evolution of sciences and many business domains. The offered program is suitable to both early career professionals as well as senior managers that need an understanding of this challenging area and its applications. 2. Data Science Starter Program The training program aims at professionals and executives and covers taught modules, labs and homework. It addresses state- of- the- art topics in Data Science and Big Data ranging from data collection, storage and processing to analytics and visualization, as well as a range of real- world applications and business/laboratory cases. This program is large- scope, and will cover, to a satisfactory degree of detail, the methods and tools to tackle big data problems. 2.1 Master Structure The training spans 140 hours taught (Friday and Saturday, in October/November), each training day: 2 x 3h slots + 1h conference/invited talk. The thematic articulation is as follows: Week 1. Data Science introduction. Big Data ecosystem: players, software, hardware Data project cycle/management Legal issues/security framework. Week 2. Data Management. Database / SQL, data cleaning, normalization, feature selection & creation spectral, decompositions and dimensionality reduction. Weeks 3-5. Data Analysis and Machine Learning. Descriptive (data quality) Exploratory (summary statistics, correlation, ANOVA) Inferential (theory of generalization, sampling, statistical testing) Predictive (supervised, unsupervised machine learning). Week 6-7. Cloud computing & Big Data. Introduction the basics of the cloud computing paradigm and understanding of performance evaluation for applications in the cloud. Basic concepts of Bigdata - Hadoop/MapReduce as a programming model for distributed processing of large datasets. Introduction to NoSQL languages. Week Graph & Text Mining and Bigdata Camp. Methods and tools for pre- processing, indexing, querying, retrieval and ranking of text at the document and collection levels. Algorithms for text- oriented application in web and social networks. Methods and tools for pre- processing graphs, searching ranking and evaluating nodes and communities. 1

2 2.2 Courses structure and Syllabus Course Objective Syllabus Introduction to Data Science Data Management Data Analysis and Machine Learning Cloud Computing & Bigdata To present a big picture of Data Science as well as of its cycles. To present the foundation of data management: accessing to the data stored in a database and (pre)processing to prepare its analysis To present the basis of Data Analysis and Machine Learning: how to describe and explore a dataset, how to use data to find hidden information and to do prediction with statistical and machine learning algorithms. Introduce the basics of the cloud- computing paradigm. Understand in performance evaluation for applications in the cloud. Understand the basic concepts in Hadoop/ MapReduce as a programming model for distributed processing of large datasets. Big Data ecosystem: players, software, hardware Data project cycle/management Juridic/security framework Databases, SQL, design Data processing: normalization, feature selection & creation, spectral decompositions and dimensionality reduction Looking at the data: Descriptive statistic, PCA and dimension reduction, Statistical testing Unsupervised clustering: Clustering, K- Means and K- Means++, DBSCAN, Hierarchical clustering Linear model and diagnostic: Generalization theory, Prediction vs inference, Linear model and diagnostic Logistic regression: Logistic regression and variable selection, Overfitting and Cross validation, Metric choice (AUC, Precision/Recall, F- Score,...) Machine Learning: Empirical criterion minimization, SVM, Regularization for SVM and logistic regression Tree methods and ensemble methods: Classification And Regression Tree,Bagging and boosting Further topics: Naive Bayes, Non- parametric methods, Neural networks and deep learning, Spectral clustering Overview of Computing Paradigms Grid Computing, Cluster Computing, Distributed Computing, Utility Computing, Cloud Computing Cloud Computing Architecture - Comparison with traditional computing architecture (client/server) Services provided at various levels, Role of Networks protocols, Web services Service Management in Cloud Computing Data security privacy and security Issues Principles of parallel processing and distributed systems Functional programming and parallel algorithms for Mapreduce Hadoop storage, DFS, Cluster architecture, Visual Analytics 2

3 Graph & Text Mining Graphs and Texts are ubiquitous in social and web data. This module provides methods and tools for pre- processing, indexing, querying, retrieval and ranking of text at the document and collection levels. We describe also algorithms for text- oriented application in web and social networks. For graphs, the objective is to provide methods and tools for pre- processing graphs, searching ranking and evaluating nodes and communities. Community mining methods, graph clustering methods (min- cut, spectral clustering), Spectral Clustering of Graph Data Ranking algorithms (Pagerank), Ranking evaluation measures (Kendal Tau, NDCG), Degeneracy (k- core & extensions) Feature extraction for text, scoring, term weighting & the vector space representation, indexing, retrieval functions: time- frequency/inverse- document- frequency (TF- IDF), BM25. Web Mining. Web personalization and recommendations (collaborative filtering) Web Advertising (Google ad- words, 2nd price auctions, campaign design principles, natural language generation for snippets, campaign optimization algorithms). Bigdata Camp Apply the techniques described in the previous lectures to a case study from an industrial problem or academic problem, using state- of- the- art methods and machine learning tools. Conferences - Invited talks Case study from industry or academia Workshops from machine learning challenges This is a horizontal activity spanning all the duration of the master with invited people from academia and industry to present topics and experiences from data science and big data case studies. 3

4 3. Teaching staff Faculty S. Gaiffas (CMAP), C. Giatsidis (LIX), B. Kegl (LAL), https://users.lal.in2p3.fr/kegl/ Short CV Stéphane Gaïffas is Professeur Chargé at the department of applied mathematics of Ecole Polytechnique. He is doing research in Statistics and Machine Learning, with current applications to web- marketing, social networks, and health records data in partnership with Caisse Nationale d Assurance Maladie. He defended his PhD in Statistics about «Nonparametric Regression and Inhomogeneous Information» under the supervision of Marc Hoffman at LPMA - Univ. Denis Diderot in He was Maitre de Conférence at LSTA - Univ Paris 6 between 2007 and He has a scientific consultant activity for machine learning and big data since 3 years with several french companies. Christos Giatsidis is currently a Post- doctoral researcher in the Computer Science Laboratory at Ecole Polytechnique in France. He received his Diploma in computer Science from the Athens Univ. of Economics & Business, Greece in 2009 and his PhD from Ecole Polytechnique, under the supervision of Prof. Michalis Vazirgiannis. In 2014 he received a "thesis prize" for his thesis entitled "Graph Mining and Community Detection with Degeneracy". He has experience in both the research and industrial domain. Specifically, recent work on the industrial domain includes predicting a players obsession for a large French company in the gambling industry and working on a prediction model for component failure for a big aeronautics company. His research interests include data/graph mining and algorithms for big data management. Balázs Kégl received the Ph.D. degree in computer science from Concordia University, Montreal, in From January to December 2000 he was a Postdoctoral Fellow at the Department of Mathematics and Statistics at Queen's University, Kingston, Canada, receiving NSERC Postdoctoral Fellowship. He was an Assistant Professor from 2001 to 2006 in the Department of Computer Science and Operations Research at the University of Montreal. Since 2006 he has been a research scientist in the Linear Accelerator Laboratory of the CNRS (DR since 2013). He has published more than hundred papers on unsupervised and supervised learning (principal curves, intrinsic dimensionality estimation, boosting), large- scale Bayesian inference and optimization, and on various applications ranging from music and image processing to systems biology and experimental physics. At his current position he has been the head of the AppStat team working on machine learning and statistical inference problems motivated by applications in high- 4

5 energy particle and astroparticle physics. Since 2014, he has been the chair of the Center for Data Science of the University of Paris Saclay. E. Le Pennec (CMAP), Eric Matzner- Lober (CMAP) M. Vazirgiannis (LIX) Erwan Le Pennec have been an Associate Professor (Professeur associé) at the Applied Math department of École Polytechnique since September He is doing his research in statistics and signal processing at the CMAP of the same school. He has done a Signal Processing PhD with Stéphane Mallat at the centre de mathématiques appliquées de l'école Polytechnique. The subject of his thesis is the introduction of geometry in image representation. He defended it on December the 19th 2002: its title is Bandelettes et représentations géométriques des images (Bandelets and geometric representation of images). In , He worked as a "post- doc" in a joint- project between the CMAP and Let It Wave, a company created by Stéphane Mallat, Christophe Bernard, Jérôme Kalifa and myself to exploit our research on bandelets. From 2004 to 2010, He was a "Maitre de Conférence" (Assistant Professor) at the university Paris Diderot (Paris 7) in the "laboratoire de Probabilités et Modèles Aléatoires" (Statistics team). From 2010 to 2013, He was a "Chargé de Recherche" (Research Associate) at the project SELECT of Inria Saclay, a project in which he had already worked in He has also accompanied Let It Wave, even after it was selled to Zoran, as a scientific consultant. Eric Matzner- Lober have been professor of Statistics at Rennes 2 university since 2007, he is also affiliated at Los Alamos National Laboratory. From this year on, he is also part time professor at Ecole Polytechnique. He is a specialist of non parametric statistic and machine learning. He is a renown expert of R, a language for which he runs a book series. He has also funded a statistic consulting company that has been bought by a major consulting actor. Dr. Vazirgiannis is a Professor in LIX, Ecole Polytechnique. He is currently working in the area of Data Science for Bigdata aiming at harnessing the potential of machine learning algorithms for large scale data sets including text and graphs. More specifically his current work is on graph degeneracy for large scale graph mining, graph based text retrieval, learning models from time series data and text mining for the web (i.e. advertising, news streams). He is involved in teaching in data mining and machine learning for big data in Ecole Polytechnique. He has supervised previously nine completed Ph.D. theses and supervises six more underway. He has published chapters in books and encyclopedias, two international books and more than a hundred twenty (120) papers in international refereed journals and conferences. He has received the 5

6 ERCIM and Marie Curie EU fellowships. Also he has coauthored three patents and attracted significant R&D funding including national and international research & development projects. Currently he leads industrial projects in the area of large scale machine learning. 6

7 4. Master Schedule (3/10/ /12/2014) Session Date Topic Teaching Faculty Amphi 1 3/10/2014 Introduction to Data Science Gaiffas, Le Pennec, Matzner Painlevé 2 4/10/2014 Introduction to Data Science Gaiffas, Le Pennec, Matzner Painlevé 3 10/10/2014 Data Management Giatsidis, Vazirgianis Painlevé 4 11/10/2014 Data Management Giatsidis, Vazirgianis Painlevé 5 17/10/2014 Data Analysis and Machine Learning Gaiffas, Le Pennec, Matzner Painlevé 6 18/10/2014 Data Analysis and Machine Learning Gaiffas, Le Pennec, Matzner Painlevé 7 24/10/2014 Data Analysis and Machine Learning Gaiffas, Le Pennec, Matzner Painlevé 8 25/10/2014 Data Analysis and Machine Learning Gaiffas, Le Pennec, Matzner Painlevé 9 07/11/2014 Data Analysis and Machine Learning Gaiffas, Le Pennec, Matzner Painlevé 10 08/11/2014 Data Analysis and Machine Learning Gaiffas, Le Pennec, Matzner Painlevé 11 14/11/2014 Cloud Computing & Bigdata Gaiffas, Matzner Painlevé 12 15/11/2014 Cloud Computing & Bigdata Gaiffas, Matzner Painlevé 13 21/11/2014 Cloud Computing & Bigdata Giatsidis, Vazirgianis Painlevé 14 22/11/2014 Cloud Computing & Bigdata Giatsidis, Vazirgianis Painlevé 15 28/11/2014 Graph/Text Mining Vazirgianis, Giatsidis Painlevé 16 29/12/2014 Graph/Text Mining Vazirgianis, Malliaros Painlevé 17 5/12/2014 Bigdata Camp Kegl, Giatsidis Painlevé 18 6/12/2014 Bigdata Camp Kegl, Giatsidis Labs 19 12/12/2014 Bigdata Camp Kegl, Giatsidis Painlevé 20 13/12/2014 Bigdata Camp Kegl, Giatsidis Painlevé 7

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA

INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA POLITECNICO DI MILANO GRADUATE SCHOOL OF BUSINESS BABD INTERNATIONAL MASTER IN BUSINESS ANALYTICS AND BIG DATA Courses Description A JOINT PROGRAM WITH POLITECNICO DI MILANO SCHOOL OF MANAGEMENT PRE-COURSES

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by

More information

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376 Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Data Science in Action

Data Science in Action + Data Science in Action Peerapon Vateekul, Ph.D. Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University + Outlines 2 Data Science & Data Scientist Data Mining Analytics with

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.

01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours. (International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models

More information

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level? Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level? Dr. Frank Lee Chair, ECE/CS/IT New York Institute of Technology Old Westbury, NY 11568 Topics This talk describes:

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

Predictive Analytics Certificate Program

Predictive Analytics Certificate Program Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

More information

Data Science at U of U

Data Science at U of U Data Science at U of U Je M. Phillips Assistant Professor, School of Computing Center for Extreme Data Management, Analysis, and Visualization Director, Data Management and Analysis Track University of

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

2016 POST-DOCTORAL PROGRAM Applicant Guide

2016 POST-DOCTORAL PROGRAM Applicant Guide 2016 POST-DOCTORAL PROGRAM Applicant Guide POST-DOCTORAL FELLOWSHIP PROGRAM 2016 Applicant guide The Initiative of Excellence of the University of Bordeaux (IdEx Bordeaux) is opening positions for postdoctoral

More information

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data Search in BigData 2 - When Big Text meets Big Graph Christos Giatsidis, Fragkiskos D. Malliaros, François Rousseau, Michalis Vazirgiannis Computer Science Laboratory, École Polytechnique, France {giatsidis,

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]...

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]... Business Intelligence and Data Mining ISOM 3360: Spring 2015 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: justinjia@ust.hk Office: LSK 5045 Begin subject:

More information

PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMME

PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMME PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMME KEY FACTS Programme name Advanced Computer Science Award MSc School Mathematics, Computer Science and Engineering Department or equivalent Department of Computing

More information

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering

DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY. MASTER in Informatics Engineering DEGREE CURRICULUM BIG DATA ANALYTICS SPECIALITY MASTER in Informatics Engineering Module general information Module name BIG DATA ANALYTICS SPECIALITY Typology Optional ECTS 18 Temporal organization C1S2

More information

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina Graduate Co-op Students Information Manual Department of Computer Science Faculty of Science University of Regina 2014 1 Table of Contents 1. Department Description..3 2. Program Requirements and Procedures

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Big Data Analytics and Healthcare

Big Data Analytics and Healthcare Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

MHI3000 Big Data Analytics for Health Care Final Project Report

MHI3000 Big Data Analytics for Health Care Final Project Report MHI3000 Big Data Analytics for Health Care Final Project Report Zhongtian Fred Qiu (1002274530) http://gallery.azureml.net/details/81ddb2ab137046d4925584b5095ec7aa 1. Data pre-processing The data given

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) 305 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Big-Data Computing with Smart Clouds and IoT Sensing

Big-Data Computing with Smart Clouds and IoT Sensing A New Book from Wiley Publisher to appear in late 2016 or early 2017 Big-Data Computing with Smart Clouds and IoT Sensing Kai Hwang, University of Southern California, USA Min Chen, Huazhong University

More information

EUPIDE 2008 Enterprise-University Partnership in Doctoral Education 12-13 June, Université Pierre et Marie Curie, Paris Conference program

EUPIDE 2008 Enterprise-University Partnership in Doctoral Education 12-13 June, Université Pierre et Marie Curie, Paris Conference program EUPIDE 2008 Enterprise-University Partnership in Doctoral Education 12-13 June, Université Pierre et Marie Curie, Paris Conference program Session 3 Workshop 2 DEVELOPING KNOWLEDGE OF ENTREPRISE Joël Monéger

More information

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School

More information

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery

Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Scalable Machine Learning to Exploit Big Data for Knowledge Discovery Una-May O Reilly MIT MIT ILP-EPOCH Taiwan Symposium Big Data: Technologies and Applications Lots of Data Everywhere Knowledge Mining

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

High Productivity Data Processing Analytics Methods with Applications

High Productivity Data Processing Analytics Methods with Applications High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research

More information

Journée Thématique Big Data 13/03/2015

Journée Thématique Big Data 13/03/2015 Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets

More information

No BI without Machine Learning

No BI without Machine Learning No BI without Machine Learning Francis Pieraut francis@qmining.com http://fraka6.blogspot.com/ 10 March 2011 MTI-820 ETS Too Much Data Supervised Learning (classification) Unsupervised Learning (clustering)

More information

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction to Machine Learning Lecture 1 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Introduction Logistics Prerequisites: basics concepts needed in probability and statistics

More information

Master of Science in Health Information Technology Degree Curriculum

Master of Science in Health Information Technology Degree Curriculum Master of Science in Health Information Technology Degree Curriculum Core courses: 8 courses Total Credit from Core Courses = 24 Core Courses Course Name HRS Pre-Req Choose MIS 525 or CIS 564: 1 MIS 525

More information

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa

More information

270107 - MD - Data Mining

270107 - MD - Data Mining Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of

More information

Prerequisites. Course Outline

Prerequisites. Course Outline MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning.  CS 2750 Machine Learning. Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

International Workshop on Big Data Analytics for Advanced Databases (BIGDATA, 2016)

International Workshop on Big Data Analytics for Advanced Databases (BIGDATA, 2016) International Workshop on Big Data Analytics for Advanced Databases (BIGDATA, 2016) Call for Papers AIM and SCOPE There is an exponential growth in digital data with unprecedented new platforms derived

More information

EECS 445: Introduction to Machine Learning Winter 2015

EECS 445: Introduction to Machine Learning Winter 2015 Instructor: Prof. Jenna Wiens Office: 3609 BBB wiensj@umich.edu EECS 445: Introduction to Machine Learning Winter 2015 Graduate Student Instructor: Srayan Datta Office: 3349 North Quad (**office hours

More information

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis) CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce

More information

Knowledge Discovery and Data Mining 1 (VO) (707.003)

Knowledge Discovery and Data Mining 1 (VO) (707.003) Knowledge Discovery and Data Mining 1 (VO) (707.003) Denis Helic KTI, TU Graz Oct 1, 2015 Denis Helic (KTI, TU Graz) KDDM1 Oct 1, 2015 1 / 55 Lecturer Name: Denis Helic Office: IWT, Inffeldgasse 13, 5th

More information

Predictive Data modeling for health care: Comparative performance study of different prediction models

Predictive Data modeling for health care: Comparative performance study of different prediction models Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar

More information

SUPPLY CHAIN AND BUSINESS TECHNOLOGY MANAGEMENT Section 61.50

SUPPLY CHAIN AND BUSINESS TECHNOLOGY MANAGEMENT Section 61.50 SUPPLY CHAIN AND BUSINESS TECHNOLOGY MANAGEMENT Section 61.50 Faculty Professor and Chair of the Department RUSTAM VAHIDOV, PhD Georgia State University Professors CLARENCE BAYNE, PhD McGill University

More information

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining + Business Intelligence. Integration, Design and Implementation Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

for the Field of Electrical and Information Engineering 1. Introduction: the doctorate in the framework of the European policy of education

for the Field of Electrical and Information Engineering 1. Introduction: the doctorate in the framework of the European policy of education 2BNew Trends of Doctoral Studies in Europe: Special Considerations for the Field of Electrical and Information Engineering Olivier Bonnaud, Michael H.W. Hoffmann The authors are members of EAEEIE, IEEE,

More information

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference to

More information

Doctor of Philosophy in Computer Science

Doctor of Philosophy in Computer Science Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

An Introduction to Health Informatics for a Global Information Based Society

An Introduction to Health Informatics for a Global Information Based Society An Introduction to Health Informatics for a Global Information Based Society A Course proposal for 2010 Healthcare Industry Skills Innovation Award Sponsored by the IBM Academic Initiative submitted by

More information

Parallel and Distributed Data Analytics (PDDA 2014)

Parallel and Distributed Data Analytics (PDDA 2014) Ecole d Eté CEA-EDF-Inria 16 au 20 Juin 2014 CEA Cadarache Parallel and Distributed Data Analytics (PDDA 2014) Organizers: CEA: Michael Aupetit (LIST, Saclay) EDF: Georges Hébrail (R&D, Clamart) Inria:

More information

Machine learning for algo trading

Machine learning for algo trading Machine learning for algo trading An introduction for nonmathematicians Dr. Aly Kassam Overview High level introduction to machine learning A machine learning bestiary What has all this got to do with

More information

CURRICULUM VITAE. August 2008 now: Lecturer in Analysis at the University of Birmingham.

CURRICULUM VITAE. August 2008 now: Lecturer in Analysis at the University of Birmingham. CURRICULUM VITAE Name: Olga Maleva Work address: School of Mathematics, Watson Building, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK Telephone: +44(0)121 414 6584 Fax: +44(0)121 414 3389

More information

Using Data Mining and Machine Learning in Retail

Using Data Mining and Machine Learning in Retail Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune

More information

Scalable Developments for Big Data Analytics in Remote Sensing

Scalable Developments for Big Data Analytics in Remote Sensing Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,

More information

M E M O R A N D U M. Faculty Senate Approved April 2, 2015

M E M O R A N D U M. Faculty Senate Approved April 2, 2015 M E M O R A N D U M Faculty Senate Approved April 2, 2015 TO: FROM: Deans and Chairs Becky Bitter, Sr. Assistant Registrar DATE: March 26, 2015 SUBJECT: Minor Change Bulletin No. 11 The courses listed

More information

At a Glance A short portrait of the Technical University of Crete

At a Glance A short portrait of the Technical University of Crete At a Glance A short portrait of the Technical University of Crete Contact: Technical University of Crete Public & International Relations Department University Campus Akrotiri 731 00 Chania Crete Greece

More information

ADVANCED MACHINE LEARNING. Introduction

ADVANCED MACHINE LEARNING. Introduction 1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

More information

MSCA 31000 Introduction to Statistical Concepts

MSCA 31000 Introduction to Statistical Concepts MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced

More information