DATA SCIENCE Workshop November 12-13, 2015

Size: px
Start display at page:

Download "DATA SCIENCE Workshop November 12-13, 2015"

Transcription

1 DATA SCIENCE Workshop November 12-13, 2015 Paris-Dauphine University Place du Maréchal de Lattre de Tassigny, Paris DATA SCIENCE Workshop will be held at Paris-Dauphine university, on November 12 th and 13 th, 2015 «DATA SCIENCE» Workshop (cf. is a satellite workshop of SDA2015 (cf. ) Organizers: Edwin Diday, Patrice Bertrand (CEREMADE Paris-Dauphine University) Tristan Cazenave, Suzanne Pinson (LAMSADE Paris-Dauphine University) Registration is free but mandatory due to a limited number of participants. To register, please send an to E. Diday (diday@ceremade.dauphine.fr) Interested students may have their travel expenses reimbursed; Please send your request with a CV.when your inscription is confirmed. CONTEXT AND AIM OF THIS «DATA SCIENCE» SATELLITE WORKSHOP. A Data Scientist is someone who is able to extract knew knowledge from Standard, Big and Complex Data: unstructured data, unpaired samples, multi sources data (as mixture of numerical, textual, image, social networks data). The fusion of such data can be done into classes of row statistical units which are considered as new statistical units. The description of these classes can be vectors of intervals, probability distributions, weighted sequences, functions, and the like, in order to express the within-class variability. One of the advantage of this approach is that unstructured data and unpaired samples at the level of row units, become structured and paired at the level of classes. The study of such new type of data, built in order to describe classes in an explanatory way, has led to a new domain called Symbolic Data Analysis (SDA). Recently, four international journals have published special issues on SDA, including ADAC which is now known as a leading international journal of classification.

2 In this satellite meeting of the next SDA 2015 workshop (Orléans, November 17-19, the talks will concern the state of the art and recent advances in SDA, or more generally visualization in Data Science. SCHEDULE Nov.12 th. Welcome speech: 14:00 to 14:15 Lynne Billard: 14:15 to 15:15 Data Science and Statistics followed by Maximum Likelihood Estimation for Interval-valued Data Chun-houh Chen: 15:15 to 15:45 Matrix Visualization: New Generation of Exploratory Data Analysis Coffee Break: 15:45 to 16:15 Oldemar Rodrıguez: 16:15 to 16:45 Shrinkage linear regression for symbolic interval-valued variables Edwin Diday, Richard Emilion: 16:45 to 17:30 Nov. 13 th Symbolic Bayesian Networks Welcome Breakfast: 8h30 Edwin Diday: 9:15 to 10:00 Thinking by classes in Data Sciences: the Symbolic Data Analysis paradigm for Big and Complex Data Oldemar Rodrıguez:: 10:00 to 10:45 Probabilistic/statistical setting of SDA Coffee Break: 10:45 to 11:00 Richard Emilion 11:00 to 11:45 Latest developments of the RSDA: An R package for Symbolic Data Analysis Lunch: 12:00 to 14:00 Manabu Ichino: 14:00 to 14:45 The Lookup Table Regression Model for Symbolic Data Paula Brito: 14:45 to 15:30 Multivariate Parametric Analysis of Interval Data Coffee Break: 15:30 to 16:00 Chun-houh Chen: 16:00 to 16:45 Some Extensions of Matrix Visualization: the GAP Approach for Standard and Symbolic Data Analysis. Cheng Wang: 16:45 to 17:30 Multiple Correspondence Analysis for Mixed Symbolic Data

3 ABSTRACTS Nov. 12 th Lynne Billard (University of Georgia, USA) Title: Maximum Likelihood Estimation for Interval-valued Data Abstract: Bertrand and Goupil (2000) obtained empirical formulas for the mean and variance of interval-valued observations. These are in effect moment estimators. We show how, under certain probability assumptions, these are the same as the maximum likelihood estimators for the corresponding population parameters. Chun-houh Chen (Institute of Stat. Science, Academia Sinica, Taiwan) Title: Matrix Visualization: New Generation of Exploratory Data Analysis Abstract: It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it (Exploratory Data Analysis: John Tukey, 1977). Data analysts and statistics practitioners nowadays are facing difficulties in understanding higher and higher dimensional data with more and more complex nature while conventional graphics/visualization tools do not answer the needs. It is statisticians responsibility for coming up with graphics/visualization environment that can help users really understand what one CAN DO for complex data generated from modern techniques and sophisticated experiments. Matrix visualization (MV) for continuous, binary, ordinal, and nominal data with various types of extensions provide users more comprehensive information embedded in complex high dimensional data than conventional EDA tools such as boxplot, scatterplot, with dimension reduction techniques such as principal component analysis and multiple correspondence analysis. In this talk I ll summarize our works on creating MV environment for conducting statistical analyses and introducing statistical concepts into MV environment for visualizing more versatile and complex data structure. Many real world examples will be demonstrated in this talk for illustrating the strength of MV for visualizing all types of datasets collected from scientific experiments and social surveys. Oldemar Rodrıguez (University of Costa Rica, San José, Costa Rica) Title: Shrinkage linear regression for symbolic interval-valued variables Abstract: This paper proposes a new approach to fit a linear regression for symbolic internalvalued variables, which improves both the Center Method suggested by Billard and Diday (2006) and the Center and Range Method suggested by Lima-Neto, E.A. and De Carvalho, F.A.T. (2008). Just in the Centers Method and the Center and Range Method, the new methods proposed fit the linear regression model on the midpoints and in the half of the length of the intervals as an additional variable (ranges) assumed by the predictor variables in the training data set, but to make these fitments in the regression models, the methods Ridge Regression, Lasso, and Elastic Net proposed by Tibshirani, R. Hastie, T., and Zou H are used. The prediction of the lower and

4 upper of the interval response (dependent) variable is carried out from their midpoints and ranges, which are estimated from the linear regression models with shrinkage generated in the midpoints and the ranges of the interval-valued predictors. Methods presented in this document are applied to three real data sets cardiologic interval data set, Prostate interval data set and US Murder interval data set to then compare their performance and facility of interpretation regarding the Center Method and the Center and Range Method. For this evaluation, the rootmean-squared error and the correlation coefficient are used. Besides, the reader may use all the methods presented herein and verify the results using the RSDA package written in R language, that can be downloaded and installed directly from CRAN. Edwin Diday (Ceremade, University Paris-Dauphine, France), Richard Emilion (University of Orléans, France) Title: Symbolic Bayesian Networks Abstract: We first consider a n x p table of probability vectors. Each vector in column j is the probability distribution of a random variable taking values in a finite set Vj that only depends on j, with Vj = dj. Column j is considered as a sample of size n of a random distribution Pj on Vj. We are considering the problem of building a Bayesian network from these samples in order to express the joint distribution of (P1,, Pj,..., Pp). This problem is very popular for estimating the joint distribution of p real-valued random variables, we extend it to the case it of random distributions. A first solution to the finite sets case consists in discretizing the probability vectors. A second solution consists in using partial distance correlations (Székély-Rizzo, Ann. Stat. 2014) that evaluate the influence of Pj on Pj'. The general case will be discussed. Nov. 13 th Edwin Diday (Ceremade, Paris-Dauphine University, France) Title: Thinking by classes in Data Sciences: the Symbolic Data Analysis paradigm for Big and Complex Data Abstract: Data science is, in general terms, the extraction of knowledge from data, considered as a science by itself. The Symbolic Data Analysis (SDA) gives a new way of thinking in Data Sciences by extending standard data to symbolic data in order to extract knowledge from aggregated classes of individual entities. The SDA is born from the classification domain by considering classes of a given population to be units of a higher level population to be studied. Such classes allow a summary of the population and often represent the real units of interest. In order to take care of the variability between the members of each class, these classes are described by intervals, distributions, set of categories or numbers sometimes weighted and the like. In that way, we obtain new kinds of data expressing variability, called "symbolic" as they cannot be reduced to numbers without losing much information. The aim of SDA is to study and

5 extract new knowledge from these new kinds of data by at least an extension of Computer Statistics and Data Mining to symbolic data. We show that SDA is a new paradigm which opens up a vast domain of research and applications to standard, complex and big data. Richard Emilion (University of Orléans, France) Title: Probabilistic/statistical setting of SDA Abstract: Given some raw units described by some variables and a specific class variable, Symbolic Data Analysis (SDA) deals with objects described by probability distributions describing classes of raw units. Our SDA formalism hinges on the notion of random distribution. In the case of paired samples, we show that this formalism depends on a regular conditional probability existence theorem. We also show the interest of SDA in the case of unpaired samples. We then discuss on the extension of some classical methods such as PCA and probabilistic classification. Oldemar Rodrıguez (University of Costa Rica, San José, Costa Rica) Title: Latest developments of the RSDA: An R package for Symbolic Data Analysis Abstract: This package aims to execute some models on Symbolic Data Analysis. Symbolic Data Analysis was propose by the professor E. DIDAY in 1987 in his paper Introduction à l approche symbolique en Analyse des Données. Premières Journées Symbolique-Numérique. Université Paris IX Dauphine. Décembre A very good reference to symbolic data analysis can be found in From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis of L. Billard and E. Diday that is the journal American Statistical Association Journal of the American Statistical Association June 2003, Vol. 98. The main purpose of Symbolic Data Analysis is to substitute a set of rows (cases) in a data table for an concept (second order statistical unit). For example, all of the transactions performed by one person (or any object) for a single transaction that summarizes all the original ones (Symbolic-Object) so that millions of transactions could be summarized in only one that keeps the customary behavior of the person. This is achieved thanks to the fact that the new transaction will have in its fields, not only numbers (like current transactions), but can also have objects such as intervals, histograms, or rules. This representation of an object as a conjunction of properties fits within a data analytic framework concerning symbolic data and symbolic objects, which has proven useful in dealing with big databases. In RSDA version 1.2, methods like centers interval principal components analysis, histogram principal components analysis, multi-valued correspondence analysis, interval multidemensional scaling (INTERSCAL), symbolic hierarchical clustering, CM, CRM, Lasso, Ridge and Elastic Net Linear regression model to interval variables have been implemented. This new version also includes new features to manipulate symbolic data through a new data structure that implements Symbolic Data Frames and methods for converting SODAS and XML SODAS files to RSDA files. Manabu Ichino (Tokyo Denki University, Japan) Title: The Lookup Table Regression Model for Symbolic Data Abstract: This paper presents a preliminary research on the lookup table regression model for symbolic data. We apply the quantile method to the given symbolic data table of the size (N

6 objects) (d feature variables), and we represent each object by (m+1) d-dimensional numerical vectors, called the quantile vectors, for a preselected integer m. The integer m controls the granularity of the descriptions for symbolic objects. In the new data table of the size {N (m+1)} d, we interchange N (m+1) rows according to the values of the selected objective variable from the smallest to the largest. For each of remained d-1 features, i.e., columns, we execute the segmentation of feature values into blocks so that the generated blocks satisfy the monotone property. We discard columns that have only a single block. Then, we execute the segmentation of the objective variable according to the blocks of the remained explanatory feature variables. Finally, we obtain the lookup table of the size N d, where N N (m+1) and d d. Each element of the table is an interval value corresponding to the segmented block. We realize the interval-value estimation rule for the objective variable by the search of the nearest element in the lookup table. We present examples to illustrate the lookup table regression model. Paula Brito (Porto University, Portugal) Title: Multivariate Parametric Analysis of Interval Data Abstract: In this work we focus on the study of interval data, i.e., when the variables' values are intervals of R. Parametric probabilistic models for interval-valued variables have been proposed and studied in (Brito & Duarte Silva, 2012). These models are based on the representation of each observed interval by its MidPoint and LogRange, and Multivariate Normal and Skew-Normal distributions are assumed for the whole set of 2p MidPoints and LogRanges of the original p interval-valued variables. The intrinsic nature of the interval-valued variables leads to different structures of the variance-covariance matrix, represented by different possible configurations. For all cases, maximum likelihood estimators of the corresponding parameters have been derived. This framework may be applied to different statistical multivariate methodologies, thereby allowing for inference approaches for symbolic data. The proposed modelling has first been applied to (M)ANOVA of interval data, using a likelihood-ratio approach. Linear and quadratic models for discriminant analysis of data described by interval-valued variables have been obtained, and their performance compared with alternative distance-based approaches. We have also addressed the problem of mixture distributions, developing model-based clustering using the proposed models. For the Gaussian model, the problem of outlier identification is addressed, using Mahalanobis distances based on robust estimations of the joint mean values and the covariance matrices. The referred modelling, for the Gaussian case, has been implemented in the R-package MAINT.DATA, available on CRAN. MAINT.DATA introduces a data class for representing interval data and includes functions for modeling and analysing these data. In particular, maximum likelihood estimation and statistical tests for the different considered configurations are addressed. Methods for (M)ANOVA and Linear and Quadratic Discriminant Analysis of this data class are also currently provided. Chun-houh Chen (Institute of Stat. Science, Academia Sinica, Taiwan) Title: Some Extensions of Matrix Visualization: the GAP Approach

7 Abstract: Exploratory data analysis (EDA, Tukey, 1977) has been extensively used for nearly 40 years yet boxplot and scatterplot are still the major EDA tools for visualizing continuous data in the 21st century. Many extended modules of matrix visualization via the Generalized Association Plots (GAP) approach have been developed or under developing. Some details of the following MV modules will be provided in this talk: 1. Matrix visualization for high-dimensional categorical data structure. For categorical data, MCA (multiple correspondence analysis) is most popular for visualizing reduced joint space for samples and variables of categorical nature. But similar to it s continuous counter part: PCA (principal component analysis), MCA loses its efficiency when data dimensionality gets really high. In this study we extend the framework of matrix visualization from continuous data to categorical data. Categorical matrix visualization can effectively present complex information patterns for thousands of subjects on thousands of categorical variables in a single matrix visualization display. 2. Matrix Visualization for High-Dimensional Data with a Cartography Link. When a cartography link is attached to each subject of a high-dimensional categorical data, it is necessary to use a geographical map to illustrate the pattern of subject (region)-clusters with variable-groups embedded in the high-dimensional space. This study presents an interactive cartography system with systematic color-coding by integrating the homogeneity analysis into matrix visualization. 3. Matrix visualization for symbolic data analysis. Symbolic data analysis (SDA) has gained popularity over the past few years because of its potential for handling data having a dependent and hierarchical nature. Here we introduce matrix visualization (MV) for visualizing and clustering SDA data using interval-valued symbolic data as an example; it is by far the most popular SDA data type in the literature and the most commonly encountered one in practice. Many MV techniques for visualizing and clustering conventional data are converted to SDA data, and several techniques are newly developed for SDA data. Various examples of data with simple to complex structures are brought in to illustrate the proposed methods. 4. Covariate-adjusted matrix visualization via correlation decomposition. In this study, we extend the framework of matrix visualization (MV) by incorporating a covariate adjustment process through the estimation of conditional correlations. MV can explore the grouping and/or clustering structure of high-dimensional large-scale data sets effectively without dimension reduction. The benefit is in the exploration of conditional association structures among the subjects or variables that cannot be done with conventional MV. Several biomedical examples will be employed for illustrating the versatility of the GAP approach matrix visualization. (cf. Cheng Wang (Beihang University), Edwin Diday, Richard Emilion, Huiwen Wang Title: Multiple Correspondence Analysis for Mixed Symbolic Data Abstract: Under the circumstance of cross-platform data collection technology develops rapidly and the big data era is coming, there are always a mixture of single-valued data, histogram data, composition data and functional data in one table, which can be called mixed feature-data. Different types of data may be belong to different space, which leads to that it is a pretty complicated problem to conduct crosstab analysis among several data types. In this paper, we propose a Multiple Correspondence Analysis (MCA) for mixed data to detect and represent underlying structures involved. Before MCA, we first transfer different types of data to vector

8 data, which is further converted to nominal data. Two ways are considered to convert the vector data to nominal data, respectively is hierarchal clustering and discretization. An empirical analysis is conducted to compare the performance of MCA for mixed data based on these two different ways.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Exploratory Data Analysis with MATLAB

Exploratory Data Analysis with MATLAB Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

TOWARD BIG DATA ANALYSIS WORKSHOP

TOWARD BIG DATA ANALYSIS WORKSHOP TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Generalized association plots (GAP): Dimension free information visualization environment for multivariate data structure

Generalized association plots (GAP): Dimension free information visualization environment for multivariate data structure Generalized association plots (GAP): Dimension free information visualization environment for multivariate data structure Chun-houh Chen, hun-chuan Chang, Yueh-Yun Chi, and Chih-Wen Ou-Young Academia inica,

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY QÜESTIIÓ, vol. 25, 3, p. 509-520, 2001 PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY GEORGES HÉBRAIL We present in this paper the main applications of data mining techniques at Electricité de France,

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

How To Identify Noisy Variables In A Cluster

How To Identify Noisy Variables In A Cluster Identification of noisy variables for nonmetric and symbolic data in cluster analysis Marek Walesiak and Andrzej Dudek Wroclaw University of Economics, Department of Econometrics and Computer Science,

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 23 IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Exploratory data analysis (Chapter 2) Fall 2011

Exploratory data analysis (Chapter 2) Fall 2011 Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

More information

Geostatistics Exploratory Analysis

Geostatistics Exploratory Analysis Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

IBM SPSS Direct Marketing 19

IBM SPSS Direct Marketing 19 IBM SPSS Direct Marketing 19 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This document contains proprietary information of SPSS

More information

Data analysis process

Data analysis process Data analysis process Data collection and preparation Collect data Prepare codebook Set up structure of data Enter data Screen data for errors Exploration of data Descriptive Statistics Graphs Analysis

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Teaching Multivariate Analysis to Business-Major Students

Teaching Multivariate Analysis to Business-Major Students Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano)

Data Exploration and Preprocessing. Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) Data Exploration and Preprocessing Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut. Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

More information

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM COLLEGE OF SCIENCE John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE: COS-STAT-747 Principles of Statistical Data Mining

More information

THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona

THE MULTIVARIATE ANALYSIS RESEARCH GROUP. Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona THE MULTIVARIATE ANALYSIS RESEARCH GROUP Carles M Cuadras Departament d Estadística Facultat de Biologia Universitat de Barcelona The set of statistical methods known as Multivariate Analysis covers a

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 1 What is data exploration? A preliminary

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3 COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Easily Identify the Right Customers

Easily Identify the Right Customers PASW Direct Marketing 18 Specifications Easily Identify the Right Customers You want your marketing programs to be as profitable as possible, and gaining insight into the information contained in your

More information

How to report the percentage of explained common variance in exploratory factor analysis

How to report the percentage of explained common variance in exploratory factor analysis UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

IBM SPSS Direct Marketing 20

IBM SPSS Direct Marketing 20 IBM SPSS Direct Marketing 20 Note: Before using this information and the product it supports, read the general information under Notices on p. 105. This edition applies to IBM SPSS Statistics 20 and to

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

More information

MATH2210 Notebook 1 Fall Semester 2016/2017. 1 MATH2210 Notebook 1 3. 1.1 Solving Systems of Linear Equations... 3

MATH2210 Notebook 1 Fall Semester 2016/2017. 1 MATH2210 Notebook 1 3. 1.1 Solving Systems of Linear Equations... 3 MATH0 Notebook Fall Semester 06/07 prepared by Professor Jenny Baglivo c Copyright 009 07 by Jenny A. Baglivo. All Rights Reserved. Contents MATH0 Notebook 3. Solving Systems of Linear Equations........................

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees Statistical Data Mining Practical Assignment 3 Discriminant Analysis and Decision Trees In this practical we discuss linear and quadratic discriminant analysis and tree-based classification techniques.

More information

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration

Business Analytics using Data Mining Project Report. Optimizing Operation Room Utilization by Predicting Surgery Duration Business Analytics using Data Mining Project Report Optimizing Operation Room Utilization by Predicting Surgery Duration Project Team 4 102034606 WU, CHOU-CHUN 103078508 CHEN, LI-CHAN 102077503 LI, DAI-SIN

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Big Data: a new era for Statistics

Big Data: a new era for Statistics Big Data: a new era for Statistics Richard J. Samworth Abstract Richard Samworth (1996) is a Professor of Statistics in the University s Statistical Laboratory, and has been a Fellow of St John s since

More information

Dimensionality Reduction: Principal Components Analysis

Dimensionality Reduction: Principal Components Analysis Dimensionality Reduction: Principal Components Analysis In data mining one often encounters situations where there are a large number of variables in the database. In such situations it is very likely

More information

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Examples: Exploratory Factor Analysis CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS Exploratory factor analysis (EFA) is used to determine the number of continuous latent variables that are needed to

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011 Statistical techniques to be covered Explore relationships among variables Correlation Regression/Multiple regression Logistic regression Factor analysis

More information

Implications of Big Data for Statistics Instruction 17 Nov 2013

Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction 17 Nov 2013 Implications of Big Data for Statistics Instruction Mark L. Berenson Montclair State University MSMESB Mini Conference DSI Baltimore November

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

More information

Monitoring chemical processes for early fault detection using multivariate data analysis methods

Monitoring chemical processes for early fault detection using multivariate data analysis methods Bring data to life Monitoring chemical processes for early fault detection using multivariate data analysis methods by Dr Frank Westad, Chief Scientific Officer, CAMO Software Makers of CAMO 02 Monitoring

More information

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

More information

High-Dimensional Data Visualization by PCA and LDA

High-Dimensional Data Visualization by PCA and LDA High-Dimensional Data Visualization by PCA and LDA Chaur-Chin Chen Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Abbie Hsu Institute of Information Systems & Applications,

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information