# Exploratory Data Analysis with MATLAB

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business A CHAPMAN & HALL BOOK

2 Table of Contents Preface to the Second Edition Preface to the First Edition xiii xvii Parti Introduction to Exploratory Data Analysis Chapter 1 Introduction to Exploratory Data Analysis 1.1 What is Exploratory Data Analysis Overview of the Text A Few Words about Notation Data Sets Used in the Book Unstructured Text Documents Gene Expression Data Oronsay Data Set Software Inspection Transforming Data Power Transformations Standardization Sphering the Data Further Reading 25 Exercises 27 Part II EDA as Pattern Discovery Chapter 2 Dimensionality Reduction Linear Methods 2.1 Introduction Principal Component Analysis PCA PCA Using the Sample Covariance Matrix PCA Using the Sample Correlation Matrix How Many Dimensions Should We Keep? Singular Value Decomposition SVD Nonnegative Matrix Factorization 47 vii

3 Vlll Exploratory Data Analysis with MATLAB, 2 nd Edition 2.5 Factor Analysis Fisher's Linear Discriminant Intrinsic Dimensionality Nearest Neighbor Approach Correlation Dimension Maximum Likelihood Approach Estimation Using Packing Numbers Summary and Further Reading 72 Exercises 73 Chapter 3 Dimensionality Reduction Nonlinear Methods 3.1 Multidimensional Scaling MDS Metric MDS Nonmetric MDS Manifold Learning Locally Linear Embedding Isometric Feature Mapping ISOMAP Hessian Eigenmaps Artificial Neural Network Approaches Self-Organizing Maps Generative Topographic Maps Ill Curvilinear Component Analysis Summary and Further Reading 121 Exercises 122 Chapter 4 Data Tours 4.1 Grand Tour Torus Winding Method Pseudo Grand Tour Interpolation Tours Projection Pursuit Projection Pursuit Indexes Posse Chi-Square Index Moment Index Independent Component Analysis Summary and Further Reading 151 Exercises 152 Chapter 5 Finding Clusters 5.1 Introduction Hierarchical Methods 157

4 Table of Contents ix 5.3 Optimization Methods fc-means Spectral Clustering Document Clustering Nonnegative Matrix Factorization Revisited Probabilistic Latent Semantic Analysis Evaluating the Clusters Rand Index Cophenetic Correlation Upper Tail Rule Silhouette Plot Gap Statistic Summary and Further Reading 197 Exercises 200 Chapter 6 Model-Based Clustering 6.1 Overview of Model-Based Clustering Finite Mixtures Multivariate Finite Mixtures Component Models Constraining the Covariances Expectation-Maximization Algorithm Hierarchical Agglomerative Model-Based Clustering Model-Based Clustering MBC for Density Estimation and Discriminant Analysis Introduction to Pattern Recognition Bayes Decision Theory Estimating Probability Densities with MBC Generating Random Variables from a Mixture Model Summary and Further Reading 241 Exercises 244 Chapter 7 Smoothing Scatterplots 7.1 Introduction Loess Robust Loess Residuals and Diagnostics with Loess Residual Plots Spread Smooth Loess Envelopes Upper and Lower Smooths Smoothing Splines Regression with Splines Smoothing Splines Smoothing Splines for Uniformly Spaced Data Choosing the Smoothing Parameter 281

5 x Exploratory Data Analysis with MATLAB, 2 nd Edition 7.7 Bivariate Distribution Smooths Pairs of Middle Smoothings Polar Smoothing Curve Fitting Toolbox Summary and Further Reading 293 Exercises 294 Part III Graphical Methods for EDA Chapter 8 Visualizing Clusters 8.1 Dendrogram Treemaps Rectangle Plots ReClus Plots Data Image Summary and Further Reading 323 Exercises 324 Chapter 9 Distribution Shapes 9.1 Histograms Univariate Histograms Bivariate Histograms Boxplots The Basic Boxplot Variations of the Basic Boxplot Quantile Plots Probability Plots Quantile-Quantile Plot Quantile Plot Bagplots Rangefinder Boxplot Summary and Further Reading 359 Exercises 361 Chapter 10 Multivariate Visualization 10.1 Glyph Plots Scatterplots D and 3-D Scatterplots Scatterplot Matrices Scatterplots with Hexagonal Binning 372

6 Table of Contents xi 10.3 Dynamic Graphics Identification of Data Linking Brushing Coplots Dot Charts Basic Dot Chart Multiway Dot Chart Plotting Points as Curves Parallel Coordinate Plots Andrews' Curves Andrews' Images More Plot Matrices Data Tours Revisited Grand Tour Permutation Tour Biplots Summary and Further Reading 411 Exercises 413 Appendix A Proximity Measures A.l Definitions 417 A.1.1 Dissimilarities 418 A.1.2 Similarity Measures 420 A.1.3 Similarity Measures for Binary Data 420 A.l.4 Dissimilarities for Probability Density Functions 421 A.2 Transformations 422 A.3 Further Reading 423 Appendix В Software Resources for EDA B.l MATLAB Programs 425 B.2 Other Programs for EDA 429 B.3 EDA Toolbox 431 Appendix С Description of Data Sets 433 Appendix D Introduction to MATLAB D.l What Is MATLAB? 439 D.2 Getting Help in MATLAB 440 D.3 File and Workspace Management 440

7 XII Exploratory Data Analysis with MATLAB, 2 nd Edition D.4 Punctuation in MATLAB 443 D.5 Arithmetic Operators 444 D.6 Data Constructs in MATLAB 444 Basic Data Constructs 444 Building Arrays 445 Cell Arrays 445 Structures 447 D.7 Script Files and Functions 447 D.8 Control Flow 448 for Loop 448 while Loop 449 if-else Statements 449 switch Statement 449 D.9 Simple Plotting 450 D.10 Where to get MATLAB Information 452 Appendix E MATLAB Functions E.l MATLAB 455 E.2 Statistics Toolbox 457 E.3 Exploratory Data Analysis Toolbox 458 E.4 EDA GUI Toolbox 459 References 475 Author Index 497 Subject Index 503

### Exploratory Data Analysis GUIs for MATLAB (V1) A User s Guide

Exploratory Data Analysis GUIs for MATLAB (V1) A User s Guide Wendy L. Martinez, Ph.D. 1 Angel R. Martinez, Ph.D. Abstract: The following provides a User s Guide to the Exploratory Data Analysis (EDA)

### CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint

### Computer Science and Data Analysis Series. Exploratory Data Analysis with MATLAB

Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Chapman & Hall/CRC Series in Computer Science and Data Analysis The interface between the computer and statistical sciences

### Applied Multivariate Analysis

Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

### Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

### The Value of Visualization 2

The Value of Visualization 2 G Janacek -0.69 1.11-3.1 4.0 GJJ () Visualization 1 / 21 Parallel coordinates Parallel coordinates is a common way of visualising high-dimensional geometry and analysing multivariate

### Computer-Aided Multivariate Analysis

Computer-Aided Multivariate Analysis FOURTH EDITION Abdelmonem Af if i Virginia A. Clark and Susanne May CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London New York Washington, D.C Contents Preface

### Data visualization and clustering. Genomics is to no small extend a data science

Data visualization and clustering Genomics is to no small extend a data science [www.data2discovery.org] Data visualization and clustering Genomics is to no small extend a data science [Andersson et al.,

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### The Visual Statistics System. ViSta

Visual Statistical Analysis using ViSta The Visual Statistics System Forrest W. Young & Carla M. Bann L.L. Thurstone Psychometric Laboratory University of North Carolina, Chapel Hill, NC USA 1 of 21 Outline

### MS1b Statistical Data Mining

MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

### CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

### Lecture 2. Summarizing the Sample

Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

### Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

### A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez.

A Solution Manual and Notes for: Exploratory Data Analysis with MATLAB by Wendy L. Martinez and Angel R. Martinez. John L. Weatherwax May 7, 9 Introduction Here you ll find various notes and derivations

### Geostatistics Exploratory Analysis

Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt

### Statistical Models in Data Mining

Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### SOFTWARE TESTING AS A SERVICE

SOFTWARE TESTING AS A SERVICE ASHFAQUE AHMED (g) CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business AN AUERBACH BOOK

### DIGITAL IMAGE PROCESSING AND ANALYSIS

DIGITAL IMAGE PROCESSING AND ANALYSIS Human and Computer Vision Applications with CVIPtools SECOND EDITION SCOTT E UMBAUGH Uffi\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is

### Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

### Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

### Data Mining Part 2. Data Understanding and Preparation 2.1 Data Understanding Spring 2010

Data Mining Part 2. and Preparation 2.1 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction Outline Introduction Measuring the Central Tendency Measuring the Dispersion of Data Graphic Displays References

Ronald Christensen Advanced Linear Modeling Multivariate, Time Series, and Spatial Data; Nonparametric Regression and Response Surface Maximization Second Edition Springer Preface to the Second Edition

### Functional Data Analysis with R and MATLAB

J.O. Ramsay Giles Hooker Spencer Graves Functional Data Analysis with R and MATLAB Springer Contents 2 Introduction to Functional Data Analysis. What Are Functional Data?.. Data on the Growth of Girls..2

### APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

### Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

### INTERACTIVE DATA EXPLORATION USING MDS MAPPING

INTERACTIVE DATA EXPLORATION USING MDS MAPPING Antoine Naud and Włodzisław Duch 1 Department of Computer Methods Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract: Interactive

### Principal Component Analysis

Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

### Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

Contents List of Figures List of Tables List of Examples Foreword Preface to Volume IV xiii xvi xxi xxv xxix IV.1 Value at Risk and Other Risk Metrics 1 IV.1.1 Introduction 1 IV.1.2 An Overview of Market

### Exploratory Data Analysis Using Radial Basis Function Latent Variable Models

Exploratory Data Analysis Using Radial Basis Function Latent Variable Models Alan D. Marrs and Andrew R. Webb DERA St Andrews Road, Malvern Worcestershire U.K. WR14 3PS {marrs,webb}@signal.dera.gov.uk

### MTH 140 Statistics Videos

MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

### DATA SCIENCE Workshop November 12-13, 2015

DATA SCIENCE Workshop November 12-13, 2015 Paris-Dauphine University Place du Maréchal de Lattre de Tassigny, 75016 Paris DATA SCIENCE Workshop will be held at Paris-Dauphine university, on November 12

### Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics

MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title- course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.

### So which is the best?

Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which

### A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization Ángela Blanco Universidad Pontificia de Salamanca ablancogo@upsa.es Spain Manuel Martín-Merino Universidad

### What is Data mining?

STAT : DATA MIIG Javier Cabrera Fall Business Question Answer Business Question What is Data mining? Find Data Data Processing Extract Information Data Analysis Internal Databases Data Warehouses Internet

### 7 Time series analysis

7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Data Visualization. Principles and Practice. Second Edition. Alexandru Telea

Data Visualization Principles and Practice Second Edition Alexandru Telea First edition published in 2007 by A K Peters, Ltd. Cover image: The cover shows the combination of scientific visualization and

### Parallel Computing for Data Science

Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint

### EFFECTIVE NON-PROFIT MANAGEMENT

American Society for Public Administration Series in Public Administration and Public Policy Advancing excellence in public service.., EFFECTIVE NON-PROFIT MANAGEMENT Context, Concepts, and Competencies

### Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Brochure More information from http://www.researchandmarkets.com/reports/2170926/ Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd

### Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

### Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction

### Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### Univariate and Multivariate Methods PEARSON. Addison Wesley

Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

### Signal Processing for Intelligent Sensor Systems with MATLAB 9

Signal Processing for Intelligent Sensor Systems with MATLAB 9 Second Edition David С Swanson @ CRC Press Taylor & Francis Gro jroup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis

### Dimensionality Reduction - Nonlinear Methods

Chapter 3 Dimensionality Reduction - Nonlinear Methods This chapter covers various methods for nonlinear dimensionality reduction, where the nonlinear aspect refers to the mapping between the highdimensional

### HDDVis: An Interactive Tool for High Dimensional Data Visualization

HDDVis: An Interactive Tool for High Dimensional Data Visualization Mingyue Tan Department of Computer Science University of British Columbia mtan@cs.ubc.ca ABSTRACT Current high dimensional data visualization

### Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

### Applied Computational Economics and Finance

Applied Computational Economics and Finance Mario J. Miranda and Paul L. Fackler The MIT Press Cambridge, Massachusetts London, England Preface xv 1 Introduction 1 1.1 Some Apparently Simple Questions

### Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group

Cloud Computing Data-Intensive Computing and Scheduling Frederic Magoules, Jie Pan, and Fei Teng SILKQH CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor

### Clustering Hierarchical clustering and k-mean clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: A quick review partition genes into distinct sets with high homogeneity

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

### The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA

Paper 156-2010 The Forgotten JMP Visualizations (Plus Some New Views in JMP 9) Sam Gardner, SAS Institute, Lafayette, IN, USA Abstract JMP has a rich set of visual displays that can help you see the information

### WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

### Selection of the Suitable Parameter Value for ISOMAP

1034 JOURNAL OF SOFTWARE, VOL. 6, NO. 6, JUNE 2011 Selection of the Suitable Parameter Value for ISOMAP Li Jing and Chao Shao School of Computer and Information Engineering, Henan University of Economics

### Dimension Reduction. Wei-Ta Chu 2014/10/22. Multimedia Content Analysis, CSIE, CCU

1 Dimension Reduction Wei-Ta Chu 2014/10/22 2 1.1 Principal Component Analysis (PCA) Widely used in dimensionality reduction, lossy data compression, feature extraction, and data visualization Also known

### Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.

Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,

### 240ST014 - Data Analysis of Transport and Logistics

Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research MASTER'S

### Contributions to high dimensional statistical learning

Contributions to high dimensional statistical learning Stéphane Girard INRIA Rhône-Alpes & LJK (team MISTIS). 655, avenue de l Europe, Montbonnot. 38334 Saint-Ismier Cedex, France Stephane.Girard@inria.fr

### BNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I

BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### Data Exploration Data Visualization

Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### Introduction to Financial Models for Management and Planning

CHAPMAN &HALL/CRC FINANCE SERIES Introduction to Financial Models for Management and Planning James R. Morris University of Colorado, Denver U. S. A. John P. Daley University of Colorado, Denver U. S.

### RnavGraph: A visualization tool for navigating through high-dimensional data

Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS117) p.1852 RnavGraph: A visualization tool for navigating through high-dimensional data Waddell, Adrian University

### Data Preparation and Statistical Displays

Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability

### Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

### Design of Enterprise Systems

Design of Enterprise Systems Theory, Architecture, and Methods Ronald E. Giachetti CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an

### Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

### Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.

### Regression Modeling Strategies

Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

### METHODS IN MEDICAL INFORMATICS

Chapman & Hall/CRC Mathematical and Computational Biology Series METHODS IN MEDICAL INFORMATICS Fundamentals of Healthcare Programming in Perln Pythoni and Ruby Jules J- Berman TECHNISCHE INFORMATION SBIBLIOTHEK

### CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

Examples: Regression And Path Analysis CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS Regression analysis with univariate or multivariate dependent variables is a standard procedure for modeling relationships

### Business Analytics. Methods, Models, and Decisions. James R. Evans : University of Cincinnati PEARSON

Business Analytics Methods, Models, and Decisions James R. Evans : University of Cincinnati PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London

### Data Visualization in R

Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2014 Introduction Motivation for Data Visualization Humans are outstanding at detecting

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments

### Multivariate Data Visualization

Multivariate Data Visualization c A.I. McLeod and S.B. Provost, January 26, 2001. Introduction Multivariate analysis deals with the statistical analysis of observations where there are multiple responses

### Exploratory Data Analysis

Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction

### Development and Management

Cloud Database Development and Management Lee Chao CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an Informa business AN AUERBACH BOOK

### SOFTWARE TESTING. A Craftsmcm's Approach THIRD EDITION. Paul C. Jorgensen. Auerbach Publications. Taylor &. Francis Croup. Boca Raton New York

SOFTWARE TESTING A Craftsmcm's Approach THIRD EDITION Paul C. Jorgensen A Auerbach Publications Taylor &. Francis Croup Boca Raton New York Auerbach Publications is an imprint of the Taylor & Francis Group,

### ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 1, JANUARY 2002 237 ViSOM A Novel Method for Multivariate Data Projection and Structure Visualization Hujun Yin Abstract When used for visualization of

### Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

### Quality Management. Theory and Application PETER D. MAUCH. Ltfi) CRC Press. \ V J Taylor & Francis Group. ^ ^ Boca Raton London New York

Quality Management Theory and Application PETER D. MAUCH Ltfi) CRC Press \ V J Taylor & Francis Group ^ ^ Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an Informa business

### Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures

### DATA ANALYTICS USING R

DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data