Design & Analysis of Ecological Data. Landscape of Statistical Methods...

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Design & Analysis of Ecological Data. Landscape of Statistical Methods..."

Transcription

1 Design & Analysis of Ecological Data Landscape of Statistical Methods: Part 3 Topics: 1. Multivariate statistics 2. Finding groups - cluster analysis 3. Testing/describing group differences 4. Unconstratined ordination 5. Constrained ordination The Landscape The basic statistical model: Y = deterministic part + stochastic part P Univariate P Multivariate P Linear P Nonlinear P Smoothed P Distribution P Heterogeneity P Autocorrelation P Multiple levels P Random noise

2 Mulivariate statistics Why do we need multivariate statistics? P Reflect more accurately the true multidimensional nature of natural systems P Provide a way to handle large data sets with large numbers of variables P Provide a way of summarizing redundancy in large data sets P Provide rules for combining variables in an "optimal" way Mulivariate statistics Why do we need multivariate statistics? P Provide a means of detecting and quantifying truly multivariate patterns that arise out of the correlational structure of the variable set P Provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested experimentally

3 What is mulivariate statistics? y = x1 + x xj y1 + y yi = x y1 + y yi = x1 + x xj y1 + y yi Multivariate Statistics Regression Analysis of Variance Contingency Tables, etc. Multivariate ANOVA Discriminant Analysis CART,MRPP,MANTEL Canonical Corr. Analysis Constrained ordination Unconstrained ordination Cluster Analysis Mulivariate methods P Finding groups (Cluster analysis) P Testing for groups (e.g., MRPP, MANTEL) P Discriminating among groups (e.g., DA, ISA, mcart) P Unconstrained ordination (e.g., PCA, CA, NMDS) P Constrained ordination (e.g., RDA, CCA, CAPS) Large family of techniques with similar goals; operating on data sets for which pre-specified, well-defined groups do "not" exist; characteristics of the data are used to assign entities into artificial groups

4 Finding groups cluster analysis PCan we organize sampling entities (e.g., sites) into discrete classes, such that within-group similarity is maximized and among-group similarity is minimized? Sites A Species B C D Finding groups cluster analysis Nonhierarchical clustering: P NHC methods merely assign each entity to a cluster, placing similar entities together in order to maximize within-cluster homogeneity K-means clustering + Group Centroids? + Seeds?

5 Finding groups cluster analysis Hierarchical clustering: P HC methods combine similar entities into classes or groups and arrange these groups into a hierarchy that reveals relationships among the entities classified Mulivariate methods P Finding groups (Cluster analysis) P Testing for groups (e.g., MRPP, MANTEL) P Discriminating among groups (e.g., DA, ISA, mcart) P Unconstrained ordination (e.g., PCA, CA, NMDS) P Constrained ordination (e.g., RDA, CCA, CAPS) Family of different methods for testing and/or describing differences among prespecified, well-defined groups based on a set of discriminating variables

6 Discriminating among groups Id Species Ccov Snags CHgt 1 A A A B B B C C C P Are pre-defined groups of entities (e.g,. species) significantly different from each other and, if so, how do they differ? Testing for group differences P Are groups significantly different? (How valid are the groups?) < Multivariate Analysis of Variance (MANOVA) < Multi-Response Permutation Procedures (MRPP) < Analysis of Group Similarities (ANOSIM) < Mantel s Test (MANTEL) P How do groups differ? (Which variables best distinguish among the groups?) < Discriminant Analysis (DA) < Classification and Regression Trees (CART) < Logistic Regression (LR) < Indicator Species Analysis (ISA)

7 Mulivariate methods P Finding groups (Cluster analysis) P Testing for groups (e.g., MRPP, MANTEL) P Discriminating among groups (e.g., DA, ISA, mcart) P Unconstrained ordination (e.g., PCA, CA, NMDS) P Constrained ordination (e.g., RDA, CCA, CAPS) A family of different methods for organizing sampling entities (e.g., species, sites, observations, etc.) along continuous gradients based on a set of interdependent variables Unconstrained ordination PCan we organize entities (e.g., sites) along one or more gradients based on their relationships among the interdependent variables?

8 Unconstrained ordination Canopy Snag Canopy Obs Cover Density Height X N Centroid PC1 X 1 PC1 =.8x 1 -.4x 2 +.1x 3 PC2 = -.1x 1 -.1x 2 +.9x 3 X 2 PC2 Unconstrained ordination 2d ordi bubble plot 3d ordi scatter plot

9 Unconstrained ordination Ordi trajectories Ordi overlays Unconstrained ordination Linear Quadratic Smooth Nonlinear P Principal components analysis (PCA) P Factor analysis (FA) P Multidimensional scaling (MDS/PCO) P ML-Unconstrained linear ordination (ULO) P Correspondence analysis (CA & DCA) P ML-Unconstrained quadratic ordination (UQO) P ML-Unconstrained additive ordination (UAO) P Nonmetric multidimensional scaling (NMDS)

10 Mulivariate methods P Finding groups (Cluster analysis) P Testing for groups (e.g., MRPP, MANTEL) P Discriminating among groups (e.g., DA, ISA, mcart) P Unconstrained ordination (e.g., PCA, CA, NMDS) P Constrained ordination (e.g., RDA, CCA, CAPS) A family of different methods for extending unconstrained ordination in which the solution is constrained to be expressed by ancillary variables Constrained ordination Species Sites A B C D E Tree cover Snag density Shrub cover P Can bird community patterns be explained by measured environmental variables?

11 Constrained ordination P The triplot displays the major patterns in the species data with respect to the environmental variables Tri =(1) Samples (2) Species (3) Environment Constrained ordination 2d ordi bubble plot Ordi overlays

12 Constrained ordination Variance partioning Space Structure Patch Stand Patch Plot Floristics Abiotic Landscape Misc. Landscape Composition Landscape stand configuration Landscape patch configuration Constrained ordination P Constrained analysis of principal coordinates (CAP) P Redundancy analysis (RDA) Species abundance P Canonical correspondence analysis (CCA) Environmental gradient

Important Characteristics of Cluster Analysis Techniques

Important Characteristics of Cluster Analysis Techniques Cluster Analysis Can we organize sampling entities into discrete classes, such that within-group similarity is maximized and amonggroup similarity is minimized according to some objective criterion? Sites

More information

Analysing Ecological Data

Analysing Ecological Data Alain F. Zuur Elena N. Ieno Graham M. Smith Analysing Ecological Data University- una Landesbibliothe;< Darmstadt Eibliothek Biologie tov.-nr. 4y Springer Contents Contributors xix 1 Introduction 1 1.1

More information

Step 5: Conduct Analysis. The CCA Algorithm

Step 5: Conduct Analysis. The CCA Algorithm Model Parameterization: Step 5: Conduct Analysis P Dropped species with fewer than 5 occurrences P Log-transformed species abundances P Row-normalized species log abundances (chord distance) P Selected

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

DISCRIMINANT FUNCTION ANALYSIS (DA)

DISCRIMINANT FUNCTION ANALYSIS (DA) DISCRIMINANT FUNCTION ANALYSIS (DA) John Poulsen and Aaron French Key words: assumptions, further reading, computations, standardized coefficents, structure matrix, tests of signficance Introduction Discriminant

More information

Multivariate Analysis Techniques in Environmental Science

Multivariate Analysis Techniques in Environmental Science 23 Multivariate Analysis Techniques in Environmental Science Mohammad Ali Zare Chahouki Department of Rehabilitation of Arid and Mountainous Regions, University of Tehran, Iran 1. Introduction One of the

More information

Multivariate Analysis. Overview

Multivariate Analysis. Overview Multivariate Analysis Overview Introduction Multivariate thinking Body of thought processes that illuminate the interrelatedness between and within sets of variables. The essence of multivariate thinking

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Course Agenda. First Day. 4 th February - Monday 14.30-19.00. 14:30-15.30 Students Registration Polo Didattico Laterino

Course Agenda. First Day. 4 th February - Monday 14.30-19.00. 14:30-15.30 Students Registration Polo Didattico Laterino Course Agenda First Day 4 th February - Monday 14.30-19.00 14:30-15.30 Students Registration Main Entrance Registration Desk 15.30-17.00 Opening Works Teacher presentation Brief Students presentation Course

More information

4.7. Canonical ordination

4.7. Canonical ordination Université Laval Analyse multivariable - mars-avril 2008 1 4.7.1 Introduction 4.7. Canonical ordination The ordination methods reviewed above are meant to represent the variation of a data matrix in a

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

What do the dimensions or axes represent? Ordinations

What do the dimensions or axes represent? Ordinations Ordinations Goal order or arrange samples along one or two meaningful dimensions (axes or scales) Ordinations Not specifically designed for hypothesis testing Similar to goals of clustering, so why not

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

More information

MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4

MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4 SEVENTH EDITION MULTIVARIATE DATA ANALYSIS i.-*.'.. ' -4 A Global Perspective Joseph F. Hair, Jr. Kennesaw State University William C. Black Louisiana State University Barry J. Babin University of Southern

More information

Computer program review

Computer program review Journal of Vegetation Science 16: 355-359, 2005 IAVS; Opulus Press Uppsala. - Ginkgo, a multivariate analysis package - 355 Computer program review Ginkgo, a multivariate analysis package Bouxin, Guy Haute

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19 PREFACE xi 1 INTRODUCTION 1 1.1 Overview 1 1.2 Definition 1 1.3 Preparation 2 1.3.1 Overview 2 1.3.2 Accessing Tabular Data 3 1.3.3 Accessing Unstructured Data 3 1.3.4 Understanding the Variables and Observations

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis ERS70D George Fernandez INTRODUCTION Analysis of multivariate data plays a key role in data analysis. Multivariate data consists of many different attributes or variables recorded

More information

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Semester 2 Statistics Short courses

Semester 2 Statistics Short courses Semester 2 Statistics Short courses Course: STAA0001 - Basic Statistics Blackboard Site: STAA0001 Dates: Sat 10 th Sept and 22 Oct 2016 (9 am 5 pm) Room EN409 Assumed Knowledge: None Day 1: Exploratory

More information

Chapter 14: Analyzing Relationships Between Variables

Chapter 14: Analyzing Relationships Between Variables Chapter Outlines for: Frey, L., Botan, C., & Kreps, G. (1999). Investigating communication: An introduction to research methods. (2nd ed.) Boston: Allyn & Bacon. Chapter 14: Analyzing Relationships Between

More information

Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data

Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data Design and Analysis of Ecological Data Conceptual Foundations: Ec o lo g ic al Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Semester 1 Statistics Short courses

Semester 1 Statistics Short courses Semester 1 Statistics Short courses Course: STAA0001 Basic Statistics Blackboard Site: STAA0001 Dates: Sat. March 12 th and Sat. April 30 th (9 am 5 pm) Assumed Knowledge: None Course Description Statistical

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams

Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams Matthias Schonlau RAND 7 Main Street Santa Monica, CA 947 USA Summary In hierarchical cluster analysis dendrogram graphs

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

Patterns in the species/environment relationship depend on both scale and choice of response variables

Patterns in the species/environment relationship depend on both scale and choice of response variables OIKOS 105: 117/124, 2004 Patterns in the species/environment relationship depend on both scale and choice of response variables Samuel A. Cushman and Kevin McGarigal Cushman, S. A. and McGarigal, K. 2004.

More information

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation

9.2 User s Guide SAS/STAT. Introduction. (Book Excerpt) SAS Documentation SAS/STAT Introduction (Book Excerpt) 9.2 User s Guide SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Department of Industrial Engineering

Department of Industrial Engineering Department of Industrial Engineering Master of Engineering Program in Industrial Engineering (International Program) M.Eng. (Industrial Engineering) Plan A Option 2: Total credits required: minimum 39

More information

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico

Cluster Analysis. Isabel M. Rodrigues. Lisboa, 2014. Instituto Superior Técnico Instituto Superior Técnico Lisboa, 2014 Introduction: Cluster analysis What is? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from

More information

ECOL 411/ MARI451 Reading Ecology and Special Topic Marine Science - (20 pts) Course coordinator: Professor Stephen Wing Venue: Department of Marine

ECOL 411/ MARI451 Reading Ecology and Special Topic Marine Science - (20 pts) Course coordinator: Professor Stephen Wing Venue: Department of Marine ECOL 411/ MARI451 Reading Ecology and Special Topic Marine Science - (20 pts) Course coordinator: Professor Stephen Wing Venue: Department of Marine Science Seminar Room (140) Time: Mondays between 12-3

More information

Multivariate Analysis of Ecological Communities in R: vegan tutorial

Multivariate Analysis of Ecological Communities in R: vegan tutorial Multivariate Analysis of Ecological Communities in R: vegan tutorial Jari Oksanen June 10, 2015 Abstract This tutorial demostrates the use of ordination methods in R package vegan. The tutorial assumes

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Exploratory Data Analysis with MATLAB

Exploratory Data Analysis with MATLAB Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

More information

Overview of Factor Analysis

Overview of Factor Analysis Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

More information

Multivariate Analysis of Ecological Data. Jan Lepš & Petr Šmilauer

Multivariate Analysis of Ecological Data. Jan Lepš & Petr Šmilauer Multivariate Analysis of Ecological Data Jan Lepš & Petr Šmilauer Faculty of Biological Sciences, University of South Bohemia České Budějovice, 1999 Foreword This textbook provides study materials for

More information

Introduction to Longitudinal Data Analysis

Introduction to Longitudinal Data Analysis Introduction to Longitudinal Data Analysis Longitudinal Data Analysis Workshop Section 1 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section 1: Introduction

More information

Graduate Programs in Statistics

Graduate Programs in Statistics Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.7 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Linear Regression Other Regression Models References Introduction Introduction Numerical prediction is

More information

Simple Linear Regression Chapter 11

Simple Linear Regression Chapter 11 Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Lecture - 32 Regression Modelling Using SPSS

Lecture - 32 Regression Modelling Using SPSS Applied Multivariate Statistical Modelling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 32 Regression Modelling Using SPSS (Refer

More information

Maximising the value of pxrf data

Maximising the value of pxrf data Maximising the value of pxrf data Michael Gazley Senior Research Scientist 13 November 2015 With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e e.edu.in http://www.insof LIST OF COURSES Essential Business Skills for a Data Scientist...

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Clustering and Data Mining in R

Clustering and Data Mining in R Clustering and Data Mining in R Workshop Supplement Thomas Girke December 10, 2011 Introduction Data Preprocessing Data Transformations Distance Methods Cluster Linkage Hierarchical Clustering Approaches

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids

NC STATE UNIVERSITY Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids Exploratory Analysis of Massive Data for Distribution Fault Diagnosis in Smart Grids Yixin Cai, Mo-Yuen Chow Electrical and Computer Engineering, North Carolina State University July 2009 Outline Introduction

More information

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization Clustering Formulation Objects.................................

More information

CLUSTERING (Segmentation)

CLUSTERING (Segmentation) CLUSTERING (Segmentation) Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 What is Clustering? Given a set of records, organize the records into

More information

USE OF REMOTE SENSING FOR MONITORING WETLAND PARAMETERS RELEVANT TO BIRD CONSERVATION

USE OF REMOTE SENSING FOR MONITORING WETLAND PARAMETERS RELEVANT TO BIRD CONSERVATION USE OF REMOTE SENSING FOR MONITORING WETLAND PARAMETERS RELEVANT TO BIRD CONSERVATION AURELIE DAVRANCHE TOUR DU VALAT ONCFS UNIVERSITY OF PROVENCE AIX-MARSEILLE 1 UFR «Sciences géographiques et de l aménagement»

More information

A new method for non-parametric multivariate analysis of variance

A new method for non-parametric multivariate analysis of variance Austral Ecology (2001) 26, 32 46 A new method for non-parametric multivariate analysis of variance MARTI J. ANDERSON Centre for Research on Ecological Impacts of Coastal Cities, Marine Ecology Laboratories

More information

PCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker

PCA, Clustering and Classification. By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker PCA, Clustering and Classification By H. Bjørn Nielsen strongly inspired by Agnieszka S. Juncker Motivation: Multidimensional data Pat1 Pat2 Pat3 Pat4 Pat5 Pat6 Pat7 Pat8 Pat9 209619_at 7758 4705 5342

More information

Ecological Methodology Second Edition

Ecological Methodology Second Edition Ecological Methodology Second Edition Charles J. Krebs University of British Columbia Technische Universitat Darmstadt FACHBEREICH 10 BIOLOGIE Bi bliothek SchnittspahnstraBe 10 D-64 28 7 Darmstadt Inv.-Nr.

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Big Data Analytics and Optimization

Big Data Analytics and Optimization Big Data Analytics and Optimization C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n

More information

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification

Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Tail-Dependence an Essential Factor for Correctly Measuring the Benefits of Diversification Presented by Work done with Roland Bürgi and Roger Iles New Views on Extreme Events: Coupled Networks, Dragon

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Variation in Bird Communities Among Three Silvicultural Treatments in Northern Hardwood Forests

Variation in Bird Communities Among Three Silvicultural Treatments in Northern Hardwood Forests Variation in Bird Communities Among Three Silvicultural Treatments in Northern Hardwood Forests Katherine E. Brashear Graduate Research Assistant UWSP College of Natural Resources 10 May 2006 Introduction

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Better decision making under uncertain conditions using Monte Carlo Simulation

Better decision making under uncertain conditions using Monte Carlo Simulation IBM Software Business Analytics IBM SPSS Statistics Better decision making under uncertain conditions using Monte Carlo Simulation Monte Carlo simulation and risk analysis techniques in IBM SPSS Statistics

More information

Quantitative Marketing Marketing Orientation Spring Semester 2016 Master in Management GSEM University of Geneva

Quantitative Marketing Marketing Orientation Spring Semester 2016 Master in Management GSEM University of Geneva Quantitative Marketing Marketing Orientation Spring Semester 2016 Master in Management GSEM University of Geneva SYLLABUS Faculty: Prof. Marcel Paulssen, University of Geneva e-mail: marcel.paulssen@unige.ch

More information

When to Use a Particular Statistical Test

When to Use a Particular Statistical Test When to Use a Particular Statistical Test Central Tendency Univariate Descriptive Mode the most commonly occurring value 6 people with ages 21, 22, 21, 23, 19, 21 - mode = 21 Median the center value the

More information

Figure 1. IBM SPSS Statistics Base & Associated Optional Modules

Figure 1. IBM SPSS Statistics Base & Associated Optional Modules IBM SPSS Statistics: A Guide to Functionality IBM SPSS Statistics is a renowned statistical analysis software package that encompasses a broad range of easy-to-use, sophisticated analytical procedures.

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Instructions for SPSS 21

Instructions for SPSS 21 1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

More information

Species Associations: The Kendall Coefficient of Concordance Revisited

Species Associations: The Kendall Coefficient of Concordance Revisited Species Associations: The Kendall Coefficient of Concordance Revisited Pierre LEGENDRE The search for species associations is one of the classical problems of community ecology. This article proposes to

More information

Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis

Joint Use of Factor Analysis (FA) and Data Envelopment Analysis (DEA) for Ranking of Data Envelopment Analysis Joint Use of Factor Analysis () and Data Envelopment Analysis () for Ranking of Data Envelopment Analysis Reza Nadimi, Fariborz Jolai Abstract This article combines two techniques: data envelopment analysis

More information

CHAPTER 3 COMMONLY USED STATISTICAL TERMS

CHAPTER 3 COMMONLY USED STATISTICAL TERMS CHAPTER 3 COMMONLY USED STATISTICAL TERMS There are many statistics used in social science research and evaluation. The two main areas of statistics are descriptive and inferential. The third class of

More information

Cluster analysis with SPSS: K-Means Cluster Analysis

Cluster analysis with SPSS: K-Means Cluster Analysis analysis with SPSS: K-Means Analysis analysis is a type of data classification carried out by separating the data into groups. The aim of cluster analysis is to categorize n objects in k (k>1) groups,

More information

How is Big Data Different? A Paradigm Shift

How is Big Data Different? A Paradigm Shift How is Big Data Different? A Paradigm Shift Jennifer Clarke, Ph.D. Associate Professor Department of Statistics Department of Food Science and Technology University of Nebraska Lincoln ASA Snake River

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Lecture 2. Summarizing the Sample

Lecture 2. Summarizing the Sample Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

More information

NCSS Statistical Software. K-Means Clustering

NCSS Statistical Software. K-Means Clustering Chapter 446 Introduction The k-means algorithm was developed by J.A. Hartigan and M.A. Wong of Yale University as a partitioning technique. It is most useful for forming a small number of clusters from

More information

Classification of Bad Accounts in Credit Card Industry

Classification of Bad Accounts in Credit Card Industry Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition

More information

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka

CLASSIFICATION AND CLUSTERING. Anveshi Charuvaka CLASSIFICATION AND CLUSTERING Anveshi Charuvaka Learning from Data Classification Regression Clustering Anomaly Detection Contrast Set Mining Classification: Definition Given a collection of records (training

More information

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009

Cluster Analysis. Alison Merikangas Data Analysis Seminar 18 November 2009 Cluster Analysis Alison Merikangas Data Analysis Seminar 18 November 2009 Overview What is cluster analysis? Types of cluster Distance functions Clustering methods Agglomerative K-means Density-based Interpretation

More information

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling MS4424 Data Mining & Modelling Lecturer : Dr Iris Yeung Room No : P7509 Tel No : 2788 8566 Email : msiris@cityu.edu.hk 1 Aims To introduce the basic concepts of data mining

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

SPSS: AN OVERVIEW. Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi-110 012

SPSS: AN OVERVIEW. Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi-110 012 SPSS: AN OVERVIEW Seema Jaggi and and P.K.Batra I.A.S.R.I., Library Avenue, New Delhi-110 012 The abbreviation SPSS stands for Statistical Package for the Social Sciences and is a comprehensive system

More information

Economic Order Quantity and Economic Production Quantity Models for Inventory Management

Economic Order Quantity and Economic Production Quantity Models for Inventory Management Economic Order Quantity and Economic Production Quantity Models for Inventory Management Inventory control is concerned with minimizing the total cost of inventory. In the U.K. the term often used is stock

More information

AN OBJECT-BASED APPROACH FOR WETLAND HABITATS INVENTORY AND ASSESSMENT USING ALOS AVNIR-2 AND FIELD DATA

AN OBJECT-BASED APPROACH FOR WETLAND HABITATS INVENTORY AND ASSESSMENT USING ALOS AVNIR-2 AND FIELD DATA AN OBJECT-BASED APPROACH FOR WETLAND HABITATS INVENTORY AND ASSESSMENT USING ALOS AVNIR-2 AND FIELD DATA Moschos Vogiatzis (1), Christos G. Karydas (1), Thomas K. Alexandridis (1), Nikolaos Silleos (1),

More information