Introduction to Multivariate Analysis



Similar documents
Data Mining: Algorithms and Applications Matrix Math Review

Multivariate Normal Distribution

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

Simple Regression Theory II 2010 Samuel L. Baker

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Sections 2.11 and 5.8

Multivariate Analysis of Ecological Data

Manifold Learning Examples PCA, LLE and ISOMAP

Systems of Linear Equations

with functions, expressions and equations which follow in units 3 and 4.

Glencoe. correlated to SOUTH CAROLINA MATH CURRICULUM STANDARDS GRADE 6 3-3, , , 4-9

03 The full syllabus. 03 The full syllabus continued. For more information visit PAPER C03 FUNDAMENTALS OF BUSINESS MATHEMATICS

Copyright PEOPLECERT Int. Ltd and IASSC

CRLS Mathematics Department Algebra I Curriculum Map/Pacing Guide

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

We shall turn our attention to solving linear systems of equations. Ax = b

Definition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.

Alabama Department of Postsecondary Education

Anchorage School District/Alaska Sr. High Math Performance Standards Algebra

Introduction to Matrix Algebra

Algebra 1 Course Information

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Dimensionality Reduction: Principal Components Analysis

Lecture notes on linear algebra

Standards and progression point examples

Summation Algebra. x i

is the degree of the polynomial and is the leading coefficient.

Algebra Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

How To Understand And Solve A Linear Programming Problem

Pennsylvania System of School Assessment

CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

-- Martensdale-St. Marys Community School Math Curriculum

α = u v. In other words, Orthogonal Projection

Iris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Vector Notation: AB represents the vector from point A to point B on a graph. The vector can be computed by B A.

Florida Algebra 1 End-of-Course Assessment Item Bank, Polk County School District

Multivariate Analysis of Variance (MANOVA): I. Theory

Notes on Determinant

Common Core Unit Summary Grades 6 to 8

Data exploration with Microsoft Excel: analysing more than one variable

LINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,

Similarity and Diagonalization. Similar Matrices

Higher Education Math Placement

1 Introduction to Matrices

AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Multiple regression - Matrices

Session 7 Bivariate Data and Analysis

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

DERIVATIVES AS MATRICES; CHAIN RULE

Module 3: Correlation and Covariance

Algebra I Vocabulary Cards

096 Professional Readiness Examination (Mathematics)

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Factor Analysis. Chapter 420. Introduction

Diablo Valley College Catalog

Volumes of Revolution

MATH2210 Notebook 1 Fall Semester 2016/ MATH2210 Notebook Solving Systems of Linear Equations... 3

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Linear Algebra Notes for Marsden and Tromba Vector Calculus

FURTHER VECTORS (MEI)

How To Understand And Solve Algebraic Equations

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

STAT 360 Probability and Statistics. Fall 2012

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

Pre-Algebra Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems

EQUATIONS and INEQUALITIES

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

Math 1. Month Essential Questions Concepts/Skills/Standards Content Assessment Areas of Interaction

KEANSBURG SCHOOL DISTRICT KEANSBURG HIGH SCHOOL Mathematics Department. HSPA 10 Curriculum. September 2007

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

for the Common Core State Standards 2012

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

WHERE DOES THE 10% CONDITION COME FROM?

Acquisition Lesson Planning Form Key Standards addressed in this Lesson: MM2A2c Time allotted for this Lesson: 5 Hours

Grade 6 Mathematics Common Core State Standards

Please start the slide show from the beginning to use links. Click here for active links to various courses

For example, estimate the population of the United States as 3 times 10⁸ and the

Matrix Differentiation

Mathematics Online Instructional Materials Correlation to the 2009 Algebra I Standards of Learning and Curriculum Framework

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

Lecture 1: Systems of Linear Equations

Introduction to Principal Component Analysis: Stock Market Values

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

HIBBING COMMUNITY COLLEGE COURSE OUTLINE

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Fairfield Public Schools

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

A Concrete Introduction. to the Abstract Concepts. of Integers and Algebra using Algebra Tiles

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Preparation and Statistical Displays

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the school year.

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Transcription:

Introduction to Multivariate Analysis Lecture 1 August 24, 2005 Multivariate Analysis Lecture #1-8/24/2005 Slide 1 of 30

Today s Lecture Today s Lecture Syllabus and course overview Chapter 1 (a brief review, really): Data organization/notation Graphical techniques Distance measures Introduction to SAS Lecture #1-8/24/2005 Slide 2 of 30

Who are you? To help all of use get to know each other better, please tell us your: Name Department and specialty Where you are from originally Where you did you undergraduate work Lecture #1-8/24/2005 Slide 3 of 30

Syllabus Syllabus discussion Multivariate Lecture #1-8/24/2005 Slide 4 of 30

Multivariate Statistics Multivariate A taxonomy of multivariate statistical analyses shows that most techniques fall into one of the following categories: 1 Data reduction or structural simplification 2 Sorting and grouping 3 Investigation of the dependence among variables 4 Prediction 5 Hypothesis construction and testing Lecture #1-8/24/2005 Slide 5 of 30

Arrays As a precursor of things to come, here is a preview of the ways data are organized in this book/course Multivariate data are a collection of observations (or measurements) of: p variables (k = 1,,p) n items (j = 1,,n) items can also be though of as subjects/examinees/individuals or entities (when people are not under study) In some disciplines (such as educational measurement), items are considered the variables collected per individual Lecture #1-8/24/2005 Slide 6 of 30

x jk = measurement of the k th variable on the j th entity Variable 1 Variable 2 Variable k Variable p Item 1: x 11 x 12 x 1k x 1p Item 2: x 21 x 22 x 2k x 2p Item j: x j1 x j2 x jk x jp Item n: x n1 x n2 x nk x np Lecture #1-8/24/2005 Slide 7 of 30

Arrays Arrays To represent the entire collection of items and entities, a rectangular array can be constructed: x 11 x 12 x 1k x 1p x 21 x 22 x 2k x 2p X = x j1 x j2 x jk x jp x n1 x n2 x nk x np In the next class, we will learn about how arrays like this have an algebra that makes life somewhat easier All arrays will be symbolized by boldfaced font Lecture #1-8/24/2005 Slide 8 of 30

Array Example Arrays So, putting things all together, envision standing outside of the Kansas Union Bookstore, asking people for receipts You are interested in looking at two variables: Variable 1: the total amount of the purchase Variable 2: the number of books purchased You find four people, and here is what you see observe (with notation: x 11 = 42 x 21 = 52 x 31 = 48 x 41 = 58 x 12 = 4 x 22 = 5 x 32 = 4 x 42 = 3 Lecture #1-8/24/2005 Slide 9 of 30

Array Example (Continued) The data array would the look like: Arrays X = x 11 x 12 x 21 x 22 x 31 x 32 x 41 x 42 = 42 4 52 5 48 4 58 3 Notice for any variable, x jk : The first subscript (j) represents the ROW location in the data array The second subscript (k) represents the COLUMN location in the data array Lecture #1-8/24/2005 Slide 10 of 30

Review Sample Mean Sample Variance Sample Correlation When we have a large amount of data, it is often hard to get a manageable description of the nature of the variables under study For this reason (and as a way of introducing a review topics from previous courses), descriptive statistics are used Such descriptive statistics include: Means Variances Covariances Correlations Lecture #1-8/24/2005 Slide 11 of 30

Sample Mean For the k th variable, the sample mean is: Sample Mean Sample Variance Sample Correlation x k = 1 n n j=1 x jk An array of the means for all p variables then looks like this (which we will come to know as the mean vector): x 1 x 2 x = x 3 x 4 Lecture #1-8/24/2005 Slide 12 of 30

Sample Variance For the k th variable, the sample variance is: Sample Mean Sample Variance Sample Correlation s 2 k = s kk = 1 n n (x jk x k ) 2 j=1 Note the kk subscript, this will be important because the equation that produces the variance for a single variable is a derivation of the equation of the covariance for a pair of variables Also note the division by n Reasons for this will become apparent in the near future For a pair of variables, i and k, the sample covariance is: s ik = 1 n n (x ji x i )(x jk x k ) j=1 Lecture #1-8/24/2005 Slide 13 of 30

Sample Covariance Matrix Sample Mean Sample Variance Sample Correlation Making an array of all sample covariances give us: s 11 s 12 s 1p s 21 s 22 s 2p S n = s p1 s p2 s pp Lecture #1-8/24/2005 Slide 14 of 30

Sample Correlation Sample Mean Sample Variance Sample Correlation Sample covariances are dependent upon the scale of the variables under study For this reason, the correlation is often used to describe the association between two variables For a pair of variables, i and k, the sample correlation is found by dividing the sample covariance by the product of the standard deviation of the variables: r ik = s ik sii skk The sample correlation: Ranges from -1 to 1 Measures linear association Is invariant under linear transformations of i and k Is a biased estimator Lecture #1-8/24/2005 Slide 15 of 30

Sample Correlation Matrix Sample Mean Sample Variance Sample Correlation Making an array of all sample covariances give us: 1 r 12 r 1p r 21 1 r 2p R = r p1 r p2 1 Lecture #1-8/24/2005 Slide 16 of 30

Displaying multivariate data can be difficult due to our natural limitations of 3-dimensions Several simple ways of displaying data include: Bivariate scatterplots Three-dimensional scatterplots Bivariate Scatterplots Trivariate Scatterplots Stars Chernoff Faces Dendrograms Variable Space Network Diagrams But you already know those, some plots that can be achieved by multivariate methods include: Stars Chernoff faces Lecture #1-8/24/2005 Slide 17 of 30

Bivariate Scatterplots Lecture #1-8/24/2005 Slide 18 of 30

Trivariate Scatterplots Lecture #1-8/24/2005 Slide 19 of 30

But you already know those plots Bivariate Scatterplots Trivariate Scatterplots Stars Chernoff Faces Dendrograms Variable Space Network Diagrams Some plots that can be achieved by multivariate methods include: Stars Chernoff faces Dendrograms Bivariate plots, but of the variable space Network graphs Lecture #1-8/24/2005 Slide 20 of 30

Stars Lecture #1-8/24/2005 Slide 21 of 30

Chernoff Faces Lecture #1-8/24/2005 Slide 22 of 30

Dendrograms Lecture #1-8/24/2005 Slide 23 of 30

Variable Space Plots Lecture #1-8/24/2005 Slide 24 of 30

Network Diagrams P010000 P000000 P001000 P110001 P011001 P110101 P111100 P001001 P111001 P101000 P001100 P101001 P111101 P101100 P100000 P101101 P111111 P100100 P011101 P100101 P001101 P000100 P000101 P100001 P000001 P011000 P010001 P110000 P010100 P011110 P011010 P111000 P011100 P010101 P110100 P011111 P101111 P100111 P001111 P101011 P111010 P010111 P110010 P111110 P110111 P111011 P101110 P100011 P110110 P110011 P011011 P000111 P010110 P010011 P000011 P010010 P001110 P100110 P101010 P001011 P000110 P001010 P100010 P000010 Pajek Lecture #1-8/24/2005 Slide 25 of 30

A great number of multivariate techniques revolve around the computation of distances: Distances between variables Distances between entities The formula for the Euclidean distance formula between the coordinate pair P = (x 1, x 2 ) and the origin P = (0, 0): d(o, P) = x 2 1 + x2 2 Lecture #1-8/24/2005 Slide 26 of 30

Elaborate discussions of distance measures will be found later in the class Just keep in mind that there are statistical analogs to distance measures, taking the variability of variables into account Also be aware that there are literally an infinite number of distance measures! A distance measure must satisfy the following: d(p, Q) = d(q, P) d(p, Q) > 0 if P Q d(p, Q) = 0 if P = Q d(p, Q) d(p, R) + d(r, Q) (known as the triangle inequality) Lecture #1-8/24/2005 Slide 27 of 30

Introduction to SAS SAS has a reputation for beingwellunliked by many in the social sciences Why bother teaching it in this course? New focus on SAS in our department Adding value to your degree (check out amstatorg or dicecom for details) For some good things about SAS, check out http://wwwpbsorg/cringely/pulpit/pulpit20020411html Lecture #1-8/24/2005 Slide 28 of 30

Final Thought We just introduced what this course will be about Things will become increasingly relevant as time progresses Final Thought Next Class Please be patient with SAS We will now head down to the lab for a SAS introduction session Lecture #1-8/24/2005 Slide 29 of 30

Next Time Matrix algebra (Chapter 2, Supplement 2A) SAS proc iml Final Thought Next Class Lecture #1-8/24/2005 Slide 30 of 30