Statistical Analysis in MATLAB. Hot Topic 18 Jan 2006 Sanjeev Pillai BARC



From this document you will learn the answers to the following questions:

Where can you find data in a workpace?

What is standard scientific computing?

Similar documents
STATISTICAL ANALYSIS WITH EXCEL COURSE OUTLINE

AMATH 352 Lecture 3 MATLAB Tutorial Starting MATLAB Entering Variables

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

SPSS Tests for Versions 9 to 13

Additional sources Compilation of sources:

Beginner s Matlab Tutorial

Analysis of Variance ANOVA

(!' ) "' # "*# "!(!' +,

Introduction. Statistics Toolbox

Module 5: Statistical Analysis

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Linear Models in STATA and ANOVA

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Study Guide for the Final Exam

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Simple Linear Regression Inference

Section 13, Part 1 ANOVA. Analysis Of Variance

The Statistics Tutor s Quick Guide to

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Minitab Session Commands

18.06 Problem Set 4 Solution Due Wednesday, 11 March 2009 at 4 pm in Total: 175 points.

StatCrunch and Nonparametric Statistics

First-year Statistics for Psychology Students Through Worked Examples. 3. Analysis of Variance

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Statistical Functions in Excel

The Projection Matrix

AC - Clinical Trials

Data Analysis Tools. Tools for Summarizing Data

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

T-test & factor analysis

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Question 2: How do you solve a matrix equation using the matrix inverse?

One-Way Analysis of Variance (ANOVA) Example Problem

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

2+2 Just type and press enter and the answer comes up ans = 4

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Data Mining: Algorithms and Applications Matrix Math Review

Introduction to Matlab

Simple Predictive Analytics Curtis Seare

Non-Inferiority Tests for Two Means using Differences

Package HHG. July 14, 2015

Multivariate Statistical Inference and Applications

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Lecture 5: Singular Value Decomposition SVD (1)

Matrix Algebra in R A Minimal Introduction

Figure 1. An embedded chart on a worksheet.

Systat: Statistical Visualization Software

Guide to Microsoft Excel for calculations, statistics, and plotting data

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Projects Involving Statistics (& SPSS)

Review Jeopardy. Blue vs. Orange. Review Jeopardy

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Skewed Data and Non-parametric Methods

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

Tutorial 5: Hypothesis Testing

Chi-square test Fisher s Exact test

Parametric and Nonparametric: Demystifying the Terms

Nonparametric Statistics

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

Descriptive Statistics

Appendix 4 Simulation software for neuronal network models

MS-EXCEL: STATISTICAL PROCEDURES

Statistical issues in the analysis of microarray data

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Package dsstatsclient

Basic Concepts in Matlab

SPSS ADVANCED ANALYSIS WENDIANN SETHI SPRING 2011

CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Linear Algebra Review. Vectors

1 Topic. 2 Scilab. 2.1 What is Scilab?

Module 4 (Effect of Alcohol on Worms): Data Analysis

Comparing Means in Two Populations

Introduction to StatsDirect, 11/05/2012 1

Descriptive Analysis

Performance Metrics for Graph Mining Tasks

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

Similar matrices and Jordan form

Using Microsoft Excel for Probability and Statistics

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

NCSS Statistical Software

NCSS Statistical Software

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

UNIVERSITY OF NAIROBI

Statistics for Sports Medicine

Lecture 2 Matrix Operations

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

MEASURES OF LOCATION AND SPREAD

Data analysis process

Multivariate Analysis of Variance (MANOVA)

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

CALCULATIONS & STATISTICS

Transcription:

Statistical Analysis in MATLAB Hot Topic 18 Jan 2006 Sanjeev Pillai BARC

MATLAB Basic Facts n MATrix LABoratory n Standard scientific computing software n Interactive or programmatic n Wide range of applications n Bioinformatics and Statistical toolboxes n Product of MathWorks (Natick, MA) n Available at WIBR (~20 licenses now)

Basic operations n Primary data structure is a matrix n To create a matrix a = [1 2 3 4] % creates a row vector b = 1:4 % creates a row vector c = pi:-0.5:0 % creates a row vector d = [1 2;4 5;7 8] % creates a 3x2 matrix n Operations on matrices a+c % adds a and c to itself if dimensions agree d % transposes d into a 2x3 matrix size(d) % gives the dimensions of d x*y % multiplies x with y following matrix rules x.* y % element by element multiplication

Basic operations n Accessing matrix values d(3,2) % retrieves the 3 rd rw, 2 nd cl element of d d(3,:) % all elements of the 3 rd row d(:,2) % all elements of the 2 nd column d(1:2,2) % 1 st to 2 nd row, 2 nd column n Assigning values to matrix elements d(1,1)=3; % assigns 3 to (r1,c1) d([1 2],:)=d([3 3],:) % change the first 2 rows to the 3 rd d=d^2 % squares all values in d

Basic operations n Strings Row vectors that can be concatenated x = Matlab y = class z = [x y] % z gets Matlab class n Useful functions doc, help % for help with various matlab functions whos % Lists all the variables in current workspace clear % clears all variables in the current workspace

Read/Write Data (File I/O) n Several data formats supported text, xls, csv, jpg, wav, avi etc. n From the prompt or using Data Import n Read into variables in the workspace [V1 V2 V3..] = textread( filename, format ) eg. [l,o] = textread('energy.txt','%f%f','delimiter',',','headerlines', 1,'emptyvalue',NaN); n Treated as regular matlab variables n Write out into files fid=fopen( en.txt, w ); fprintf(fid, %f\t%f\n,[lean;obese]); fclose(fid); xlswrite('energy.xls',[num2cell([lean obese])]);

Basic Statistics in Matlab n mean(lean) % calculates the mean n median(lean) n std(obese(finite(obese))) % ignores the NaNs n Visualize data boxplot([lean,obese],'labels',{'lean','obese'}) Select variables from workspace Use the plotting tool from the interface

Hypothesis testing n One sample z-test Done to test a sample statistic against an expected value (population parameter) Done when the population sd is known ztest(vector,mean,sd); [h,p,ci,zscore]=ztest(vector,mean,sigma,alpha,tail) n One sample t-test Done when the population sd is not known. [h,p,ci,tscore]=ttest(vector,mean,alpha,tail)

Two-sample tests n Paired samples Data points match each other Eg. before/after drug treatment [h,p,ci,stats]=ttest(d1,d2,alpha) n Independent samples Data points not related Eg. Data from 2 groups of people [h,p,ci,stats]=ttest2(d1,d2,alpha)

Test for assumptions n Data is normally distributed Paired: Delta is normally distributed Independent: Both data sets are normal normplot(var) or qqplot(var) or qqplot(v1,v2) n Data is homogenous (equal variances) F-test Tests whether the ratio of the variances is 1. [h,p,ci,stats]=vartest2(g1,g2,0.01)

Non-parametric tests n Data need not be normal n Compare ranks instead of values n By ranking the signs or sums n Wilcoxon signed rank test (one sample or paired samples) [p,h,stats]=signrank(var1,var2) n Wilcoxon rank sum test (Independent samples) [p,h,stats]=ranksum(var1,var2)

Multiple hypothesis correction n Applied when a test is done several times Significance occurs just by chance Eg. Microarray analysis (wild type vs mutant) n Bonferroni correction Multiply raw p-value with the number of repetitions for i=1:number_of_reps n calculate p-value for each n correct each p-value n store in a data structure end

Comparing proportions n Analyze proportions instead of values n Chi-square test No single command in matlab x= [matrix of contingency table]; e= sum(x')'*sum(x)/sum(sum(x)); X2=(x-e).^2./e X2=sum(sum(X2)) df=prod(size(x)-[1,1]) P=1-chi2cdf(X2,df)

Some more tests n Enrichment analysis Is the given data enriched for a category? Used widely in biological data analysis Hypergeometric probability analysis n Y = hygecdf(x,m,k,n); n Correlation Identify correlation between paired values From -1 to +1: perfect +ve and inverse correlations n [R,P] = corrcoef(x,y);

Matlab resources n Online help http://www.mathworks.com/access/helpdesk/help/helpdesk.shtml n Open source user community Someone may have already done what you need http://www.mathworks.com/matlabcentral/ n Topics not covered Scripts and functions Complex data structures Programming