FUNCTIONAL DATA ANALYSIS: INTRO TO R s FDA



Similar documents
Representing Functional Data as Smooth Functions

CHAPTER 1 Splines and B-splines an Introduction

Functional Data Analysis of MALDI TOF Protein Spectra

Excel Guide for Finite Mathematics and Applied Calculus

GeoGebra. 10 lessons. Gerrit Stols

Spline Toolbox Release Notes

Package fda. February 19, 2015

Gerrit Stols

Curve Fitting with Maple

Scatter Plots with Error Bars

Gamma Distribution Fitting

Summary of important mathematical operations and formulas (from first tutorial):

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Functional Data Analysis

Package MBA. February 19, Index 7. Canopy LIDAR data

Fitting Subject-specific Curves to Grouped Longitudinal Data

Simple Programming in MATLAB. Plotting a graph using MATLAB involves three steps:

f(x) = g(x), if x A h(x), if x B.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

0 Introduction to Data Analysis Using an Excel Spreadsheet

Derive 5: The Easiest... Just Got Better!

Lecture 6: Logistic Regression

Package SHELF. February 5, 2016

Lecture 2 Mathcad Basics

Descriptive Statistics

Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16

Directions for using SPSS

M Polynomial Functions 1

Dealing with Data in Excel 2010

PRACTICAL GUIDE TO DATA SMOOTHING AND FILTERING

F.IF.7b: Graph Root, Piecewise, Step, & Absolute Value Functions

5 Correlation and Data Exploration

Package neuralnet. February 20, 2015

Assessing Measurement System Variation

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Quick Tour of Mathcad and Examples

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Absorbance Spectrophotometry: Analysis of FD&C Red Food Dye #40 Calibration Curve Procedure

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina

Point Lattices in Computer Graphics and Visualization how signal processing may help computer graphics

Using Excel for Handling, Graphing, and Analyzing Scientific Data:

Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Computer Graphics. Geometric Modeling. Page 1. Copyright Gotsman, Elber, Barequet, Karni, Sheffer Computer Science - Technion. An Example.

A short introduction to splines in least squares regression analysis

Lesson 3 - Processing a Multi-Layer Yield History. Exercise 3-4

Microsoft Excel. Qi Wei

STA 4273H: Statistical Machine Learning

More details on the inputs, functionality, and output can be found below.

CD-ROM Appendix E: Matlab

The KaleidaGraph Guide to Curve Fitting

Preface of Excel Guide

Psychology 205: Research Methods in Psychology

How To Use Excel With A Calculator

Piecewise Cubic Splines

Moving Least Squares Approximation

Promotional Forecast Demonstration

Smoothing and Non-Parametric Regression

7 Time series analysis

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Manhattan Center for Science and Math High School Mathematics Department Curriculum

CSC 120: Computer Science for the Sciences (R section)

Gage Studies for Continuous Data

Visualizing Data: Scalable Interactivity

MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!

Raster to Vector Conversion for Overlay Analysis

Introduction Course in SPSS - Evening 1

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

Lab 1: Introduction to PSpice

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Sample Size Calculation for Longitudinal Studies

ITSM-R Reference Manual

Solving Quadratic & Higher Degree Inequalities

University of Alberta FUNCTIONAL DATA ANALYSIS WITH APPLICATION TO MS AND CERVICAL VERTEBRAE DATA. Kate Yaraee. Master of Science in Statistics

The Fourth International DERIVE-TI92/89 Conference Liverpool, U.K., July Derive 5: The Easiest... Just Got Better!

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Excel Companion. (Profit Embedded PHD) User's Guide

Analyzing Dose-Response Data 1

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

geofd: An R Package for Function-Valued Geostatistical Prediction

Content. Chapter 4 Functions Basic concepts on real functions 62. Credits 11

DRAFT. Algebra 1 EOC Item Specifications

the points are called control points approximating curve

Scientific Graphing in Excel 2010

Essential Mathematics for Computer Graphics fast

E x c e l : Data Analysis Tools Student Manual

By Clicking on the Worksheet you are in an active Math Region. In order to insert a text region either go to INSERT -TEXT REGION or simply

List the elements of the given set that are natural numbers, integers, rational numbers, and irrational numbers. (Enter your answers as commaseparated

South Carolina College- and Career-Ready (SCCCR) Probability and Statistics

Access Queries (Office 2003)

Using splines in regression

Scicos is a Scilab toolbox included in the Scilab package. The Scicos editor can be opened by the scicos command

Higher Education Math Placement

Using R for Linear Regression

Transcription:

FUNCTIONAL DATA ANALYSIS: INTRO TO R s FDA EXAMPLE IN R...fda.txt DOCUMENTATION: found on the fda website, link under software (http://www.psych.mcgill.ca/misc/fda/) Nice 2005 document with examples, explanations. CODE: available from R-CRAN or the above website (in R: use package menu to download and install). ERRORS IN FDA: Please let me know about errors, for posting to Jim Ramsay.

STEPS IN USING FDA choose basis and set up basis functions : might depend on range of t values, but not on y values or specific t-values basisfd (a basis object) turn vectors into functions using data (t s and y s) and basisfd. This can be done by least squares or by lightly smoothing the data. datafd (a functional data object) plot, summarize with pointwise means and standard deviations using datafd align (warp), functional principal components, linear discriminant analysis, derivatives,...

basisfd: Basis Object (class=bs) Example: basisfd <- create.bspline.basis(rangeval=c(0, 1), nbasis=null, norder=4, breaks=null) You must specify type of basis: Fourier, B-spline, power, constant (one function φ(t) 1), exponential (φ j (t) = exp(α j t)), polygonal (piecewise linear), polynomial range = c(a, b): t values are in interval [a,b] number of basis functions (or something that determines this, like number of knots) some parameters (depends on basis chosen): eg knot values, degree, period (for Fourier series) Miscellaneous: dropind : leave out basis functions; quadvals and values: used for integrals and derivatives of basis functions

BSPLINE BASIS: parameters are similar, but not identical to, bs create.bspline.basis(rangeval=c(0, 1), nbasis=null, norder=4, breaks=null) RANGEVAL range for independent variable, default is [0,1] BREAKS=KNOTS include interior knots and boundary knots, in increasing order, boundary knots must equal RANGEVAL default: equally spaced with RANGEVAL as boundary knots NORDER (must be between 1 and 20) is the degree + 1 (order = 4 means piecewise cubic) NBASIS = number of basis functions to use

create.bspline.basis(rangeval=c(0, 1), nbasis=null, norder=4, breaks=null) NOTE: we must have nbasis = degree + # knots + 1 = norder + length(breaks) - 2 We needn t specify all three: nbasis, order, breaks We won t always get errors if we specify all, but with nbasis not equal to norder + length(breaks) - 2. R code...

datafd: Functional Data Object turn vectors into functions using data (t s and y s) and basisfd. This can be done by lightly smoothing the data: datafdpar <- fdpar(basisfd, 2, lambda) ## info on smoothing datalist <- smooth.basis(x, y, datafdpar) ## data datalist$fd # this is the functional data object or by least squares: data2fd(y, argvals=seq(0, 1, len = n), basisfd, fdnames=defaultnames, argnames=c("time", "reps", "values")

Turn vectors into functions using data (t s and y s) and basisfd (continued) data2fd can handle NA s, can handle different t-values for each individual. Since it s least squares (no penalty), we can get some undesirable results. smooth.basis: can t have NA s, must have same t s for each individual. POSSIBLE ACTIONS: Use smooth.basis. Check data2fd fit - looks OK? Great. Lightly smooth data outside of fda library. Then use smooth.basis. Outside of fda library: interpolate to fill in missing data values. GOAL: do not change data much at all. This is initial processing. Output: a basis class object, coeff, and fdnames (fdnames is optional input, too - for plotting)

FDNAMES: list of length 3 used for labelling plots first: argument value (eg age ), default = time second: a vector for replication (eg mouse id ), default = reps1, reps2.. third: response (eg body mass ), default = values Difference between fdnames and argnames in data2fd?? - fdnames[2] is a vector (one entry per curve) - argnames[2] is the name of fdnames[2] eg mouse data2fd and smooth.basis are also used for turning differential operator into a function.

smooth.basis datafdpar <- fdpar(basisfd, 2, lambda) ## info on smoothing datalist <- smooth.basis(x, y, datafdpar) ## data datalist$fd # this is the functional data object fdpar(fdobj=fd(), Lfdobj=int2Lfd(0), lambda=0, estimate=true, penmat=null) Lfdobj = integer or differential operator (gives penalty on function) Lfdobj = 2 penalty = [f ] 2 smooth.basis(argvals, y, fdparobj, wtvec=rep(1,n), dffactor=1,fdnames=list(null, dimnames(y)[2], NULL) R code...

data2fd data2fd(y, argvals=seq(0, 1, len = n), basisfd, fdnames=defaultnames, argnames=c("time", "reps", "values") This fits by ordinary least squares, not penalized least squares. Y, ARGVALS (NA s permitted - only in y??); nrep = number of reps/curves if all individuals are observed at same argument values, t 1,..., t n, y is the n by nrep matrix of responses and argvals can be (t 1,..., t n ) if all individuals are observed at same argument values, t 1,..., t n, but with some missing values, do as above but use NA s at missing y-values if individual i is observed at (t i1,..., t in ) (same length for all individuals): both y and argvals are n by nrep. if individual i is observed at (t i1,..., t ini ): - let T,..., T K be the union of all distinct t-values. - y is K by nrep with many NA s, argvals is K by nrep.

Some Other Objects/Classes bivariate functional data class bifd for functions of two variables Linear differential operator object (via Lfd) functional parameter object (via fdpar)

SUMMARY/GRAPHICS COMMANDS FOR FD OBJECTS PLOT plotfd(fd,lfd,matplt=true,href=true,nex,...) fd is a functional data object to be plotted Lfd: what derivative do you want to plot? (0 = function, 1, 2,...). We can make this a complicated differential operator: eg plot m (t) sin(2πt) m(t)... matplt = T plot all curves at once, = F plot one at a time href = T plot x-axis, = F omit x-axis

MEAN, SD, CENTER, VAR meanfd(fd): point-wise mean std.fd(fd): point-wise sd s center.fd(fd): subtract point-wise average from all curves var.fd(fd1,fd2): a bivariate functional data object R code...