Data analysis manual for eye-tracking for use with R / growth curve analysis (t.lentz@let.ru.nl) (tomlentz@gmail.com) Nijmegen, 13 October 2011 Analysis ET 1 Overview Steps Gather data Organise data Statistical analysis Data inspection Data modelling Specifically Tobii output Perl scripts R, Growth Curve Analysis, Mixed models Analysis ET 2
Tobii output: Lines of text Data properties: Recording date: 8/3/2011 Recording time : 8:50:53:468 (corresponds to time 0) Study: P Stragegy Subject: AbelG Recording: AbelG Screen resolution: 1280 x 1024 Coordinate unit: Pixels Filter settings: Eye: Average Fixation radius: 30 Min duration: 100 Tobii output: Lines of text Timestamp Number GazepointX (L) GazepointY (L) CamX (L) CamY (L) Distance (L) Pupil (L) Validity (L) GazepointX (R) GazepointY (R) CamX (R) CamY (R) Distance (R) Pupil (R) Validity (R) Fixation GazepointX GazepointY Event Event Key Data 1 Data 2 Description 2 1-1280 -1024-1.000-1.000-1.000-1.000 4-1280 -1024-1.000-1.000-1.000-1.000 4 217 5 5 ShowAVI 8 0 0 C1 xvid.avi
Using scripts Tobii measurement is a time point in a trial. Trial is participant item Every measurement to be anchored to time of trial (movie) enhanced with all trial information ( disk space hog ) Copy by hand or use a script. Analysis ET 5 Script Possibilities Customised scripts (with the structure just explained) One general script will be made available (soon, ask me) Tell me your wishes (now) Scripts run on Perl (try to install, then to learn) Analysis ET 6
My script(s) combinefixations.pl Process every fixation line (after header) of every file in a directory. Uses two additional files participant item Fixed assumptions about information Everything is text, i.e. readable and adaptable Perl: available for all platforms (often pre-installed) Analysis ET 7 Example Timestamp Number GazepointX (L) GazepointY (L) CamX (L) CamY (L) Distance (L) Pupil (L) Validity (L) GazepointX (R) GazepointY (R) CamX (R) CamY (R) Distance (R) Pupil (R) Validity (R) Fixation GazepointX GazepointY Event Event Key Data 1 Data 2 Description Participant Gender Difficulty Version CDIc CDIp Production Region Correct Targetword Condition 0 12 57 1495 0.644 0.711 583.435 4.177 0 56 1476 0.414 0.730 571.450 4.140 0 1 62 1477 AbelGCMD m SP SP a 79 67 NA buiten NA koe d 17 13 57 1503 0.643 0.711 583.435 4.122 0 48 1497 0.413 0.731 571.450 4.128 0 1 62 1477 AbelGCMD m SP SP a 79 67 NA buiten NA koe d
Script data sources Tobii data Careful! Perl can overwrite files without warning. Make a directory with a copy of the data. Make original write-only. Analysis ET 9 Script data sources Extra information Item trials starts with ShowAVI item name is movie name trials ends with next ShowAVI Participant one file, one participant name in preamble or file name Time 3 measurements every 50 ms: 0 17 34 50 Analysis ET 10 Relative to movie to item
Script data sources Sources Files Make structured (Excel Save as CSV) files. Item ID (unique!), info, onset / time point Participant ID (unique!), gender, group... More than one onset is no problem: keep the value at the same column in the items file, use different item files for other analyses. Analysis ET 11 Script data sources Run the script Windows Uitvoeren/Run type cmd Mac / Unix Open terminal perl combinefixations resultsdir participants items output Analysis ET 12
Raw outcome Output script is text (tab separated) Open as spreadsheet Open in Excel, two ways: 1 Import data from file, or 2 Give extension.csv and open with Excel Check alignment Is every value in the right column? Analysis ET 13 R What is R? Open source Anyone can contribute to R. It is more flexible than closed proprietary packages. It is free! Type and ask Instead of point and click and menus, you type commands, asking exactly what you want. Object oriented The architecture of R is built around the concept of objects. Analysis ET 14
Syntax R syntax Functions Smallest unit of command: a function. Example: sin for sine. Function and objects A function object has a syntax: function(arguments). Argument types Arguments are objects. atoms numbers, characters, truth values complex files, vectors, matrices, functions, named atoms Analysis ET 15 Syntax Example: sine > Atomic argument example You can ask the function sin to calculate the sine of a half rad. This is not only an R function, but also a mathematical one. Hands on Needed: R installation, get it at r-project.org. Start R (click icon, or r in terminal window). You should see: R version 2.12.0 (2010-10-15) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86 64-apple-darwin9.8.0/x86 64 (64-bit)... Analysis ET 16
Syntax Example: sine At the R prompt If you type at the prompt: > sin R will tell you what the name refers to. Give it an argument: > sin(0.5) Now the answer appears: [1] 0.4794255 The argument 0.5 is a number, interpreted as degrees. Analysis ET 17 Syntax Return Returning an object Functions usually return objects. Store with the operator ->. > sin(0.5) -> sineofahalf > sineofahalf [1] 0.4794255 Objects Conceptually important! Have a name (give them one!) Have a content (other objects or atoms, combinations, function algorithms) Type name, get content (try pi) Analysis ET 18
Syntax More complicated functions Example: plot > plot(sin) Function takes function as argument. plot used range < 0 1 >. Improve: use optional arguments like xlim. > plot(sin, xlim = c(0,2*pi), xlab = "deg", ylab = "sine") Optional arguments These have a default value, normally adequate. HELP! > help(plot) Meaning of arguments, defaults of optional arguments. Analysis ET 19 Syntax Function syntax: recap function object optional =function opt.arg. = string argument argument return plot( sin, xlim =c(0,2*pi), xlab = "deg", ylab = "sine") Nested functions A function with arguments like c(0,2*pi) (concatenate) can be nested in another function like plot. The higher function takes the return value. pi is an object! Analysis ET 20
Data entry Entry Basic hygiene Open a text file for your commands and comments. Change R working directory with setwd("your dir") Import your data read.csv, read.table Nice trick: file.choose() returns file name read.csv(file.choose()), header = T) What went wrong? Analysis ET 21 Data entry Inspect the data Formats read.xx returns a dataframe, consisting of vectors. read.csv has default argument field separator text delimiter put it into a named object, e.g. data check first rows with head do not ask the contents but a summary Analysis ET 22
Graphical inspection Back to infant eye-tracking Reminder All fixations have been saved separately. The script calculates if they are correct (predefined region + item file) Rationale: dependent variable is correct fixation probability Time window Without looking at conditions in hypothesis, carve out pattern. Analysis ET 23 Graphical inspection Plotting grand mean Package sciplot Install package sciplot. > require(sciplot) lineplot.ci > lineplot.ci(timeonset, Correct +, data = data, legend = T, main = "Correct fixations (only fixations)", xlab = "Time since onset (ms)", ylab = "Proportion", ci.fun = function(x) c(mean(x), mean(x)), ylim = c(0,1), fix = T) Analysis ET 24
Picture Correct fixations (only fixations) Proportion 0.0 0.2 0.4 0.6 0.8 1.0-2650 -1950-1250 -617-83 417 933 1500 2133 2767 3400 Time since onset (ms) Graphical inspection Plotting grand mean lineplot.ci > lineplot.ci(timeonset, Correct, data = data, legend = T, main = "Correct fixations (only fixations)", xlab = "Time since onset (ms)", ylab = "Proportion", ci.fun = function(x) c(mean(x), mean(x)), ylim = c(0.2,0.8), subset = (TimeOnset > -700 TimeOnset < 1300 ), fix = T) Analysis ET 26
Picture Correct fixations (only fixations) Proportion 0.2 0.3 0.4 0.5 0.6 0.7 0.8-683 -500-317 -133 0 133 300 467 633 800 967 1150 Time since onset (ms) Graphical inspection Plotting per condition lineplot.ci > lineplot.ci(timeonset, Correct, group = Type, data = windowfix, legend = T, col = colors, main = "Correct fixations (only fixations)", xlab = "Time since onset (ms)", ylab = "Proportion", ci.fun = function(x) c(mean(x), mean(x)), ylim = c(0,1), fix = T) Analysis ET 28
Picture Correct fixations (only fixations) Proportion 0.0 0.2 0.4 0.6 0.8 1.0 IC SP -583-383 -183 0 150 333 517 700 883 1083 Time since onset (ms) Growth curves on fixations Categorical data: logistic regression Mixed models for binomial data (Jaeger, 2008) 1 (looking/not looking) are possible Per time point, you can only look (1, true, succes) or not look (0). No deviation, outliers, etc. Not a continuous variable 1 T. F. Jaeger. Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models Journal of Memory and Language, 59(4):434 446, 2008. Analysis ET 30
Growth curves on fixations Growth curves Growth curves are traditionally used for development over months or years (Mirman, Dixon, & Magnuson, 2008). 2 Multiple levels of time (Barr, 2008). 3 Capture spike and delays 2 D. Mirman, J. A. Dixon, and J. S. Magnuson. Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4):475 494, 2008. 3 D. J. Barr. Analyzing visual world eyetracking data using multilevel logistic regression. Journal of Memory and Language, 59(4):457 474, 2008. Analysis ET 31 Growth curve and logistic regression Advantages Better fit: more sensitive more conservative Flexible New standard? Read Journal of Memory and Language, 59(4) See danmirman.org/gca Analysis ET 32
Growth curve and logistic regression Time components Number of components Polynomials: t 0, t 1,... t n Number of bends + 1 Improving model fit for grand mean Analysis ET 33 Example: three/four levels Meaning of components 0 Height of the line, independent of time 1 Increase or decrease, linear 2 Peaking, quadratic 3 Bending back, cubic Add to dataframe Make polynomial time vectors, for the window.
1 t[, 1] -0.04-0.02 0.00 0.02 0.04-500 0 500 1000 TIME 2 t[, 2] -0.02 0.00 0.02 0.04-500 0 500 1000 TIME
3 t[, 3] -0.06-0.04-0.02 0.00 0.02 0.04 0.06-500 0 500 1000 TIME Growth curve and logistic regression Make Mixed Model No conditions m.base < lmer(correct 1 + (1 Gender/Participant) + (1), data = windowfix, family = binomial ) m.base1 < lmer(correct ot1 + (1 Gender/Participant) + (1 Targetword), data = windowfix, family = binomial ) m.base2 < lmer(correct ot1 + ot2 + (1/Participant) + (1 Targetword), data = windowfix, family = binomial ) m.base3 < lmer(correct ot1 + ot2 + ot3 + (1/Participant) + (1 Targetword), data = windowfix, family = binomial ) Analysis ET 38
Growth curve and logistic regression Barr, D. J.(2008). Analyzing visual world eyetracking data using multilevel logistic regression. Journal of Memory and Language, 59, 457 474. Jaeger, T. F.(2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434 446. Mirman, D., Dixon, J. A., & Magnuson, J. S.(2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), 475 494. Analysis ET 38