Statistical Considerations in Magnetic Resonance Imaging of Brain Function



Similar documents
Mining Information from Brain Images

Data Mining: Large Databases and Methods. or... Finding Needles in Haystacks: Finding unusual patterns in large data sets. Data Mining & Data Dredging

The Wondrous World of fmri statistics

2. MATERIALS AND METHODS

Validating cluster size inference: random field and permutation methods

Geostatistics Exploratory Analysis

Volume visualization I Elvins

STA 4273H: Statistical Machine Learning

Norbert Schuff Professor of Radiology VA Medical Center and UCSF

Spatial smoothing of autocorrelations to control the degrees of freedom in fmri analysis

Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate 1

HT2015: SC4 Statistical Data Mining and Machine Learning

Tutorial 5: Hypothesis Testing

Environmental Remote Sensing GEOG 2021

Image Registration and Fusion. Professor Michael Brady FRS FREng Department of Engineering Science Oxford University

Data Preparation and Statistical Displays

How To Understand The Theory Of Probability

EXPLORING SPATIAL PATTERNS IN YOUR DATA

Obtaining Knowledge. Lecture 7 Methods of Scientific Observation and Analysis in Behavioral Psychology and Neuropsychology.

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Spatial sampling effect of laboratory practices in a porphyry copper deposit

Data Mining and Visualization

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Probabilistic Latent Semantic Analysis (plsa)

Multisensor Data Fusion and Applications

INTRODUCTION TO GEOSTATISTICS And VARIOGRAM ANALYSIS

Subjects: Fourteen Princeton undergraduate and graduate students were recruited to

Direct Volume Rendering Elvins

Component Ordering in Independent Component Analysis Based on Data Power

Least Squares Estimation

Advanced Signal Processing and Digital Noise Reduction

Introduction to Time Series Analysis. Lecture 1.

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

How To Calculate A Multiiperiod Probability Of Default

COST AID ASL post- processing Workshop

Neural Networks Lesson 5 - Cluster Analysis

GE Medical Systems Training in Partnership. Module 8: IQ: Acquisition Time

Statistics Graduate Courses

> plot(exp.btgpllm, main = "treed GP LLM,", proj = c(1)) > plot(exp.btgpllm, main = "treed GP LLM,", proj = c(2)) quantile diff (error)

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

A Statistical Framework for Operational Infrasound Monitoring

Normality Testing in Excel

Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Handling attrition and non-response in longitudinal data

Dirichlet forms methods for error calculus and sensitivity analysis

Annealing Techniques for Data Integration

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Neural Network Add-in

Constrained Tetrahedral Mesh Generation of Human Organs on Segmented Volume *

Data Mining - Evaluation of Classifiers

Fairfield Public Schools

Cognitive Neuroscience. Questions. Multiple Methods. Electrophysiology. Multiple Methods. Approaches to Thinking about the Mind

A simple and fast technique for on-line fmri data analysis

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Bayesian Statistics: Indian Buffet Process

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Geography 4203 / GIS Modeling. Class (Block) 9: Variogram & Kriging

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Joint models for classification and comparison of mortality in different countries.

Lecture 14. Point Spread Function (PSF)

COMMENTS AND CONTROVERSIES Why Voxel-Based Morphometry Should Be Used

Diffusione e perfusione in risonanza magnetica. E. Pagani, M. Filippi

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Iterative Solvers for Linear Systems

Appendix 1: Time series analysis of peak-rate years and synchrony testing.

Linear Threshold Units

Stock Option Pricing Using Bayes Filters

Classic EEG (ERPs)/ Advanced EEG. Quentin Noirhomme

Imputing Values to Missing Data

Leveraging Ensemble Models in SAS Enterprise Miner

BayesX - Software for Bayesian Inference in Structured Additive Regression

COMPARISON OF OBJECT BASED AND PIXEL BASED CLASSIFICATION OF HIGH RESOLUTION SATELLITE IMAGES USING ARTIFICIAL NEURAL NETWORKS

Enhanced LIC Pencil Filter

More details on the inputs, functionality, and output can be found below.

Understanding Purposeful Human Motion

Multiphysics Software Applications in Reverse Engineering

Image Segmentation and Registration

Automated Stellar Classification for Large Surveys with EKF and RBF Neural Networks

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

5 Factors Affecting the Signal-to-Noise Ratio

Neuroimaging module I: Modern neuroimaging methods of investigation of the human brain in health and disease

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

TINA Demo 003: The Medical Vision Tool

Quantitative Methods for Finance

7 Time series analysis

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

How to Get More Value from Your Survey Data

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

1 Maximum likelihood estimation

Magnetic Resonance Imaging

Time Series Analysis

NCSS Statistical Software

Simultaneous or Sequential? Search Strategies in the U.S. Auto. Insurance Industry

Monitoring of Internet traffic and applications

From the help desk: Bootstrapped standard errors

Data Mining: An Overview. David Madigan

Simple Linear Regression Inference

Transcription:

Statistical Considerations in Magnetic Resonance Imaging of Brain Function Brian D. Ripley Professor of Applied Statistics University of Oxford ripley@stats.ox.ac.uk http://www.stats.ox.ac.uk/ ripley

Acknowledgements Most of the computations were done by Jonathan Marchini (EPSRC-funded D.Phil student). Data, background and advice provided by Peter Styles (MRC Biochemical and Clinical Magnetic Resonance Spectroscopy Unit) and Stephen Smith (Oxford Centre for Functional Magnetic Resonance Imaging of the Brain).

Functional Imaging Functional PET and MRI are used for studies of brain function: give a subject a task and see which area(s) of the brain light up. Functional studies were done with PET in the late 1980s and early 1990s, now fmri is becoming possible (needs powerful magnets that in Oxford is 3 Tesla). PET has lower resolution, say 3 3 7 mm voxels at best. So although 128 128 80 (say) grids might be used, this is done by subsampling. Comparisons are made between PET images in two states (e.g. rest and stimulus ) and analysis is made on the difference image. PET images are very noisy, and results are averaged across several subjects. fmri has a higher spatial resolution, and temporal resolution of around one second. So stimuli are applied for a period of about 30 secs, images taken around every 3 secs, with several repeats of the stimulus being available for one subject.

The commonly addressed statistical issue is has the brain state changed, andifsowhere?

Neurological Change A longer-term view of function is in the change of tissue state and neurological function after traumatic events such as a stroke or tumour growth and removal. The aim here is to identify tissue as normal, impaired or dead, and to compare images from a patient taken over a period of several months. In MRI can trade temporal, spatial and spectral resolution. In MR spectroscopy the aim is a more detailed chemical analysis at a fairly low spatial resolution. In principle chemical shift imaging provides a spectroscopic view at each of a limited number of voxels: in practice certain aspects of the chemical composition are concentrated on.

Pilot Study Our initial work has been exploring T1 and T2 images (the conventional MRI measurements) to classify brain tissue automatically, with the aim of developing ideas to be applied to spectroscopic measurements at lower resolutions. Consider image to be made up of white matter, grey matter, CSF (cerebro spinal fluid) and skull. Initial aim is reliable automatic segmentation.

Some Data T1 (left) and T2 (right) MRI sections of a normal human brain. This slice is of 172 208 pixels.

Data from the same image in T1 T2 space.

Imaging Imperfections The clusters in the T1 T2 plot were surprising diffuse. Known imperfections were: (a) Mixed voxel / partial volume effects. The tissue within a voxel may not be all of one class. (b) A bias field in which the mean intensity from a tissue type varies across the image. This effect is thought to vary approximately multiplicatively and to consist of a radial component plus a linear component, the latter varying from day to day. (c) The point spread function. Because of bandwidth limitations in the Fourier domain in which the image is acquired, the true observed image is convolved with a spatial point spread function of sinc (sin x/x) form. The effect can sometimes be seen at sharp interfaces (most often the skull / tissue interface) as a rippling effect, but is thought to be small.

Bias Fields There is an extensive literature on bias field correction. One approach uses a stochastic process prior for the bias field, and is thus another re-invention of the ideas known as kriging in the geostatistical literature. Based on experience with the difficulty of choosing the degree of smoothing and the lack of resistance to outliers (kriging is based on assumptions of Gaussian processes) we prefer methods with more statistical content and control. Our basic model is log Y ij = µ + β class(ij) + s(i, j)+ɛ ij for the intensity at voxel (i, j), studied independently for each of the T1 and T2 responses. Here s(x, y) is a spatially smooth function. Of course, the equation depends on the classification, which will itself depend on the predicted bias field. This circularity is solved by iterative procedure, starting with no bias field.

Estimation If the classification were known we would use a robust method that fits a long-tailed distribution for ɛ ij, unconstrained terms α j for each class, and a smooth function s. We cope with unknown class in two ways. In the early stages of the process we only include data points whose classification is nearly certain, and later we use log Y ij = µ + β c p(c Y ij )+s(i, j)+ɛ ij class c that is, we average the class term over the posterior probabilities for the current classification. For the smooth term s we initially fitted a linear trend plus a spline model in the distance from the central axis of the magnet, but this did not work well, so we switched to loess. Loess is based on fitting a linear surface locally plus approximation techniques to avoid doing for the order of 27 000 fits.

Fits of bias fields Fitted bias fields for T1 (left) and T2 (right) images. The bias fields for these images are not large and change intensity by 5 10%.

Clustering model Our model is that for the bias-corrected images (Y ij /ŝ(i, j)) T1andT2 measurements jointly follow a mixture of six distributions. The bivariate distributions for the six classes will have elliptical contours. We have experimented with normal and bivariate t distributions: the latter are often desirable as the normal density decays extremely rapidly. These should correspond to white matter, grey matter, CSF, two types of skull (the distribution of skull seems to split into two ellipses) and outlier, a diffuse distribution picking up the isolated points. Our fitting approach is unsupervised, but the initial locations of the cluster are taken from a reference model whose clusters were labelled by interactive visualization.

Fitting the model There is an extensive literature on fitting mixtures of densities, and using them as a model for classification. There are two distinct approaches in the literature, and the one we have taken is Bayesian rather than maximum likelihood in character. That is, we estimate the parameters in a finite mixture model 6 p(t 1,T2) = π i p i (T 1,T2; φ i ) and then use the posterior probabilities for a soft classification. i=1 This in contrast to, e.g. Banfield & Raftery (1993), who simultaneously maximize over the parameters and the classification of the observations. Fitting the parameters (π i ) and (φ i ) is currently by maximum-likelihood, although we will investigate using Markov Chain Monte Carlo methods for a full Bayesian scheme with linked hyperpriors.

Unsupervised Approach Banfield & Raftery (1993) Biometrics T1, T2 and PD images of an MRI brain slice with 26 100 voxels.

They claim to find seven clusters:

Spatial locations of the 7 clusters, labelled in retrospect as A: Bone, B: Air: C: White matter, D: Fluids, E: Muscle, F: Fat and G: Grey matter.

Our Results Classified image and T1 T2 plot showing the classification with normally-distributed clusters.

Classification before and after removing the bias field.

Raw T1 and T2 images. A Second Dataset

Second data set: before bias-field correction.

Second data set: after bias-field correction.

Outliers and anomalies We have found our scheme to be quite robust to variation in imaging conditions and to different normal subjects. The background class helps greatly in achieving this robustness as it mops up the observations which do not agree with the model. However, outliers can be more extreme:

T1 T2 plot of a brain slice of a brain with a pathology.

This illustrates the dangers of classifying all the points. This is a particularly common mistake when neural networks are used for classification, and we have seen MRI brain scans classified by neural networks where common sense suggested an outlier report was the appropriate one. The procedure presented here almost entirely ignores the spatial nature of the image. For some purposes this would be a severe criticism, as contextual classification would be appropriate. However, our interest in these images is not a pretty picture but is indeed in the anomalies, and for that we prefer to stay close to the raw data. The other interest is in producing summary measures that can be compared across time.

Part 2: Statistics of fmri Data

SPM Statistical Parametric Mapping is a widely used program and methodology of Friston and co-workers, originating with PET. The idea is to map tstatistic images, and to set a threshold for statistical significance. The t-statistic is in PET of a comparison between states over a number of subjects, voxel by voxel. Thus the numerator is an average over subjects of the difference in response in the two states, and the denominator is an estimate of the standard error of the numerator. The details differ widely between studies, in particular if a pixel-by-pixel or global estimate of variance is used.

Example PET Statistics Images From Holmes et al (1996). Mean difference image. Voxel-wise variance image.

Voxel-wise t statistic image. Smoothed variance image. Resulting t statistic image.

Multiple comparisons Finding the voxel(s) with highest SPM values should detect the areas of the brain with most change, but does not say they are significant changes. The t distribution might apply at one voxel, but it does not apply to the voxel with the largest response. Conventional multiple comparison methods (e.g. Bonferroni) will greatly over-compensate as the voxel values are far from independent, so the effective number of observations is far fewer that the number of voxels (which might themselves represent sub-sampling).

Three main approaches: 1. (High) level crossings of Gaussian stochastic processes (Worsley et al). 2. Randomization-based analysis (Holmes et al). 3. Variability within the time series at a voxel. Clearly components of variance need to be considered carefully (and have not been). Issue: do we want the voxels of highest values, or do we want regions of high value?

Euler Characteristics The Worsley et al approach is based on modelling the SPM image X ijk as a Gaussian (later relaxed) stochastic process in continuous space with a Gaussian autocorrelation function (possibly geometrically anisotropic). The autocorrelation function must be estimated from the data, but to some considerable extent is imposed by low-pass filtering. For such processes there are results (Hasofer, Adler) on the level sets {x : X(x) >x 0 }. These will be made up of components, themselves containing holes. The results are on the expected Euler characteristic (number of sets minus holes) as function of x 0, but for large x 0 there is a negligible probability of a hole, and the number is approximately Poisson distributed. Thus we can choose x 0 such that under the null hypothesis P (X(x) >x 0 for any x A) 5% Note that this is based on variability within a single image to address the multiple comparisons point.

Randomization-based Statistics Classical statistical inference of designed experiments is based on the uncertainly introduced by the randomization, and not on any natural variability. (Approach associated with Fisher and Yates.) A typical fpet or fmri experiment compares two states, say A and B. If there is no difference between the states we can flip the labels within each pair (for each subject in PET, for each repetition subject in fmri). If there are n pairs, there are 2 n possible A B or B A labellings. If there is no difference, these all give equally likely values of an observed statistic, so compared observed statistic to the permutation distribution. Can choose any statistic one can compute fairly easily.

The permutation distribution is often remarkably well approximated by a t distribution. Classic example (Box, Hunter & Hunter, 1977) Empirical and Hypothesized t CDFs 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 1.0 Permutation dsn t_9 cdf -4-2 0 2 4 diff -4-2 0 2 4 solid line is the empirical d.f.

Time-Series-based Statistics The third component of variability is within the time series at each voxel. Suppose there were no difference between A and B. Then we have a stationary autocorrelated time series, and we want to estimate its mean and the standard error of that mean. This is a well-known problem in the output analysis of (discrete-event) simulations. More generally, we want the mean of the A and B phases, and there will be a delayed response (approximately known) giving a cross-over effect. Instead, use a matched filter (sin wave?) to extract effect, and estimated autocorrelations (like Hannan estimation) or spectral theory to estimate variability. For a sin wave the theory is particularly easy: the log absolute value of response has a Gumbel distribution with location depending on the true activation.

fmri Example Data on 64 64 14 grid of voxels. (Illustrations omit top and bottom slices and front and back slices, all of which show considerable activity, probably due to registration effects.) A series of 100 images at 3 sec intervals: a visual stimulus was applied after 30 secs for 30 secs, and the A B pattern repeated 5 times. Conventionally the images are filtered in both space and time, both highpass filtering to remove trends and low-pass filtering to reduce noise (and maketheeulercharacteristicresultsvalid). Theresultingt statistics images are shown on the next slide. These have variances estimated for each voxel based on the time series at that voxel.

slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 slice 8 slice 9 slice 10 slice 11 slice 12 Conventional t statistic images

Alternative Analyses We also worked with the raw data, and matched a filter to the expected pattern of response (square wave input, modified by the haemodynamic response). This produced much more extreme deviations from the background variation, and much more compact areas of response. 0.0 0.1 0.2 0.3 0.4 Histogram of log abs filtered response. -4-2 0 2 4 6 8 log abs filtered response

slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 slice 8 slice 9 slice 10 slice 11 slice 12 Log abs filtered response, with small values coloured as background.