Machine learning for Neuroimaging: an introduction

Similar documents
Neuroimaging module I: Modern neuroimaging methods of investigation of the human brain in health and disease

runl I IUI%I/\L Magnetic Resonance Imaging

Obtaining Knowledge. Lecture 7 Methods of Scientific Observation and Analysis in Behavioral Psychology and Neuropsychology.

COST AID ASL post- processing Workshop

Cognitive Neuroscience. Questions. Multiple Methods. Electrophysiology. Multiple Methods. Approaches to Thinking about the Mind

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Data, Measurements, Features

Morphological analysis on structural MRI for the early diagnosis of neurodegenerative diseases. Marco Aiello On behalf of MAGIC-5 collaboration

Functional neuroimaging. Imaging brain function in real time (not just the structure of the brain).

Introduction to Pattern Recognition

Software Packages The following data analysis software packages will be showcased:

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

MEDIMAGE A Multimedia Database Management System for Alzheimer s Disease Patients

2 Neurons. 4 The Brain: Cortex

Diffusione e perfusione in risonanza magnetica. E. Pagani, M. Filippi

Processing Strategies for Real-Time Neurofeedback Using fmri

Unsupervised and supervised dimension reduction: Algorithms and connections

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

life science data mining

Azure Machine Learning, SQL Data Mining and R

Maschinelles Lernen mit MATLAB

Network Analysis I & II

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

The Wondrous World of fmri statistics

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Subjects: Fourteen Princeton undergraduate and graduate students were recruited to

Statistical Considerations in Magnetic Resonance Imaging of Brain Function

advancing magnetic resonance imaging and medical data analysis

The Scientific Data Mining Process

Diffusion MRI imaging for brain connectivity analysis

Identifying Group-wise Consistent White Matter Landmarks via Novel Fiber Shape Descriptor

Free software solutions for MEG/EEG source imaging

fmri 實 驗 設 計 與 統 計 分 析 簡 介 Introduction to fmri Experiment Design & Statistical Analysis

Machine Learning for Medical Image Analysis. A. Criminisi & the InnerEye MSRC

Update: MRI in Multiple sclerosis

Multivariate analyses & decoding

Contributions to high dimensional statistical learning

OUTLIER ANALYSIS. Data Mining 1

MHI3000 Big Data Analytics for Health Care Final Project Report

Granger Causality in fmri connectivity analysis

Data Mining - Evaluation of Classifiers

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Alzheimer s Disease Image Segmentation with SelfOrganizing Map Network

University of California San Francisco, CA, USA 4 Department of Veterans Affairs Medical Center, San Francisco, CA, USA

Advanced MRI methods in diagnostics of spinal cord pathology

Machine Learning Algorithms for GeoSpatial Data. Applications and Software Tools

Principles of Data Mining by Hand&Mannila&Smyth

Model-free Functional Data Analysis MELODIC Multivariate Exploratory Linear Optimised Decomposition into Independent Components

GE Medical Systems Training in Partnership. Module 8: IQ: Acquisition Time

Bayesian Penalized Methods for High Dimensional Data

The Data Mining Process

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Proceedings of the Symposium on Big Data Initiatives for Connectomic Research 2015 International conference on Brain Informatics and Health

ADVANCED MACHINE LEARNING. Introduction

Functional magnetic resonance imaging as a tool to study the brain organization supporting hearing and communication.

TINA Demo 003: The Medical Vision Tool

Introduction to Data Mining

Machine Learning Introduction

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

A Survey on Pre-processing and Post-processing Techniques in Data Mining

Integrated Data Mining Strategy for Effective Metabolomic Data Analysis

Short Courses in Imaging Sciences

Data Mining and Machine Learning in Bioinformatics

Multisensor Data Fusion and Applications

Review of Biomedical Image Processing

Machine Learning with Brain Graphs. functional imaging in systems neuroscience]

NeuroVault and the vision for data sharing in neuroimaging. Chris Gorgolewski Max Planck Research Group: Neuroanatomy & Connectivity Leipzig, Germany

Neuroimaging Big Data Challenges and Computational Workflow Solutions

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Statistical Models in Data Mining

Cross-Validation. Synonyms Rotation estimation

Principles of functional Magnetic Resonance Imaging

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

Medical Imaging, Acquisition and Processing

Supervised and unsupervised learning - 1

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

Monotonicity Hints. Abstract

Predict the Popularity of YouTube Videos Using Early View Data

QAV-PET: A Free Software for Quantitative Analysis and Visualization of PET Images

A VISUALIZATION TOOL FOR FMRI DATA MINING NICU DANIEL CORNEA. A thesis submitted to the. Graduate School - New Brunswick

Classic EEG (ERPs)/ Advanced EEG. Quentin Noirhomme

Data Mining Part 5. Prediction

Final Project Report

Music Mood Classification

Single trial analysis for linking electrophysiology and hemodynamic response. Christian-G. Bénar INSERM U751, Marseille

Data Mining: An Overview. David Madigan

Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network

A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML

Graph Analysis of fmri data

Automatic Morphological Analysis of the Medial Temporal Lobe

2. MATERIALS AND METHODS

How are Parts of the Brain Related to Brain Function?

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

ScienceDirect. Brain Image Classification using Learning Machine Approach and Brain Structure Analysis

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Effects of Achievement Goals on Challenge Seeking and Feedback Processing: Behavioral and fmri Evidence

How To Cluster

Neural Network Add-in

Transcription:

Machine learning for Neuroimaging: an introduction Bertrand Thirion, INRIA Saclay Île de France, Parietal team http://parietal.saclay.inria.fr bertrand.thirion@inria.fr 1

Outline Neuroimaging in 5' Some learning problems in neuroimaging: Medical diagnosis & evaluation of risk factors Study of between subject-variability Brain reading Brain connectivity mapping Common technical challenges Handbook of Functional MRI Data Analysis Russell A. Poldrack,Jeanette Mumford,Thomas Nichols 2

NeuroImaging: modalities and aims 'Functional' (time resolved) modalities: fmri, EEG, MEG vs 'anatomical' (spatially resolved) modalities: T1MRI, DW-MRI 3

Neuroimaging modalities: T1 MRI T1 MRI yields Iconic (voxel-based) statistics Skull CSF gyrus GM sulcus density of grey matter (voxel-based morphometry) Cortical thickness Gyrification ratio WM Landmarks-based statistics Sulcus shape/orientation 102 to 106 variables (1mm)3 4

Neuroimaging modalities: DW-MRI Diffusion MRI: measurement of water diffusion in all directions in the white matter Resolution: (2mm)3, 30-60 directions Yields the local direction of fiber bundles that connect brain regions fibers/bundles can be reconstructed through tractography algorithms Statistical measurement on bundles (counting, fractional anisotropy, direction) 5

NeuroImaging modalities: fmri BOLD signal: measures blood oxygenation in regions where synaptic activity occurs Used to detect functionally specialized regions But indirect measurement Not a true quantitative measurement Can also be used to characterize network structure from brain signals 102 to 106 observations Resolution (2-3mm)3, TR = 2-3s 6

Neuroimaging data pre-processing Data depend on various acquisition parameters (TR, TE, resolution, FOV...) But also on multiple preprocessing steps, which are standardized, but there is room for optimization Experimental paradigm and contrasts Raw fmri data Space-time relaignment Spatial normalization GLM analysis Hemodynamic reponse function Activation parameters Statistical Parametric Maps Noise model 7

NeuroImaging: modalities and aims Provide some biomarkers for diagnostic/prognostic, study of risk factors for various brain diseases Psychiatric diseases Neuro-degenerative diseases, Brain lesions (strokes...) Understand brain organization and related factors: brain mapping, connectivity, architecture, development, aging, relation to behavior, relation to genetics Study chronometry of brain processes (MEG) Build brain computer interfaces (EEG) 8

Outline Neuroimaging in 5' Some learning problems in neuroimaging: Medical diagnosis & evaluation of risk factors Study of between subject-variability Brain reading Brain connectivity mapping Common technical challenges 9

learning in Neuroimaging: Medical diagnosis and evaluation of risk factors Rationale: brain images provide quantitative measurements of brain organization that reflect brain disease, abnormalities etc. Cortical thickness (T1-MRI) Brain shape/folding (T1-MRI) Brain anatomical connections (DW-MRI) Neural activation (BOLD) Vascular structure/density (BOLD, ASL) Alzheimer in VBM, [Klöppel et al., Brain 2008] Different approaches for population comparison: classical statistics, population discrimination, outlier detection 10

Diagnosis based on medical images Neuroimaging data X-MRI, PET,... Brain descriptors - local (gyrification ratio on anatomical image) - global (functional integration of brain systems) - more meaningful than raw data + denoising Classification/ Regression / outlier detection Accuracy of the prediction? Discriminating information? Fundamental difficulty: Necessity to coregister brain anatomically but risk of masking brain shape differences 11

Study of between-subject variability Brain diseases are extreme case of normal variability Between-subject variability is a prominent effect in neuroimaging: hard to characterize as such how much of it can be explained using other data? Data easier to acquire on normal populations Confrontation to behavioral data Confrontation to genetic data Perspective of individualized treatments 12

Study of between-subject variability The major challenge here is to discover statistical associations between complex, high-dimensional variables (regression) Frequently handled as unsupervised problems: describe the density of the data based on observations (manifold learning, mixture modeling) image phenotype p( Image Phenotype image ) p( genetic ) Gene Image Imaging as an intermediate (endo)phenotype 13

Brain reading Definition: Use of functional neuroimaging data to infer the subject's behaviour typically the brain response related to a certain stimulus Similar to BCI -to some extent without time constraints More emphasis on model correctness Popular due to its sensitivity to detect smallamplitude but distributed brain responses Rationale: population coding 14

Brain reading: population coding Different spatial models of the functional organization of neural networks Not a unique kind of pattern for the spatial organization of the neural code. This is further confounded by between-subject variability 15

Brain reading / Reverse inference Aims at predicting a cognitive variable decoding brain activity [Dehaene et al. 1998, Cox et al. 2003] 16

Brain reading / open issues Do we want this... ### Compute the prediction accuracy for the different folds (i.e. session) cv_scores = cross_val_score(anova_svc, X, y, cv=cv, n_jobs=-1, verbose=1, iid=true) ### Return the corresponding mean prediction accuracy classification_accuracy = np.sum(cv_scores) / float(n_samples) print "Classification accuracy: %f" % classification_accuracy, \ >>> print "Classification accuracy: %f" % classification_accuracy, \ " / Chance level: %f" % (1. / n_conditions) Classification accuracy: 0.744213 / Chance level: 0.125000 or that? 17

Brain reading/open issues a classifier trained to discriminate left versus right saccades can also decode mental arithmetics: left saccade subtraction right saccade addition This generalization occurs only when based on two regions of the parietal cortex This shows that the same neural populations are involved in ocular saccades and arithmetics [Knops et al., science 2009] 18

Functional connectivity mapping Definition: consists in deriving a quantitative measure of brain networks integration based on functional neuroimaging observation Rationale Popularity of resting-state fmri. Model-driven approach (SEM, DCM): restrictive hypotheses Learning problems Segment regions based on connectivity information (common to many neuroimaging problems) Inference of connectivity models 19

Learning in FCM (1) [Varoquaux et al. IPMI 2011] Learn a spatial model (atlas) from the resting state data ICA, clustering provide little guarantees on the result Dictionary learning can be used instead The population-level model adapts to individual configurations Note: Atlas learning is not tied to Functional Connectivity Mapping, but is important in different contexts (parcellations) 20

Learning in FCM (2) Next: Given a set of regions, quantify properly their interactions/integration of the underlying networks Threshold correlation graph graph statistics, graph embedding Learn covariance model between the set of regions (partial correlations) Do statistical inference on these objects 21

Outline Neuroimaging in 5' Some learning problems in neuroimaging: Medical diagnosis & evaluation of risk factors Study of between subject-variability Brain reading Brain connectivity mapping Common technical challenges 22

Technical challenges in MLNI Low SNR in the data Only a fraction of the data is modeled (BOLD) Presence of structured noise (noise is not i.i.d. Gaussian!) + non-stationarity in time and space Few salient structures (resting-state fmri...) Size of the data 104 to 106 voxels in most settings Compared to 10 to 102 samples available Related to the particular learning problems 23

Technical challenges in MLNI Diagnostic/classification problems Needs accuracy mostly (+ robustness) Suffers from curse of dimensionality, but this is well addressed in the literature [generic approaches perform well] But: not the main aim of most neuroimaging studies Need a large set of tools to be compared against each other - Need to take into account some priors on the data/true model (smoothness, sparsity) 24

Technical challenges in MLNI Recovery: retrieve the true model that accounts for the data This is the main topic of all neuroimaging / brain mapping / decoding literature. Suffers much more from feature dimensionality and correlation Virtually in-addressed/unseen so far I. Rish, HBM 2011 1. learn EN model for pain perception rating using first 120 TRs for training and next 120 TRs for testing. 2. Find best-predicting 1000 voxels using EN, delete them, find next 1000 best-predicting, etc. Does the predictive accuracy degrade sharply? Surprisingly, the answer is NO 25

Technical challenges in MLNI Unsupervised problems: find the right model that accounts for the data, without making in prediction (number PCA components, clusters etc.) Suffers from data correlation, lack of salient structure, Very difficult (no happy curve around) Likelihood of left out data in a 3-fold stratified cross-validation on restingstate fmri data 26

This is not a conclusion To me Neuroimaging methodologists need Implementations of various ML tools Only the easily available tools are used; this is part of the success of SVM Open-source, tested, documented ;-) Guidelines on cross-validation (people tend to use leave-oneout without further thinking) Steady warning against the danger of overfitting (the more complex the method, the most likely overfit occurs) Working on only one dataset is one type of over-fitting Understand the limitations of current methods regarding recovery and unsupervised problems. 27