Machine Learning for Medical Image Analysis A. Criminisi & the InnerEye team @ MSRC
Medical image analysis the goal Automatic, semantic analysis and quantification of what observed in medical scans Brain - MR Normal or diseased? How big is the diseased area? How fast is it progressing? Is the treatment working? Fast data retrieval from databases Retrieving similar cases Big data, 2D, 3D, 4D images Spine - CT Lungs - XRay Diagnosing is easy. Measuring is hard!
Medical image analysis do we need machine learning? Example: how big is the tumour? - need to segment the tumour automatically Actively proliferating tumour region Healthy region Need to look beyond single pixels, to context A cascade of manual if-then, if-then statements too brittle We need to select discriminative features and their combination from training data.
Medical image analysis what kind of machine learning? Efficient training classifier Efficient testing/runtime Interpretability of classifier. No black boxes. (Why does it work? What are the limitations?)?
Medical image analysis what about the data? We need labelled training data (expensive) Non ML-based algorithms are no different. You need labelled data for validation So? Data being collected and organized in http://research.microsoft.com/medimaging
Machine learning in medical image analysis @ MSRC
The InnerEye project Measuring brain tumours Localizing and identifying vertebrae Kinect for surgery
Decision forests
Decision trees A decision tree Is top part blue? Is bottom part green? Is bottom part blue? A. Criminisi, J. Shotton Springer 2013
Decision trees A decision tree A decision tree is nothing else but a typical if-then algorithm. However, it is learned automatically. Is bottom part green? Is top part blue? Is bottom part blue? Interpretability A. Criminisi, J. Shotton Springer 2013
11 Decision forests Classification Regression Density estimation Manifold learning Semi-supervised classification The Sherwood forest software library is also available for download.
Regression forests Training data? Input data point Output/labels are continuous Node weak learner Regularized training loss Randomness provides regularization Obj. funct. for node j Information gain defined on continuous variables. Training node j Leaf model A. Criminisi, J. Shotton Springer 2013
Regression forests: training objective At node j Computing the regression information gain at node j Regression information gain Differential entropy of Gaussian Our regression information gain A. Criminisi, J. Shotton Springer 2013
Regression forests: the ensemble model Tree t=1 t=2 t=3 The ensemble model Forest output probability A. Criminisi, J. Shotton Springer 2013
Regression forests: probabilistic, non-linear regression Training points Training different trees in the forest (max tree depth D=2) Testing different trees in the forest Tree posteriors Forest posterior Generalization properties: - Smooth interpolating behaviour in gaps between training data. - Uncertainty increases with distance from training data. A. Criminisi, J. Shotton Springer 2013
Regression forests: multi-model posteriors Experiment 5 Exp. 6 Exp. 7 Exp. 8 Forest posteriors for six different experiments and varying depth D. mean mode Exp. 9 Exp. 10 A. Criminisi, J. Shotton Springer 2013
Regression forests: comparisons with Gaussian Processes Regression forests Gaussian processes (GP) In regr. forests the posterior can be multimodal In GP the posterior is Gaussian (unimodal) Observations - Similar behaviour of the mean. Forests can capture multimodality. - Forests are a parallel algorithm that does not need large matrix inversion. - In forests the quality and shape of the uncertainty depends on the predictor model. A. Criminisi, J. Shotton Springer 2013
Regression forests: comparisons with Gaussian Processes Regression forests Gaussian processes (GP) In regr. forests the posterior can be multimodal In GP the posterior is Gaussian (unimodal) Observations - Similar behaviour of the mean. Forests can capture multimodality. - Forests are a parallel algorithm that does not need large matrix inversion. - In forests the quality and shape of the uncertainty depends on the leaf model. A. Criminisi, J. Shotton Springer 2013
Anatomy localization in CT scans
Anatomy localization - Direct mapping of voxels to organ bounding boxes. - No search, no sliding window. - No atlas registration. Input CT scan Output anatomy localization Key idea: each voxel votes (probabilistically) for the position of each organ s bounding box. A. Criminisi, et al. Med Image Analysis 2013
Anatomy localization the labelled database Different image cropping, noise, contrast/no-contrast, resolution, scanners, body shapes/sizes, patient position
Anatomy localization via regression forests Each voxel in the volume votes for the position of the 6 box sides We wish to learn a set of discriminative points (landmarks, clusters) which can predict the kidney position with high confidence. Input data point Output Multiple organs Node split function Node optimization Node training Feature response (voxel position in volume) (bound. box continuous pos.) (mean over displaced 3D boxes) Error in model fit (weighted uncertainty for all organs) (relative displacement) (Gaussian repres. of distribs) Regressing an n-d piece-wise constant model
Anatomy localization context-rich features Possible visual features Computing the feature response Capturing spatial context
Anatomy localization quantitative results a b Experimental setup - Forest trained on 55 volumes and tested on remaining 45. - MedINRIA used for atlas-based registration. Best atlas selected to run optimally on testing set! - Errors are defined as differences between detected box sides and ground truth. Computational efficiency - About 1 2 sec per tree (unoptimized C++ code). - All trees can be trained/tested in parallel and independently from one another.
Anatomy localization landmark discovery Discovery of landmark regions Here the system is trained to detect left and right kidneys. The system learns to use bottom of lung and top of pelvis to localize kidneys with highest confidence. interpretability Input CT scan and detected landmark regions A. Criminisi, et al. Med Image Analysis 2013
Segmentation of Multiple Sclerosis Lesions
Segmentation of multiple sclerosis lesions E. Geremia, et al. Neuroimage 2011
Segmentation of multiple sclerosis lesions interpretability E. Geremia, et al. Neuroimage 2011
Semantic image registration
Further applications semantic image registration Running anatomy localization Img.1 - Apr 26 th Img.2 - Apr 21 st A. Criminisi, et al. Med Image Analysis 2013
Further applications semantic image registration Registration of longitudinal (same patient) CT images The energy function The energy. A 1D illustration Mutual information T = arg min T E T, E T = MI I x, J T x + KL I x, J T x Running semantic registration KL divergence of organ location probabilities Shifting one image wrt the other in the X direction and computing the energy. ------------- Warping img.1 into img.2 ------------ E. Konukoglu et al 2011
Further applications semantic image registration Registration of different patients CT images Elastix (no semantics) The moving image remains stuck in an incorrect local minimum. Target image Moving image (after convergence) S. Klein, M. Staring, K. Murphy, M.A. Viergever, J.P.W. Pluim, "elastix: a toolbox for intensity based medical image registration," IEEE Transactions on Medical Imaging, vol. 29, no. 1, pp. 196-205, January 201012 Our semantic registration Target image Moving image (after convergence) Better alignment of organs thanks to anatomy localization Probabilities. In the correct basin of convergence. E. Konukoglu, A. Criminisi, S. Pathak, D. Robertson, S. White, D. Haynor, and K. Siddiqui, Robust Linear Registration of CT Images using Random Regression Forests, in SPIE Medical Imaging, February 2011
Further applications semantic image registration Registration of different patients CT images Our proposed energy (top curve, solid) has fewer local minima than the conventional mutual information based one (bottom curve, dotted) T = arg min T E T, E T = MI I x, J T x + KL I x, J T x E. Konukoglu et al 2011
Further applications semantic image registration Quantitative comparison on hundreds of image pair registrations Percentage of failed pair-wise registrations for different error thresholds E. Konukoglu et al 2011
2013 Microsoft Corporation. All rights reserved.