A Feature- based Approach to Big Data Medical Image Analysis Ma$hew Toews $, Chris/an Wachinger, Raul San Jose Estepar, William Wells III $ École de Technologie Supérieur, Montreal Canada BWH, Harvard Medical School CSAIL, Massachuse$s Ins/tute of Technology h$p://www.ma$hewtoews.com July 3, 2015
Context Big data Massive digital memories, rapid data transmission Large- scale data mining, novel discoveries,... Big medical image data sets E.g. 10K subjects, 20K lung CTs, 3.8 TB Per- subject labels, disease stage, Can we leverage this data? Computer assisted diagnosis Image biomarker discovery 2
Challenge Efficient image- to- image correspondence E.g. N = 20K lung CT volumes O(N 2 ) Intractable 3
Most Relevant Prior Work Nearest neighbor classifica/on (Cover & Hart 1967) As N - >, error is upper bounded by 2x op/mal Bayes error Big Data Scale- invariant feature transform SIFT (Lowe 2004) Iden/fy & match dis/nc/ve keypoints in images Efficient NN correspondence via random KD- trees O(N log N) 4
3D SIFT Features Lung CT Volume σ Geometry Location, scale, orientation Appearance Descriptor Gradient orientation histogram, 64 elements, rank-ordering Efficient and Robust Model-to-Image Alignment using 3D Scale-Invariant Features M. Toews, W.M. Wells III, MedIA 2013 SIFT-Rank: Ordinal Descriptors for Invariant Feature Correspondence M. Toews, W.M. Wells III, CVPR 2009 5
3D SIFT Features Classifying Alzheimer s disease, discovering image biomarkers Modeling infant brain development Aligning images: robust, mul/- modal, group- wise Segmen/ng organs in full- body CT 6
Analysis: Kernel Density Es/ma/on Es/mate maximum a- posteriori (MAP) subject label C given feature descriptor set F = { f i } i p(c F) p(c) p( f i C) F = { f i } p( f i C) j:c=c j N # exp f f i j N % C $ α 2 i +1 & ( ' f j KNN i α i = min j f i f j Adap/ve kernel bandwidth: distance to NN 7
Analysis: Kernel Density Es/ma/on On- the- fly parameter es/ma/on Lazy Learning, easy to incorporate new data MAP es/ma/on: for each feature f i F : 1) Iden/fy KNN correspondence set 2) Compute p( f i C), posterior product F = { f i } p(c F) p(c) p( f i C) i O(log N) 8
COPD Chronic Obstruc/ve Pulmonary Disorder Major cause of chronic morbidity and mortality COPDGene data 21 sites, 10K subjects, 20K images, 95M features 5- category disease stage labels (GOLD score) Regan, Elizabeth A., et al. "Gene/c epidemiology of COPD (COPDGene) study design." COPD: Journal of Chronic Obstruc8ve Pulmonary Disease 7.1 (2011) 9
COPD Classifica/on Label C = [0,4] GOLD disease stage Maximum a- posterior es/ma/on C* = argmax{ p(c F) } < 1 second per image State- of- the- art GOLD predic/on accuracy GOLD Labels Predicted GOLD 10
COPD Dis/nct phenotypes Source: Frank H. Ne<er, MD and Ar/st 11
COPD Dis/nct phenotypes Blue Bloaters Pink Puffers 12
COPD Phenotype- informa/ve features? Musculoskeletal features 13
Other Aspects Same- subject iden/fica/on Label C = subject ID Perfect iden/fica/on across breathing state 65 highly similar images iden/fied 20 known duplicate subjects iden/fied via DNA 14
Other Aspects Significant data reduc/on 15
Other Aspects Feature geometry unused Es/ma/on from appearance descriptors only Subject images are unaligned, bag- of- features Soxware implementa/on available 16
References 1) M. Toews, C. Wachinger, R. S. et al. "A Feature- based Approach to Big Data Analysis of Medical Images Informa/on Processing in Medical Imaging (IPMI), 2015. 2) C. Wachinger, M. Toews, et al. "Keypoint Transfer SegmentaAon, Informa/on Processing in Medical Imaging (IPMI), 2015. 3) Gill, G. et Toews, M. et Beichel, R. R.. 2014. «Robust iniaalizaaon of acave shape models for lung segmentaaon in CT scans: a feature- based atlas approach». Interna/onal Journal of Biomed Imaging. p. 479154. 4) Toews, Ma$hew et Wells III, William M.. 2013. «Efficient and robust model- to- image alignment using 3D scale- invariant features». Medical Image Analysis, vol. 17, nº 3. p. 271-282. 5) Toews, Ma$hew et Wells III, William M. et Zöllei, Lilla. 2012. «A feature- based developmental model of the infant brain in structural MRI». In Medical Image Compu/ng and Computer- Assisted Interven/on MICCAI 2012. Coll. «Lecture Notes in Computer Science», 7511. p. 204-211. Springer Berlin Heidelberg. 6) Toews, Ma$hew et Wells III, William M. et Collins, D. Louis et Arbel, Tal. 2010. «Feature- based morphometry: discovering group- related anatomical pa<erns». NeuroImage, vol. 49, nº 3. p. 2318-2327. 7) Toews, Ma$hew. et Wells III, William M.. 2009. «SIFT- Rank: Ordinal descripaon for invariant feature correspondence». In IEEE Conference oncomputer Vision and Pa$ern Recogni/on, 2009. CVPR 2009 (Miami, FL, USA, June 20-25, 2009), p. 172-177. 8) Toews, Ma$hew. et Arbel, T.. 2007. «A staasacal parts- based model of anatomical variability». IEEE Transac/ons on Medical Imaging, vol. 26, nº 4. p. 497-508. h$p://www.ma$hewtoews.com 17
Thank You 18