1 Big Data Medical Imaging Brett Cowan Centre for Advanced MRI University of Auckland
2 Overview Medical Imaging 2/40 1. Medical imaging and Big Data Medical imaging is producing huge quantities of complex and high quality data what do we do with it all? 2. Big Data in action the Cardiac Atlas Project International collaboration, data ownership, data sharing and infrastructure 3. Data Analysis New statistical analyses of shape, disease classifiers, and new and improved diagnostic accuracy 4. A new approach to clinical trials? Can we get more for less out of international clinical trials?
3 Magnetic Resonance Imaging (MRI) Medical Imaging 3/40 in just 20 years, MRI has revolutionised medical imaging without radiation, without any known harmful effects and without even touching the patient, MRI produces diagnostic images of virtually photographic quality the Auckland District Health Board PACS system has 15 TB on-line (one year of imaging), 27 TB near-line (3 years of imaging) and 40 TB off-line ten years ago, one year of imaging was <1 TB, now it is 15 TB
4 Faster Image Acquisition Medical Imaging 4/ FLASH (25 seconds) 2005 SSFP (14 seconds) 2010 Accelerated (5 seconds) 2013 Realtime (1 second)
5 Neurological MRI Medical Imaging 5/40 T1 weighted T2 weighted
6 Tractography and fmri Medical Imaging 6/40 Tractography fmri
7 Data or Information or Knowledge? Medical Imaging 7/40 in medical imaging, the data are usually grey scale pixel values they are not demographics (52 years old), blood pressure (155/95), a genetic sequence (ACAT), they are just numbers representing grey scale (or colour) in an image this data is not information in the sense of many other datasets we must align images into a common reference frame, segment features of interest, measure distances, thickness, volume and define shapes or regions of interest this process is time consuming relative to scan acquisition time (the acquisition to analysis ratio)
8 Image Processing Medical Imaging 8/40 Edge detection Non-rigid registration Feature tracking a wide range of image processing techniques are used such as machine learning, finite element modeling - and human interaction these are computationally (or time) intensive
9 The Cardiac Atlas Project (CAP) CAP Project 9/40 collecting data is expensive and it is reusable the cardiac atlas project (CAP) is an international Big Data project funded by the NIH, led from Auckland subcontracts were awarded to collaborators at Johns Hopkins, UCLA, and a Los Angeles supercomputing centre (Centre for Computational Biology) aim to collate cardiac (image) data from large international clinical trials into a web accessible big data database for reuse by any legitimate researcher other aims were to create an infrastructure for managing approval for data use. and to create advanced statistical analysis and display tools the first two contributing clinical trials were the MESA and DETERMINE trials Fonseca et al. Bioinformatics 27(16): ; 2011
10 CAP Project Case Study CAP Project 10/40 1. The overall rationale and strategy 2. Ownership of data 3. Project infrastructure 4. Data analysis and results
11 The Big Data Strategy CAP Project 11/40 Data Acquisition RADIOLOGY Heart Modelling BIOENGINEERING Patient Diagnosis MEDICINE Statistical Analysis BIG DATA Software Development COMPUTER SCIENCE
12 Ownership and Rights to Big Data CAP Project 12/40
13 Data Ownership Data Has Value CAP Project 13/40 Who owns the rights to medical imaging trial data? Who can use it and for what purpose? Participant has rights, certainly they must provide informed consent, informed in that they fully understand the risks and benefits, and what the information will be used for ethics committees will not usually give permission for the data to be used by anyone for anything in the future Researcher has rights, often jealously guarded as a strategic advantage for publication and career progression Institution has rights, but what is an individual leaves the University, do they have the right to take all of the data with them? Funder has rights, especially when they are a commercial entity such as Big Pharma, or if there are valuable patents at stake
14 Ethical Approval (IRB) CAP Project 14/40 Individual consent required Application to IRB required Investigator can make the decision Not human subjects research Low IRB requirements High
15 HIPAA and Anonymisation of Metadata CAP Project 15/40 the convenience and power of electronic data is also its Achilles heel the DICOM standard allows for the inclusion of private information, which is not HIPAA (1996) compliant in many cases.
16 Project Infrastructure CAP Project 16/40
17 Database CAP Project 17/40
18 Calculation of Volume and Mass CAP Project 18/40 Complete mathematical representation of the left ventricle
19 Creation of a Mathematical Model CAP Project 19/40 Using image processing (and operator input), the raw images are converted into a beating mathematical heart in the database. This allows any parameter to be determined without further analysis.
20 Reproducibility and Accuracy CAP Project 20/40 Scan rescan variability 25 patients with moderate to severe MR Scanned twice at a six week interval Coefficient of variation 3% LVM difference -1.1 ± 5.7 g (~ 0.6%) Accuracy 12 animals (9 dogs, 3 pigs) Data courtesy David Fieno and Paul Finn LVM determined by weight at autopsy LVM difference 2.1 ± 4.3 g (~3%) Difference (g) Difference (g) Average LVM (g) Postmortem (g)
21 Segmentation Challenge CAP Project 21/40 (a) Basal slice (b) Mid-ventricular slice (c) Apical slice Suinesiaputra et al. Medical Image Analysis In press 2013
22 Modal Analysis CAP Project 22/40 Lewandowski et al. Circulation 2013;127:
23 Identification of Myocardial Infarction CAP Project 23/40 Anterior Lateral Septal L L L P A P A P A S S S Medrano-Gracia et al. JCMR 15:80 ; 2013
24 Modeling of Stiffness, Stress and Strain CAP Project 24/40 Normal Non-Ischaemic HF Vicky Wang, Martyn Nash, STACOM
25 Classification of Disease CAP Project 25/40 Does the patient have the disease?? Medrano-Gracia, PhD thesis, 2013
26 Data Analysis CAP Project 26/40 How bad is the disease?? Medrano-Gracia, PhD thesis, 2013
27 A Second Example the Coronary Arteries CAP Project 27/40 Image data Clinical problem Database Statistics
28 Catalonia Prospective trials 28/40
29 Cardiovascular Risk in Catalonia Prospective trials 29/40 If we wanted to determine the cardiovascular risk profile in Catalonia, how would we do this? Specific Generic 1. Recruit 5,000 normal Catalonians (preferably in 1948) and follow them for 50 years (similar to the definitive Framingham study) 2. Recruit 5,000 normal Catalonians and follow them for five years (an abbreviated Framingham study) 3. Use the Framingham results and add local correction factors from small studies where there are obvious discrepancies 4. Read the Framingham publications and speculate on how the data applies locally High cost Low cost
30 The VERIFICA Study Prospective trials 30/40 The Framingham function adapted to local population characteristics accurately and reliably predicted the 5-year CHD risk for patients aged years, in contrast with the original function, which consistently overestimated the actual risk. about 60% was observed in the United Kingdom; however, this is far from the >260% overestimation observed in Spain
31 The Catalonian Risk Table Prospective trials 31/40
32 Cost-Benefit Ratio of Clinical Trials Prospective trials 32/40 Cost Benefit Not fundable Fundable NZ Specific data Generic data Trial performed in New Zealand Can we do better in this range? Trial performed Overseas
33 The MESA Trial Prospective trials 33/40 The Multi-Ethnic Study of Atherosclerosis (MESA) is a study of the characteristics of subclinical cardiovascular disease and the risk factors that predict progression to clinical disease. 6,814 asymptomatic men and women aged have been recruited. (38% are white, 28% African- American, 22% Hispanic, and 12% Asian).
34 MESA Investigations Prospective trials 34/40 extensive physical exam to determine coronary calcification ventricular mass and function by MRI flow-mediated endothelial vasodilation carotid intimal-medial wall thickness and presence of echogenic lucencies in the carotid artery lower extremity vascular insufficiency arterial wave forms electrocardiographic (ECG) measures standard coronary risk factors socio-demographic factors lifestyle factors, and psychosocial factors blood samples are being assayed for putative biochemical risk factors and stored for case-control studies DNA are being extracted and lymphocytes immortalized for study of candidate genes and possibly, genome-wide scanning participants are being followed for identification and characterization of cardiovascular disease events, including acute myocardial infarction and other forms of coronary heart disease (CHD), stroke, and congestive heart failure; for cardiovascular disease interventions; and for mortality
35 The Jackson Heart Study Prospective trials 35/40 The objective of the Jackson Heart Study is to investigate the causes of cardiovascular disease (CVD) in African Americans (n=5301) with an emphasis on manifestations related to hypertension (such as remodeling of the left ventricle of the heart, coronary artery disease, heart failure, stroke and renal vascular disease). The MESA and Jackson Heart Studies are using software developed in Auckland to analyse all of their cardiac MRI image data. The data and results are fully compatible with all of the work already done here.
36 New Zealand Fingerprinting Prospective trials 36/40 could we perform the baseline MESA investigations on a group of 100 participants in New Zealand? this group could be defined geographically, by age, gender, a specific risk factor,. this would provide a mean and standard deviation for each investigation there would also be a profile for each individual participant together these data would represent a fingerprint for this group in New Zealand it is highly likely that some participants (and groups of participants) in MESA will have a similar group (and individual) fingerprint could we then follow them in the MESA trial?
37 Matching Prospective trials 37/40 Fingerprinting Matching n =100 n =1000
38 Future Overall of Clinical Trials Prospective trials 38/40 Outcomes for New Zealand data which reflects NZ subgroups direct application of results amplification of sample size by 10 X trial cost met internationally prospective study design international collaboration and engagement development of fingerprinting and matching technologies 20 year outcomes 15 year outcomes 10 year outcomes 5 year outcomes NZ cohort followed in trial New Zealand Sample (n=100) Fingerprint Statistical matching NZ cohort identified in baseline data (n=1000) Local (New Zealand) Global
39 Big Data Summary 39/40 What is big data for us in Medical Imaging? it is large and expanding medical imaging databases, preferably shared internationally there are issues of data ownership, appropriate ethical consents, data anonymisation and access control computationally intensive image processing is required to convert data into useful information computational and statistical anatomy and pathology (rather than feature labeling, or calculation of simple distance or volumes) image data may be represented using mathematical models to recreate physiological and pathophysiological shape, features and motion statistical analyses of shape, disease classifiers, calculation of new parameters (like stress) become possible clinical trial focus devices and pharmaceuticals
40 Thank you to - Summary 40/40 Alistair Young Avan Michael Jae Do Randall Carissa Lana Agustín Ben John Yingmin Pau Wenchao Paul Finn