Big data in health research Professor Tony Blakely Burden of Disease Epidemiology, Equity and Cost Effectiveness Programme 1
Structure Added value of big data. Examples: Linked census health data Linked health administrative data Longitudinal data Genetic data and epidemiology Challenges Opportunities 2
NZCMS: method in one slide 1991 census cohort (0-74 yr olds) Anonymous and probabilistic record linkage Deaths + + + + + 3
Life expectancy 85 80 75 70 65 60 Without linkage of census and 55 mortality data, we would have 50 overestimated Māori LE 1980s to mid 1990s. 45 1941 1951 1961 1971 1981 1991 2001 2011 Non-Māori Male Non-Māori Female Māori Male Māori Female 4
Breast cancer incidence rates by ethnicity Suggestion survival gaps widening faster than incidence gaps Breast cancer mortality rates by ethnicity NZCMS and CancerTrends (Incidence) findings 5
Rate ratios of 45 74 year old mortality for nil cf. postschool education, before and after adjusting for smoking Reduction in excess RR (ie RR-1) due to adjusting for smoking 1.5 RR 1.4 3% 16% 11% 21% Age & Ethnicity adjusted 1.3 1.2 Plus adjusted for smoking 1.1 1 Females 1981-84 Males 1981-84 Females 1996-99 Males 1996-99 6
HealthTracker linked health data hospital costs paid by the Ministry or DHBs (case mix cost weights) outpatient costs (contracted purchase units) GP visits (average capitation cost only) general medical subsidy for GP visits outside enrolled PHO emergency department triage level contracted purchase unit cost for event community pharmacy, and more recently hospital pharmacy costs (excluding non subs medications) lab tests funded by Vote:Health. 7
HealthTracker colon cancer costs Females age 62.5 yrs by time pre/post diagnosis $20,000 Cost per person month $15,000 $10,000 $5,000 $- 6-11 mth 1-5 mth <1 mth < 1 mth 1-5 mth 6-11 mth 12-23 mth 24+ mth 6-11 mth 1-5 mth <1 mth Pre-Diagnosis Post-Diagnosis, & not within yr of death Pre-Death from cancer 8
Economic evaluation: system costs Cost of intervention Health sector (C1) Consequences DALYs averted Other sectors (C2) Patient/family (C3) INTERVENTION Change in health Productivity losses (C4) New Zealand with NHI linked health datasets has a wonderful tool for calculating these health system costs HealthTracker Downstream costs averted/incurred Health sector (S1) Other sectors (S2) Patient/family (S3) Productivity gains (S4) 9
Cost $30,000,000 $25,000,000 HPV vaccination: Cost effectiveness plane Girls&Boys intensified schoolonly prog (2G+B) $20,000,000 $15,000,000 $10,000,000 $5,000,000 $0 0 100 200 300 400 500 600 700 HALYs gained Girls&Boys current prog (1G+B) Girls only intensified schoolonly prog (1G) Girls only current prog (1G) 10
Longitudinal Causal Inference H 1 H 2 H 3 Z L 1 L 2 L 3 Does a change in H cause a change in L or vice versa?
Does change in income predict change in self-rated health? Longitudinal data: SoFIE-Health
Does change in income predict change in self-rated health? No Variables Odds ratio 95% confidence interval Amalgamated conditional logit regression model Household annual income* 1.009 0.995 1.023 Hybrid proportional odds model Household annual income* 1.006 0.997 1.015 Supported by international literature Imlach Gunasekara Soc Sci Med 2011 But counter to most people s expectations
HDL and myocardial infarct example of big gene data internationally HDL accepted as (causal) risk factor for IHD So much so, that HDL a target for pharmaceutical companies big $$$ But really?
Mendellian randomisation = genetic variation as an instrument variable Z: Instrument = genes that predict HDL X: Exposure = HDL Y: Outcome = Myocardial infarction U: Unmeasured confounders Usually requires massive datasets in this case 20,000 MI cases and 100,000 controls with blood But incredibly good for causal inference 15
No association of HDL with MI major body of knowledge overturned
Major implications Drug discovery Pharmaceutical business CVD risk calculators Although HDL may still be useful for prediction, emphasizing prediction not the same as causation) And how did we miss this in observational epidemiology? Measurement error and residual confounding
Structure Added value of big data. Examples. Challenges Need vision Need champions Need capacity to use big data to add value Need funding Opportunities 18
Vision: Cancer Collections Framework Hewlett Packard Report for NZHIS and NSU, 2006 Year 1 Year 2 Year 3 Year 5 System NCMD Business Case developed and approved First phase of NCMD implemented (2 cancer specialities/tumour sites) Facilitate national view by using NCMD with links from NMDS, NZCR and Mortality Cancer Collections: 5 Year Vision Information/Reporting Current State Develop front-end Reporting tools within existing NZHIS reporting channels Set standard reports for Key Answers using existing reporting tools Use current reporting/ data extraction channels (e.g. PHO Performance Indicators site). Links created between Mortality and NZCR Links created between NCSP and Year 1 NZCR Trial of Information Explore data Laboratory for users to Implementation and from new access linked data from roll out of Front end sources eg cancer collections and Reporting tool Private clinics, related datasets community services A menu of pre structured reports made available. There - Add clinical Is also some ad-hoc report Automate links to information from ability for some datasets improve speed of NNPAC and access community Links between NZCR and Mortality are automated Links created between BSA and NZCR Year 1 New NCSP Link to PHO Information enhanced data System with capture Some additional data automated links access to authorised users for systems Link to accessible Explore providing key data aggregated data: PHI; links to PHO & DHB palliative performance indicators care data Increase data Add Collection Establish a process to increase accessed from from NNPAC - cancer related capture eg current systems (no count palliative care, primary care, new data collection). private, by working with other directorates and developments within the Ministry Year 3 Year 2 Year 1 Data Collection/Structure Data Access 19
Vision: Cancer Collections Framework Hewlett Packard Report for NZHIS and NSU, 2006 Widespread Authorised Data Access (Authorised) on - line access through to base data (that meets standard alignment requirements) On -line access to filtered/pre-formatted data and prestructured reports (real-time/recent data) On-line access to standard reports only Reports available to answer key critical questions based on data 2+ years old Set of standard reports readily available (recent data) with limited ability to generate ad- hoc reports. Data Laboratory: Able to undertake complex adhoc analysis in a supported or facilitated, real or virtual environment User-friendly interface for Information search and reporting No change to current data structures some additional data collected Data held in national collections linked, with the ability to add data fields where extraction or linking is not onerous and is valueadding On-line guide to answering key questions with links to particular reports Front end search and compilation tool with links to key databases Uniform Data Collection/Structure All information linked by NHI through cancer continuum for individuals diagnosed with cancer Datawarehouse for high-speed integrated analysis and sophisticated research Loosely linked System with search capability 20
Making vision happen challenging Still not there, many committees later Researcher/clinician/manager enthusiasm, hits reality of: Data dictionaries and definitions Systems of collecting the data Who? How? When? Reliability and validity of data Fitting it in with existing data Cost Linking it with biological samples and trial networks Demonstrating value Privacy, confidentiality and ethics 21
Challenge: Champions Census mortality and census cancer linkage would not have happened (as soon) without: Vision and championing of the then Government Statistician An emerging researcher looking for a PhD HealthTracker would not have happened (as soon) without drive of staff within Ministry Order of magnitude up is whole of cancer collections, not only requiring champion(s) but coordination, leadership, resources, etc. This is not easy. 22
Challenge: Capacity Capacity needed to assemble and maintain big data. but also to make good use of it: Provision to likely users Users capable of using it well, e.g.: Longitudinal data analyses Comparative effectiveness research, econometric and epidemiological skills Funding 23
Challenge: Cost New Zealand is a small country: May cost just as much to run a birth cohort study in New Zealand as Australia to achieve internal validity (e.g. sample size).. or put another way New Zealand does not have economies of scale. Are we able to even get to table to new drug trials: Numbers Registries Tissue samples 24
Structure Added value of big data. Examples. Challenges Opportunities 25
Opportunities E.g. HealthTracker, virtual access to data, joining in clinical data, etc. 26
Opportunities: Use what we have well Examples: NHI (VIEW/PREDICT, HealthTracker, etc) Growing Up Synthesis to answer research and policy questions through modelling Contributing data to international collaborations 27
Opportunities: Internet & social media Mountains of data: Twitter Facebook Websites How do we use machine learning and other methods to ask questions of, follow up and retrieve data from free living humans? Texting health messages just scratching the surface Innovation needed (e.g. monitoring how social media discussions alter as a result of health promotion campaigns) 28
Big data in health research Professor Tony Blakely Examples of added value of big data Challenges Opportunities Burden of Disease Epidemiology, Equity and Cost Effectiveness Programme 29