Using Ultra-Large Data Sets in Healthcare New Questions-New Answers



Similar documents
Applications for Business Intelligence, Predictive Analytics and Big Data

DIGITAL MARKETING STRATEGIES Leveraging The Back-End Tools

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

So Just What Is Big Data? James E. Tcheng, MD, FACC, FSCAI

Introduction to Predictive Analytics. Dr. Ronen Meiri

Adding value to Healthcare with Big Data

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Technologies Compared June 2014

BIRT in the World of Big Data

Introduction to the Mathematics of Big Data. Philippe B. Laval

Global SME Big Data Market

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Putting IBM Watson to Work In Healthcare

Health Data Analytics and Decision Support Prof.Dr. Bart De Moor

Right Sizing Big Data for Credit Unions. Filene and CUCC Research Symposium May 3, 2015

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis

Hadoop Big Data for Processing Data and Performing Workload

BIG DATA CHALLENGES AND PERSPECTIVES

The Imperative of Big Data in Public Health Transformation

Copyright (c) 2012, Meta Business Systems. Mario Bojilov Meta Business Systems 20 February 2013

Gain insights and take action

BIG Big Data Public Private Forum

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data. Lyle Ungar, University of Pennsylvania

Getting to Know Big Data

Big Data and Graph Analytics in a Health Care Setting

FROM DATA TO KNOWLEDGE: INTEGRATING ELECTRONIC HEALTH RECORDS MEANINGFULLY INTO OUR NURSING PRACTICE

How To Use Big Data In Healthcare

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

New Approaches to Technology Adoption for Healthcare Organizations

Game On: How Information is Changing the Rules of Insurance

Background Information Data Uses Strategies and Plans Summary Open Discussion/Questions. Art Cadorine, ISO Pete Marotta, ISO Tracy Spadola, Teradata

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Is Big Data a Big Deal? What Big Data Does to Science

Application Development. A Paradigm Shift

Chapter 1. Contrasting traditional and visual analytics approaches

Data Management Nuts and Bolts. Don Johnson Scientific Computing and Visualization

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Data Aggregation and Cloud Computing

How To Get More Data From Your Computer

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Doing Multidisciplinary Research in Data Science

Innoveren door te leren

Dealing with Data Especially Big Data

Hadoop implementation of MapReduce computational model. Ján Vaňo

Data Use and the Liquid Grids Model

Examine Your Strategy, Weigh Your Options in Choosing Removable Storage. Dave Holmstrom Associate Director of New Products Verbatim Corporation

The Mysterious Cloud What s In It For Propane? Aaron Cargas acargas@cargas.com CargasEnergy.com Booth: 1339

IBM's Watson could usher in new era of ALS research and medicine ons/ideas/index.html?

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

One Research Court, Suite 200 Rockville, MD Tel: Fax:

Big Data a threat or a chance?

Taming the Beast of Big Data

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Congrats to Game Winners. How can computation use data to solve problems? What topics have we covered in CS 202? Part 1: Completed!

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Cloud Computing and Big Data That s Why! Ray Walshe 14 th March 2013

Data Analytics in Organisations and Business

Is Big Data Good for our Health? You Bet. Here s Why. By Cameron Warren and Merav Yuravlivker

TIPPING POINT: HOSPITAL RESILIENCE IN A PERFECT STORM

BIG DATA: ARE YOU READY? Andy Kyiet Demand Flow Intelligence May, 2013

CIS492 Special Topics: Cloud Computing د. منذر الطزاونة

Navigating the big data challenge

Big Table in Plain Language

Big Analytics: A Next Generation Roadmap

Big Data: Public Sector Opportunities, Challenges, and Implications

What Is Big Data? Craig C. Douglas University of Wyoming

CONNECT: SOA for Healthcare

Large scale processing using Hadoop. Ján Vaňo

NextGen Infrastructure for Big DATA Analytics.

Find the signal in the noise

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

From Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations

FOR A FEW TERABYTES MORE THE GOOD, THE BAD and THE BIG DATA. Cenk Kiral Senior Director of BI&EPM solutions ECEMEA region

BIG DATA FOR YOUR DC

So What s the Big Deal?

This Symposium brought to you by

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Predictive Analytics for Demand Forecasting and Planning Managers A Big Data Challenge Hans Levenbach, Delphus, Inc.

ICD-10 for Providers. Differences in ICD-9 and ICD-10. No Laterality Laterality Right or Left account for >40% of codes. 4-7 digits.

Data : Big & Open Big Data Open Data. François Bancilhon Data Publica & INRIA/Mobile Services Initiative twitter.com/fbancilhon

Determining Your Computer Resources

Putting Analytics to Work In Healthcare

lesson 1 An Overview of the Computer System

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Data Centric Computing Revisited

Open source large scale distributed data management with Google s MapReduce and Bigtable

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

WHAT IS BIG DATA? David Bechtold

MapReduce, Hadoop and Amazon AWS

Dimensionalizing Big Data. WA State vs. peers. Building on strengths CONTENTS. McKinsey & Company 1

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

MEANINGFUL USE. Community Center Readiness Guide Additional Resource #13 Meaningful Use Implementation Tracking Tool (Template) CONTENTS:

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data Zurich, November 23. September 2011

Transcription:

Using Ultra-Large Data Sets in Healthcare New Questions-New Answers David Hartzband, D.Sc.. Director, Technology Research, RCHN Community Health Foundation & Lecturer, Engineering Systems Division Massachusetts Institute of Technology

Big Data? Big Data is the management & use of ultralarge amounts of information, where: Management & use = efficient storage, search, analysis, visualization & Ultra-large = more than 1 Petabyte of data 1 byte = a single printed character 5 million bytes (5 Megabytes or 5MB) = complete printed works of Shakespeare 4x1000 times that (20 billion bytes, 20 Gigabytes) = complete recorded works of Beethoven 500 times that (10 trillion bytes, 10 Terabytes) = the printed works in the Library of Congress 100 times that (1 quadrillion bytes, 1 Petabyte) is a lot!

Is this Real? 8 years ago, as a VP at the EMC Corporation, Merck asked my group if we could manage a 1 PB submission to the FDA Today Google has about 2 PBs of information under management for Google Earth Typical EHR record/patient (not counting images) ranges from 1MB for a healthy young person, to 40MB for a middle-aged person with some health issues to 3-5GB for a person with several health issues including images* *SearchStorage.com

Translate this to Kaiser Kaiser Permanente has 8.8M members*, based on the estimates of EHR record size, KP would have between 26.5PBs & 44PBs of patient data under management just from EHR data including images & annotations Just by raw size, this is 4400 Libraries of Congress not a meaningful or imaginable concept By some estimates, total size of digitized patient data in the US might be as large as 600PB-10EB (10 exabytes) * http://xnet.kp.org/newscenter/aboutkp/fastfacts.html, accessed 9/19/11

OK, This is Big, But Kaiser is never going to try to analyze all 44PB of data at once Analysis of any kind, is typically done on cohorts of patients that number in the 1000s, The San Diego Supercomputer Center currently has 16TB* of CMS data under management Medicaid claims for the past 5 years (minus some States) but what if analysis could be done on much larger numbers? What kinds of questions could you ask? What kinds of analysis could you do? What could it tell you? * Natasha Balac, SDSC, personal communication

Questions? Say Kaiser (or HHS, or NY etc.) wanted to look at how many patients had Flu (like symptoms) by analyzing patterns in EHR data, even if Flu was not diagnosed, on a weekly basis for 2006-2010 (260 weeks) What if they wanted to correlate length of acute respiratory infections with administered doses of specific antibiotics? What if they wanted to model the course of seasonal respiratory infections & their response to different drug therapies? What if they wanted to enhance those results by using data from social media & other sources to develop new epidemiologic indicators What if they wanted to determine the relationship of cost (to Kaiser) of those infections & correlate that with specific drug therapies?

New Process & Analyses Data acquisition from EHRs, PHRs, other structured clinical & demographic data, the Web & other unstructured sources such as social media Aggregation of ultra-large analysis set, use of new database & data transformation technologies such as NoSQL DBs, MapReduce, UIMA etc. Use of new tools to define analysis or models including R, Hadoop Requires new skills to design analysis & interpret results Leaders include Google, IBM, Amazon, EMC, MongoDB, Opera Solutions, 1010Data, Quantivo, Zillabyte

Hasn t This Been Done? Yes for 1000s of patients, maybe even 10s of thousands of patients, not for millions The difference is between 90% (1 in 10 error rate) & 99.999999% (1 in 1,000,000) confidence level Analysis using this much data produces results with close to certainty

Calling Dr. Watson Early medical expert systems Mycin (Stanford), diagnosed bacterial infections, about 600 rules, data entered for each diagnosis, 69% effective (10% better than human), but never used Current Active Health (Aetna), still rule based (10,000s), uses knowledge base created by medical & IT staff, produces medical alerts based on current research & best practice IBM Dr. Watson adapted from Watson hardware/software system, Deep Question Answering (DeepQA), content acquired from Web or specified documents (EHR, etc.), analyzes questions, generates & evaluates hypotheses, generates & evaluates answers, proposes diagnosis & treatment (now allied with WellPoint 33.3M members)

We are just Starting Systems now can be directed at ultra-large scale analysis & predictive modeling, not just diagnosis Many companies developing tools for data acquisitions, analysis & modeling at this scale big & small: IBM, Oracle as well as 100s of start-ups & smaller companies Healthcare will benefit: Outcomes improved through discovery of new evidencebased practices Cost control through integrated clinical & financial analysis Public health improved through use of more accurate models

Continue the Discussion dhartzband@rchnfoundation.org dhartz@mit.edu