A Case Study of Hadoop in Healthcare



Similar documents
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

The Future of Data Management

HDP Hadoop From concept to deployment.

Architecting for the Internet of Things & Big Data

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

HDP Enabling the Modern Data Architecture

The Future of Data Management with Hadoop and the Enterprise Data Hub

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

The 4 Pillars of Technosoft s Big Data Practice

Comprehensive Analytics on the Hortonworks Data Platform

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

The Internet of Things and Big Data: Intro

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Integrating a Big Data Platform into Government:

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

IBM Big Data Platform

Achieving Business Value through Big Data Analytics Philip Russom

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Big Data Management and Security

Analytics Industry Trends Survey. Research conducted and written by:

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data and Analytics in Government

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

VIEWPOINT. High Performance Analytics. Industry Context and Trends

#TalendSandbox for Big Data

Self-service BI for big data applications using Apache Drill

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Business Intelligence for Big Data

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

How To Handle Big Data With A Data Scientist

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

The Enterprise Data Hub and The Modern Information Architecture

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Making Sense of Big Data in Insurance

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Traditional BI vs. Business Data Lake A comparison

Data Virtualization A Potential Antidote for Big Data Growing Pains

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Are You Big Data Ready?

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Self-service BI for big data applications using Apache Drill

Luncheon Webinar Series May 13, 2013

Spatio-Temporal Networks:

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

UNIFY YOUR (BIG) DATA

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Why Spark on Hadoop Matters

Upcoming Announcements

Cisco IT Hadoop Journey

Big Data and the Data Lake. February 2015

The 3 questions to ask yourself about BIG DATA

Next-Generation Cloud Analytics with Amazon Redshift

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Building Your Big Data Team

How To Use Big Data For Business

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Hadoop Ecosystem B Y R A H I M A.

IBM Big Data in Government

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

Reference Architecture, Requirements, Gaps, Roles

Has been into training Big Data Hadoop and MongoDB from more than a year now

Getting Started Practical Input For Your Roadmap

Big Data Integration: A Buyer's Guide

Big Data Infrastructure at Spotify

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Apache Hadoop: Past, Present, and Future

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data on Microsoft Platform

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Analytics and Big Data with the PI System Part 2: Statistical Analytics

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Safe Harbor Statement

Cloudera Enterprise Data Hub in Telecom:

Talend Big Data. Delivering instant value from all your data. Talend

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Cost-Effective Business Intelligence with Red Hat and Open Source

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Large scale processing using Hadoop. Ján Vaňo

Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering

Transcription:

Leading a Healthcare Company to the Big Data Promised Land: A Case Study of Hadoop in Healthcare Mohammad Quraishi (IT Senior Principal - Cigna) atif71@gmail.com

About me BS in Computer Science and Engineering from University of Connecticut In the Healthcare Industry for over 19 years Programmer most of my career - Architect, Designer Worked in the SOA space for a number of years Lead engineer in the mobile application space Now Lead engineer in the Big Data Analytics Space - Hadoop In my spare time Love to travel with the family Video games, music, movies Community relations work Fan of College basketball 2

Breakdown of the Hadoop Journey 1 2 3 Making the case Vision Architecture The blowback What we accomplished Roadmap to the future Lessons Learned Questions? 3

The Elephant in the room Image Credit: Guian Bolisay/Flickr 4

What s the problem? We already have a mature data analysis infrastructure 5

And it looks something like this What we already do We have independent data marts We have the Hub-and-spoke architecture, the centralized warehouse 6

What is the vision? The ability to perform Descriptive, Predictive and Prescriptive Analytics Remove the traditional IT barriers separating the business users from insights 7

Benefits of Big Data Hadoop has the lowest cost per TB ratio of any data technology available Getting started with Hadoop is fairly inexpensive Entry-level clusters relatively inexpensive Grow in small steps 8

Benefits of Big Data You don t have to throw away data anymore 9

Vision - Reference Architecture 3 Logs Web IVR Portal Mobile Log files to HDFS Live Data Streams Web Analytics Event detection(storm) Realtime Feed Flume NoSQL Storing weblogs Data from tables to HDFS 2 SQOOP/ Flume Chronos Edge'Node'For'Hadoop'Client Python Hive/Impala (SQL) Jobs Scalding (Scala) Cascading (Java) SAS Pentaho R? Analysis/Modeling'Tools Tableau Spotfire Platfora Analysis 7 Cognos Microstrategy? Real%me'Data'Store'or'event'processing Flume SQOOP 1 MapReduce'Distributed'Programming'Framework *Use Spark Streaming and append to Hadoop output - Realtime events Data'Science'Tools' Visualization 6 HDFS filestack SQOOP Hadoop'Cluster'running'HDFS'and'MapReduce Includes'Management,'Monitoring'and'Security Map Reduce jobs Batch Teradata RDBMS External'Hadoop'Output Or'in'HDFS 4 CED/ Claims Clinical Data RDBMS Teradata 5 Hadoop' Cluster'#2 8 Hadoop' Cluster'#3 Back'up'or'copy'data'from'HDFS'to'a' redundant'cluster'for'quick'recovery *'For'future'implementa%on'TBD 10

The Initial Evaluation Vendor Evaluation: Which relationship best fits our needs without lock-in? Selection of use cases for demonstration Visualization of those use cases 11

Use Case 1 12

Use Case 2 13

Success Ready to tackle tougher more complicated problems Went out looking for more use cases 14

Ran into misconceptions Let s use Hadoop as ETL Help us move data. Can we back up data for archiving? 15

& Challenges 16

But Why? Overuse of the words Big & Data There was an overlap with other tools and platforms Hadoop looked like a swiss army knife Will it take over the world and replace other platforms? 17

Broader impact - Business Benefits Building a Customer Persona Service Ops efficiency Being Customer Centric Product Efficiency Brand Impact 18

Broader impact - IT Benefits Predictive threat modeling Data Archival Network Efficiency 19

Hadoop and Big Data Big Data = Hadoop + Relational + other suitable task related technologies Hadoop is complementary 20

Hadoop is Complementary Hadoop excels at processing and analyzing large volumes of distributed, unstructured, structured and semi-structured data in batch or near real-time fashion for analysis NoSQL databases are adept at storing and serving up multistructured data in near-real time for web-based applications Massively parallel OLAP databases are best at providing analysis of large volumes of mainly structured data - Teradata SAS/R - Modeling and Business Intelligence Tableau - Visualization 21

Embrace the Most Important Change: Culture Democratize your data and reap the benefits 22

Why is Hadoop Complementary? 23

What we accomplished? Evangelized Hadoop Linked Hadoop to BI Tools R on Hadoop A fail fast iterative analytics approach 24

λ Lambda Architecture as the foundation Credit Nathan Marz - Big Data 25

What we accomplished? ETL - Ingest, Transform and Move patterns Logs generated from consumer channels were ingested with Flume Standardized on Parquet (Storage) and Snappy (Compression) Lifecycle and organization of Data on HDFS LUKS - dm-crypt for data at rest encryption Sentry and LDAP for Role Based Access Control 26

A Custom NLP Framework 27

A Roadmap to the Future 28

A Roadmap to the Future Data Driven Solutions + FP Functional Programming: I came for the concurrency, but I stayed for the Data Science Dean Wampler 29

The Hadoop Stack Advanced View There s also Workflow Management with Oozie. 30

Lessons Learned Overuse of the words Big & Data The overlap Everyone found a use for Hadoop Big Change/Baby Steps Agility + Process = Cognitive Dissonance 31

Healthcare company needs Security Vendors Vendor Partnerships 32

WWYS Difficult to see. Always in motion is the future Yoda Many of the truths that we cling to depend on our point of view. Yoda The Journey of a thousand miles begins with one cluster 33

Questions? Mohammad Quraishi (IT Senior Principal - Cigna) atif71@gmail.com 34