Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

Similar documents
Datenverwaltung im Wandel - Building an Enterprise Data Hub with

So What s the Big Deal?

BIRT in the World of Big Data

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

How To Handle Big Data With A Data Scientist

The Next Wave of Data Management. Is Big Data The New Normal?

Peninsula Strategy. Creating Strategy and Implementing Change

The Future of Data Management

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Copyright (c) 2012, Meta Business Systems. Mario Bojilov Meta Business Systems 20 February 2013

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

The Future of Data Management with Hadoop and the Enterprise Data Hub

Big Data and Data Science: Behind the Buzz Words

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Secure Cloud Computing Concepts Supporting Big Data in Healthcare. Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC

HDP Hadoop From concept to deployment.

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

The Rembrandt Group Strategies for BIG DATA

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

INTRODUCTION TO CASSANDRA

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

Taming Big Data Storage with Crossroads Systems StrongBox

Big Data a threat or a chance?

Big Data Challenges in Bioinformatics

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Enterprise Operational SQL on Hadoop Trafodion Overview

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Cloud Computing and Big Data What Technical Writers Need to Know

How To Scale Out Of A Nosql Database

Transforming the Telecoms Business using Big Data and Analytics

Big Data. Lyle Ungar, University of Pennsylvania

HDP Enabling the Modern Data Architecture

WHITE PAPER WHY ORGANIZATIONS NEED LTO-6 TECHNOLOGY TODAY

Data Refinery with Big Data Aspects

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Manifest for Big Data Pig, Hive & Jaql

Protecting Big Data Data Protection Solutions for the Business Data Lake

How To Create A Data Visualization With Apache Spark And Zeppelin

Turning Big Data into Big Decisions Delivering on the High Demand for Data

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Many government agencies are requiring disclosure of security breaches. 32 states have security breach similar legislation

Hadoop Big Data for Processing Data and Performing Workload

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

NextGen Infrastructure for Big DATA Analytics.

THE AGE OF BIG DATA. Chula DataScience

The 3 questions to ask yourself about BIG DATA

Big Data and Healthcare Payers WHITE PAPER

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Customized Report- Big Data

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Open Source for Cloud Infrastructure

How To Understand The Business Case For Big Data

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

EMC BACKUP MEETS BIG DATA

BIG DATA CHALLENGES AND PERSPECTIVES

Integrating a Big Data Platform into Government:

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

BIG DATA TRENDS AND TECHNOLOGIES

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

OnX Big Data Reference Architecture

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

SUSE Storage. FUT7537 Software Defined Storage Introduction and Roadmap: Getting your tentacles around data growth. Larry Morris

Open source large scale distributed data management with Google s MapReduce and Bigtable

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Transcription:

A Bioinformatics Research & Consulting Group Adding Omics Data to Electronic Health Record, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations Ali Eghlima Ph.D Director of Bioinformatics

Before we start: About me Ali Eghlima Expert BioSystems, EVP, and Director of Bioinformatics Data Scientist, Software/System/Solution Architect Five Years as Sr. Principal Engineer at Raytheon Leading R&D Projects in Enterprise Architecture, Cyber Security, Huge Big Data Analytics, Real-Time Distributed Big Data Collection and Analysis 20-years Career as Senior Consulting Engineer at DEC, Compaq, and HP Primary Technical Expertise Big Data Analytics, Real-time Distributed Computing, High Availability, Cyber Security, Cluster and Cloud Technology, High Performance Computing, Numerical Analysis Pioneer and Advocate in Cluster & Cloud Computing Ph.D from RPI, MS and Engineering degrees from MIT

Agenda Characteristics of Healthcare & life sciences data Review, Data integrity/privacy/cyber Security concerns of major healthcare/research Centers Review current technology, and common systems architecture used for Big Data Analytics in Health Sciences vs other industries. Issues, challenges and potential solutions for real-time and archived data storage managements Present scalable open source computing platform to manage Exabyte class datasets Concluding Remarks 3

Characteristics of Healthcare Data vs Other Data Almost permanent It is being owned by individual Data ownership after individual death is unknown ( offspring, siblings, other family members)

Example: Storage/Dataset Size Health Sciences vs Other Industries Financial Number of Accounts From 10000 to 300 Millions Storage per Account - ~Gigs or less Total Storage From ~Tens of Terabytes to ~300 Petabytes Healthcare Number of Patients From 10000 to 300 Millions Storage per Patient From ~Gigabytes Today to ~ Many Terabytes in future Total Storage From ~ 20 Petabytes to ~600 Exabyte

Example: Storage/Dataset Size Healthcare: Now vs. Future 700000 Total Storage in Petabytes 600000 500000 400000 300000 Total Storage in Petabytes 200000 100000 0 Healthcare Now (without Omics data) Healthcare Future (with Omics data)

Total Archived Capacity Total Archived Capacity, by Content Type, Worldwide, 2008-2015 (Petabytes) 7

It is not just genomic data Nature 494 February 2013 Big biology: The omes puzzle Where once there was the genome, now there are thousands of omes. Nature goes in search of the ones that matter. 8

Data integrity/privacy/cyber Security concerns of major healthcare/research Centers 9

Theft of Healthcare Identity Data Consequences Medical services, devices and prescription drugs Physician information to create fake prescriptions and then resell the medicine online. File false claims to insurance companies and government agencies 10

Theft of Healthcare Identity Data Value Credit Card info $1 Personal Identification Information (PII) for $10- $12 Patient Records for $50 Source: 1 - Medical Identity Fraud Alliance, The Growing Threat of Medical Identity Fraud: A Call To Action, July 2013, accessed at http://medidfraud.org/wp- content/uploads/2013/07/mifa-growing- Threat-07232013.pdf. 2 - David Carr, Healthcare Data Breaches to Surge in 2014, InformationWeek Healthcare, Dec. 26, 2013, accessed at http://www.informationweek.com/healthcare/policy-and-regulation/healthcaredatabreaches-to-surge-in-2014/d/d-id/1113259. 11

Theft of Healthcare Identity Data is Growing 2010 1.42 Million 2011 1.49 Million 2012 1.85 Million Source: Ponemon Institute, Fourth Annual Benchmark Study on Patient Privacy and Data Security, March 2014, accessed at http://lpa.idexpertscorp.com/acton/attachment/6200/f- 012c/1/-/-/-/-/ID%20 Experts%204th%20Annual%20Patient%20Privacy%20%26%20Data%20Security%20Re port%20final%20%281%29.pdf 12

Healthcare Data Security Threat (reported by healthcare provider) Employee negligence Unsecured mobile devices Security gaps with business associates Evolving criminal threats New vulnerabilities under the Affordable Care Act Survey participants had strong reservations about the security of Health Information Exchanges (HIEs): A third said they don t plan to participate in HIEs because they are not confident enough in the security and privacy of patient data shared on the exchanges http://www2.idexpertscorp.com/ponemon-report-on-patient-privacy-datasecurity-incidents/ 13

Technology, and Common Systems Architecture used for Big Data Analytics in Health Sciences vs other industries 14

Cloud Computing? Private Public Community

Private cloud? From Webopedia Private cloud is the phrase used to describe a cloud computing platform that is implemented within the corporate firewall, under the control of the IT department. A private cloud is designed to offer the same features and benefits of public cloud systems, but removes a number of objections to the cloud computing model including control over enterprise and customer data, worries about security, and issues connected to regulatory compliance.

Cloud or Public cloud Network Cloud In telecommunications, a cloud refers to a public or semipublic space on transmission lines (such as T1 or T3) that exists between the end points of a transmission Cloud Computing Cloud computing is a type of computing that relies on sharing computing resources rather than having local servers Consumer - Software as a Service (SaaS) Developers and Architects Platform as a Service (PaaS) IT Pros and system administrators - Infrastructure as a Service (IaaS)

Community Cloud? Centralized Distributed

Centralized Community Cloud? Multi-Tenant Infrastructure Shared Among Several Organizations with Common Computing Concerns/Requirements Higher Level of Security, Privacy, and Performance (Compare to Public Cloud) Pay-as-you-go Billing Structure Cost, less than Private more than Public

Examples: Centralized Community Cloud Industry A Community Cloud XaaS Industry B Community Cloud XaaS Customer C Customer A Customer B June 1, 2011 NYSE Technologies Introduces the World s First Capital Markets Community Platform

Secure/Trusted Distributed Community Cloud Enc/Dec Policy-based Access Control Private Cloud Three Enc/Dec Internet Policy-based Access Control Private Cloud One Enc/Dec Policy-based Access Control Private Cloud Two

Issues, challenges and potential solutions for real-time and archived data storage managements 22

Archiving, Tape Technology Ultirum LTO: Capacity per Tape 6.25 Terabytes Cost (tape) 1.3 cent per GB 250 million LTO tapes have been shipped Total, shipped Capacity ~ 100 Exabyte's Sony's new magnetic tape technology: Capacity - 185 TB per cartridges Announced at the INTERMAG Europe 2014 23

LTO ULTRIUM Technology Roadmap 24

Scalable Open Source Computing Platform to manage Exabyte class datasets Linux Hadoop MapReduce R Accumulo 25

Technology Stack Source: SQRRL Enterprise 2014 26

Partial listing of NoSQL database engines for managing Exabyte class datasets Accumulo Cassandra Couchbase CouchDB ElasticSearch Hbase Hypertable MongoDB Neo4j Redis Riak Scalaris VoltDB 27

Summary Challenges; System performance and Storage Management Storage requirements, and associated computing power and network infrastructure performance will increase by at least three order of magnitude, just to keep up with today computing systems performance Total Patient EHR, Data Storage ~ Zettabyte 28

Summary (cont.) Main challenge; Common data model and data store technology Standardizations and adaptation of common data store and data model technologies 29

Concluding Remarks A Bioinformatics Research & Consulting Group

5 V s of Big Data Volume: From Terabytes, Petabytes, Exabytes,... Variety: Structured data, semi-structured date, unstructured data and images. Velocity: Conducting real-time analytics and ingesting streaming data feeds in addition to batch processing. Value: Using commodity hardware instead of expensive specialized appliances. Veracity: Leveraging data from a variety of domains. 31