Big Data and the Data Lake February 2015
My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act on. The goal of any project, interface or other work by this team should always be to get the data intelligence out of the data.
Precision Medicine at Precision medicine is changing the landscape of cancer treatment at Wake Forest Baptist, allowing us to provide our patients with more precise, targeted therapies. Using the latest DNA sequencing technology, our experienced team of oncologists and geneticists can identify the genetic makeup of a patient s tumor and tailor treatment to the specific cancer mutations (abnormalities). Our goal is to provide the best individualized cancer therapy designed for you. Targeted Cancer Therapy: No two cancers are alike. Every cancer has a unique genetic code, or blueprint, that shapes how it spreads and grows. Through genomic sequencing, our physicians can uncover genetic abnormalities or changes in a tumor that drive the growth of cancer. We then select treatments to specifically target these genes and attack the cancer, while sparing healthy tissues that the body needs. For adults and children who have active cancer and whose treatment is no longer working, precision medicine may be an option. Learn more.
Patient-Centric/Care Coordination Focus Internal Use Only Subject to Change
Wake Cloud: Hybrid Cloud Solution Greater Agility, Leverage Existing Skills & Process SELF SERVICE EMC CLOUD SERVICE PROVIDER MANAGEMENT & ORCHESTRATION CONVERGED INFASTRUCTURE SOFTWARE-DEFINED DATA CENTER Compute Storage Network VMWARE VCLOUD AIR VIPR SW-DEFINED STORAGE DATA PROTECTION VMAX VNX Isilon Data Domain Avamar VPLEX & RP
Business Benefits
Enterprise Information Market Trends Information Discovery and Visualization BI on the Go Content Intelligence And Discovery BI on the Cloud Demand Big Data Information Discovery and Visualization Advanced Analytics Predictive, Statistical, Data & Decision Sciences Social decision making Social media and analytics Managing the Information Value Chain Big Data Architectures, Quality, Governance Cohesive Information Architectures With Master Data Management 2015 2016 Time 7
Journey to Data Driven Enterprise Steps Archive Realize cost efficiencies and extend life of existing systems and Data migration Insights Integrate all existing data to generate business insights Data Analysis Apps Build Apps to assist/take (automated) actions from the insights generated Data Driven Apps Business Models Create new revenue streams leveraging new data and new insights Business Transformation Repeatable Framework Platform for experimenting data driven business models and innovation Experimentation Platform Technology Data Lake Platform as a Service Target Manager IT Leaders Business Leader CEO
Data lakes take advantage of commodity cluster computing techniques For massively scalable, low-cost storage of data files in any format. 13 Oliver Halter, PricewaterhouseCoopers LLP
Healthcare Data Lake Concept Wake Lake Internal Use Only Subject to Change 15
Wake Data Lake Functional Module Pivotal Functional Big Data Module 76 TB Usable Allowance for Growth GemFire XD Brings real-time data processing and analytics capabilities Future home of High Performance Computing, Research, Translational Medicine Integration of relational DBs with unstructured data In Memory Data Grid Analytic Data Warehouse Applications BI/Analytics Tools Greenplum Functional MPP Module 27.5 TB Greenplum DB Module (required) Future home of Enterprise Data Warehouse 2.0, TDW, other relational databases Commercial quality tools to manage Big Data, allowing relational data access (SQL) (Big) Data Staging Platform Data Science
EMC Data Lake Reference Architecture Apache Hadoop is at the heart of a data lake. EMC supports data lakes with enterprise management and enhanced data services provided by Pivotal. HAWQ Advanced Database Services A full-featured SQL interface to data in Hadoop. Spring XD The Spring Programming Framework lets you build Hadoop applications in a standardized, extensible fashion. Hadoop Virtualization Extensions (HVE) Pivotal integrates the open source Hadoop Virtualization Extensions bringing the flexibility of virtual infrastructure to Hadoop. Command Center The Command Center makes Pivotal HD, enterprise ready with automated deployment, configuration, monitoring and control. GemFire XD Brings real-time data processing and analytics capabilities to the 3 rd platform. Data Loader High performance data-loading building to ingest 100s of TB an hour. Internal Use Only Subject to Change 14
And the data goes where?? Your Use Case Gemfire Greenplum Pivotal HD with Hawk When do I need it? Now Later Later What doi want to do with it? How willi query and search? How do I need to store it? Whereis it coming from? Singular event processing, Transactions Structured analytics Structured,regular Ad hocsql Temporary Events/ Stream, file, ETL I do but not required to File, ETL Exploratory analytics Unstructured/ unknown I must andi am required to File, ETL
In-Database Analytics: Detail Data Access & Query Layer ODBC JDBC SQL In-Database Analytics Embedded Partner Open-Source Customized User-written Greenplum DB Embedded Analytics Greenplum Spatial Greenplum Text SAS Scoring Accelerator SAS/HPA High Performance Analytics SAS Access SAS Grid MADlib Open Source Analytical Algorithms Customized MADlib User-Written Analytical Algorithms GREENPLUM DATABASE
Chorus Analytics Studio Create, store, and share visual analytic workflows Build analytic flows for Greenplum, HAWQ, and Hadoop Powered by Alpine and MADlib 75+ drag-and-drop operators for the entire analytics process MADlib algorithms in-database
Data & Analytics Technology Ecosystem Analytics Business Intelligence Data Integration Social Media Services Data Modeling
How Does this Work in Practice? Store Everything Obsessively collect data Keep it forever Put the data in one place Analyze Anything Cleanse, organize, and manage your data lake Make the right tools available Use the resources wisely to compute, analyze, and understand data Build the Right Thing Use insights to iteratively improve your product
Questions?