Big Data Analytics Best Practices



Similar documents
Are You Ready for Big Data?

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

The Enterprise Data Hub and The Modern Information Architecture

Big Data Can Drive the Business and IT to Evolve and Adapt

Are You Ready for Big Data?

Making Sense of Big Data in Insurance

The Future of Data Management

Integrating a Big Data Platform into Government:

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Getting Started Practical Input For Your Roadmap

HDP Hadoop From concept to deployment.

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Big Data and the Data Lake. February 2015

Native Connectivity to Big Data Sources in MSTR 10

The Future of Data Management with Hadoop and the Enterprise Data Hub

Deploying Big Data to the Cloud: Roadmap for Success

This Symposium brought to you by

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Demystifying Big Data Government Agencies & The Big Data Phenomenon

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data jako součást našeho života. Zdenek Panec: June, 2015

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data and Analytics in Government

Advanced In-Database Analytics

Big Data: Are You Ready? Kevin Lancaster

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Big Data Explained. An introduction to Big Data Science.

Safe Harbor Statement

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

HDP Enabling the Modern Data Architecture

Transforming the Telecoms Business using Big Data and Analytics

Addressing government challenges with big data analytics

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Modern Data Architecture for Predictive Analytics

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Industry Impact of Big Data in the Cloud: An IBM Perspective

White Paper: Datameer s User-Focused Big Data Solutions

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Tap into Hadoop and Other No SQL Sources

Big Data and Your Data Warehouse Philip Russom

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Testing 3Vs (Volume, Variety and Velocity) of Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

SAP and Hortonworks Reference Architecture

Extend your analytic capabilities with SAP Predictive Analysis

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

I/O Considerations in Big Data Analytics

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

The 3 questions to ask yourself about BIG DATA

NATIONAL STRATEGY FOR GLOBAL SUPPLY CHAIN SECURITY

The 4 Pillars of Technosoft s Big Data Practice

Information Builders Mission & Value Proposition

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Business Intelligence for Big Data

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Strategic Decisions Supported by SAP Big Data Solutions. Angélica Bedoya / Strategic Solutions GTM Mar /2014

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Parallel Data Warehouse

Evolution from Big Data to Smart Data

Turning Big Data into Big Decisions Delivering on the High Demand for Data

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

The Internet of Things and Big Data: Intro

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

BIG DATA CHALLENGES AND PERSPECTIVES

Data Refinery with Big Data Aspects

SAP HANA Vora : Gain Contextual Awareness for a Smarter Digital Enterprise

SEC.. DEFENSE CYBER CRIME CENTER: AUTHORITY TO ADMIT PRIVATE SECTOR CIVILIANS TO CYBER SECURITY COURSES.

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Transcription:

1 Big Data Analytics Best Practices Marshall Presser Federal Field CTO Greenplum

2 Big Data Makes the Mainstream

3 WHAT DOES IT TAKE?

4 1. New Applications MADlib

5 2. New Skill Sets -- Data Science

6 3. The Right Platforms Structured and Unstructured Data Clusters

7 The Goal: The Predictive Enterprise Predictive Enterprise Data Driven Decisions Deliver maximum business value from all the available data Predict outcomes using advanced analytics Leverage data science to gain deep insight about the business Turn insight into action with new applications

8 Federal Agency Requirements for Big Data Analytics Intelligence Community Counter-Terrorism Counter-Intelligence Cyber-Security Intelligence Analysis Department of Defense Data to Decisions Reduce the cycle time and manpower requirements Cyber Science & Technology Efficient and effective cyber capabilities Counter Weapons of Mass Destruction Secure, monitor, track and eliminate weapons of mass destruction Financials Fraud Detection Insider Trading Risk Analytics Homeland Security Identity Verification Transportation Security Border Security Immigration Control Investigations on Massive amounts of data Maritime Domain Awareness Department of Justice Counter-terrorism & Foreign Intelligence Defense Organized Crime Investigations Drug Enforcement & Illicit Drug Traffic Reduction Healthcare & Citizen Benefits Fraud, Waste & Abuse Detection Accurate Patient Identification & Treatment Accurate Benefit Distribution and Monitoring Healthcare Exchanges

9 Big Data Initiatives in the US Federal Government Economic forecasting mortgage foreclosures Health economics fraudulent claim analytics Internet security web log analytics Climatology numeric weather forecasting and storm path prediction Nuclear energy simulations of subatomic reactions, power from fusion Healthcare individually based optimal treatment patterns Genetics drug therapy, Human Genome project Medicine advanced imaging techniques Government operations waste, fraud, abuse, optimal operations

10 To Hadoop or Not to Hadoop? SQL: strong eco-structure, rich tools set, large developer community, very efficient on structured data Hadoop: more versatile on unstructured data, cost-efficient, schema on read

11 Hadoop is not Nirvana But there are many people using it despite the problems Data movement in/out of HDFS cumbersome Name Node Failure Written in Java, performance issues 3x data duplication wasteful Code base immature compared to SQL Management and admin not well developed Not a lot of Map/Reduce expertise Performance can be erratic, sub-standard

12 ETL The Hadoop Killer App? Data Processing on Hadoop Data Volume or Lack of Structure overwhelm ETL tools Use Hadoop to process transformations on raw data Load summarized data into analytical database (GPDB) Leverage the best of RDBMS & NoSQL Integration of Structured & Unstructured Data Tackle Petabyte Scale Datasets Sensor networks, social applications, online advertising apps New Data for Ad Hoc Analysis & Modeling Social Media Sentiment analysis, Online advertising optimization, Computer security

13 Hybrid Solutions Using SQL and Hadoop in a single application Raw Data Relational Text Video Audio Logfiles Hadoop Cluster Interesting Stuff Greenplum MPP Database Archive

14 How To Get Started Small manageable first project Find a first problem that is important, but not mission critical. Show success, ROI. Take an existing application that is too slow or not answering questions. Involve LOB users from the beginning. Set reasonable expectations. Avoid extensive coding, development for first project. Don t boil the ocean. Hire outside expertise; train your staff.