Delivering Value with Big Data. Copyright 2014 World Wide Technology, Inc. All rights reserved.



Similar documents
Big Data Leadership Team

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

HDP Enabling the Modern Data Architecture

The Future of Data Management

Peninsula Strategy. Creating Strategy and Implementing Change

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data and Data Science: Behind the Buzz Words

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

HDP Hadoop From concept to deployment.

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

The Future of Data Management with Hadoop and the Enterprise Data Hub

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

A Case Study of Hadoop in Healthcare

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Big Data Technologies Compared June 2014

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

TRAINING PROGRAM ON BIGDATA/HADOOP

Proact whitepaper on Big Data

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

TABLE OF CONTENTS 1 Chapter 1: Introduction 2 Chapter 2: Big Data Technology & Business Case 3 Chapter 3: Key Investment Sectors for Big Data

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Building Your Big Data Team

Table of Contents. The Big Data Curveball...3. The Big Data Roadblocks...4. Defining the Business Outcome: Use Cases Drive Infrastructure...

Has been into training Big Data Hadoop and MongoDB from more than a year now

Talend Big Data. Delivering instant value from all your data. Talend

ANALYTICS CENTER LEARNING PROGRAM

Cisco IT Hadoop Journey

Tap into Hadoop and Other No SQL Sources

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Integrating a Big Data Platform into Government:

Mind Commerce. Commerce Publishing v3122/ Publisher Sample

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Native Connectivity to Big Data Sources in MSTR 10

Tips and Techniques on how to better Monitor, Manage and Optimize your MicroStrategy System High ROI DW and BI Solutions

Comprehensive Analytics on the Hortonworks Data Platform

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Il mondo dei DB Cambia : Tecnologie e opportunita`

How To Understand The Business Case For Big Data

VIEWPOINT. High Performance Analytics. Industry Context and Trends

#TalendSandbox for Big Data

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Bringing Big Data to People

Big Data and Industrial Internet

A Modern Data Architecture with Apache Hadoop

How To Make Data Streaming A Real Time Intelligence

COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY WITH PRACTICAL OUTCOMES

Big Data Multi-Platform Analytics (Hadoop, NoSQL, Graph, Analytical Database)

Data Analytics Infrastructure

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Big Data Success Step 1: Get the Technology Right

The BIg Picture. Dinsdag 17 september 2013

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Upcoming Announcements

Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

The Internet of Things and Big Data: Intro

Introducing Oracle Exalytics In-Memory Machine

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

SAP and Hortonworks Reference Architecture

Data Integration Checklist

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer

Big Data Management and Security

Next-Generation Cloud Analytics with Amazon Redshift

HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS

Data Analyst Program- 0 to 100

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Architecting for the Internet of Things & Big Data

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

The 4 Pillars of Technosoft s Big Data Practice

Dominik Wagenknecht Accenture

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Big Data and Data Science. The globally recognised training program

Transcription:

Delivering Value with Big Data Copyright 2014 World Wide Technology, Inc. All rights reserved. 0

WWT Big Data Leadership Team James Bigger Principal Consultant Brian Vaughan Principal Consultant Chris Ward Principal Consultant Jason Lu Chief Scientist Matt DuBell Principal Systems Engineer 20 years of management consulting and entrepreneurial experience. Expertise in financial services, insurance and telecom. Prior consulting experience with Opera Solutions and A. T. Kearney. Ph.D. in Physics from Oxford University. 15 years in management consulting, analytics and software experience. Expertise in healthcare and insurance. Prior experience with Opera Solutions, Mitchell Madison Group and Broadlane. Ph.D. in Physics from Stanford University. 20 years in management consulting and executive leadership. Expertise in retail, marketing, hospitality & financial services. Prior consulting experience with Opera Solutions and The Boston Consulting Group. BA from Princeton University, MBA from the University of Virginia 18 years of analytics and software development experience. Expertise in financial services, healthcare, insurance, retail and marketing science. Prior analytics development experience at Opera Solutions, FICO and J.D. Power and Associates. Ph.D. in Physics from Stanford University.. Over 20 years of experience in a range of IT and security disciplines. Responsible for deploying large, secure, Hadoop-based platforms for the U. S. Government. 10 year of international experience implementing networking and virtual data center environments Undergraduate degree from AIU. Prem Jain Principal Architect Mike McGlynn VP Emerging Technologies Yoni Malchi Engagement Manager Chris Infanti Engagement Manager Jamie Milne Engagement Manager Over 20 years of experience in enterprise datacenter, building innovative solutions in Big Data, storage, HPC, virtualization, data migration and enterprise applications. Formerly lead architect for NetApp's Big Data solutions, and led the development of the FlexPod select solutions. B.S. in Electrical Engineering. 25 years of government service at the National Security Agency. At the NSA Mike led the design and development of nextgeneration cyber systems; real-time systems, situational awareness tools, and command and control capabilities. M.S. in Computer Science from Johns Hopkins. B.S. in Mathematics Over 7 Years of experience in management and analytics consulting. Led engagements in telecom at Opera Solutions. Previous experience performing predictive analytics for NASA and USAF at The Aerospace Corporation. Ph.D. in Mechanical Engineering from Pennsylvania State University. Over 8 years of experience in analytics consulting and delivery management. Ran engagements in wealth management, corporate security, marketing, education and transportation at Opera Solutions and IBM Global Business Services. BS in Mathematics from Georgetown University. Over 7 Years of management consulting and entrepreneurial experience. Expertize in financial services, travel, and retail sectors across US and Europe. Led Big Data strategy and analytical engagements at Opera Solutions. MSci in Astrophysics from the University of Cambridge. 1

Big Data Capabilities Big Data projects operate at the intersection of business, science, and technology $$$ BUSINESS Highlights areas of high opportunity Drives focus on value creation f x = a 0 + n=1 a n cos nπx L + b n sin nπx L DATA SCIENCE Solves business problems Proves solutions based on empirical evidence TECHNOLOGY Captures and stores data on business Facilitates the operation of data science 2

Job Flow OOZIE The Big Data Software Stack The big data ecosystem includes open source and proprietary distributions that span the stack from ingest through analytics USER/MACHINE WORKFLOW DECIDE ANALYZE ORGANIZE ACQUIRE DATA ANALYTICS ACCESS/ QUERIES ANALYTICS DATABASE TRANSFORM MANAGEMENT FILE SYSTEM/ DATABASE INGEST MICROSTRATEGY LAYER PROPERTIES OPTIONS EXAMPLES OF PRODUCTS INTEGRATED OFFERINGS BUSINESS OBJECTS Real Time & Batch Optimized for high vol reads Flexible, Compressed, Fast Read Fast, Scalable OLAP Natural Language Custom Analytics Custom API s SQL Columnar In Memory Parallel RDBMS Provisioning Maintenance HDFS Parallel, NoSQL Distributed - Document - Key-Value - Wide Column Interfaces Flexible interfaces: to Batch accept data Streaming R PYTHON ZOOKEEPER HADOOP CASSANDRA HBASE MONGODB TERADATA NETEZZA GREENPLUM VERTICA CLOUDERA HORTONWORKS MAPR PIVOTALHD EMC/PIVOTAL HD / GREENPLUM HP/VERTICA/CLOUDERA ORACLE BIG DATA EXADATA/EXALYTICS IBM INFOSPHERE BIGINSIGHTS SAP HANA TERRACOTTA BIGMEMORY Enterprise Structured Enterprise Unstructured 3 rd Party Web/ Unstructured ODS Data Warehouse MapReduce Call Center Server Logs SQL PIG HIVE HADOOP SQOOP FLUME Financial Demographic SAS SPSS SPLUNK TALEND COGNOS ORACLE OBIEE PLUS OPEN SOURCE COMMERCIAL OPEN SOURCE SOLUTIONS 3

Dual Approach to Delivering Big Data Solutions WWT offers customers both strategic and tactical approaches to derive value from the application of Big Data analytics and technology BUSINESS IMPACT Extract value from data to drive multiple Use Cases TECHNOLOGY OPTIMIZATION Accomplish data tasks, faster, cheaper, better Consulting services Big Data Strategy Big Data POCs Big Data Sustainment Offerings Data Warehouse Optimization SAP HANA Implementation 4

Defining The Opportunity Is The Starting Point The power of Big Data lies in bringing together data in a timely fashion from sources within and external to the enterprise - structured and unstructured - to create a complete view of critical issues, therefore enabling advanced analytics to unlock key insights that drive significant value Outcome Clearly defined use cases with the potential to deliver significant value by distilling vast data into new, previously unknowable intelligence Analytics Advanced machine learning techniques to analyze data and mine for insights to drive critical decisions Data Structured or unstructured, internal or external, requiring new methods of storage/integration Technology Emerging/new technology stacks using scalable, distributed architectures 5

C a s e S t u d y C i t y o f D a l l a s Big Data Initiative - Overview O B J E C T I V E K E Y D E L I V E R A B L E S Formulate a Big Data strategy for the City of Dallas, assessing potential opportunities and creating an implementation roadmap to capture them S C O P E Dallas Police Department Court and Detention Services Dallas Water Utilities Code Compliance Office of Financial Services Human Resources Department of Public Works Sustainable Development & Construction Services Dallas Fire-Rescue Equipment & Building Services Streets Data Environment Assessment View of the current data environment at the source level, including volume of data Summary of current data challenges both organization-wide and by department where applicable High-level summary of external data sources Big Data Needs Document Definition of Big Data in the City of Dallas context Detailed documentation of 30+ use-cases, outlining required data sources, data sharing needs, and a complexity/value breakdown Key dependencies and considerations for each use case Data Management Strategy Documentation of key strategic short-term and long-term objectives in areas of infrastructure technology and data management High-level roadmap that addresses data quality, governance, operations, and security, taking key objectives into account Proposed Big Data roles and responsibilities Big Data Roadmap Use-cases organized in roadmap timeline with clearly outlined data and technology architecture dependencies, and documentation of criteria used to prioritize use-cases Data management strategy timeline that shows key milestones for both data management policy enactment and organizational changes Approach to deploying the capabilities needed to implement the roadmap 6

C a s e S t u d y C i t y o f D a l l a s Property 360 Description: Create a 360 degree view of a property using data from multiple departments, raising effectiveness and awareness of Code Compliance inspectors Integrate Data Sources Make Informat ion Available to Inspectors A p p r o a c h Join information on a property from multiple data sources across departments, including: SDC Posse for building permit and owner information DWU SAP Billing for current tenant information Code CRMS for Code inspection history DFR - Fire inspection and incident history DPD RMS for police incident history Third party information on area demographics Make data available to inspectors in the field in ways that will impact their operational effectiveness: Create a simple, mobile device-accessible visualization of data for a given address Basic information on building owner and occupancy history to decrease time spent on looking up tenant information Timeline of building inspection history to avoid repeat inspections and gain intelligence from other departments DPD and demographics data incorporated to increase safety E v a l u a t i o n Strategic Alignment Useful for many other departments, including DPD, DFR, DWU, SDC Considerations Field mobility will increase effectiveness of program Security is important, especially w/ DPD data Risks Ability to identify keys to join data across multiple data sets Dependencies Consolidation of data from multiple departments Visualization (preferably mobile) Impact Complexity Increase efficiency and safety of Code Compliance inspectors by making all property data available Data set created here will be useful to other departments, and will be a foundation for other use-cases 7

C a s e S t u d y C a s i n o Visibility Into A Customer s Journey Ability to combine and analyse multiple data sources very rapidly to understand the hidden drivers of individual customer behaviour enables the changing of long-term behaviour through personalized curricula and aspirational treatments Internal Data External Data Casino Hotel Marketing Customer Demographics Ratings by game type (tables, slots, poker) Hotel reservations and transactions Offers and mails Behavioral and profile indicators Demographic Appended Data and Customer Geo Coding Longitudinal 360⁰ customer view Customer Profile Customer Activity Feb. 2011 Apr. 2011 Jun. 2011 Customer : XXXXXXX Male, 52 Resides in Orlando, FL 2 trips in 2010 - Table Only, No slot play Zip Code Annual Household Income : $100,000 Feb.1 $75 Free Slot Play offer received by email. Mar.3 Apr.6 Checked-in at 2:15pm with his wife, ordered room service 6pm 2 free nights offer (Apr 6-8) for 2 No response received by email Played Tables and Poker 3 hrs 20 mins, (ADT $345) Apr.7 No Play Ate at Restaurant 12:30pm Apr.8 Played Tables - 2 hrs (ADT $180) Joint Account created Checked-out at 11:40am 8

C a s e S t u d y C o n s u m e r E l e c t r o n i c s Social Media Analytics Typically social media tools focus on monitoring past/present activity. Predictive analytics allows users to identify important threads and intervene early, shifting the focus to future activity Word cloud shows ongoing buzz and sentiment Tabular view shows emerging themes and sentiment, virality score and recommended timewindow for action Details on particular themes or attributes Forecasts trend and a mechanism to intervene in attribute that are going viral 9

C a s e S t u d y C r e d i t C a r d First-Party Fraud A Fortune 50 financial credit card issuer transformed its current approach to detecting Bust out fraud Bust-outs drove $350MM+ in losses annually Over 90% of accounts were identified too late in the process to stop fraud - it is an Analytic and Business necessity to score accounts in near-real time Current Bust-out Detection Timeliness Frequency distribution Bust-out before detection B A C K G R O U N D 91% Detection before Bust-out A P P R O A C H & R E S U L T S Customer activity patterns were monitored on a daily basis to identify patterns predictive of Bust-outs Multitude of new metrics (e.g. transaction activity, payment activity and other variables) were defined and used in the detection algorithm: A new, neural net based predictive model which significantly improved detection accuracy, 5 days earlier Benefits from predictive model 100% Model Lift Curve 1 Bustout Capture Rate Neural Network 90% 80% 70% 60% 50% Existing Score Logistic Model < -14-13 -12-11 -10-9 -8-7 -6-5 -4-3 -2-1 0 1 2 3 4 >5 days Reduce bust-out losses through: Predicting bust-out accounts earlier Prioritizing predicted cases to increase manual review hit rate and number of Bust-outs detected 40% 30% 20% Random 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Population Capture Impact Old New Lead Time (days) - 5 Action Rate (%) 7 25 10

Dual Approach to Delivering Big Data Solutions WWT offers customers both strategic and tactical approaches to derive value from the application of Big Data analytics and technology BUSINESS IMPACT Extract value from data to drive multiple Use Cases TECHNOLOGY OPTIMIZATION Accomplish data tasks, faster, cheaper, better Consulting services Big Data Strategy Big Data POCs Big Data Sustainment Offerings Data Warehouse Optimization SAP HANA Implementation 11

Data Preparation (ETL) Data Preparation (ELT) Current Data Warehouse Environment Source Systems Operational Systems Data Preparation (e.g. Informatica) Data Warehouse (e.g. Teradata, Netezza) $17K/TB Hot Data Access Layer Reporting & Analytics ERP, CRM Cold Data Third Party Data Unstructured Data 1 2 3 4 Large amounts of unstructured data do not make it into DW due to rigid schema Preparation of data for warehouse discards potentially valuable data Additional preparation runs on DW, increasing storage and decreasing performance Cold data dominates DW storage, rarely accessed by end users 12

Data Preparation Data Warehouse Optimization (DWO) Source Systems Operational Systems Hadoop ~$2K/TB Data Warehouse (e.g. Teradata, Netezza) ~$17K/TB Hot Data Access Layer Reporting & Analytics ERP, CRM All Data 3 Third Party Data Unstructured Data 1 2 4 Unstructured data can now be loaded into Hadoop in native format Low-cost Hadoop environment enables retention of all source data for analysis Data warehouse performance increases and storage cost decreases Users can access Hadoop directly for some analytics and reports, further decreasing DW storage and processing requirements 13

Advanced Technology Center Demonstrations Workshops Hands-on Labs Proofs of Concept Advisory Services Benchmarking NETWORK SECURITY COLLABORATION DATA CENTER BIG DATA Next Generation Networking Nexus (7K, 5K, 3K & 2K) Virtual Networking (Nexus 1000v) OTV, LISP, Fabric Path Layer 2 Extension DR/BC Networking Cybersecurity Solutions BYOD (Bring Your Own Device) Secure Mobility Jukebox ISE & RSA ASA 1000v VSG (Virtual Security Gateway) Unified Communications (also on UCS) Tandberg Video VXI (View and XenDesktop) WebEx, Call Center and Collaboration Solutions Phones, Backpacks and Soft Phone Clients TelePresence and Business Video Solutions Vblock, FlexPod and CloudSystem Matrix EMC and NetApp Storage vsphere / XenServer vcloud Director VDI (View / XenDesktop) Cisco CIAC and BMC CLM EMC s UIM and Cloupia FAST MDC (Mobile Data Center) Solutions Cisco UCS C220, C240 HP DL380, Nexus 2200, UCS 6296 FlexPod Select, Isilon storage Cloudera, MapR, PivotalHD Cloud Foundry Velocidata Appliance Next Generation provisioning tools EXPLORE EVALUATE ARCHITECT IMPLEMENT 14

Local, DAS, and NAS Infrastructures in the ATC REFERENCE ARCHITECTURE 1 REFERENCE ARCHITECTURE 2 REFERENCE ARCHITECTURE 3 REFERENCE ARCHITECTURE 4 HP Internal Local Storage UCS NetApp Direct Attached Storage UCS Isilon Network Storage SAP HANA VISUALIZATION TABLEAU TABLEAU TABLEAU ANALYTICS TOOLS STREAMING TOOLS SPARK R KAFKA SPARK MADLIB PYTHON STORM TRIDENT SPARK R KAFKA SPARK MADLIB PYTHON STORM TRIDENT PYTHON KAFKA STORM SAP HANA ANALYTICS DATABASES IMPALA HIVE HBASE HAWQ IMPALA HIVE HBASE HAWQ HIVE HBASE FILE SYSTEM/ DATABASES CLOUDERA HORTON PIVOTALHD MAPR CLOUDERA HORTON PIVOTALHD MAPR HORTON CLOUDERA HORTON MAPR NETWORK NEXUS 2200 UCS 6296UP NEXUS 2232PP UCS 6296 NEXUS 2200 UCS B BLADES COMPUTE HP DL 380 UCS-C220M3 UCS-C240 UCS-B440M2 STORAGE JBOD SATA NETAPP E5460 ISILON HITACHI DATA Enterprise Structured Enterprise Unstructured 3 rd Party Web/ Unstructured ODS Data Warehouse Call Center Server Logs Financial Demographic 15

First Step: Big Data Workshop 16