Big Data Leadership Team Chris Ward Principal Consultant James Bigger Principal Consultant Brian Vaughan Principal Consultant Prem Jain Principal Consultant Ma3 DuBell Principal Engineer 20 years in management consul5ng and execu5ve leadership Exper5se in retail, marke5ng, hospitality & financial services Prior consul5ng experience with Opera Solu5ons and The Boston Consul5ng Group BA from Princeton University, MBA from the University of Virginia Darden School of Business 20 years of management consul5ng and entrepreneurial experience Exper5se in financial services, insurance and telecom Prior consul5ng experience with Opera Solu5ons and A. T. Kearney Ph.D. in Physics from Oxford University 15 years in management consul5ng, analy5cs and sooware experience Exper5se in healthcare and insurance Prior experience with Opera Solu5ons, Mitchell Madison Group and Broadlane Ph.D. in Physics from Stanford University Prem has 20 years of technology experience in enterprise datacenter technologies. He has built innova5ve solu5ons in Big Data, storage, HPC, virtualiza5on, data migra5on and enterprise applica5ons. Prem was formerly at NetApp, was the lead architect for Big Data and FlexPod solu5ons. 20 years of experience in a range of IT and security disciplines Responsible for deploying large, secure, Hadoop- based plauorms for the U. S. Govt. 10 year of interna5onal experience implemen5ng networking and virtual data center environments Undergraduate degree from AIU
Big Data Team Jason Lu Chief Scien5st Eighteen years of analy5cs and sooware development experience. Exper5se in financial services, healthcare, insurance, retail and marke5ng science. Prior analy5cs development experience at Opera Solu5ons, FICO and J.D. Power and Associates. Ph.D. in Physics from Stanford University. Yoni Malchi Consul5ng Manager Worked as an Engagement Manager for predic5ve analy5cs consul5ng engagements. Experience in both the Financial Services and Telecommunica5ons industries, bridging the gap between the business and data scien5sts. PhD in Mech. Eng. in 2007 and worked in the Aerospace industry for 4 years. Jamie Milne Consul5ng Manager Over 7 Years of management consul5ng and entrepreneurial experience. Exper5ze in financial services, travel, and retail sectors across US and Europe. Led Big Data strategy and analy5cal engagements at Opera Solu5ons. MSci in Astrophysics from the University of Cambridge. Chris Infan9 Consul5ng Manager 8+ years of experience in big data analy5cs consul5ng. Experience in business development and delivery of analy5cs projects in the educa5on, wealth management, public safety, corporate security, online subscrip5on, transporta5on, and retail sectors. B.S. in Mathema5cs, B.A. in English Literature from Georgetown University Virtual Team BDAs, Analy5c Programmers, Storage Specialists, Network Architects, Hadoop Administrators and other professionals Many years of experience architec5ng, deploying and managing compute, storage, network, Hadoop ecoysystem and database solu5ons for fortune 500 companies to augment the exper5se of the core Big Data Leadership Team.
Volume, Variety and Velocity of Data are Exploding The produc5on of data is expanding at an astonishing rate. Drivers include the switch from analog to digital technologies and the crea5on of structured and unstructured data by individuals and companies via social media and the Web Volume Variety Velocity 40 30 Enterprise Managed Data ZB Enterprise Created Data 80 70 60 Unstructured data storage EB Structured data storage Every 60 Seconds: - 98,000+ tweets - 695,000 status updates - 11 million instant messages - 698,445 Google searches - 168 million+ emails sent - 1,820TB of data created - 217 new mobile web users 20 10 0 2010 2015 2020 50 40 30 20 10 0 2009 2010 2011 2012 2013 2014 The need to process more data faster to respond to dynamic business trends has brought new requirements for database architectures We believe the industry stands at the cusp of the most significant revolu8on in database and, therefore, applica8on architectures in the past 20 years.
Data Sources & Capture IT Infrastructure Data Management &Integra5on Analy5cs PlaUorms & Solu5ons Analy5cs Services & Support Data Vendors Infrastructure Vendors Open Data PlaUorms Ver5cal Analy5cs Solu5ons Proprietary Data PlaUorm Analy5cs Service Provider Vendor Landscape Is Crowded and Growing Extended Infrastructure + Data PlaUorms System Integrators Specialized End- to- End Solu5ons
Key Big Data Technologies FOUNDATIONAL EMERGING Hadoop NoSQL Columnar In- Memory Distributed File System and Processing Language Characteris9cs Parallel storage/processing Flexible programming model Horizontal scaling Batch processing Non- rela9onal Key- Value Database Characteris9cs Fast read/write Real 5me query Horizontal scaling Simple programming model Dynamic schema Column- Oriented Database Analy9cs Characteris9cs Rela5onal Efficient compression Op5mized for fast read of many/all records In- Memory Database and Processing Characteris9cs Rela5onal Random Access Extremely Fast Enablement / Uses Pre- processing of data for analy5cs ETL for transforming unstructured data to structured Data summariza5on Enablement / Uses Real- 5me ingest Rapid retrieval Input to MapReduce Enablement / Uses On- Line Analy5cs Processing (OLAP) Data storage and retrieval for advanced analy5cs Enablement / Uses Complex Event Processing Real Time Analy5cs Poten5al to use a common database for transac5ons and analy5cs
The Big Data Software Stack USER/MACHINE WORKFLOW The big data ecosystem includes open source and proprietary distribu5ons that span the stack from ingest through analy5cs DECIDE ANALYZE ORGANIZE ACQUIRE ANALYTICS ACCESS/ QUERIES ANALYTICS DATABASE TRANSFORM MANAGEMENT FILE SYSTEM/ DATABASE INGEST LAYER PROPERTIES OPTIONS EXAMPLES OF PRODUCTS Real Time & Batch Op9mized for high vol reads Flexible, Compressed, Fast Read Fast, Scalable Provisioning Maintenance Parallel, Distributed Interfaces to accept data OLAP Natural Language Custom Analy9cs Custom API s SQL Columnar In Memory Parallel RDBMS MapReduce HDFS NoSQL - Document - Key- Value - Wide Column Batch Streaming R PYTHON SQL PIG HIVE HADOOP ZOOKEEPER HADOOP CASSANDRA HBASE MONGODB SQOOP FLUME SAS SPSS TEREDATA NETEZZA GREENPLUM VERTICA CLOUDERA HORTONWORKS MAPR PIVOTALHD SPLUNK TALEND MICROSTRATEGY INTEGRATED BUSINESS OFFERINGS OBJECTS COGNOS ORACLE OBIEE PLUS EMC/PIVOTAL HD / GREENPLUM HP/VERTICA/CLOUDERA ORACLE BIG DATA EXADATA/EXALYTICS IBM INFOSPHERE BIGINSIGHTS SAP HANA TERRACOTTA BIGMEMORY DATA Enterprise Structured Enterprise Unstructured 3 rd Party Web/ Unstructured ODS Data Warehouse Call Center Server Logs Financial Demographic OPEN SOURCE COMMERCIAL OPEN SOURCE SOLUTIONS
Dual Approach to Delivering Big Data Solu5ons WWT offers customers both strategic and tac5cal approaches to derive value from the applica5on of Big Data analy5cs and technology BIG DATA BUSINESS IMPACT Extract value from data to drive mul9ple Use Cases BIG DATA TECHNOLOGY OPTIMIZATION Accomplish data tasks, faster, cheaper, bejer Strategic Roadmap Big Data Strategy Use Case Design Use Case PoC Analy5cs Development Workflow Integra5on Data Warehouse Op5miza5on ETL/ELT Offload Data Lake Crea5on SAP HANA Implementa5on Big Data Stack Build / Op5miza5on Produc5on Support & Sustainment
Defining The Opportunity Is The Starting Point The power of Big Data lies in bringing together data in a 5mely fashion from sources within and external to the enterprise - structured and unstructured - to create a complete view of cri5cal issues, therefore enabling advanced analy5cs to unlock key insights that drive significant Value. Outcome Analy9cs Data Technology Clearly defined use cases with the poten5al to deliver significant value by dis5lling vast data into new, previously unknowable intelligence Advanced machine learning techniques to analyze data and mine for insights to drive cri5cal decisions Structured or unstructured, internal or external, requiring new methods of storage/integra5on Emerging/new technology stacks using scalable, distributed architectures
MINING COMPANY PROJECT SCOPE 252 trucks 200 sensors per truck 7 mine sites 10,000 readings per second DISPARATE DATA SETS Integra5ng 15+ siloed data sources in mul5ple file formats 10 terabytes of data 3 year historical data ecosystem EQUIPMENT MAINTENANCE (SAP) DISPATCH & OPERATOR (TERADATA) FUEL, OIL, ANALYSIS, ETC. (SQL SERVER) TRUCK SENSOR DATA (Osi Pi SERVER) Stra5fying Alarms: 1. Urgent component problem 2. Cri9cal sensor problem 3. Important/not urgent component/sensor problem 4. Not important component/sensor problem 5. Noise ignore Urgent component failure models: engine, transmission, differen5als, torque converters, final drives 1 2 DATA LOGGER HADOOP INFRASTRUCTURE Established Big Data infrastructure Migrated and normalized data sets Developing visualiza5ons, tools and predic5ve analy5cs Data/analy5cs- driven 5ming for preventa5ve maintenance (e.g. oil changes) on individual trucks 3 DATA LOGGER DATA LOGGER 360 0 VIEW OF MACHINE Time Sensor Data BUSINESS IMPACT Higher equipment up- 5me Reduced cri5cal component failure Beser preventa5ve maintenance Increased produc5vity
Data Warehouse Optimization: Value Proposition Augmen5ng the Data Warehouse with a less expensive Hadoop system allows companies to free up valuable space on their DW systems to run faster queries and analysis, whilst storing large volumes of their data universe CURRENT PROPOSED Web logs Payments Scheduling CRM Full Data Universe Social Media Billing 1. A significant amount of data is thrown out during the ETL process that may be valuable in the future Web logs Payments Scheduling CRM Full Data Universe Social Media Billing 1. U5lize addi5onal Hadoop- based storage to store full data universe WWT Hadoop Appliance Cold Data 2. About 50% of data that is brought into a typical Data Warehouse system is rarely accessed Cold Data Tradi9onal Data Warehouse Warm Data 2. Move cold/warm data, ETL workflows, and ELT scripts to Hadoop, taking advantage of lower cost per TB Warm Data Hot Data 3. About 80% of the queries and repor5ng performed on highly- used data does not need to be at DW speeds Tradi9onal Data Warehouse Warm Data Hot Data 3. Con5nue to take advantage of DW agility and speed in real- 5me analysis and querying
Four Major Big Data Challenges In our mee5ngs with customers, four issues are consistently brought up as a major challenges related to crea5ng a big data capability that can effec5vely support the business units Defining the outcome What problem/opportunity are we pursuing? What is the value that can be created? Deploying new technologies and combining with exis9ng architecture How do we create an effec5ve integrated Big Data stack? What new technologies do we need and how do they fit together? Big Data Challenges Naviga9ng a crowded and evolving vendor landscape How do we separate marke5ng hype from reality? Who should we use? Who can we trust Organizing for success Where does Big Data fit? Who is responsible for data integrity? Where do we find the cri5cal resources needed to deliver Big Data solu5ons?
Four Stages Of A Big Data Deployment Analy9cs- Ready Infrastructure Solu9on Development Plan Design Pilot Scale Develop a roadmap for implemen5ng Big Data Use case explora5on Data Governance, Infrastructure and Analy5cs ownership Define high impact use cases Design and test appropriate reference architectures Create detailed descrip5on of selected pilot use cases Analy5cs Workflow integra5on Test various reference architectures Stand- up reference architecture Design the pilot Success criteria Timeline Scope Iden5fy and prepare data Build analy5cal models Design workflow Implement, manage and monitor Implement design changes from pilot learnings Invest in sooware development as necessary to improve UI Prepare ETL process for scale Build out infrastructure as required to support rollout WWT Services 1.Strategic Roadmap Use case defini8on Organiza8onal alignment Big Data Architecture high level design 2. Big Data Stack Build Detailed design Big Data architecture and BOM Procure, configure and deploy Big Data stack 3. Proof of Concept POC design Analy8cal models Customer data loaded, processed and analyzed 4. Produc8on Support Opera8onalizing POC Infrastructure Sustainment Training Ongoing support Indica9ve Infra- structure EXAMPLE STARTER KIT Big Data Solu9on Stack: 2 UCS 6296PP 2 Nexus 2232PP 16 Cisco UCS C240 EMC Isilon SoWware: PivotalHD, Greenplum, etc. EXAMPLE SCALE OUT HARDWARE Mul9ple expansion racks 2 Nexus 2232PP Fabric Extenders 16 Cisco UCS C240 EMC Isilon
Advanced Technology Center ENTERPRISE NETWORKS SECURITY COLLABORATION DATA CENTER Next Genera5on Networking Nexus (7K, 5K, 3K & 2K) Virtual Networking (Nexus 1000v) OTV, LISP, Fabric Path Layer 2 Extension DR/BC Networking BYOD (Bring Your Own Device) & Secure Mobility Jukebox ISE & RSA ASA 1000v VSG (Virtual Security Gateway) Cyber Security Solu5ons Unified Communica5ons Tandberg Video VXI (View & XenDesktop) WebEx, Call Center & Collabora5on Solu5ons Phones, Backpacks & SoO, Phone Clients Telepresence & Business Video Vblock, FlexPod & CloudSystem Matrix EMC & NetApp Storage vsphere / XenServer vcloud Director VDI (View / XenDesktop) Cisco CIAC & BMC CLM EMC s UIM & Cloupia FAST MDC (Mobile Data Center) Solu5ons BIG DATA Cisco UCS C220, C240 HP DL380 Nexus 2200, UCS 6296 FlexPod Select, Isilon storage Cloudera, MapR, PivotalHD Cloud Foundry Velocidata Appliance Next Genera5on provisioning tools A highly collabora5ve, ecosystem to design, build, educate, demo & deploy advanced technology solu5ons for our customers & partners
Big Data Environment Set-up: ATC Reference Architectures Four analy5cs- ready infrastructure stacks have been developed in the ATC to showcase Big Data technologies ANALYTICS TOOLS ANALYTICS DATABASES REFERENCE ARCHITECTURE 1 HP Internal Local Storage R IMPALA PYTHON HBASE Current REFERENCE ARCHITECTURE 2 UCS NetApp Direct A3ached Storage MICROSTRATEGY In Process REFERENCE ARCHITECTURE 3 UCS Isilon Network Storage MICROSTRATEGY JAVA JAVA JAVA R PYTHON R PYTHON IMPALA HBASE HAWQ HBASE In Process REFERENCE ARCHITECTURE 4 SAP HANA SAP HANA FILE SYSTEM/ DATABASES HORTON CLOUDERA MAPR CLOUDERA CLOUDERA PIVOTALHD GEMFIRE HORTON MAPR HORTON MAPR INGEST VELOCIDATA SPLUNK VELOCIDATA SPLUNK VELOCIDATA SPLUNK VELOCIDATA SPLUNK NETWORK NEXUS 2200 UCS 6296UP NEXUS 2232PP UCS 6296 NEXUS 2200 UCS B BLADES COMPUTE HP DL 380 UCS- C220M3 UCS- C240 HITACHI STORAGE JBOD SATA NETAPP E5460 ISILON HITACHI DATA Enterprise Structured Enterprise Unstructured 3 rd Party Web/ Unstructured ODS Data Warehouse Call Center Server Logs Financial Demographic
How to Leverage ATC Architectures Func9on Proof of Concept Vendor Comparison Field Demo Performance Benchmarking Descrip9on Test customer solu5ons prior to full onsite implementa5on, e.g. Run Use Case analy5cal models and architectures on Big Data machines Create Big Data hardware/sooware stack, poten5ally with client data Compare Big Data solu5ons to provide insight into strengths and weaknesses of each Run bake- offs to gauge how well a full solu5on can be solved using certain components Showcase Big Data capabili5es by hos5ng demos of WWT PoCs and analysis Enable virtual access for field engineers to run customer demos Run benchmark tests to measure speed and performance of Big Data technologies, including compe5ng Hadoop distribu5ons and storage op5ons We use the ATC for a variety of customer and partner use cases, ranging from technology tes5ng to full solu5on deployment Technology Evalua9on Evaluate new technologies in the ATC as they are released, allowing our engineers to get up to speed before working in customer environments Training Hold training courses for customers and partners that allow them to work with Big Data sooware and hardware in a highly customizable environment that reach across a variety of vendors
WWT Big Data Workshop WHAT IS IT? A full- day interac5ve session with WWT consultants and Data Scien5sts designed to increase your understanding of Big Data and help you outline your strategy for using Big Data analy5cs solu5ons to add value. IDENTIFY clear use- cases that can t be iden5fied with the current setup DETERMINE which of the use- cases can benefit from WWT capabili5es ESTIMATE use- cases poten5al impact and ease of implementa5on CHOOSE high- value, ac5onable use cases WHAT TO EXPECT Highly- Skilled Consultants and Engineers Emerging Technology Customized Technical and Strategic Whiteboard Session Best Prac5ces Expert Insight Use Cases and Success Stories $ Impact High- value, ac5onable use case Ease of Implementa5on