A Whole New World Big Data Technologies Big Discovery Big Insights Endless Possibilities Dr. Phil Shelley
Query Execution Time Why Big Data Technology? Days EDW Hours Hadoop Minutes Presto Seconds Milliseconds In-Memory Gbytes Tbytes 100 s Tbytes Pbytes Data Size
ETL is obsolete ETL is a half century old Transaction ETL BI Data Hub Legacy BI Transaction Real-Time Analytics
Data Silos to Data Insight Single Copy Single Point of Truth Structured for Analysis Loaded in near-real-time Analytics and Discovery impossible or very difficult and slow Analytics and Discovery easier and fast with nosql
Why Big Data From an IT Perspective Responsiveness Business user no-longer have to wait for an ETL setup or change Data is available without IT involvement No concept of building a schema or cube before the business can use the data ETL complexity is needed no-longer DATA HUB Source Once Re-Use many times ETL changes to ELTTTTTT Latency in data is a thing of the past Analysis is routinely possible within minutes of data creation Long-running workload Can be eliminated and executed at any time Run times are a fraction of the original clock-time Batch processing on mainframes or Data Warehouse workload Moved to Hadoop Run 10, 50, even 100 times faster Intelligent Archive Put your archives/tape data on Hadoop and make it Intelligent Archive with the ability to run analytics or join it with other data Modernize Legacy Mainframe MIPs reduction has very attractive ROI Move Data Warehouse workload Reduce Cost Go Faster Avoid traditional warehouses The perpetual MDM challenge can take a completely different path
Why Big Data From a Business Perspective Focus on questions that bring value from the data... Test hypotheses without data or system restrictions Capture and link data from many and disparate sources Focus on discovery Clustering Heat mapping Linkages Text Analytics Drive the business based on those discoveries Now the inquisitive mind is needed (data scientist) I wonder if, then test it and measure the outcome Ask the previously impossible questions Almost unlimited compute and storage Capture and discover value from the full history & granularity The trend is Real-Time Automated discovery and response
DATA IN DATA OUT EDW or In-Memory High-Speed Marts Making Hadoop a Data Hub is increasingly accepted as a game-changing trend NoSQL and in-memory are driving massive change in the Data Warehousing space, shrinking the role of traditional Data Warehouses Batch Processing Area ESB Temporary Marts GOLD Data Hub Persisted Marts User Access 100% Source Data Leveraging the Source-Once and Re-Use approach, drives efficiency, reduces data silos reduces latency and time to value, massively improves analytics and discovery, greatly reduces costs Real-Time Analytics Real-Time Databases Transactional System Sources DATA IN DATA OUT
Layered Data Architecture Transactional Systems HADOOP Hbase PORTALS ESB Staging Incoming Persisted Full History Full Detail Concatenated De-Normalized Data Qualty GOLD Single Point of Truth De-Normalized Concatenated Segmented Use-Defined Marts Marts Persisted Marts Temporary Marts Export Marts Security-Driven Marts Batch working area EDW Appliance In-Memory USERS High-Speed SQL Drill-Down and Interaction
Use-Case Examples Performance Modernization Batch processing ETL replacement Cost reductions
Use-Case Examples Performance Long running batch jobs Slow ETL Slow queries due to system load Analytics conflicting with transactional systems Data latency System unable to meet production schedules Job dependencies cause schedule unreliability
Use-Case Examples Modernization Retire obsolete systems Language of code is hard to support Skill sets are becoming harder to find Applications hard to maintain Older systems will not scale with the business
Use-Case Examples Batch processing Faster Easier to maintain Cheaper Mainframe COBOL HP Unix AIX Sun DB2 Oracle
Use-Case Examples ETL replacement A sweet spot for Hadoop Massive performance improvement Shrink ETL footprint Reduce licensing costs Reduce the number of data copies Reduce latency from data creation to data use Build a data integration hub Move from ETL to ELttttt...
Use-Case Examples Cost reductions Hadoop can be 100% open-source Very large cost reduction opportunities Mainframe MIPS DB2 Oracle ETL tools EDW Teradata, Netezza
Business Value/Cost Savings Generate and Prioritize Use-Cases Complexity/Cost/Time to Implement
Consider Your Options
Program Macro View POC to discover Data Sourcing, Integration and Use-Case Challenges Establish Landscape Source Data for Initial Use-Cases Establish Governance and Security Implement Initial Use-Cases
Tracks and Milestones Establish Landscape Build dev. Build Prod. Apply Security, Access Methods & Controls Determine Access and Visualization Tools Implement Initial Visualization Tools Source Data for Initial Use-Cases Identify Data One-Time Source Dev Data -- Establish Data Architecture -- Build Sourcing and QA Scripts Initial Load -- Schedule Incremental Validate Sourcing Establish Governance and Security Design Data Source Standards Establish Governance Roles Design Security and Access Controls Validate Governance and Security Implement Initial Use-Cases POC of 1 st Use-Case -- Design Use-Cases Develop Use-cases Validate Use-Cases
Acceleration Establish Landscape Build dev. Build Prod. Apply Security, Access Methods & Controls Determine Access and Visualization Tools Implement Initial Visualization Tools Source Data for Initial Use-Cases Identify Data One-Time Source Dev Data -- Establish Data Architecture -- Build Sourcing and QA Scripts Initial Load -- Schedule Incremental Validate Sourcing Establish Governance and Security Design Data Source Standards Establish Governance Roles Design Security and Access Controls Validate Governance and Security Implement Initial Use-Cases POC of 1 st Use-Case -- Design Use-Cases Develop Use-cases Validate Use-Cases Start ASAP
Big Data - Fast Start Know where to start Know how to Start Know how to avoid traps How to integrate Big Data tools into your Enterprise Vendor agnostic Understand how to think Big Data, but start small
Starting to think about Big Data? Wondering how to put all the pieces together?
Many companies are considering how to begin their Big Data journey Over 80% are uncertain about how to start What are the best use-cases Fear of making strategic and costly mistakes stall most companies Uncertainty prevails about options, products and services
Which Hadoop Distribution Do I need other products or tools What vendors should I go with Will I end-up locked-in in the fast moving space What about the skills of my team and how to close gaps What options do I have to outsource or supplement my internal team
Newton Park Partners gives you the start you need An interactive model to bring you to the point of starting your Big Data Journey Workshops to mature ideas and use-cases to the point of POC - Fast Led by Dr. Phil Shelley, NPP brings years of Big Data planning and operational expertise What works and what does not As a former CIO/CTO/CEO leading your kick-start, you are sure to be talking the same language Vendor agnostic, just simple IT management experience in the Big Data space
NPP will walk you through the steps to your Big Data ventures... Background on Big Data tools and trends Discovery workshop with your team Human Talent review Governance and Security guidance Enterprise Data Architecture Vendors, partners and tool selection Use-case generation for your 1st POC
In one month from start to POC-ready Executive alignment Are you ready to start? Discovery Use-case generation Talent assessment Enterprise Data Architecture alignment Governance and Security alignment POC options identified
Workshop Agenda (Example) Duration Focus Attendees 3 Hours Big Data overview Use-case examples How Big Data tools can be used Company objectives Unmet business needs CIO CIO direct reports Business Unit Heads 3 Hours Use-case generation Use-case prioritization CIO CIO direct reports Multiple 1-hour Sessions Hadoop overview Why Hadoop and nosql Building Hadoop and nosql environments Data integration Governance and data quality Infrastructure requirements Hadoop operations HR and talent requirements Vendor and commercial product options BI tools IT functional leaders As applicable to each topic Project manager(s)
Phil Shelley phil@consultnpp.com