BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata
Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING WHAT WILL happen? Automated Linkages REPORTING WHAT happened? ANALYZING WHY did it happen? Predictive Models Link to Operational Systems 2 Batch Reports Ad Hoc, BI Tools Does this process change with Big? No. Do the tools you need change? Yes.
Big is an Evolution not a Revolution Flow DATA -> INSIGHTS -> ACTIONS Predictions Events Patterns Hypothesis Testing Strategic Actions Operational Actions Flow BIG DATA -> INSIGHTS -> ACTIONS Is the Ultimate USE of Big Different? No. 3
Evolution: What s Not New? #1 ECONOMICS > You still need to determine what data has value, how long to store it, and where to store it. And all your systems still need to prove ROI #2 TOOLS > Exploit parallelism, even on new data types. > Need for tools to interoperate easily > Ease of use, ease of access #3 ARCHITECTURE > Need flexibility, so you can add more tools > Need to leverage technologies you already have > Overall system must be production-ready, tested, reliable 4
Evolution: What s Different? Disruptive? #1 New ECONOMICS > Increased amounts of data you can afford to capture #2 New TOOLS > INSIGHTS FROM NEW DATA TYPES: quickly find the signal in the noise, integrate > DISCOVERY PROCESS : Fast! Capture and analyze data without the usual rigor, because much of it will not go into the EDW #3 New ARCHITECTURE FRAMEWORK > A hybrid ecosystem that makes it easy to use both old and new tools, on old and new data 5
Gartner Recommends: Shift from a Single Platform to an Ecosystem "Logical" Warehouse We will abandon the old models based on the desire to implement for high-value analytic applications. 6
Big Analytics The Problem Warehouse/ Intelligence Advanced Analytics Proliferation of Big analytics environments has resulted in fragmented data, higher costs, expensive skills, longer time to insight 7
Discovery (Advanced Analytics) The Problem The Solution Warehouse/ Intelligence Advanced Analytics SQL Framework Access Layer Integrated Discovery Platform (IDP) Proliferation of Big analytics environments has resulted in fragmented data, higher costs, expensive skills, longer time to insight Pre-Built Analytics Functions Integrated discovery analytics provides deeper insight, integrated access, ease of use, lower costs, better insight 8
UNIFIED DATA ARCHITECTURE System Conceptual View ERP MOVE MANAGE ACCESS Marketing Marketing Executives SCM CRM INTEGRATED DATA WAREHOUSE Applications Operational Systems Images DATA PLATFORM Intelligence Customers Partners Audio and Video Mining Frontline Workers Machine Logs DISCOVERY PLATFORM Math and Stats Analysts Text Scientists Languages Web and Social Engineers SOURCES 9 ANALYTIC TOOLS & APPS USERS
PROCESS FLOW What s Changed: Architecture Framework ERP Marketing Marketing Executives SCM CRM DATA INSIGHTS ACTION Applications Operational Systems Images Fast Loading Discovery Reports Dashboards Intelligence Frontline Workers Audio and Video Machine Logs Text Filtering and Processing Online Archival Pattern Detection: Path, Graph, Time-series analysis Real-time Recommendations Operational Insights Mining Math and Stats Customers Partners Engineers Scientists Web and Social New Models And Model Factors Rules Engines Languages Analysts SOURCES ANALYTIC TOOLS USERS
Integrated Analytics Operationalizing Insights in the Enterprise Single view of your business Marketing Executives Cross-functional analysis Applications Operational Systems Shared source of relevant, consistent, integrated data INTEGRATED DATA WAREHOUSE Intelligence Frontline Workers Load once, use many times Mining Customers Partners Lowest cost of ownership Math and Stats Executives Fast new applications time-to-market Analysts USERS 11 APPLICATIONS
Capture, Store, Refine Capturing for Storage and Processing Raw data capture History or long term storage Low cost archival Transformations Structured, semi-structured Sessionize, remove XML tags, extract key words Simple math at scale Batch processing DATA PLATFORM Mining Machine Learning Languages Programmer Scientists 12 APPLICATIONS USERS
Hadoop: Requirements for Staging, Preprocessing, Simple Analytics Land/source operational data > Only one extract from source system History or long term storage > Low cost storage Preprocess data > Sessionize data, remove XML tags Transformations > Structured and semi-structured Exploration > Investigate value of new data sources Batch scoring Single subject reporting Cost Low cost/value equation for data size Depth More data/raw data for small user community Multi-Structure Raw data (typically web logs) stored for later parsing Non-SQL Analytics Workload requires procedural programming or Map Reduce Flexibility Access to raw data, no prod constraints, no IT governance Parallel App Applications that require MPP Application Environment 13
SQL-MapReduce Analytics Unlocking Hidden Value in (Any) Interactive data discovery > Web clickstream, social > Set-top box analysis > CDRs, sensor logs, JSON Flexible evolving schema MapReduce, Graph, SQL, statistics, text, ASTER DISCOVERY PLATFORM Languages Intelligence Mining Marketing Executives Operational Systems Scientists Customers Partners Structured and multistructured data Patented SQL- MapReduce Math and Stats Analysts 100+ packaged functions USERS 14 APPLICATIONS
DATA INSIGHTS : Discovery ACTION ERP SCM CRM Images GOVERNANCE & INTEGRATION TOOLS DATA MANAGE DISCOVERY MOVE Raw data acquisition transformation - nuggets visualization Combine new with old DATA New PLATFORM insight generation Generate hypotheses ACCESS INTEGRATED DATA WAREHOUSE Marketing Applications Intelligence Marketing Executives Operational Systems Frontline Workers Audio and Video Machine Logs Text Web and Social HYPOTHESIS TESTING Use of new data, insights to Augment predictive models Try process and action changes Experiment design, testing Results analysis Fast fail, or move into production DISCOVERY PLATFORM Mining Math and Stats Languages Customers Partners Engineers Scientists Analysts SOURCES ANALYTIC TOOLS USERS
Teradata Aster s SNAP Framework TEXT T STATS PATH SQL MAP REDUCE GRAPH SNAP FRAMEWORK INTEGRATED OPTIMIZER INTEGRATED EXECUTER UNIFIED SQL INTERFACE STORAGE SYSTEM AND SERVICES ROW STORE COLUMN STORE FILE STORE 16
A New Analytical Approach A Single SQL Statement to Acquire, Prepare, Analyze & Visualize Social Media ERP Text CRM Hado op EDW Acquisition Preparation Analysis Visualization Teradata Aster Discovery Platform Users Single SQL statement: SELECT * FROM npathviz( on SELECT * FROM npath ( ON (SELECT * FROM SESSIONIZE ( ON SELECT * FROM LOAD_FROM_TD_HADOOP) PARTITION BY sba_id SYMBOLS ( event LIKE '%EXTERIOR LIGHTING%' AS START_EVENT, event NOT LIKE '%BRAKE SYSTEM%' AS NEXT_EVENT) RESULT ( ) ) n; Benefits: Single solution & workflow, single skill set Shared metadata, data, insights Fastest time to value, easy iterations & speed of analysis 17
Organizations Face Several Obstacles Building Big Systems on Their Own Difficulty deploying and integrating new systems Difficulty managing multiple systems, new types of data Difficulty providing accessibility to fast insights on big data Difficulty finding skilled analysts e.g., data scientists 18 Source: Big Analytics 2012 Survey, Teradata
UNIFIED DATA ARCHITECTURE System Conceptual View ERP MOVE MANAGE ACCESS Marketing Marketing Executives SCM CRM INTEGRATED DATA WAREHOUSE Applications Operational Systems Images DATA PLATFORM Intelligence Customers Partners Audio and Video Mining Frontline Workers Machine Logs DISCOVERY PLATFORM Math and Stats Analysts Text Scientists Languages Web and Social Engineers SOURCES 19 ANALYTIC TOOLS & APPS USERS
Filtering Teradata SQL-H Give business users on-the-fly access to data in Hadoop Trusted: Use existing tools/skills and enable self-service BI with granular security Teradata SQL-H Aster SQL-H Standard: 100% ANSI SQL access to Hadoop data Hadoop MR HCatalog Hive Fast: Queries run on Teradata or Aster, data accessed from Hadoop Pig Efficient: Intelligent data access leveraging the Hadoop HCatalog Hadoop Layer: HDFS 20
Fabric Based Computing Optimized for BI The backbone of UDA > High performance infrastructure > Aggregate Teradata IDW, Aster Discovery and Hadoop > Industry approach optimized for Big Analytics use Key Teradata Elements > BYNET V5 software protocoal on InfiniBand interconnect > Teradata Managed Servers > System management across all of the FBC 21
Teradata Viewpoint Single Operational View (SOV) for Teradata, Aster, & Hadoop Creation of new portlets: > Node Monitor (Aster & Hadoop) > Aster Completed Processes > Hadoop Services Integration into existing: > Monitoring: System Health, Metrics Analysis, Metrics Graph, Capacity Heatmap, Space Usage, Query Monitor (TDB & Aster) > Admin: Alert Viewer, Alert Setup, Teradata Systems, Role Manager 22
Teradata Aster Big Analytics Appliance First Deeply Integrated SQL, MapReduce and Hadoop Appliance UNIQUE FEATURES 1. Modular Aster and 100% open-source Hortonworks nodes 2. First ANSI SQL & HCatalog integration via SQL-H 3. Only ANSI SQL & MapReduce integration: SQL-MapReduce 4. Most manageable Hadoop: Teradata Viewpoint & TVI 5. Comprehensive Discovery Portfolio: 100+ pre-built functions 6. Fully-engineered and supported by Teradata, backed by Hortonworks world-class Hadoop team 7. Cascading InfiniBand switches, hot node/ cluster expansion Benefits Leverage existing investments in standard BI, ETL tools & people with SQL skills Industry s highest performance platform for Big Analytics Lowest TCO (technology + people), highest ROI, and fastest time to value 23
Thank You! Questions and Answers 24