Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014
Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any Data > Any Analytic > Virtual Compute > Summary & conclusions
Big Data Introduction and level-set Big Data
WHAT IS BIG DATA?
BIG DATA IS NOT A TECHNOLOGY
BIG DATA IS NOT THE THREE V S
BIG DATA IS NOT A USE CASE
BIG DATA IS NOT AN ARCHITECTURE
BIG DATA IS A MOVEMENT DEMANDING MORE ANALYTICS ON ALL DATA
CREATE A DATA CULTURE
Data-Driven Business SUCCESS STRATEGIC OPERATIONAL CULTURAL View Develop Focus Accelerate Integrate Measure Empower Build Take Value Foster Leverage Possess
Enhanced customer experience 55 9 Process efficiency 49 15 New products/new business model 42 17 More targeted marketing 41 12 Cost reduction 37 13 Improved risk management 32 9 Monetize information directly 23 9 Regulatory compliance Enhanced security capabilities 17 16 10 13 others 5 3 Big Data Adoption in 2013 Shows Substance Behind the Hype Gartner N = 465; multiple responses allowed 0 10 20 30 40 50 60 70 Percentage of Respondents Business issues now addressing Likely to address (12-14 months)
DATA INSIGHT ACTION Why - Companies who exploit ALL their data achieve competitive advantage How Implement an enterprise data architecture that includes three components: staging, discovery, and DW But you don t throw away what you ve already done and start over...
The four forces are leading to the rise of the Logical Data Warehouse Unified Data Architecture ERP MOVE MANAGE ACCESS Marketing Marketing Executives SCM CRM Images DATA PLATFORM DATA WAREHOUSE Applications Business Intelligence Operational Systems Customers Partners Audio and Video Data Mining Frontline Workers Machine Logs DISCOVERY PLATFORM Math and Stats Business Analysts Text Data Scientists Languages Web and Social Engineers SOURCES ANALYTIC TOOLS & APPS USERS
Teradata Unified Data Difference
Big Data Big Data Enabling the Logical Data Warehouse
Teradata s technology strategy: enable the Logical Data Warehouse, a.k.a.: Unified Data Architecture Any Data Structured, schemaless or name-value pair Any Analytic Path, graph, affinity, time-series, text, etc., etc. Virtual Compute Transparent Orchestration of Analytic Services throughout the Unified Data Architecture Seamless data synchronisation Simplified Systems Management & Administration 1-click data movement and management throughout the Unified Data Architecture Single pain of glass admin; multiple moving parts that look like one system (and manage themselves wherever possible); proactive monitoring & alerting
Big Data Any Data Big Data
The Internet of Things and the evolution of Information Management Increased ceremony (integrity, query performance) Increased flexibility and load performance Schema on load Key-Value Pair Schema on read
Teradata s Integrated Big Data Appliance is optimised for set-based Analytics on structured data Contextual Analytics Resource Flexibility Always On Corporate memory Deep analytics Data Labs Data refinery Hadoop integration Ad hoc projects Peak workload assist Disaster recovery High availability Archive reporting & retrieval Audit and compliance
can support management and Analytics of name-value pair data today BI tools Source data Schema Weblogs ETL Data Warehouse CLOB SQL + parse/extract functions Load time Runtime Early binding Late binding
with native JSON support coming in Teradata 15.0 SELECT box.mfg_line.product.color box.mfg_line.product.size box.mfg_line.product.prod_id box.mfg_line.product.create_time AS "Color", AS "Size", AS "Prod_ID", AS "Create_Time" FROM mfgtable WHERE CAST(box.MFG_Line.Product.Create_Time AS TIMESTAMP) >= TIMESTAMP'2013-06-16 00:00:00' AND box.mfg_line.product.prod_id = 96; Color Size Prod_ID Create_Time ----- ----- ------- ------------------- Blue Small 96 2013-06-17 20:07:27
Teradata Vital Infrastructure Need to manage and process large volumes of filebased data? We have you covered Aster and Teradata SQL-H Teradata Studio with Smart Loader Value Added So ware from Partners Teradata Viewpoint Teradata Connector for Hadoop (TDCH) Intelligent Start and Stop NameNode Failover Teradata Distribu on for Hadoop (Based on Hortonworks HDP) Op mized hardware for Hadoop BYNET V5 40GB/s InfiniBand interconnect
One solution, Many uses Contextual Analytics Resource Flexibility Always On Corporate memory Unrefined Multi-structured data Current data Archival data Raw data IDW data years 1-5 IDW data years 5-10 Unrefined structured data
Big Data Any Analytic Big Data
Need to move subsets of that data into the Exploration & Discovery environment, without transformation? SQL has been described as Intergalactic Data Speak. It is the lingua franca of relational database technology. But relational theory assumes that ordering doesn t matter - and support for iteration and relationship Analytics is correspondingly weak in SQL. What if we could elegantly extend SQL to include iterative styles of Analytics?
Teradata-Aster: runs MapReduce, Speaks SQL MapReduce-based path Analytics SELECT * FROM npath ( ON ( ) PARTITION BY sba_id ORDER BY datestamp MODE (NONOVERLAPPING) PATTERN ('(OTHER_EVENT FEE_EVENT)+') SYMBOLS ( event LIKE '%REVERSE FEE%' AS FEE_EVENT, event NOT LIKE '%REVERSE FEE%' AS OTHER_EVENT) RESULT ( ) ) n;
Graph Basics Graphs model relationships between objects like people, products, processes, bank accounts Graphs are made up of vertices or nodes (entities) and lines called edges (relationships) that connect them Two Major Categories of Graph Technologies Navigational Graph databases (Neo4J), RDF/SPARQL (IBM, Oracle) Analytical Graph engines (Aster, Google, Hadoop Giraph)
Aster SQL-GR Engine Built on a scalable BSP framework to enable Big Graph Feature Native graph processing Massively scalable, not bound by memory limits Pre-built graph functions Integrated with SQL Designed for Analytics GRAPH Benefits Richer insights with powerful Graph processing Large scale graph processing with best price performance Brings Graph processing to SQL audience
Teradata-Aster s SNAP Framework will soon enable more Analytic engines, more native data stores TEXT T STATS PATH SQL MAP REDUCE GRAPH SNAP FRAMEWORK INTEGRATED OPTIMIZER INTEGRATED EXECUTER UNIFIED SQL INTERFACE STORAGE SYSTEM AND SERVICES ROW STORE COLUMN STORE FILE STORE
Big Data Virtual Compute Big Data
Virtual Compute Capability Enabling the UDA Vision TERADATA DATABASE HADOOP TERADATA ASTER DATABASE TERADATA DATABASE GRID ASTER Remote, push-down processing in Hadoop Bi-directional data movement Leverage Hive query language (push foreign grammar) Results returned to Teradata for additional processing Leverage SQL-MR functions in Aster Pass SQL-MR syntax/grammar to Aster Push local TD table for remote processing SQL-MR (e.g. npath, Sessionize) functions executed in Aster Teradata to Teradata SQL sub-query sent to Teradata Database appliance Additional processing using data from appliance in Teradata IDW Leverage GRID compute (SAS, Perl, Python, Ruby, R) Data streamed from TD to GRID nodes for processing Isolates compute resource use and potential faults from database
Remote Processing On Hadoop Leverage data platform resources, reduce data movement Query through Teradata Sent to Hadoop through Hive MapReduce processing on Hadoop Results returned to Teradata Additional processing joins data in Teradata Final results sent back to application/user Available in Teradata 15.0!
Execute SQL-MR Functions In Aster Leverage pre-packaged functions in Aster Query through Teradata SQL-MR request sent to Aster Sessionize function performed in Aster Results returned to Teradata Additional processing using session results in Teradata Final results sent back to application/user Available in a future release
Big Data Summary and conclusions Big Data
Teradata s technology strategy: enable the Logical Data Warehouse, a.k.a.: Unified Data Architecture Any Data Any Analytic Virtual Compute Seamless data synchronisation Simplified Systems Management & Administration Name-value pair operators (available now) JSON (Teradata 15.0) Aster File System (Aster 6.0) BSP-based Graph Engine (Aster 6.0) More Analytic engines coming to the Aster SNAP framework soon Fabric-Based Computing (available now with further enhancements & extensions planned) Transparent Orchestration (starting in Teradata 15.0) Unity Data Mover & Unity Ecosystem Manager (available now for multi-teradata system environments, support for Aster, Hadoop coming soon) Viewpoint provides Single pain of glass management and administration (available now with further enhancements & extensions planned)
The UDA provides cost-effective storage for any data
Why UDA Architecture Framework is important Hadoop JSON Store NoSQL Store
BEST TECH TO ENABLING A DATA CULTURE IS UNIFIED DATA ARCHITECTURE
THE BIG ALL DATA IS DATA A MOVEMENT
UNIFIED DATA ARCHITECTURE MOVEMENT