Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate & review If the session is not on your schedule, just find it via the session scheduler, click on this session and then go to Rate & review. Thank you for providing your feedback, which helps us enhance content for future events. 2
Unlock your insights with Big Data in a box Claude Lorenson, Ph. D Cloud & Enterprise Marketing Group; Microsoft Wendy Harms HP Converged Systems June 2014
Agenda The modern Data Warehouse Insights from your data Microsoft Analytics Platform System Big Data Performance Value 4
5 The traditional data warehouse Data sources
6 The traditional data warehouse Data sources Non-relational data
Data sources Non-relational data 7
Insights from all your data Enrich and optimize your data from non-traditional sources 8 8
Roadblocks to evolving to a modern data warehouse Keep legacy investment Acquire Big Data solution Buy new tier-one hardware appliance Acquire business intelligence Limited scalability and ability to handle new data types Significant training and data silos High acquisition and migration costs Complex with low adoption 9
Introducing the Microsoft Analytics Platform System The turnkey modern data warehouse appliance Relational and non-relational data in a single appliance Enterprise-ready Hadoop Integrated querying across Hadoop and PDW using T-SQL Direct integration with Microsoft BI tools such as Microsoft Excel Near real-time performance with In-Memory Columnstore Ability to scale out to accommodate growing data Removal of data warehouse bottlenecks with MPP SQL Server Concurrency that fuels rapid adoption Industry s lowest data warehouse appliance price per terabyte Value through a single appliance solution Value with flexible hardware options using commodity hardware 10
Microsoft Analytics Platform System The turnkey modern data warehouse appliance 11
Petabytes Value to the business What is Big Data and why is it valuable to the business? Evolution in the nature and use of data in the enterprise Data complexity: variety and velocity Historical analysis Insight analysis Predictive analytics Predictive forecasting 12
What is Hadoop? OPERATIONAL SERVICES AMBARI OOZIE FALCON Core Services FLUME SQOOP LOAD & EXTRACT NFS WebHDFS DATA SERVICES HBASE PIG MAP REDUCE YARN HDFS HIVE & HCATALOG 1 3 Hadoop Cluster compute & storage.......... compute & storage Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware 13
Hadoop alone is not the answer to all Big Data challenges Steep learning curve, slow and inefficient Hadoop ecosystem Move HDFS into the warehouse before analysis Learn new skills New data sources T-SQL 14 New New data data sources sources Build Integrate Manage Maintain Support ETL
APS delivers enterprise-ready Hadoop with HDInsight Manageable, secured, and highly available Hadoop integrated into the appliance SQL Server Parallel Data Warehouse PolyBase High performance and tuned within the appliance End-user authentication with Active Directory Microsoft HDInsight 100-percent Apache Hadoop Managed and monitored using System Center Accessible insights for everyone with Microsoft BI tools 15
Connecting islands of data with PolyBase Bringing Hadoop point solutions and the data warehouse together for users and IT Select Result set Microsoft Azure HDInsight Hortonworks for Windows and Linux Cloudera SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides the ability to query non-microsoft Hadoop distributions, such as Hortonworks and Cloudera 16
Use cases where PolyBase simplifies using Hadoop data Bringing islands of Hadoop data together Running high performance queries against Hadoop data Archiving data warehouse data to Hadoop (move) Exporting relational data to Hadoop (copy) Importing Hadoop data into a data warehouse (copy) 17
Big Data insights for anyone New insights with familiar tools through native Microsoft BI integration Takes advantage of high adoption of Excel, Power View, PowerPivot, and SQL Server Analysis Services Minimizes IT intervention for discovering data with tools such as Microsoft Excel Everyone else using Microsoft BI tools Offers Hadoop tools like MapReduce, Hive, and Pig for data scientists Enables DBA and power users to join relational and Hadoop data with T-SQL Power users Data scientist 18
Shinsegae Corporation, a major department store chain in Korea, needed better performance for customer data mining and basket purchase analysis. Shinsegae took advantage of the integration of PDW and Hadoop to combine 40 terabytes of data, and was pleased to see PolyBase performing nearly twice as fast as their best Hive/Hadoop environment. We are really satisfied with the performance of PolyBase to allow us to join relational and Hadoop data (weather data, board data, text data) faster and easier. PolyBase is a really powerful feature of PDW to deploy a Big Data system. PolyBase is one of the reasons we selected PDW as our Big Data platform. #1 Retail company in Korea 19
Microsoft Analytics Platform System The turnkey modern data warehouse appliance 20
Performance limitations and scale with a traditional data warehouse Scale up Rowstore Data Querying data by row C1 C2 C3 C4 Forklift R1 R1 R1 R1 R2 Forklift R2 R2 R2 R3 R3 R3 R3 R4 R4 R4 R4 R5 R5 R5 R5 R6 R6 R6 R6 Page 1 Page 2 Page 3 Diminishing scale as requirements grow Sub-optimal performance for many data warehouse queries 21
Scaling out your data to petabytes Scale-out technologies in the Analytics Platform System Scale out Multiple nodes with dedicated CPU, memory, and storage PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW / HDInsight PDW / HDInsight Ability to incrementally add hardware for near-linear scale to multiple petabytes Ability to handle query complexity and concurrency at scale PDW No forklift of prior warehouse to increase capacity 22 0 terabytes 6 petabytes Ability to scale out HDInsight and PDW 22
Blazing-fast performance MPP and In-Memory Columnstore for next-generation performance Columnstore index representation Up to 100x faster queries Up to 15x more compression Updateable clustered columnstore vs. table with customary indexing 23 Parallel query execution Store data in columnar format for massive compression 23 Query Results Load data into or out of memory for nextgeneration performance with up to 60% improvement in data loading speed Updateable and clustered for real-time trickle loading
Rapid adoption fueled by concurrency and mixed workloads Great performance under stress ETL/ELT with SSIS, DQS, and MDS Analytics Platform System SQL Server SMP ERP CRM LOB APPS ETL/ELT with DWLoader PDW SSRS and SSAS Hadoop and Big Data PolyBase BI tools Ad-hoc queries HDInsight
MEC, a global media agency, uses SQL Server PDW with in-memory technology to cut query time helping marketers unlock the value of their data. SQL Server Parallel Data Warehouse gives us massively parallel advantages. Whereas it would take up to four hours to run queries scaling across multiple nodes, now it takes just minutes. 25
Microsoft Analytics Platform System The turnkey modern data warehouse appliance 26
Giving you what you need when you need it.. HP s rapid data warehouse appliance evolution 2011 HP Enterprise Data Warehouse Appliance 1 st HP/Microsoft DW appliance 27 2013 HP AppSystem for Microsoft SQL Server 2012 Parallel Data Warehouse Massive jump in scalability and functionality Up to 6PB with Hadoop connector Scalability 2014 HP ConvergedSystem 300 for Microsoft Analytics Platform New high-performance, highavailability platform Fully integrated Hadoop Functionality
Base Parallel Data Warehouse (PDW) components HP ConvergedSystem 300 for Microsoft Analytics Platform InfiniBand (data network) and Ethernet (management network) connectivity APS Rack & Network 1 x HP 642 Shock Intelligent Series Rack 2 x FDR InfiniBand 2 x HP 5120 switches (2) 2 x power distribution unit Choice of single-phase or multi-phase (priced separately) Orchestration Server Failover Server Failover Server PDW Region Base Scale Unit PDW Region Base Scale Unit (Control) 1 x Orchestration Server (PDW) 1 x Failover Server (PDW) 1 x Optional Failover Server (PDW) Virtualized control and management node; failover node for high availability (HA) PDW: Massive parallel scale-out query processing PDW Region Base Scale Unit (Processing) 2 x PDW Data Servers 1 x DAS Storage Block 1 TB, 2 TB, or 3 TB choice 28
PLUS integrated analysis with Hadoop-based HDInsight Orchestration Server Orchestration Server HDI Region Base Scale Unit (Control) 1 x Orchestration Server (HDInsight) 1 x Failover Server (HDInsight) Manage data of any size or type Relational or non-relational Perform more complex analysis faster SQL Server PDW s PolyBase One appliance for integrated analysis of HDInsight non-relational and SQL Server PDW relational data 100% Apache Hadoop-based data platform 29 HDI Region Base Scale Unit PDW Region Base Scale Unit HDI Region Base Scale Unit (Processing) 2 x HDI Data Servers 1 x DAS Storage Block 1 TB, 2 TB, or 3 TB choice Scale quickly and easily Add up to 3 x HDI Data Scale Units and 1 x HDI Failover Server per rack
Flexible, mix-and-match PDW and HDInsight From the factory or expandable in the field (examples ) Orchestration Server Orchestration Server Orchestration Server Orchestration Server Orchestration Server Orchestration Server Orchestration Server Modular Flexible Cost-effective HDI Region Base Scale Unit or HDI Data Scale Unit or HDI Data Scale Unit PDW Data Scale Unit HDI Region Base Scale Unit HDI Data Scale Unit PDW Data Scale Unit PDW Data Scale Unit HDI Region Base Scale Unit PDW Region Base Scale Unit PDW Region Base Scale Unit PDW Region Base Scale Unit PDW Region Base Scale Unit 30
Massive scalability HP ConvergedSystem 300 for Microsoft Analytics Platform Up to 6 PB 5 (PDW Region) Up to 1.2 PB (HDInsight Region) Up to 64 nodes per workload region Base rack 5 Based on 5:1 compression Fully populated base rack (8 nodes) Easily expand by adding racks 31
Simplified management for increased ROI Exclusive to HP: HP Support Pack Validation of Microsoft Reference Architecture (MRA) compliance Reports all serial numbers and rack locations of all devices Diagnostic tools to validate configuration Unique HP tools ensure solution runs at optimal performance and delivers expected return on investment (ROI) Provide Proactive Care reporting for support ConvergedSystem firmware/driver update package 32
The Royal Bank of Scotland the leading UK provider of corporate banking services needed a powerful analytics platform to improve performance and customer services. The bank implemented a Microsoft SQL Server 2012 Parallel Data Warehouse appliance to increase productivity by 40 percent for faster response to business needs. I knew that it would be easy for my team to transition from managing SQL Server databases to SQL Server 2012 PDW, and the solution cost about 85 percent less than products from other vendors. 33
Microsoft Analytics Platform System No-compromise modern data warehouse solution Meeting today s Big Data analytics requirements Enterprise-ready Hadoop with HDInsight and the simplicity of PolyBase Optimized performance with MPP technology and In-Memory Columnstore Providing value with a low TCO 34
For more information Hewlett Packard HP ConvergedSystem 300 for Microsoft Analytics Platform hp.com/go/convergedsystem/cs300aps HP ActiveAnswers hp.com/solutions/activeanswers Microsoft Microsoft Analytics Platform Server microsoft.com/aps 35
Thank you