How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW Roger Breu PDW Solution Specialist Microsoft Western Europe Marcus Gullberg PDW Partner Account Manager Microsoft Sweden
SQL Server PDW - Your socket to Big Data HP AppSystem for SQL 2012 Parallel Data Warehouse (PDW)
Agenda The Modern Data Warehouse and socket to Big Data SQL Server Parallel Data Warehouse (PDW) Overview fast, scalable, lowest TCO Interesting customer cases Q&A
The world of data has changed 10x increase every five years 4.3 connected devices per adult 85% from new data types Data explosion Consumerization of IT 27% using social media input
The Large Hadron Collider produces 1 PB/sec
But hold on, I m not CERN and I don t have a Large Hadron Collider
But you do have Sensors Clicks Logs Transactional records Call centers Images Documents Signals from social media Simulations
A Definition Traditional Data is Highly Structured traditional databases are organized around planned queries Big Data is All This and More: volume, variety, velocity (and volatility & variability)
Common Big Data customer scenarios IT infrastructure optimization Churn analysis Legal discovery Social network analysis Natural resource Weather exploration INDUSTRY forecasting Traffic flow optimization Healthcare outcomes Web app optimization GAIN COMPETITIVE ADVANTAGE BY MOVING FIRST AND FAST IN THEIR Fraud detection Life sciences research Advertising analysis Equipment monitoring Smart meter monitoring
Twitter Analytics with Microsoft Excel Demo
Is this Big Data? Is this Enterprise Ready?
Big Data is more than just new BI! Big Data is more than just Hadoop!
The Big (Data) picture $$$ New data sources Traditional data sources Data Insights Value
The Traditional Data Warehouse 2 Real-time data 4 Cloud-born data 1 Increasing data volumes 3 New data sources & types Data sources Non-Relational Data
The Modern Data Warehouse Data sources Non-Relational Data
Combining structured and semistructured Data with SQL Server PDW and Polybase And remember, it s not just working with Twitter data Demo
DATA SOURCES DATA SYSTEMS APPLICATIONS Socket for BigData: Hadoop access made simple SQL Server PDW Microsoft Applications HDInsight CREATE TABLE Customers ([user_id] INT, name NVARCHAR(50) CREATE EXTERNAL TABLE ClickEvent (url varchar(50), event_date date, user_id varchar(50) ) WITH LOCATION = hdfs://myhadoop:5000/clickstream/click.txt ; SELECT COUNT(*) FROM Customers New c Sources JOIN Traditional ClickEvent Sources e ON (web logs, email, sensor data, c.[user_id] (RDBMS, OLTP, = e.[user_id] OLAP) social media) WHERE c.name = 'Jones'; Polybase Feature in SQL PDW
Parallel or not parallel? Scale up (SMP)» Traditional approach» Build for specific requirement» Build HA etc. additionally» Maintain and Tune (Load/File Distribution)» Unknown Future workloads» Still a very good data mart solution in a Hub and Spoke architecture with SQL Server PDW Scale out (MPP)» Modern way of data warehousing» Resilient & Predictable» Big data / DW Best Practices in a box» Deploy Fast and Drive Value» Built-in HA» Scalable (start small/grow when needed) SQL Instance #8 Storage SQL Instance #... Storage SQL Instance #4 Storage SQL Instance #3 Storage SQL Instance Storage SQL Instance #2 SQL Instance #1 Storage Storage
Performance comparison Traditional Data Warehouse vs. Modern Data Warehouse (SQL Server PDW) Demo
Interesting customer cases
BENEFITS Large fresh-food retailer in Scandinavia uses SQL Server PDW to improve customer basket analysis and optimize marketing campaigns Further scale existing SQL Server based solution and get ready for Big Data SQL Server PDW is a hosted solution from their infrastructure outsourcer Build on existing SQL Server knowledge One of the largest fresh-food retailers in Scandinavia uses SQL Server PDW as backend supporting a Microsoft BI application handling billions of rows of POS data (Point of Sales). Customer will now have the ability to make lightning fast customer basket analysis, campaign management and receipt line analysis.
BENEFITS Leading Scandinavian logistic company decides for SQL Server PDW to lower their TCO Lower TCO due to standardization (previously had DWH on Oracle and SQL Server) Expand/meet future capacity requirements Lower price/tb compared to existing SAN based solutions Customer operates in Denmark and Sweden, is large in the advertising market in the Nordic region and a leading operator in logistics services to, from and within the Nordic region. PDW is going to be used as the primary data warehouse platform. Previously customer has been operating multiple data warehouse environments on even different platforms. SQL Server PDW will be used as the new standard while also embracing big data in the future.
BENEFITS AMD Boosts Data Warehouse performance with PDW Cuts Storage Costs Through Data Compression Improves Data-Loading Performance Reduces Support Work Needed by an Estimated 90 Percent In testing semiconductor wafers, AMD uses a data warehouse to process and analyze one terabyte of data each week. When its data warehouse began foundering under the load, AMD switched to Microsoft SQL Server 2008 R2 Parallel Data Warehouse. With the new system, AMD has increased performance and enabled a sustainable and scalable solution. http://www.microsoft.com/casestudies/case_study_detail.aspx?casestudyid=710000001887
BENEFITS Direct Edge gets 142X performance gain with Parallel Data Warehouse (PDW) Appliance Large US Stock Exchange needed an MPP Appliance to improve performance, scale and a complete BI solution PDW delivered 142X Query Performance gain, linear scale and complete BI Solution PDW Appliance is the BI backbone for DirectEdge Direct Edge, one of the largest equities exchanges in the world, wanted a better, faster business intelligence (BI) solution for its financial analysts to use to create reports. Direct Edge implemented a data warehouse and BI solution based on Microsoft SQL Server 2008 R2 Parallel Data Warehouse, which has given the company more visibility into its data. The firm can also provide analytical reports in seconds rather than hours and can better drive business growth. http://www.microsoft.com/casestudies/case_study_detail.aspx?casestudyid=710000002540
BENEFITS Upgrading SQL Server to PDW gains 100x Improvement Boosts Query Performance by 100 Times Gets Critical Business Data to Analysts Faster Scales to Meet Future Data Growth SQL Server Parallel Data Warehouse solution helps Hy-Vee recognize changes in their customers buying habits. They can then respond to those changes before competitors do, and that gives them a huge business advantage. http://www.microsoft.com/casestudies/microsoft-sql-server-2008-r2-enterprise/hy-vee/hy-vee-boosts-performance-speeds-data- Delivery-and-Increases-Competitiveness/710000000776
With PDW: From batch to real time insights A Classic Dark Data Example: Audit & Fraud Detection at a large Oil & Gas Company Collect data from all audited source systems ETL 6h 12min Performance improvement Run multiple Auditing Reports in parallel + multiple types + concurrency Query Concurrency / Reporting handled Scalability without impact Expand data window to address regulatory requirements Reporting processing time only Scalability increased by 47% Run multiple Auditing Reports in parallel with expanded dataset Validate future growth/ Reporting 100x more data, min. Scalability increase in runtimes Create Auditing Reports and Datasets faster Reporting Performance 5x-192x faster Drill down into SSAS OLAP Cubes faster Processing Performance/Valida te 20x future faster growth
Summary
Massively Parallel Processing (MPP) engine PDW is the SQL Server Scale Out solution Scale Out MPP Provides Near Linear Scale Out Massively Parallel Processing (MPP) Architecture Scale Out: Incrementally add HW for Near Linear Scale Shared Nothing 10X Faster Than SMP DW Compute Heavy Tasks Near Linear Scale Easy to Scale (No forklift)
Seamlessly add capacity Start small with a few TB and linearly Scale Out Add Capacity Smallest (0TB) To Largest (6PB) Start small with a few Terabyte warehouse Add capacity up to 6 Petabytes Add Capacity 0TB 6 PB Start Small And Grow Largest Warehouse PB Minimal Downtime
Columnstore gives next-gen performance Lightning Fast Data Query Processing Customer Products Sales Supplier Country Columnstore Provides Dramatic Performance Updateable and clustered columnstore Stores data in columnar format Memory-optimized for next-generation performance Updateable to support bulk and/or trickle loading Up to 50X Faster Up to 15x compression Save Time and Costs Real-Time DW
PolyBase for transparent access to Hadoop Fundamental breakthrough in data processing SQL SQL Server 2012 PDW Powered by PolyBase Single Query; Structured and Unstructured Query and join Hadoop tables with Relational Tables Use Standard SQL language Select, From Where Databas e HDFS (Hadoop) Existing SQL Skillset No IT Intervention Save Time and Costs Analyze All Data Types
Q&A roger.breu@microsoft.com
Additional resources» SQL Server Parallel Data Warehouse (PDW) Landing Page:» www.microsoft.com/pdw» Introduction to Polybase:» http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx» Price/TB comparison:» http://www.valueprism.com/resources/resources/resources/pdw%20compete%20pricing%20final.pdf» HP QuickSpecs» http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.html» http://h18000.www1.hp.com/products/quickspecs/13830_div/13830_div.pdf» Brand New SQL Server PDW Overview Whitepaper:» http://download.microsoft.com/download/d/2/0/d20e1c5f-72ea-4505-9f26-fef9550efd44/sql Server 2012 Parallel Data Warehouse - A Breakthrough Platform.docx» Modernize your Data Warehouse:» www.upgradetopdw.com
SQL Server PDW - Your socket to Big Data HP AppSystem for SQL 2012 Parallel Data Warehouse (PDW)
Thank you very much for your attention!
Learn more about this topic Use HP s Augmented Reality (AR) to access more content 1. Launch the HP AR app* 2. View this slide through the app 3. Unlock additional information! *Available on the App Store and Google Play 37 Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.