Agenda. Modern Data Warehouse Big Data Application examples. Analytic Platform Systems. Integration of Hadoop and APS. Architecture Hadoop



Similar documents
Please give me your feedback

Microsoft Analytics Platform System. Solution Brief

Bringing Big Data to People

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional

The Role Polybase in the MDW. Brian Mitchell Microsoft Big Data Center of Expertise

Modernizing Your Data Warehouse for Hadoop

Parallel Data Warehouse

Modern Data Warehousing

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Big Data Technologies Compared June 2014

Big Data Processing: Past, Present and Future

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Structured data meets unstructured data in Azure and Hadoop

HDP Hadoop From concept to deployment.

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

How To Create A Fact Table On Hadoop (Hadoop) On A Microsoft Powerbook (Powerbook) On An Ipa 2.2 (Powerpoint) On Microsoft Microsoft 2.3

Comprehensive Analytics on the Hortonworks Data Platform

The Future of Data Management

HDP Enabling the Modern Data Architecture

SQL Server 2014 Faster Insights from any Data Level 300

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Inside Scoop on Hadoop

Microsoft Big Data. Solution Brief

BIG DATA TRENDS AND TECHNOLOGIES

The Microsoft Modern Data Warehouse

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, Applies to: Microsoft SQL Server Summary:

James Serra Sr BI Architect

Luncheon Webinar Series May 13, 2013

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Building a BI Solution in the Cloud

Tap into Hadoop and Other No SQL Sources

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

#TalendSandbox for Big Data

Polybase for SQL Server 2016

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

IBM Big Data Platform

Big Data on Microsoft Platform

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Designing Self-Service Business Intelligence and Big Data Solutions

Azure Data Lake Analytics

Understanding Microsoft s BI Tools

Introducing Oracle Exalytics In-Memory Machine

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

How To Extend An Enterprise Bio Solution

CREATING PACKAGED IP FOR BUSINESS ANALYTICS PROJECTS

Course MS20467C Designing Self-Service Business Intelligence and Big Data Solutions

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

SQL Server 2016 New Features!

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Extend your analytic capabilities with SAP Predictive Analysis

Dell In-Memory Appliance for Cloudera Enterprise

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data: Making Sense of it all!

Talend Big Data. Delivering instant value from all your data. Talend

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Microsoft SQL Server 2012 with Hadoop

Oracle Big Data SQL Technical Update

Are You Ready for Big Data?

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Are You Ready for Big Data?

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

So What s the Big Deal?

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

BIG DATA TECHNOLOGY. Hadoop Ecosystem

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

A Modern Data Architecture with Apache Hadoop

Upcoming Announcements

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

The Enterprise Data Hub and The Modern Information Architecture

Modern Data Architecture for Predictive Analytics

BIG DATA What it is and how to use?

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

How To Handle Big Data With A Data Scientist

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Cost-Effective Business Intelligence with Red Hat and Open Source

SAP and Hortonworks Reference Architecture

Course 20467: Designing Self-Service Business Intelligence and Big Data Solutions

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation

Advanced In-Database Analytics

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Deeper Insights across Data

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Transcription:

Microsoft Analytics Platform System The turnkey modern data warehouse appliance Stefan Cronjaeger June 2014

Agenda Modern Data Warehouse Big Data Application examples Analytic Platform Systems Architecture Hadoop Integration of Hadoop and APS APS with external Hadoop Clusters APS with Hadoop in the Cloud APS with integrated Hadoop

Data sources 3

Data sources Non-Relational Data 4

Big Data: Variety, Velocity, Volume and Analytics Web Sensor and machine log Social Media Business apps

Technologies to drive Big Data

What to do with the data Geo analysis Forecast Customer interaction Keywords & Sentiment Churn Customer segmentation Shopping basket & Recommendation Scoring & Outlier 7

Examples for sentimental analysis: Not only Marketing Browse blogs, Twitter, News articles, Newsgroups Extract key words, pairs of key words, sentiments Analyze and correlate Campaign supervision Political campaigns and keywords Marketing campaigns Trend analysis Quality assurance Analyse internal technical discussion groups Get early warning of possible technical issues Supply chain for fashion Look in fashion blogs and discussion groups Forecast demand of specific fashion articles 8

Structured data: Fraud detection in large amounts of financial data where to look Not all digits are equal! 130 years ago Simon Newcomb detected that more numbers started with the digit 1. Re-discovered by Benford The idea: Look into the numbers (e.g., balance sheet), look how the numbers are usually distributed and look for deviations Application: Tax fraud in balance sheets. Actually used by auditors Manipulated numbers in scientific publications Fraud in elections, election campaign financing, 9

An application of Benford s law Differences in number statistics for EU reporting of Social Data and Deficit data by country Bernhard Rauch, Max Göttsche, Gernot Brähler & Thomas Kronfeld (2014) Deficit versus social statistics: empirical evidence for the effectiveness of Benford s law, Applied Economics Letters, 21:3, 147-151

Data sources Non-relational data

Agenda Modern Data Warehouse Big Data Application examples Analytic Platform Systems Architecture Hadoop Integration of Hadoop and APS APS with external Hadoop Clusters APS with Hadoop in the Cloud APS with integrated Hadoop

About Analytics Platform System

PDW Logical Architecture Control Node (virtualized) Compute/Storage Nodes (virtualized) Database host Servers Direct Attached Storage Nodes Client Queries Control Host Node Virtualization spare All servers are virtualization hosts Running Windows Server 2012 Control and compute nodes are virtual All run SQL Server 2012 Control node spreads data and workload across compute nodes Data loads are in parallel and take advantage of the power of all nodes Fast Infiniband interconnection

Scalability: Massively Parallel and Shared nothing Add Capacity Smallest (0TB) To Largest (5PB) Add Capacity Start small with a few Terabyte warehouse Add capacity up to 5 Petabytes 0TB 5 PB Just grow by adding scale units An SMP system would have needed to be completely reconfigured

2 InfiniBandFDR 36 Port Switches 2 Ethernet Switches 5120-24 G Control Node DL360p Failover Node DL360p For customer use The Base Unit has approximate useable storage capacity of 75TB, based on 5:1 compression. 3 additional Scale Units can fit into 1 rack, for up to 300 TB of useable storage. 3 rd Scale Unit for 8 nodes 2 ProLiant DL360p Compute Nodes Storage Block (D6000), 70 drives 2 nd Scale Unit for 6 nodes 2 ProLiant DL360p Compute Nodes Storage Block (D6000), 70 drives 1 st Scale Unit for 4 nodes 2 ProLiant DL360p Compute Nodes Storage Block (D6000), 70 drives Base Unit for 2 nodes 2 ProLiant DL360p Compute Nodes Storage Block (D6000), 70 drives Multiple racks can be configured for more useable storage. The 1TB drives can be replaced with 2TB or 3TB drives, for double or triple capacity. However, multiple Scale Units will provide better performance compared to one Base Unit with larger hard drives. For example, 3 Scale Units with 1TB drives will perform much better than 1 Base Unit with 3TB drives. Backup Node and Landing Zone (ETL Storage) is not included. The customer can order whatever they want for backup purposes, and install it themselves.

Software Windows Server 2012: Control Node, Mgmt. Node and Compute Nodes run in virtualized Environment Workload Management Workload classes System Center 2012: Single user i/f for management of PDW, OS, BI, custom apps and private cloud xvelocity In-memory execution Clustered columnstore SQL Server 2012 inside Visual Studio Data Tools Powerview directly on PDW Big Data Integration Polybase: T-SQL query to Hadoop External tables on Hadoop

A multi-region/workload appliance

Microsoft What is Hadoop? HCatalog Oozie HBase/Cassandra/Couch/ MongoDB Hive Mahout R Cascading Pig Flume Sqoop Zookeeper Ambari HBase(column DB) Hadoop = MapReduce + HDFS Avro Distributed, scalable system on commodity hardware composed of: HDFS distributed file system MapReduce programming model Others: HBase, R, Pig, Hive, Flume, Mahout, Avro, Zookeeper

APS: Parallel Data Warehouse and HDInsight region Control Node Failover Node Hadoop Head Node Hadoop redundant Head Node For customer use Configurable: Minimum 1 PDW region Additional PDW scale units Additional HDI scale units Hadoop region Hadoop region PDW scale unit PDW region

HDI region overview In a nutshell, it s a HDI instance running on an appliance. HDInsight is Microsoft branded Hortonworks distro. An integrated appliance for running PDW region and HDI region PDW is offered as a stand-alone workload on the appliance. HDI is offered only as an add-on to PDW, as a scale unit Based on V2 hardware. H/A for the Head Node is provided via Windows Failover Clustering (WFC), Data Node H/A is provided via HDFS/MapRed mechanisms Security add-ons to address security issues which are not contained in standard Hadoop Support for multiple user accounts

Query Hadoop data with T-SQL using PolyBase Bringing the worlds or big data and the data warehouse together for users and IT Select Result set Windows Azure HDInsight Cloudera Hortonworks (Windows, Linux) SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight Single T-SQL query model for PDW and Hadoop with rich features of T-SQL including joins without ETL Leverages the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Query non-microsoft Hadoop distributions such as Hortonworks and Cloudera

Big data insights for any user Native Microsoft BI integration to create new insights with familiar tools Leverages high adoption of Excel, Power View, Power Pivot, and SSAS No IT intervention required Everyone else using Microsoft BI tools Allow any users to create new insights with familiar tools Analyze PDW and Hadoop data in the same view Power Users Data Scientists

Differentiation: Freedom of deployment options and hybrid solutions

APS Management Console 1 PDW and Appliance

Agenda Modern Data Warehouse Big Data Application examples Analytic Platform Systems Architecture Hadoop Integration of Hadoop and APS APS with external Hadoop Clusters APS with Hadoop in the Cloud APS with integrated Hadoop

Polybase Use Case Category 1 Integration with external Hadoop clusters

Listening to SQL customers ShinSeGae Investing into Online Shopping website ( Korea s Amazon ) o SQL Server 2012 PDW & HDP 1.3/HDP 2.0 on Linux What they want 1. We want perform complex data mining on customer purchase data basket analysis. 2. We want to understand the social media data (reviews/twitter) specifically around our products & stores. 3. We will use Hadoop to keep all of our data ~ envisioned to be around 480 TB. PDW will be the efficient analysis engine for the hot data. 4. PDW & Polybaseare much faster than Hive. 5. We re interested in using data mining cloud services in Azure (hybrid scenarios) Microsoft NDA - Material

Listening to SQL customers TeleCom Understanding network quality o SQL Server 2012 PDW & Cloudera 4.5 on Linux What they want 1. We collect millions of network records for quality assessment and capacity planning on a daily basis. 2. Hadoop will be used for storage and ETL of these network record files. 3. PDW for more operational analysis, ad-hoc analysis, operational reports. 4. We are using Polybasealong with Oozie-based orchestration for a seamless & automated integration. Microsoft NDA - Material

Solution Architecture Integration with external Hadoop cluster (1) Polybase for integrating with various Hadoop distributions Support of Hortonwork s HDP 1.x & 2.x (Windows Server and Linux) Support of Cloudera scdh 4.x (on Linux) Microsoft APS Polybase Your Apps PowerPivot & PowerView External Table Push-down computation w/ AU1 release Pushing computation where data resides (Hadoop as query execution & processing aid) Transparent for users no need to learn map/reduce Seamless query experience through external tables + simplified & parallelized ETL through T-SQL (CTAS for import & CETAS for export) APS control & data nodes External Data source Polybase/APS query engine External File Format Web Apps Social Apps Mobile Apps Sensor & RFID Integration with 3 rd party tool and Microsoft insights/bi layer Existing applications simply work External tables populated through application layer like regular tables SQL Server Security Model You decide who sees what type of data SQL Server permission model adapted for each Polybase object external table, data source, and file format Microsoft NDA - Material

T-SQL Examples Integration with external Hadoop cluster (2) Creating external table, data source, file format Your Apps PowerPivot & PowerView CREATE EXTERNAL DATA SOURCE HDP2.0 WITH (TYPE = Hadoop, LOCATION = hdfs://hdp:8020,job_tracker_location= HDP:50300 ); CREATE EXTERNAL FILE FORMAT MyRCFile WITH(FORMAT_TYPE = RCFile, SERDE_METHOD = LazyBinarySerDe ) Microsoft APS Polybase External Table CREATE EXTERNAL TABLE Clickstream(url varchar(50),event_date date) WITH (DATA_SOURCE = HDP2.0,LOCATION = /employees/ employee.txt, FILE_FORMAT = MyRCFile); External Data source Polybase/APS query engine External File Format Querying Hadoop data SELECT user_name FROM ClickStream cs, PDW_User u WHERE cs.user_ip = u.user_ip and cs.url= www.microsoft.com ; APS control & data nodes Web Apps Social Apps Mobile Apps Sensor & RFID Persistently exporting & importing CREATE EXTERNAL TABLE Web_Sales WITH (LOCATION='/TPCDS/web_sales/, DATA_SOURCE = HDP2.0, FILE_FORMAT = MyRCFile) AS SELECT u.* FROM PDW_User CREATE TABLE PDW_Sales WITH DISTRIBUTION = Hash (id) AS SELECT FROM Web_Sales Microsoft NDA - Material

Solution Architecture (Details) ShinSeGae 2. Unstructured/semi-structured text data - External Polybase tables D, E, F Text (Board/SNS/ Internal Text ) Weather.. 1. Web log data(160gb/daily) External Polybase tables A, B, C Complex Event Processing (Storm) Message Queues (KAFKA, Open source) Tracking Log Servers SSG.com (renewal) Online Shopping Mall Recommendation engine & personalized advertising 3. Company emails External Polybase tables G, H, I Mails Campaign HDP 1.3 on Linux (5-10 servers) raw/cold data Analytic information (right customer targeting) Polybase Queries 10 GB Ethernet APS/PDW Operational Data Store EDW Recent/hot data stored in PDW EIS OLAP (Tabular) DATA Mining Visualization (Silverlight) BI analyst Microsoft NDA - Material

Solution Architecture (Details) Telcom Capturing Network logs (>300 GB/per day) External Polybasetables A, B, C Usage of Hive s Metadata stores HCatalog Polybase Queries APS/PDW Network quality analysis High-frequency Event Processing (Network logs) Cloudera s CDH 4 on HP (18+ servers) raw/cold data (Petabyte of network logs) Infiniband Operational Data Store EDW Hot operational PDW data Capacity Planning Visualization (PowerPivot) BI analyst/planner/ Decision-maker Oozie Workflows Remote procedure calls via stored procedures to trigger Polybase queries Microsoft NDA - Material

Polybase Use Case Category 2 Integration with Microsoft Azure

Listening to SQL customers (5) Government Bridging the gap between cloud & onprem ocurrent POC -SQL Server 2012 PDW & HDInsightAzure What they want 1. HDInsight/Hadoop in the cloud to store and massage our raw data (XML files) generated by our web-application. 2. PDW to keep the data on-prem (legal requirement) and to have an efficient query engine for analysis purposes. 3. Polybase is a great way of accessing our files in the cloud via simple T-SQL. 4. With this solution, we can allow web users to quickly ask questions while the heavy, more complex business analysis is accomplished by PDW users. Microsoft NDA - Material

Solution Architecture Hybrid Scenarios Microsoft Azure Your Apps Azure HDInsight Polybase as key integrative feature Integration with external Hadoop, HDInsight region & Azure Storage Data aging strategies Aging of cold data to Azure Storage APS & HDInsight region for hot & warm data Azure Storage Public Internet Azure Express Route Queryhot data & cold aged data APS as modern cloud end-point for Azure Seamless querying of hot & cold data through APS APS as gateway allowing users to query all on-prem data via PowerBI and T-SQL examples On-premises or private cloud Your Apps Microsoft or 3 rd party Applications Microsoft APS Polybase APS control & data nodes CREATE EXTERNAL DATA SOURCE WASB WITH (TYPE = Hadoop, LOCATION = wasbs://dailylogs@myaccount.blob.core.windows.net ); CREATE EXTERNAL TABLE clickstream_hdinsights (url varchar(50), event_date date) WITH (DATA_SOURCE = WASB, LOCATION = /input/ log1.txt,file_format = MyDelimitedText); SELECT FROM clickstream_hdinsights, PDW_Table Microsoft NDA - Material

Solution Architecture (Details) Government HDI tools for data transformation Web apps- Generating tons of smaller XML files (~7KB each) Web Application for Tax Filing (einvoice) Other Web Feeds Transforming to large text files ~ 10 GBs each (External WASB Tables) HDI on Azure Azure Blob Storage cheap data store alternative to Hadoop onprem solution Public Internet or Azure Express Route Polybase Queries APS/PDW Operational Data Store EDW PDW/APS for fast query response & data processing of hot data Microsoft BI stack IBM Cognos Microsoft NDA - Material

Polybase Use Case Category 3 Unified Appliance with PDW and HDInsight region

Listening to SQL customers (6) Beverage & Vending Machines What are you drinking? Why is the machine down? o POC - SQL Server/APS with PDW & HDI region What they want 1. We want a complete solution stack we do not have Hadoop experts in-house and don t have the money to get it. 2. We want to store all raw data coming from vending machines into Hadoop. 3. 360 degree of all our data structured customer data & unstructured data coming from vending machines. 4. Predicate maintenance of machines. Microsoft NDA - Material

Solution Architecture Unified APS appliance Your Apps External Table PowerPivot & PowerView Distributed & replicated table Unified appliance Multi-workload support with PDW and HDInsight region HDInsight powered by HDP bits No need to deal with multiple support teams ( better together ) Seamless & performing query experience through Polybase External tables can be used for HDI data PDW data nodes connected via high-speed network (Infiniband) to Hadoop data nodes Unified Microsoft APS with PDW & HDI region Simplified management & monitoring One consistent monitoring experience through appliance management tools T-SQL examples APS control & data nodes Web Apps HDI name & data nodes Social Apps Mobile Apps Sensor & RFID CREATE EXTERNAL DATA SOURCE HDI_R WITH (TYPE = Hadoop, LOCATION = 'hdfs://htukia-c-hhn01:8020,job_tracker_location ='HTUKIA-C HHN01:50300' CREATE EXTERNAL TABLE HDI_Region (url varchar(50), event_date date) WITH (DATA_SOURCE = WASB, LOCATION = /input/ log1.txt,file_format = MyDelimitedText); SELECT FROM clickstream_hdinsights, PDW_Table Microsoft NDA - Material

Solution Architecture (Details) Internal Microsoft Data Scientist Data scientist group 1 - using chaing of Hive queries & PowerQueryvia HiveODBC Hive & PowerQuery via Hive ODBC Analyzing ~3 TB Web Traffic msn.com Log files Microsoft servers Log files Secure Gateway & AD Integration HDI region 1 scale unit HDI region System Center & AdminConsole Polybase Queries Infiniband PDW region Full Rack PDW Data scientist group 2 -Using Polybasefor existing tooling (T-SQL, BI tools), performing processing of complex analytical queries & consistent management experience PowerQuery/PowerV iew/powermap Analytical queries via SSDT APS with PDW & HDI region Microsoft NDA - Material

Microsoft Digital Crime Unit Part of Microsoft LCA (Legal and Corporate Affairs) mandated to help protect Internet DCU s Challenge: To effectively combat digital crime requires the collection of huge amounts of data from multiple sources. DCU needs to be able to: Process 10s of TBs daily and house PBs of data historically (accessible as needed) House 100s of terabytes from multiple sources that is easily queryable. Use leading edge business intelligence and visualization tools.

Corporate Security Officers DCU Big Data Solution DCU Investigators and Analysts Predictive Analytics Embedded BI SQL Azure Azure MSFT SQL Stream Insight Data Sources Sinkholes, Passive DNS, Files, 3 rd Party Security Info. 500 TB SAN Storage PowerView HP Business S Decision S R Appliance S Hadoop 30 Node Cluster On Windows Excel with PowerPivot SSIS SharePoint, SSRS, SSAS, PowerView, PowerPoint HP EDW Appliance MSFT PDW

Microsoft Digital Crime Unit Data Source for BI Drop Extract Load Transform Data Source for BI Source for BI Hadoop SSIS PDW SSAS Microsoft BI Microsoft Digital Crime Unit currently being implemented) Part of Microsoft LCA (Legal and Corporate Affairs) mandated to help protect the Internet To effectively combat digital crime requires the collection of huge amounts of data from multiple sources. Process 10s of TBs daily and house PBs of data historically (accessible as needed) House 100s of terabytes from multiple sources that is easily queryable. Use leading edge business intelligence and visualization tools. 30 Node Hadoop on Windows Server Control Rack and 10 Node PDW Data Rack HP BDA (Business Decision Appliance) upgraded to SQL 2012 BI Voyage currently implementing PDW and BI portions of the project.

Why 2 Storage Platforms? HADOOP Parallel Data Warehouse Storage Capacity in the Petabytes Storage Capacity in the 100s of Terabytes Simplified Load, just drop unstructured or semi-structured files ETL process more complex to transform data in to reporting optimized DB structures No optimization of queries Structures can be optimized for common query patterns. Queried by IT professionals Queried by business analysts Complex and slow to query multiple sources at once Hadoop is DCU s Centralized Data Warehouse. Simple load and high capacity make it optimal for storing huge volumes of data. Optimized for fast query against key data from multiple sources. PDW is DCU s Data Mart platform. Easily accessible, intuitive data structures, and blazing fast for querying data.

APS Differentiators Part of a product family: From SQL server standalone to Cloud service offerings TCO: Very low, especially when looking on the whole bundle: ETL (SSIS), PDW, Data marts (SQL server) and Analytics (SSAS, SSRS) Appliance: Much lower effort for DBAs Microsoft product stack integration SSIS, SSAS, SSRS, PowerPivot, System Center, integration with Cloud services Linear Scaling via Shared Nothing xvelocity: Column Store and In-Memory execution Polybase: Integration with Big Data and Hadoop HDInsight integrated: fast Infiniband interconnect, management and security Microsoft exhibits one of the best value propositions on the market with a low cost and a highly favorable price/performance ratio - Gartner, February 2012

Columnstore Up to 100x faster queries Up to 15x more compression Updatable clustered columnstore vs. table with customary indexing 48 Parallel query execution Query Results Store data in columnar format for massive compression Load data into or out of memory for nextgeneration performance Updateable and clustered for real-time trickle loading

Concurrency that fuels rapid adoption Great performance with mixed workloads ETL/ELT with SSIS, DQS, MDS Analytics Platform System SQL Server SMP ERP CRM LOB APPS ETL/ELT with DWLoader PDW SSRS / SSAS Hadoop / Big Data PolyBase BI Tools Ad hoc queries HDInsight

MEC, a global media agency, uses SQL Server PDW with in-memory technology to cut query time helping marketers unlock the value of their data. SQL Server Analytics Platform System gives us massively parallel advantages. Whereas it would take up to four hours to run queries scaling across multiple nodes, now it takes just minutes.

Value through a single flexible appliance solution Why Analytics Platform System when I have SQL Server? Single appliance solution PDW Reduce the data center footprint Lower energy costs and usage Accelerate time to value and insights with no forklift required for scaling out PolyBase HDInsight Simplify management with built in System Center Reduce tuning efforts while retaining high performance

Value through a single flexible appliance solution Why Analytics Platform System when I have SQL Server? Your choice of hardware PDW Integrated support plan with a single Microsoft contact Co-engineered with HP, Dell and Quanta best practices PolyBase HDInsight Pre-configured, built, tuned software and hardware Leading performance with commodity hardware

CROSSMARK needed faster and more detailed insight into terabytes of information about product supply and demand. They deployed a turnkey business intelligence solution from Microsoft and HP that is based on the Microsoft SQL Server Parallel Data Warehouse. People can instantly create their own reports with SQL Server Power View and PowerPivot for Excel and they can build those reports 50 percent to many times faster compared with the previous system.