SQL Server PDW. Artur Vieira Premier Field Engineer



Similar documents
Structured data meets unstructured data in Azure and Hadoop

James Serra Sr BI Architect

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation

Microsoft Analytics Platform System. Solution Brief

Big Data Processing: Past, Present and Future

HP Enterprise Data Warehouse Deep Dive. Steve Tramack, Sr. Engineering Manager, I2A Solutions, HP

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Please give me your feedback

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

EMC BACKUP MEETS BIG DATA

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Symantec Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft SQL Server

Big Data Technologies Compared June 2014

In Memory Accelerator for MongoDB

Innovative technology for big data analytics

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Real Life Performance of In-Memory Database Systems for BI

hmetrix Revolutionizing Healthcare Analytics with Vertica & Tableau

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

2009 Oracle Corporation 1

Microsoft BI Platform Overview

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Testing Big data is one of the biggest

Upgrading to Microsoft SQL Server 2008 R2 from Microsoft SQL Server 2008, SQL Server 2005, and SQL Server 2000

POLAR IT SERVICES. Business Intelligence Project Methodology

Modern Data Warehousing

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

HP Enterprise Data Warehouse Appliance architecture overview and performance guide

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Exam : Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2. Title : The safer, easier way to help you pass any IT exams.

Scaling Your Data to the Cloud

ORACLE DATABASE 10G ENTERPRISE EDITION

The Pros and Cons of Data Warehouse Appliances

Cost-Effective Business Intelligence with Red Hat and Open Source

SQL Server Administrator Introduction - 3 Days Objectives

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Protect Data... in the Cloud

The Inside Scoop on Hadoop

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Database as a Service (DaaS) Version 1.02

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

IBM PureData System for Transactions. Technical Deep Dive. Jonathan Rossi, PureSystems Specialist

A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions

Enterprise and Standard Feature Compare

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

How To Use Hp Vertica Ondemand

SQL Server Integration Services Design Patterns

Green Migration from Oracle

Oracle Architecture, Concepts & Facilities

Load Testing Analysis Services Gerhard Brückl

BIGDATA GREENPLUM DBA INTRODUCTION COURSE OBJECTIVES COURSE SUMMARY HIGHLIGHTS OF GREENPLUM DBA AT IQ TECH

MySQL Enterprise Monitor

LDA, the new family of Lortu Data Appliances

Advanced In-Database Analytics

BigMemory & Hybris : Working together to improve the e-commerce customer experience

SQL Server Enterprise Edition

Practical Cassandra. Vitalii

Implementing a Data Warehouse with Microsoft SQL Server 2014

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

SQL Server to SQL Server PDW. Migration Guide (AU3)

In-Memory Analytics for Big Data

Implementing a Data Warehouse with Microsoft SQL Server

SQream Technologies Ltd - Confiden7al

SCALABLE DATA SERVICES

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*

Monitoring and Diagnosing Oracle RAC Performance with Oracle Enterprise Manager. Kai Yu, Orlando Gallegos Dell Oracle Solutions Engineering

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Neelesh Kamkolkar, Product Manager. A Guide to Scaling Tableau Server for Self-Service Analytics

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Building a BI Solution in the Cloud

Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database

Business Intelligence Competency Partners

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

SQL Server to SQL Server PDW Migration Guide

OBIEE 11g Scaleout & Clustering

East Asia Network Sdn Bhd

Parallel Data Warehouse

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

The Future of Data Management

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Netezza and Business Analytics Synergy

Course Outline. Module 1: Introduction to Data Warehousing

This module explains the Microsoft Dynamics NAV architecture and its core components.

Transcription:

SQL Server PDW Artur Vieira Premier Field Engineer

Agenda 1 Introduction to MPP and PDW 2 PDW Architecture and Components 3 Data Structures 4 PDW Tools Data Load / Data Output / Administrative Console 5 References 2

3 Why choose MPP?

Why choose MPP Until today How do we currently handle large data volumes Scale up SMP Easy development and coding Resource contention Shared disk Shared CPU s Shared Memory But we will get to a volume where we reach physical limits Then we have to start thinking in other solutions 4

Why choose MPP Now... Shared Nothing Memory CPU Disks Linear scale Up Fault tolerant Reduction of bottlenecks Distributed Architecture 5

Now we have SQL PDW Enterprise Data Warehouse Appliance Tier-1 Enterprise Data Warehouse Appliance High scalability from tens to hundreds of terabytes High performance through the MPP system Flexibility and Choice Choice of deployment options through distributed architecture Up to 480 Cores, 4 Tb RAM, 700 Tb Data Loads of data in a single rack of 1.5 TB / hour Can store 2.3 trillion rows in a single table with 700 TB of data Tested database backups running at up to 5TB / hour

8 PDW Architecture

PDW Architecture and components Components Control Rack Data Rack (up to 4 racks) Redundancy Connectivity Control Rack Data Rack

PDW Architecture and components

PDW Architecture and components Domain Controller running Active Directory Commands the entire appliance Handles the coordination between all the servers

PDW Architecture and components Handles the SQL requests

PDW Architecture and components Stores staging data Runs loader process for loading tables Dedicated storage

PDW Architecture and components Coordinates database backups across all nodes Hosts 3rd party software to facilitate backup copies to external devices

PDW Architecture and components Highly tuned SQL Server node with standard interfaces N+1 Cluster

PDW Architecture and components

Development / Test PDW Appliance Control Node Dev/test Rack SQL Management Servers Database Servers Storage Nodes Landing Zone SQL SQL Backup Node Each user accessing the appliance requires a unique Developer License. Developer Licenses include full software functionality for development, test or demo use only on as many appliances as necessary

Distributed Data Warehouse Architecture Departmental Reporting Regional Reporting Central EDW Hub High-Performance Reporting Mobile Applications Regional Reporting with Business Decision Appliance Third-Party Data Integration Landing Zone ETL Tools Third-Party RDBMS

19 Data Structures

PDW Data Structures Replicated Duplicate copy of entire table on all compute nodes Smaller lookup tables Generally 5gb or smaller Distributed Distributed among all compute nodes Distribution based on distribution key All compute nodes hold a portion of table Even distribution is dependent on choice of distribution key

Replicating Tables dimtime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimproduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Smaller Dimension Tables are Replicated on Every Compute Node SQL factsales SQL Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold DimMktCampaign SQL SQL dimstore Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End

Distributing Tables dimtime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimproduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Larger Fact Table is Hash Distributed Across All Compute Nodes SQL factsales SQL Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold dimmktcampaign SQL SQL dimstore Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End

PDW Tools

PDW Tools From the landing zone or external computer DWLoader Utility SSIS: PDW Destination Adapter DML: Insert-select or CTAS

Administrative Console Dashboard Query activity Load activity Backup and restore Active locks Active sessions Alerts Appliance state

Parallel Data Warehouse Configuration Manager Appliance topology Services status Network configuration Privileges

PDW BI Connectivity Departmental Reporting Regional Reporting Central EDW Hub High-Performance Reporting Mobile Applications Regional Reporting with Business Decision Appliance Third-Party Data Integration Landing Zone ETL Tools Third-Party RDBMS

PDW Public References

US Retailer 10 TB Business Problem & Challenges Project Overview Expected Benefits

Retail Bank - 40TB Business Problem & Challenges Project Overview Expected Benefits

NASDAQ - 450TB Business Problem & Challenges Project Overview Expected Benefits

Direct Edge - 300TB Business Problem & Challenges Project Overview Expected Benefits

33 Q&A

34