SQL Server PDW Artur Vieira Premier Field Engineer
Agenda 1 Introduction to MPP and PDW 2 PDW Architecture and Components 3 Data Structures 4 PDW Tools Data Load / Data Output / Administrative Console 5 References 2
3 Why choose MPP?
Why choose MPP Until today How do we currently handle large data volumes Scale up SMP Easy development and coding Resource contention Shared disk Shared CPU s Shared Memory But we will get to a volume where we reach physical limits Then we have to start thinking in other solutions 4
Why choose MPP Now... Shared Nothing Memory CPU Disks Linear scale Up Fault tolerant Reduction of bottlenecks Distributed Architecture 5
Now we have SQL PDW Enterprise Data Warehouse Appliance Tier-1 Enterprise Data Warehouse Appliance High scalability from tens to hundreds of terabytes High performance through the MPP system Flexibility and Choice Choice of deployment options through distributed architecture Up to 480 Cores, 4 Tb RAM, 700 Tb Data Loads of data in a single rack of 1.5 TB / hour Can store 2.3 trillion rows in a single table with 700 TB of data Tested database backups running at up to 5TB / hour
8 PDW Architecture
PDW Architecture and components Components Control Rack Data Rack (up to 4 racks) Redundancy Connectivity Control Rack Data Rack
PDW Architecture and components
PDW Architecture and components Domain Controller running Active Directory Commands the entire appliance Handles the coordination between all the servers
PDW Architecture and components Handles the SQL requests
PDW Architecture and components Stores staging data Runs loader process for loading tables Dedicated storage
PDW Architecture and components Coordinates database backups across all nodes Hosts 3rd party software to facilitate backup copies to external devices
PDW Architecture and components Highly tuned SQL Server node with standard interfaces N+1 Cluster
PDW Architecture and components
Development / Test PDW Appliance Control Node Dev/test Rack SQL Management Servers Database Servers Storage Nodes Landing Zone SQL SQL Backup Node Each user accessing the appliance requires a unique Developer License. Developer Licenses include full software functionality for development, test or demo use only on as many appliances as necessary
Distributed Data Warehouse Architecture Departmental Reporting Regional Reporting Central EDW Hub High-Performance Reporting Mobile Applications Regional Reporting with Business Decision Appliance Third-Party Data Integration Landing Zone ETL Tools Third-Party RDBMS
19 Data Structures
PDW Data Structures Replicated Duplicate copy of entire table on all compute nodes Smaller lookup tables Generally 5gb or smaller Distributed Distributed among all compute nodes Distribution based on distribution key All compute nodes hold a portion of table Even distribution is dependent on choice of distribution key
Replicating Tables dimtime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimproduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Smaller Dimension Tables are Replicated on Every Compute Node SQL factsales SQL Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold DimMktCampaign SQL SQL dimstore Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End
Distributing Tables dimtime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimproduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Larger Fact Table is Hash Distributed Across All Compute Nodes SQL factsales SQL Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold dimmktcampaign SQL SQL dimstore Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End
PDW Tools
PDW Tools From the landing zone or external computer DWLoader Utility SSIS: PDW Destination Adapter DML: Insert-select or CTAS
Administrative Console Dashboard Query activity Load activity Backup and restore Active locks Active sessions Alerts Appliance state
Parallel Data Warehouse Configuration Manager Appliance topology Services status Network configuration Privileges
PDW BI Connectivity Departmental Reporting Regional Reporting Central EDW Hub High-Performance Reporting Mobile Applications Regional Reporting with Business Decision Appliance Third-Party Data Integration Landing Zone ETL Tools Third-Party RDBMS
PDW Public References
US Retailer 10 TB Business Problem & Challenges Project Overview Expected Benefits
Retail Bank - 40TB Business Problem & Challenges Project Overview Expected Benefits
NASDAQ - 450TB Business Problem & Challenges Project Overview Expected Benefits
Direct Edge - 300TB Business Problem & Challenges Project Overview Expected Benefits
33 Q&A
34