Polybase for SQL Server 2016



Similar documents
Bringing Big Data to People

The Inside Scoop on Hadoop

Modernizing Your Data Warehouse for Hadoop

The Role Polybase in the MDW. Brian Mitchell Microsoft Big Data Center of Expertise

HDP Hadoop From concept to deployment.

HDP Enabling the Modern Data Architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Please give me your feedback

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Modern Data Warehousing

Parallel Data Warehouse

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional

Comprehensive Analytics on the Hortonworks Data Platform

WHAT S NEW IN SAS 9.4

Upcoming Announcements

Deeper Insights across Data

SQL Server 2016 New Features!

Agenda. Modern Data Warehouse Big Data Application examples. Analytic Platform Systems. Integration of Hadoop and APS. Architecture Hadoop

The Future of Data Management

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Native Connectivity to Big Data Sources in MSTR 10

#TalendSandbox for Big Data

BIG DATA TRENDS AND TECHNOLOGIES

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Big Data Processing: Past, Present and Future

Testing Big data is one of the biggest

How To Create A Fact Table On Hadoop (Hadoop) On A Microsoft Powerbook (Powerbook) On An Ipa 2.2 (Powerpoint) On Microsoft Microsoft 2.3

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Constructing a Data Lake: Hadoop and Oracle Database United!

Luncheon Webinar Series May 13, 2013

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Azure Data Lake Analytics

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

The Future of Data Management with Hadoop and the Enterprise Data Hub

Tap into Hadoop and Other No SQL Sources

Microsoft Analytics Platform System. Solution Brief

Microsoft Data Platform Evolution

SAP and Hortonworks Reference Architecture

Big Data Too Big To Ignore

Big Data: Making Sense of it all!

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Microsoft SQL Server 2012 with Hadoop

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

A Modern Data Architecture with Apache Hadoop

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Big Data Explained. An introduction to Big Data Science.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data Technologies Compared June 2014

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Data Integration Checklist

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Deploying Hadoop with Manager

Oracle Database 12c Plug In. Switch On. Get SMART.

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data Management and Security

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Data processing goes big

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Oracle Big Data SQL Technical Update

Building a BI Solution in the Cloud

docs.hortonworks.com

Creating a universe on Hive with Hortonworks HDP 2.0

IBM InfoSphere BigInsights Enterprise Edition

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

Ganzheitliches Datenmanagement

and Hadoop Technology

Workshop on Hadoop with Big Data

Microsoft Big Data. Solution Brief

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Open Source Technologies on Microsoft Azure

Virtualizing Apache Hadoop. June, 2012

Implement Hadoop jobs to extract business value from large and varied data sets

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Moving From Hadoop to Spark

Lofan Abrams Data Services for Big Data Session # 2987

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Dominik Wagenknecht Accenture

Transcription:

Polybase for SQL Server 2016 Lukasz Grala Architect Data Platform & BI Solutions MVP SQL Server Łukasz Grala MVP SQL Server MCT MCSE Architekt i trener - Data Platform & Business Intelligence Solutions Prelegent na licznych konferencjach Wykładowca i autor publikacji Posiada liczne certyfikaty Od 2010 roku wyróżniony nagrodą MVP w kategorii SQL Server Lider PLSSUG Poznań (lukasz.grala@plssug.org.pl) Doktorant na Politechnice Poznańskiej Łukasz Grala - Architekt i Trener - MVP SQL Server - Kontakt: lukasz@grala.it lub lukasz@sqlexpert.pl

The explosion of data sources... 1.3B 4.0B 25B There s an opportunity to drive smarter decisions with data 2010 2013 2020 drives an explosion of data 2013-2020 CAGR = 41% which drives businesses to learn more Source: Forecast: Internet of Things, Endpoints and Associated Services, Worldwide, 2014. Gartner. Oct 20 2014 Source: IDC Digital Universe, Dec. 2012 Microsoft platform leads the way on-premises and cloud Leader in 2014 for Gartner Magic Quadrants Operational Database Management Systems Data Warehouse Database Management Systems Business Intelligence and Analytics Platforms x86 Server Virtualization Cloud Infrastructure as a Service Enterprise Application Platform as a Service Public Cloud Storage

The Microsoft data platform Apps Reports Dashboards Ask Mobile Orchestration Extract, transform, load Information management Prediction Relational Non-relational Analytical Streaming Internal & external Types of Analytics

Evolving Approaches to Analytics Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Dashboards Apps Evolving Approaches to Analytics Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Dashboards Ingest (EL) Original Data Apps

Evolving Approaches to Analytics Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Ingest (EL) Original Data Streaming data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Dashboards Apps Transform & Load Evolving Approaches to Analytics Extract Transform Load Original Data ETL Tool (SSIS, etc) Transformed Data EDW (SQL Svr, Teradata, etc) BI Tools Data Marts Data Lake(s) Ingest (EL) Original Data Streaming data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Dashboards Apps Transform & Load

9/30/2015 SQL Server and Azure DocumentDB Premium relational DB capable to exchange data with modern apps & services Derives unified insights from structured/unstructured data JSON PolyBase for SQL Server 2016 JS JS JSON Schema-free NoSQL document store Scalable transactional processing for rapidly changing apps

The Hadoop Ecosystem Management & Monitoring (Ambari) Coordination (ZooKeeper) Workflow &Scheduling (Oozie) Scripting (Pig) Machine Query Learning (Hive) (Mahout) Distributed Processing (MapReduce) Distributed Storage (HDFS) NoSQL Database (HBase) Data Integration (Sqoop/REST/ODBC) PolyBase Query relational and non-relational data with T-SQL Query relational and non-relational data, on-premises and in Azure T-SQL query SQL Server Hadoop Apps

Prerequisites for installing PolyBase 64-bit SQL Server Evaluation edition Microsoft.NET Framework 4.0. Oracle Java SE RunTime Environment (JRE) version 7.51 or higher NOTE: Java JRE version 8 does not work. Minimum memory: 4GB Minimum hard disk space: 2GB Using the installation wizard for PolyBase Run SQL Server Installation Center. (Insert SQL Server installation media and double-click Setup.exe.) Click Installation, then click New Standalone SQL Server installation or add features On the feature selection page, select PolyBase Query Service for External Data. On the Server Configuration Page, configure the PolyBase Engine Service and PolyBase Data Movement Service to run under the same account.

Choose Hadoop data source with sp_configure -- Run sp_configure hadoop connectivity -- and set an appropriate value sp_configure @configname = 'hadoop connectivity', @configvalue = 7; GO RECONFIGURE GO -- List the configuration settings for -- one configuration name sp_configure @configname='hadoop connectivity'; GO Option values 0: Disable Hadoop connectivity 1: Hortonworks HDP 1.3 on Windows Server Azure blob storage (WASB[S]) 2: Hortonworks HDP 1.3 on Linux 3: Cloudera CDH 4.3 on Linux 4: Hortonworks HDP 2.0 on Windows Server Azure blob storage (WASB[S]) 5: Hortonworks HDP 2.0 on Linux 6: Cloudera 5.1 on Linux 7: Hortonworks 2.1 and 2.2 on Linux Hortonworks 2.2 on Windows Server Azure blob storage (WASB[S]) Start the PolyBase services After running for sp_configure, you must stop and restart the SQL Server engine service Run services.msc Find the services shown below and stop each one Restart the services

Configure PolyBase for Azure blob storage -- Using credentials on database requires enabling -- traceflag DBCC TRACEON(4631,-1) -- Create a master key CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'; CREATE CREDENTIAL WASBSecret ON DATABASE WITH IDENTITY = 'pdw_user', Secret = 'mykey=='; -- Create an external data source (Azure Blob Storage) -- with the credential CREATE EXTERNAL DATA SOURCE Azure_Storage WITH ( TYPE = HADOOP, LOCATION ='wasb[s]://mycontainer@test.blob.core.windows.net/pat h, CREDENTIAL = WASBSecret ) Type methods for providing credentials Core-site.xml in installation path of SQL Server - <SqlBinRoot>\Polybase\Hadoop\Conf Credential object in SQL Server for higher security NOTE: The syntax for a database-scoped credential (CREATE CREDENTIAL ON DATABASE) is temporary and will change in the next release. This new feature is documented only in the examples in the CTP2 content, and will be fully documented in the next release. Create a reference to a Hadoop cluster -- Create an external data source (Hadoop) CREATE EXTERNAL DATA SOURCE hdp2 with ( TYPE = HADOOP, LOCATION ='hdfs://10.xxx.xx.xxx:xxxx', RESOURCE_MANAGER_LOCATION='10.xxx.xx.xxx:xxxx') CTP2 supports the following Hadoop distributions Hortonworks HDP 1.3, 2.0, 2.1, 2.2 for both Windows and Linux Cloudera CDH 4.3, 5.1 on Linux

Define the external file format -- Create an external file format -- (delimited text file) CREATE EXTERNAL FILE FORMAT ff2 WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR =' ', USE_TYPE_DEFAULT = TRUE)) CTP2 supports the following file formats Delimited text Hive RCFile Hive ORC Create an external table to the data source -- Create an external table pointing to file stored in Hadoop CREATE EXTERNAL TABLE [dbo].[carsensor_data] ( [SensorKey] int NOT NULL, [CustomerKey] int NOT NULL, [GeographyKey] int NULL, [Speed] float NOT NULL, [YearMeasured] int NOT NULL ) WITH (LOCATION='/Demo/car_sensordata.tbl', DATA_SOURCE = hdp2, FILE_FORMAT = ff2, REJECT_TYPE = VALUE, REJECT_VALUE = 0 The external table provides a T- SQL reference to the data source used to: Query Hadoop or Azure blob storage data with Transact-SQL statements Import and store data from Hadoop or Azure blob storage into your SQL Server database

Optimize queries by adding statistics -- Create statistics on an external table. CREATE STATISTICS StatsForSensors ON CarSensor_Data(CustomerKey, Speed) In CTP2, you can optimize query execution against the external tables using statistics Query Capabilities (1) Joining relational and external data SELECT DISTINCT C.FirstName, C.LastName, C.MaritalStatus FROM Insurance_Customer_SQL INNER JOIN ( SELECT * FROM SensorData_ExternalHDP WHERE Speed > 35 UNION ALL SELECT * FROM SensorData_ExternalHDP2 WHERE Speed > 35 ) AS SensorD ON C.CustomerKey = SensorD.CustomerKey SQL Server table External tables referring to data in 2 HDP Hadoop clusters

Query Capabilities (2) Push-Down Computation SELECT DISTINCT C.FirstName, C.LastName, C.MaritalStatus FROM Insurance_Customer_SQL -- table in SQL Server OPTION (FORCE EXTERNALPUSHDOWN) push-down computation CREATE EXTERNAL DATA SOURCES ds-hdp WITH.( TYPE = Hadoop, LOCATION = hdfs://10.193.27.52:8020, Resources_Manager_Location = 10.193.27.52:8032 ); Typical PolyBase Setup Engine Service DMS Control Node Namenode (HDFS) DB DMS Compute Nodes DB DMS Hortonworks or Cloudera Hadoop Hadoop cluster distributions can be either: Text RCFile ORCFile File Format File Format File Format File System System System System DB DMS Windows Linux Cluster Parquet Format (future) File System PDW Appliance Many Hadoop popular cluster HDFS can be file either formats supported on premise (text, or in the RC, cloud ORC, ) File System Hadoop Cluster

9/30/2015 Attach a Hadoop Cluster CREATE EXTERNAL DATA SOURCE GSL_HDFS_CLUSTER AV_DFS_CLUSTER WITH (TYPE= HADOOP, ) Azure Azure Storage Hadoop Volume Cluster #2 A Better Query Plan Select C.Name from Customers C, Orders O where C.id = O.CustId and P.price > $1000 1) Use MR job to find Orders > $1000 2) Import result into SQL vnext instances on Compute nodes 3) Redistribute Customers table from Head Node to Compute Nodes 4) Perform join in parallel on compute nodes Azure Storage Volume Azure Storage Volume

9/30/2015 e.g., cleansing data before loading it e.g., joining relational tables w/ streams of tweets e.g., mine sensor data for predictive analytics Example #1: (Non-Relational Sensor data Structured Customer data Price policies (based on driver behavior) Sensor data from cars (kept in Hadoop) Relational data (kept in SQL Server PDW/APS)

Example #2: (Non-Relational Social Media (kept in Hadoop) Structured Product data Product data (kept in SQL Server PDW/APS) Basket Analysis of online shoppers (based on social media behavior) Example #3: Rig old sensor data Rig new ( hot ) data Monitoring & functioning of rig (kept in Hadoop) More recent data (kept in SQL Server PDW/APS) Drilling rig analysis

9/30/2015 Mobile BI apps for SQL Server formally Datazen For on-premises implementations - optimized for SQL Server. Rich, interactive data visualization on all major mobile platforms No additional cost for SQL Server Enterprise Edition customers 2008 or later and Software Assurance Data visualization and publishing Create beautiful visualizations and KPIs with a touch-based designer Connect to the Mobile BI for SQL Server server to access SQL Server data Publish for access by others

Datazen architecture overview Server, Publisher and Mobile apps Enterprise environment Authentication Internet boundary Publisher App Data sources SQL Server Datazen Enterprise Server Viewer Apps Analysis Services File Web browser Mobile BI for SQL Server summary Beautiful visualizations KPI repository Create once - publish to any device Native apps for all platforms Perfect scaling to any form factor Custom branding Collaborate on the go

2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.