The Role Polybase in the MDW. Brian Mitchell Microsoft Big Data Center of Expertise

Size: px
Start display at page:

Download "The Role Polybase in the MDW. Brian Mitchell Microsoft Big Data Center of Expertise"

Transcription

1 The Role Polybase in the MDW Brian Mitchell Microsoft Big Data Center of Expertise

2 Program Polybase Basics Polybase Scenarios Hadoop for Staging Ambient data from Hadoop Export Dimensions to Hadoop Hadoop as a Data Archive Demos Throughout

3 The traditional data warehouse data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. Gartner, The State of Data Warehousing in 2012 Data sources

4 The traditional data warehouse 2 Real-time data 1 Increasing data volumes Data sources Non-relational data 3 New data sources and types 4 Cloud-born data

5 Data sources Non-relational data

6 Introducing the Microsoft Analytics Platform System The turnkey modern data warehouse appliance Relational and non-relational data in a single appliance Enterprise-ready Hadoop Integrated querying across Hadoop and PDW using T-SQL Direct integration with Microsoft BI tools such as Microsoft Excel Near real-time performance with In-Memory Columnstore Ability to scale out to accommodate growing data Removal of data warehouse bottlenecks with MPP SQL Server Concurrency that fuels rapid adoption Industry s lowest data warehouse appliance price per terabyte Value through a single appliance solution Value with flexible hardware options using commodity hardware

7 Hardware and software engineered together The ease of an appliance Analytics Platform System Pre-built hardware + software appliance Co-engineered with Dell, HP, and Quanta SQL Server Parallel Data Warehouse PolyBase Pre-built hardware Pre-installed software Plug and play Built-in best practices Microsoft HDInsight Time savings Built for Big Data

8 HDInsight Region

9 APS delivers enterprise-ready Hadoop with HDInsight Manageable, secured, and highly available Hadoop integrated into the appliance SQL Server Parallel Data Warehouse High performance and tuned within the appliance End-user authentication with Active Directory PolyBase Microsoft HDInsight 100-percent Apache Hadoop Managed and monitored using System Center Accessible insights for everyone with Microsoft BI tools

10 Appliance APS appliance overview A region is a logical container within an appliance Each workload contains the following boundaries: Parallel Data Warehouse workload Fabric HDInsight workload Security Metering Servicing Hardware

11 HDInsight Overview It s HDI running on an appliance as a workload HDInsight is Microsoft branded Hortonworks distro (HDP1.3) For AU1 An integrated appliance for running PDW region and HDI region PDW is offered as a stand-alone workload on the appliance HDI is offered only as an add-on to PDW Only supported on V2 hardware H/A for the Head Node is Failover Clustering Data Node H/A is HDFS

12 What s included?

13 Hardware Topology Uses PDW HW and topology No new SKUs for the HDI region 2 additional servers on rack 1 for HDI Head Node 1 active/1 failover PDW PDW failover/spare Hadoop Hadoop failover/spare Passive scale unit for PDW PDW Control Node HDI Head Node HDI Data Nodes (1 scale unit) PDW Compute Nodes (1 scale unit) IB switch B21 IB switch B21 Ethernet switch JE068A Ethernet switch JE068A DL360G8 Server B21 DL360G8 Server B21 DL360G8 Server B21 DL360G8 Server B21 DL360G8 Server B21 DL360G8 Server B21 D6000 JBOD DL360G8 Server B21 DL360G8 Server B21 D6000 JBOD u42 u41 u40 u39 u38 u37 u36 u35 u34 u33 u32 u31 u30 u29 u28 u27 u26 u25 u24 u23 u22 u21 u20 u19 u18 u17 u16 u15 u14 u13 u12 u11 u10 u9 u8 u7 u6 u5 u4 u3 u2 u1

14 Connecting islands of data with PolyBase Bringing Hadoop point solutions and the data warehouse together for users and IT Select Result set Microsoft Azure HDInsight Hortonworks for Windows and Linux Cloudera SQL Server Parallel Data Warehouse PolyBase Microsoft HDInsight Provides a single T-SQL query model for PDW and Hadoop with rich features of T-SQL, including joins without ETL Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides the ability to query non-microsoft Hadoop distributions, such as Hortonworks and Cloudera

15 Polybase APS AU1 New versions of Hadoop New file types Multiple Hadoop Connections Predicate Pushdown

16 How to query any data, in any location, in any format? External Tables External Data Sources External File Format

17 Concept of External Tables, Data Sources & File Formats

18 Polybase Enhancing PDW query engine Data Scientists, BI Users, DB Admins Your Apps Power BI Microsoft APS Polybase External Table External Data source External File Format APS control & data nodes Polybase/APS query engine Web Apps Social Apps Mobile Apps Sensor & RFID

19 External tables Internal representation of data residing outside of appliance Introducing modified syntax (compared to PolyBase v1) o Seamless upgrade of existing v1 external tables SQL permissions required for creating external tables o ADMINISTER BULK OPERATIONS, CREATE TABLE, and ALTER ON SCHEMA permission o ALTER ANY EXTERNAL DATA SOURCE and FILE FORMAT permission CREATE EXTERNAL TABLE table_name ({<column_definition>}[,..n ]) {WITH (DATA_SOURCE = <data_source>, FILE_FORMAT = <file_format>, LOCATION = <file_path>, Referencing external data source Referencing external file format Path of the Hadoop file/folder [;] [REJECT_VALUE = <value>], } (Optional) Reject parameters

20 External data sources Internal representation o an external data source Enabling and disabling of split-based query processing Alter any external data source permission required Support of Hadoop as a data source and Windows Azure Blob Storage (WASB, formerly known as ASV) Generation of MapReduce jobs on-the-fly [fully transparent for end user] CREATE EXTERNAL DATA SOURCE datasource_name {WITH (TYPE = <data_source>, LOCATION = <location>, Type of external data source Location of external data source } [;] [JOB_TRACKER_LOCATION = <jb_location> ] Enabling or disabling of MapReduce job generation

21 External file format Internal representation of an external file format Enabling and disabling of split-based query processing Alter any external file format permission required Support of delimited text files and Hive RCFiles Generation of MapReduce jobs on-the-fly CREATE EXTERNAL FILE FORMAT fileformat_name {WITH ( FORMAT_TYPE = <type>, [SERDE_METHOD = <sede_method> ] [DATA_COMPRESSION = <compr_method> ] Type of external data source (De)Serialization method [Hive RCFile] Compression method } [;] [FORMAT_OPTIONS (<format_options>)] (Optional) Format Options [Text Files]

22 Support of additional HDFS file formats: Hive RCFiles Hadoop/Hive users prefer RCFile due to better compression and performance benefits Record Columnar File consisting of binary key/value pairs RCFile stores columns of a table in a record columnar way User has to specify serialization/deserializ ation method (SERDE_METHOD) CREATE EXTERNAL FILE FORMAT MyRCFile WITH ( FORMAT_TYPE = RCFile, [SERDE_METHOD = LazyBinarySerDe ] ) Some performance observations in-house o o LazyBinaryColumnarSerDe significantly faster and more efficient than ColumnarSerDe Data compression is not very beneficial in the case of IB connectivity between Hadoop and PDW (If low-speed networking is used, compression is expected to help)

23 Format options for delimited text files <Format Options> :: = [,FIELD_TERMINATOR= Value ], [,STRING_DELIMITER = Value ], [,DATE_FORMAT = Value ], [USE_TYPE_DEFAULT = Value ] FIELD_TERMINATOR STRING_DELIMITER DATE_FORMAT USE_TYPE_DEFAULT To indicate a column delimiter To specify the delimiter for string data type fields To specify a particular date format To specify how missing entries in text files are treated

24 (HDFS) Bridge Direct and parallelized HDFS access Enhancing the Data Movement Service (DMS) of APS to allow direct communication between HDFS data nodes and PDW compute nodes Non-relational data Social apps Mobile apps Sensor and RFID Web apps Regular T-SQL External table External data source Results External file format Relational data Traditional schema-based data warehouse applications Hadoop Enhanced PDW query engine HDFS bridge PDW

25 Querying external Hadoop data via T-SQL

26 Predicate Pushdown Reduce Data Movement Reduce the number of rows moved Reduce the number of columns moved Subset of expressions and operators

27 Querying Hadoop data via T-SQL I. Query data in HDFS and display results in table form (via external tables) II. Join data from HDFS with relational APS/PDW data Running example Creating external table ClickStream : CREATE EXTERNAL TABLE ClickStream(url varchar(50), event_date date, user_ip varchar(50)), WITH (LOCATION='//Hadoop_files/clickstream.tbl', DATA_SOURCE=MY_HDP2.0,FILE_FORMAT= MyDelimitedText) 1. External data source & file format Polybase query examples SELECT top 10 (url) FROM ClickStream where user_ip = Filter query against data in HDFS SELECT url.description FROM ClickStream cs, Url_Descr* url WHERE cs.url = url.name and cs.url= ; SELECT user_name FROM ClickStream cs, User* u WHERE cs.user_ip = u.user_ip and cs.url= ; Join data from various files in HDFS (*Url_Descr is a second text file) Join data from HDFS with data in PDW (*User is a distributed PDW table)

28 Split-based query execution through Polybase Your App PowerBI 1. (HDFS/WASB) Bridge Component Connecting and retrieving/wrting data from/to Hadoop s distributed file system or Azure s storage (containers) (HDFS/WASB) Bridge External Table External Data source External File Format APS/Polybase Query Engine M-R Job Submitter Polybase Storage Layer (PPAX) Job Submitter Component Generating map/reduce jobs on-the-fly for in-situ processing Transparent for end-user no need to learn map/reduce M/R jobs executed by Hadoop s job tracker Cost-based decision when to push computation vs. direct import of data (based on statistics) Optimized Storage Layer PPAX hybrid columnar-row storage All HDFS file formats transformed into optimized PPAX

29 Cost-based Decision I (for split-based query execution) Your App External Table External Data source PowerBI External File Format APS/Polybase Query Engine Distributed query plan SQL Server on control node Leveraging SQL Server as query compilation aid User can create statistics on external table Full scan vs. sampling Cost-based decision on push-down APS/Polybase query engine uses stats to determine the data volume to be transferred Cost factors > IO and data transfer cost Assuming high-speed networking (>10G Ethernet) (HDFS/WASB) Bridge M-R Job Submitter Polybase create statistics example Polybase Storage Layer (PPAX) CREATE STATISTICS UserIP_Stats ON ClickStream(user_IP) WITH FULLSCAN

30 Cost-based Decision II (for split-based query execution) Your App External Table External Data source (HDFS/WASB) Bridge PowerBI External File Format APS/Polybase Query Engine M-R Job Submitter Polybase Storage Layer (PPAX) Major factor for decision is data volume reduction Spin-up time for map-reduce is around seconds o Spin-up time varies depending on Hadoop distribution and underlying OS Cardinality of predicate matters o creating statistics crucial for quality of Polybase query plans o No push-down for scenarios where APS can execute under seconds w/o push-down Rough rule of thumb o Don t consider pushdown for inputs that results in less than 1 GB per *PDW distribution* Example: For 2 compute nodes, file size > 16 GB o Transfer, write, and process 1 GB per distribution faster than spinning up an m/r jobs

31 Cost-based Decision III (for split-based query execution) Your App External Table External Data source (HDFS/WASB) Bridge PowerBI External File Format APS/Polybase Query Engine M-R Job Submitter Queries can have push-able & non push-able expressions Push-able ones will be evaluated on Hadoop side (if possible) Processing of non-push- able ones will be done on PDW side Joins in general will be always executed on APS Predicates may be push-downed (if possible) Aggregations (partial or full) will be performed in PDW Partial aggregation on Hadoop envisioned for future APS releases Polybase Storage Layer (PPAX)

32 Supported Configurations for AU1 HDInsight on Analytics Platform System HDInsight s Windows Azure blob storage (WASB[S]) Hortonworks on Windows Server (HDP 1.3, 2.0) Hortonworks on Linux (HDP 1.3, 2.0) Cloudera on Linux (CDH 4.3)

33 Applications Data Sources Applications A Traditional Approach Under Pressure Business RDBMS EDW Repositories Existing Sources (CRM, ERP, ClickStream, Logs) Emerging Sources (Sensor, Sentiment, Geo, Logs, Unstr.)

34 Why Polybase? PDW PDW with Polybase

35 An Emerging Data Architecture

36 Integrating Big Data with Microsoft Data Warehousing and Business Intelligence ETL Processing

37 Using Hadoop for Staging

38 Traditional ETL Data Warehousing and Business Intelligence ETL Processing (SSIS, etc)

39 Long Term Raw Data Archiving

40 Long Term Raw Data Archiving

41 Transforming Data

42 New Data Types

43 Let s get Technical

44 Create External Table

45 CTAS Create Table AS Select CREATE TABLE mytable WITH ( CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH (CustomerKey) ) AS SELECT * FROM ClickStream;

46 Demo

47 Using Polybase to export from PDW to Hadoop

48 Exporting Conformed Dimensions to Hadoop

49 Export your Conformed Dimensions

50 Data Archiving

51 Hadoop as a Data Archive ETL Processing

52

53 CETAS Create External Table AS Select CREATE EXTERNAL TABLE hdfsfactalldataarchive WITH (LOCATION = 'user/administrator/passbac/all_data/', DATA_SOURCE = f14790hdp, FILE_FORMAT = pipedelimited ) AS SELECT * FROM FactAllData WHERE transaction_year < 2000;

54 Demo

55 Join Data on the Fly

56 Joining Data Store your Dimensional data on PDW and your Fact data on Hadoop

57 Join PDW & External Tables No Different from any other join you do today SELECT c.name, d.year, sum(sales) FROM FactSales s External Table JOIN dimcustomer c Internal Table ON c.customerid = s.customerid JOIN dimdate d Internal Table ON s.dateid = d.dateid WHERE d.year = 2008 AND c.name = Albertson & Brothers

58 Demo

59 Wrap-up

How To Create A Fact Table On Hadoop (Hadoop) On A Microsoft Powerbook 2.5.1 (Powerbook) On An Ipa 2.2 (Powerpoint) On Microsoft Microsoft 2.3

How To Create A Fact Table On Hadoop (Hadoop) On A Microsoft Powerbook 2.5.1 (Powerbook) On An Ipa 2.2 (Powerpoint) On Microsoft Microsoft 2.3 學 習 門 檻 太 高, 把 人 變 成 7x24 系 統 IT 需 要 藉 由 人 工 化 的 方 式 重 置 資 料 到 DW Learn MapReduce Prior manual IT moving HDFS into Warehouse/Data Mart before Analysis 感 應 器 HDInsight (Hadoop) SQL Server 2012 PDW SQL Server

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

Structured data meets unstructured data in Azure and Hadoop

Structured data meets unstructured data in Azure and Hadoop 1 Structured data meets unstructured data in Azure and Hadoop Sameer Parve, Blesson John [email protected] [email protected] PFE SQL Server/Analytics Platform System October 30 th 2014 Agenda

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Big Data Processing: Past, Present and Future

Big Data Processing: Past, Present and Future Big Data Processing: Past, Present and Future Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist [email protected] O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

Modern Data Warehousing

Modern Data Warehousing Modern Data Warehousing Cem Kubilay Microsoft CEE, Turkey & Israel Time is FY15 Gartner Survey April 2014 Piloting on premise 15% 10% 4% 14% 57% 2014 5% think Hadoop will replace existing DW solution (2013:

More information

Agenda. Modern Data Warehouse Big Data Application examples. Analytic Platform Systems. Integration of Hadoop and APS. Architecture Hadoop

Agenda. Modern Data Warehouse Big Data Application examples. Analytic Platform Systems. Integration of Hadoop and APS. Architecture Hadoop Microsoft Analytics Platform System The turnkey modern data warehouse appliance Stefan Cronjaeger June 2014 Agenda Modern Data Warehouse Big Data Application examples Analytic Platform Systems Architecture

More information

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional

Microsoft technológie pre BigData. Ľubomír Goryl Solution Professional Microsoft technológie pre BigData Ľubomír Goryl Solution Professional Tradičný prístup Breaking points of traditional approach Breaking points of traditional approach Breaking points of traditional approach

More information

SQL Server 2014 Faster Insights from any Data Level 300

SQL Server 2014 Faster Insights from any Data Level 300 SQL Server 2014 Faster Insights from any Data Level 300 Data Explorer Preview for Excel Enable self-service data discovery, query, transformation, and mashup experiences for information workers through

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Parallel Data Warehouse

Parallel Data Warehouse MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability

More information

Polybase for SQL Server 2016

Polybase for SQL Server 2016 Polybase for SQL Server 2016 Lukasz Grala Architect Data Platform & BI Solutions MVP SQL Server Łukasz Grala MVP SQL Server MCT MCSE Architekt i trener - Data Platform & Business Intelligence Solutions

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop

More information

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW Roger Breu PDW Solution Specialist Microsoft Western Europe Marcus Gullberg PDW Partner Account Manager Microsoft Sweden

More information

A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions

A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions A Breakthrough Platform for Next-Generation Data Warehousing and Big Data Solutions Writers: Barbara Kess and Dan Kogan Reviewers: Murshed Zaman, Henk van der Valk, John Hoang, Rick Byham Published: October

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

The Microsoft Modern Data Warehouse

The Microsoft Modern Data Warehouse The Microsoft Modern Data Warehouse Contents 4 Executive summary 4 The traditional data warehouse 5 Key trends breaking the traditional data warehouse 6 Increasing data volumes 6 Real-time data 7 New sources

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation

SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation SQL Server Parallel Data Warehouse: Architecture Overview José Blakeley Database Systems Group, Microsoft Corporation Outline Motivation MPP DBMS system architecture HW and SW Key components Query processing

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

SQL Server 2016 New Features!

SQL Server 2016 New Features! SQL Server 2016 New Features! Improvements on Always On Availability Groups: Standard Edition will come with AGs support with one db per group synchronous or asynchronous, not readable (HA/DR only). Improved

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Oracle Database 12c Plug In. Switch On. Get SMART.

Oracle Database 12c Plug In. Switch On. Get SMART. Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile. Einsatzfelder von IBM PureData Systems und Ihre Vorteile [email protected] Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics

More information

Building a BI Solution in the Cloud

Building a BI Solution in the Cloud Building a BI Solution in the Cloud Stacia Varga, Principal Consultant Email: [email protected] Twitter: @_StaciaV_ 2 SQLSaturday #467 Sponsors Stacia (Misner) Varga Over 30 years of IT experience,

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

Investor Presentation. Second Quarter 2015

Investor Presentation. Second Quarter 2015 Investor Presentation Second Quarter 2015 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences

More information

James Serra Sr BI Architect [email protected] http://jamesserra.com/

James Serra Sr BI Architect JamesSerra3@gmail.com http://jamesserra.com/ James Serra Sr BI Architect [email protected] http://jamesserra.com/ Our Focus: Microsoft Pure-Play Data Warehousing & Business Intelligence Partner Our Customers: Our Reputation: "B.I. Voyage came

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013 SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase

More information

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014 Ralph Kimball Associates 2014 The Data Warehouse Mission Identify all possible enterprise data assets Select those assets

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

SQL Server PDW. Artur Vieira Premier Field Engineer

SQL Server PDW. Artur Vieira Premier Field Engineer SQL Server PDW Artur Vieira Premier Field Engineer Agenda 1 Introduction to MPP and PDW 2 PDW Architecture and Components 3 Data Structures 4 PDW Tools Data Load / Data Output / Administrative Console

More information

SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft. SQL Server 2014 What s New? Christopher Speer Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) [email protected] The evolution of the Microsoft data platform What s New

More information

Exploring the Synergistic Relationships Between BPC, BW and HANA

Exploring the Synergistic Relationships Between BPC, BW and HANA September 9 11, 2013 Anaheim, California Exploring the Synergistic Relationships Between, BW and HANA Sheldon Edelstein SAP Database and Solution Management Learning Points SAP Business Planning and Consolidation

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Microsoft Big Data and Analytics. Server, an on-premises solution, and Windows Azure HDInsight Service*, a completely cloud-based solution.

Microsoft Big Data and Analytics. Server, an on-premises solution, and Windows Azure HDInsight Service*, a completely cloud-based solution. Executive Summary Microsoft has established a firm foothold in the world of traditionally structured data with Microsoft SQL Server* and an even firmer foothold in the world of data analysis with tools

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

Azure Data Lake Analytics

Azure Data Lake Analytics Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

Deeper Insights across Data

Deeper Insights across Data Deeper Insights across Data Technical White Paper Published: June 2015 Applies to: SQL Server 2016 Summary: Data warehousing, analytics, and business intelligence must adapt to a whole new scope, scale,

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

SQL Server to SQL Server PDW. Migration Guide (AU3)

SQL Server to SQL Server PDW. Migration Guide (AU3) SQL Server to SQL Server PDW Migration Guide (AU3) Contents 4 Summary Statement 4 Introduction 4 SQL Server Family of Products 6 Differences between SMP and MPP 8 PDW Software Architecture 10 PDW Community

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

SQL Server 2014. Point of View. Overview on Key Enhancements and Updates

SQL Server 2014. Point of View. Overview on Key Enhancements and Updates Point of View Overview on Key Enhancements and Updates Executive Overview Data and analytics strategies and solutions must be flexible and scalable to meet future needs. They must give users real-time

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

<Insert Picture Here> Oracle and/or Hadoop And what you need to know Oracle and/or Hadoop And what you need to know Jean-Pierre Dijcks Data Warehouse Product Management Agenda Business Context An overview of Hadoop and/or MapReduce Choices, choices,

More information

Teradata s Big Data Technology Strategy & Roadmap

Teradata s Big Data Technology Strategy & Roadmap Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

SAP and Hortonworks Reference Architecture

SAP and Hortonworks Reference Architecture SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Can Drive the Business and IT to Evolve and Adapt Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights

More information

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

Il mondo dei DB Cambia : Tecnologie e opportunita`

Il mondo dei DB Cambia : Tecnologie e opportunita` Il mondo dei DB Cambia : Tecnologie e opportunita` Giorgio Raico Pre-Sales Consultant Hewlett-Packard Italiana 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016

SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization. MicroStrategy World 2016 SQL Server and MicroStrategy: Functional Overview Including Recommendations for Performance Optimization MicroStrategy World 2016 Technical Integration with Microsoft SQL Server Microsoft SQL Server is

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

2009 Oracle Corporation 1

2009 Oracle Corporation 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions

More information

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data 1 Introduction SAP HANA is the leading OLTP and OLAP platform delivering instant access and critical business insight

More information

An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise

An Oracle White Paper June 2013. Oracle: Big Data for the Enterprise An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure

More information