Data Warehousing Reinvented for the Cloud World. Benoit Dageville

Similar documents
Big Data & Cloud Computing. Faysal Shaarani

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Oracle Architecture, Concepts & Facilities

BIRT in the Cloud: Deployment Options for ActuateOne

The IBM Cognos Platform

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

AtScale Intelligence Platform

Configuration and Development

Hadoop & Spark Using Amazon EMR

Big Data for the Rest of Us Technical White Paper

TA 15-21: Oracle Database Subscription Licenses

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Native Connectivity to Big Data Sources in MSTR 10

ORACLE DATABASE 10G ENTERPRISE EDITION

SQL Server 2012 Parallel Data Warehouse. Solution Brief

CitusDB Architecture for Real-Time Big Data

FIFTH EDITION. Oracle Essentials. Rick Greenwald, Robert Stackowiak, and. Jonathan Stern O'REILLY" Tokyo. Koln Sebastopol. Cambridge Farnham.

Transforming the Economics of Data Warehousing with Cloud Computing

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

GoodData. Platform Overview

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Oracle Database 11g: New Features for Administrators DBA Release 2

Big Data at Cloud Scale

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

How to Enhance Traditional BI Architecture to Leverage Big Data

D12CBR Oracle Database 12c: Backup and Recovery Workshop NEW

How to Migrate From Existing BusinessObjects or Cognos Environments to MicroStrategy. Ani Jain January 29, 2014

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Overview: X5 Generation Database Machines

Advanced In-Database Analytics

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

Luncheon Webinar Series May 13, 2013

RRF Reply Reporting Framework

Elastic Data Warehousing in the Cloud Is the sky really the limit?

High Availability with Windows Server 2012 Release Candidate

SQL Server 2005 Features Comparison

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

Your Data, Any Place, Any Time.

Enterprise and Standard Feature Compare

Using Oracle Real Application Clusters (RAC)

Virtualizing Apache Hadoop. June, 2012

Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to:

Objectif. Participant. Prérequis. Pédagogie. Oracle Database 11g - New Features for Administrators Release 2. 5 Jours [35 Heures]

SQL on NoSQL (and all of the data) With Apache Drill

NoSQL for SQL Professionals William McKnight

Oracle Database In-Memory The Next Big Thing

PLATFORA SOLUTION ARCHITECTURE

Webinar: Modern Data Protection For Next-Gen Apps and Databases

Oracle Database 12c Plug In. Switch On. Get SMART.

Course Outline. Upgrading Your Skills to SQL Server 2016 Course 10986A: 5 days Instructor Led

Upgrading Your SQL Server 2000 Database Administration (DBA) Skills to SQL Server 2008 DBA Skills Course 6317A: Three days; Instructor-Led

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Online Courses. Version 9 Comprehensive Series. What's New Series

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Information Architecture

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Real Time Big Data Processing

Integrating Netezza into your existing IT landscape

Hitachi Content Platform. Andrej Gursky, Solutions Consultant May 2015

The Vertica Analytic Database Technical Overview White Paper. A DBMS Architecture Optimized for Next-Generation Data Warehousing

Migration Scenario: Migrating Batch Processes to the AWS Cloud

HPE Vertica & Hadoop. Tapping Innovation to Turbocharge Your Big Data. #SeizeTheData

SQL Server 2012 Performance White Paper

TRANSFORMING DATA PROTECTION

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

SQL Server 2012 End-to-End Business Intelligence Workshop

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Next-Generation Cloud Analytics with Amazon Redshift

CÓDIGO DESCRIPCIÓN LUGAR HORAS Ene Feb Mar Abr. CAS-6001 Administración Oracle 11g OCP (SQL+Admon I+Admon II)

Using DataDirect Connect for JDBC with Oracle Real Application Clusters (RAC)

Jitterbit Technical Overview : Salesforce

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

How To Use Hp Vertica Ondemand

Microsoft Analytics Platform System. Solution Brief

How to Leverage Cloud to Quickly Build Scalable Applications

INDIA September 2011 virtual techdays

Self-service BI for big data applications using Apache Drill

Scala Storage Scale-Out Clustered Storage White Paper

<Insert Picture Here> Oracle In-Memory Database Cache Overview

10231B: Designing a Microsoft SharePoint 2010 Infrastructure

Evaluation Checklist Data Warehouse Automation

Scaling Your Data to the Cloud

Updating Your Skills to SQL Server 2016

East Asia Network Sdn Bhd

Microsoft SQL Server 2012: What to Expect

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

More Data in Less Time

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Welcome to The Future of Analytics In Action IBM Corporation

Power BI Dashboarding. Alberto Ferrari SQLBI.

An Oracle White Paper May Oracle Database Cloud Service

How To Manage A Database Server 2012

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Transcription:

Data Warehousing Reinvented for the Cloud World Benoit Dageville

Snowflake? Startup founded in August 2012 with the ambition to build a data warehouse for the cloud Located downtown San Mateo 90+ employees, about 40 engineers, 25 core developers (and hiring ) GA in June 2015 Snowflake service currently hosted on Amazon public cloud Stores multiple petabyte of raw data Runs million SQL statements every day 2

Why Cloud? Cloud is an amazing platform for building distributed systems Access to unlimited compute power and storage capacity Provision a fleet of servers in few minutes Blob storage service allows to cheaply store petabyte data Pay what you use model Provide multi-datacenter availability Efficient access from anywhere Data democratization Enables Software as a Service Self-service, no need for complex IT organization and infrastructure Virtuous circle measured in days, not years 3

Our Vision for a Cloud Data Warehouse Data warehouse as a service No infrastructure to manage, no knobs to tune Multidimensional elasticity On-demand scalability data, queries, users All business data Native support for structured + semi-structured data 4

So Many Challenges Multi-tenant service one single database for the world No scalability bottleneck Resource isolation True Elasticity Online resize without any negative impact Provision hundred servers and use them for only an hour Shutoff compute when done Extreme availability Resilient to infrastructure failures (node, cluster, full data center) Protect against any type of data loss No downtime for software/hardware upgrades Efficient support of schemaless semi-structured data Data pruning, columnar storage, vectorized execution Petabyte volume 5

Shared-nothing Architecture? Shared-nothing architecture is not a good fit for cloud Not elastic: resizing compute cluster requires redistributing data Cannot pay as you go: shutting off compute cluster requires unloading data Limited scalability: poor multi-user scalability Limited availability: simultaneous node failures will cause downtime and data loss è We need a new architecture for cloud 6

Snowflake Multi-cluster Shared-data Architecture Authentication & access control Rest (JDBC/ODBC/Python) Cloud services Infrastructure manager Optimizer Transaction Manager Security All data in one place Metadata Dynamically combine storage and compute Virtual warehouse Cache Virtual warehouse Cache Virtual warehouse Cache Virtual warehouse Cache Independently size storage and compute Database storage No unload / reload to shut off compute Every compute cluster can access any data 7

Multi-cluster Shared-data Architecture Query performance is isolated to each Virtual Warehouse Finance Sales Loading & ETL Cluster configuration customized for workload Unlimited scalability Shared Storage Eliminates need to physically move and copy data Data Marts, Cubes, and Test/Dev Enables billing by department Bus. Dev. Marketing Test Dedicated Compute 8

Extreme Availability Snowflake UI - BI Tool - ETL Tool - JDBC - ODBC - Command Line Load Balancer SQL > Cloud Services SQL > Cloud Services SQL > Cloud Services Always On Compute Compute Compute On-Demand Storage as a Service Storage Storage Storage Infinite Data Center Data Center Data Center 9

Time travel data recovery Previous versions of data automatically retained Retention period selected by customer Accessed via SQL extensions UNDROP recovers from accidental deletion SELECT AT for point-in-time selection CLONE AT to recreate past versions T0 T1 T2 New data Modified data > SELECT * FROM mytable AT T0 10

Relational database extended to semistructured data > SELECT FROM Semi-structured data (e.g. JSON, Avro, XML) Structured data (e.g. CSV, TSV, ) Storage optimization Transparent discovery and storage optimization of repeated elements Query optimization Full database optimization for queries on semi-structured data 11

Q & A