Making big data simple with Databricks

Similar documents
Databricks. A Primer

Databricks. A Primer

Customer Case Study. Sharethrough

Customer Case Study. Celtra

Customer Case Study. Automatic Labs

BIG DATA ANALYTICS For REAL TIME SYSTEM

Ali Ghodsi Head of PM and Engineering Databricks

Cloudera Enterprise Data Hub in Telecom:

Unlocking the True Value of Hadoop with Open Data Science

Integrating a Big Data Platform into Government:

The Future of Data Management

Moving From Hadoop to Spark

From Spark to Ignition:

UNIFY YOUR (BIG) DATA

Three Open Blueprints For Big Data Success

Copyright 2013 Splunk Inc. Introducing Splunk 6

GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.

locuz.com Big Data Services

Unified Big Data Processing with Apache Spark. Matei

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

HDP Hadoop From concept to deployment.

How Companies are! Using Spark

More Data in Less Time

Real Time Data Processing using Spark Streaming

Analance Data Integration Technical Whitepaper

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Why Big Data Analytics?

Unified Batch & Stream Processing Platform

Agile Business Intelligence Data Lake Architecture

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Big Data for Investment Research Management

Oracle Cloud: Line of Business PaaS Services. Balaji Yelamanchili Senior Vice President Product Development

Embracing Spark as the Scalable Data Analytics Platform for the Enterprise

BIG DATA & DATA SCIENCE

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Using Tableau Software with Hortonworks Data Platform

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

MapR: Best Solution for Customer Success

WHITE PAPER. Five Steps to Better Application Monitoring and Troubleshooting

Unleash your intuition

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Safe Harbor Statement

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Interactive data analytics drive insights

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Cisco Data Preparation

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Salesforce.com and MicroStrategy. A functional overview and recommendation for analysis and application development

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

ANALYTICS CENTER LEARNING PROGRAM

Qlik Sense Enterprise

Empowering the Masses with Analytics

1. The orange button 2. Audio Type 3. Close apps 4. Enlarge my screen 5. Headphones 6. Questions Pane. SparkR 2

Izenda & SQL Server Reporting Services

Tagetik Extends Customer Value with SQL Server 2012

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

HDP Enabling the Modern Data Architecture

Unified Big Data Analytics Pipeline. 连 城

Big Data Analytics Nokia

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

Information Architecture

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, Applies to: Microsoft SQL Server Summary:

Big Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify

Cisco IT Hadoop Journey

Native Connectivity to Big Data Sources in MSTR 10

Data virtualization: Delivering on-demand access to information throughout the enterprise

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Cloud Big Data Architectures

The big data revolution

TECHNOLOGY TRANSFER PRESENTS INTERNATIONAL. Rome, December Residenza di Ripetta Via di Ripetta, 231 CONFERENCE BIG DATA

Welcome. Host: Eric Kavanagh. The Briefing Room. Twitter Tag: #briefr

Analance Data Integration Technical Whitepaper

Making Business Intelligence Easy. White Paper Agile Business Intelligence

SAP and Hortonworks Reference Architecture

Upcoming Announcements

DATA MANAGEMENT FOR THE INTERNET OF THINGS

Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Executive Summary WHO SHOULD READ THIS PAPER?

Harnessing the power of advanced analytics with IBM Netezza

Big Analytics in the Cloud. Matt Winkler PM, Big

Microsoft Big Data. Solution Brief

Azure Data Lake Analytics

Transcription:

Making big data simple with Databricks

We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created Databricks on top of Spark to make big data simple. 2

WORKING WITH BIG DATA IS DIFFICULT Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation and will be abandoned. GARTNER 3

PROBLEM Building infrastructure and data pipelines is complex 4

Your difficult journey to finding value in data Building a cluster Import and explore data with disparate tools Build and deploy data applications Data Exploration Advanced Analytics Production Deployment ETL Data Warehousing Dashboards & Reports Long delays $$ High costs 5

3 main causes of this problem: Infrastructure is complex to build and maintain Tools are slow, clunky, and disparate Re-engineering of prototypes for deployment Expensive upfront investment Months to build Dedicated DevOps to operate Not user-friendly Long time to compute answers Lots of integration required Duplicated effort Complexity to achieve production quality 6

SOLUTION Data Value Build Databricks on top of Spark to make big data simple 7

A complete solution, from ingest to production Instant & secure infrastructure Fast and easy-to-use tools in a single platform Seamless transition to production Easy connection to diverse data sources Notebooks with oneclick visualization Data Exploration Built-in ML and graph libraries Advanced Analytics Job scheduler Production Deployment ETL Data Warehousing Real-time query engine Short time to value $$ Dashboards & Reports Customizable dashboards & 3 rd party apps Lower costs 8

Four components of Databricks Make Big Data simple Notebooks & Dashboards Jobs Cluster Manager 3 rd -Party Apps Managed Spark clusters Production pipeline scheduler Interactive workspace with notebooks 3rd party applications Easily provision clusters Harness the power of Spark Import data seamlessly Schedule production workflows Implement complete pipelines Monitor progress and results Explore data and develop code in Java, Python, Scala, or SQL Collaborate with the entire team Point and click visualization Publish customized dashboards Connect powerful BI tools Leverage a growing ecosystem of applications 9

Databricks benefits Higher productivity Faster deployment of data pipelines Data democratization Maintenance-free infrastructure Real time processing Easy to use tools Zero management Spark clusters Instant transition from prototype to production One shared repository Seamless collaboration Easy to build sophisticated dashboards and notebooks 10

A few examples of Databricks in action Prepare data Import data using APIs or connectors Cleanse mal-formed data Aggregate data to create a data warehouse Perform analytics Explore large data sets in real-time Find hidden patterns with regression analysis Publish customized dashboards Build data products Rapid prototyping Implement advanced analytics algorithms Create and monitor robust production pipelines 11

CUSTOMER CASE STUDIES 12

Customer testimonials Without Databricks and the real-time insights from Spark, we wouldn't be able to maintain our database at the pace needed for our customers Darian Shirazi, CEO, Radius Intelligence We condensed the 6 months we had planned for the initial prototype to production process to just about a couple of weeks with Databricks. Rob Ferguson, Director of Engineering, Automatic Labs Databricks is used by over a third of our staff; After implementation, the amount of analysis performed has increased sixfold, meaning more questions are being asked, more hypotheses tested. Jaka Jančar, CTO, Celtra 13

Radius Intelligence Gathering customer insights for marketers CHALLENGE: Complex data integration 25 million businesses Over 100 billion points of data RESULT: Speed up the data pipeline Entire data set processed in hours instead of days Deploy weekly updates to customers instead of monthly BENEFIT: Higher productivity 14

Automatic Labs IoT for drivers making car sensor data useful CHALLENGE: Product idea validation Ingest billions of data points Rapidly test hypothesis Iterate on ideas in real-time RESULT: Shorter time from idea to product Location based Driving habits X 3 weeks with Databricks vs. 2 months with previous solution BENEFIT: Faster Deployment of Data Pipelines 15

Celtra Building and serving rich digital ads across platforms CHALLENGE: Analytics specialist bottleneck Billions of data points, operational data of the entire company Huge backlog of analytics projects RESULT: Enable self-service for non-specialists Grew the number of analysts by 4x Increased analytics project completed by 6x in four months BENEFIT: Data Democratization 16

Sharethrough Intelligent ad placement CHALLENGE: Slow performance, costly DevOps Terabyte-scale clickstream data, long delays in new feature prototyping Two full-time engineers to maintain infrastructure $ RESULT: Faster answers, zero management Prototyped new feature in record time Reduced system downtime with faster root-cause analysis Dramatically easier to maintain than Hive BENEFIT: Higher Productivity 17

Yesware Delivering tools and analytics for sales teams CHALLENGE: Infrastructure pain, slow & clunky tools 6 months to setup Pig, very problematic pipeline Too slow to extend reporting history beyond 1 month Need to develop machine-learning algorithms X RESULT: Instant infrastructure, full suite of capabilities 3 weeks to setup robust Spark pipeline w/databricks Double data processed, in fraction of time Built-in machine learning libraries BENEFIT: Faster Deployment of Data Pipelines 18

A few of our customers 19

WHAT S NEW? 20

What s new with Databricks Databricks is now generally available (announced on June 15 th, 2015) Upcoming features during second half of 2015: R-language notebooks: Analyze large-scale data sets using R in the Databricks environment. Access control and private notebooks: Manage permissions to view and execute code at an individual level. Version control: Track changes to source code in the Databricks platform. Spark streaming support: Enabling a fault-tolerant real-time processing 21

What s new with Spark The general availability of Spark 1.4 was announced on June 10 th 2015 Spark 1.4 is largest Spark release: more than 220 contributors and 1,200 commits. Key new features introduced in Spark 1.4: New R language API (SparkR) Expansion of Spark s Dataframe API s: window functions, statistical and mathematical functions, support for missing data. API to build complete machine learning pipelines. UI visualizations for debugging and monitoring programs: interactive event timeline for jobs, DAG visualization, visual monitoring for Spark Streaming. 22

Data science made easy with Apache Spark From ingest to production þ Unified þ Zero management Higher productivity þ Fast at any scale þ Flexible þ Real-time and collaborative þ Instant to production Faster deployment of data pipelines þ No lock-in þ Open and extensible Data democratization 23

Databricks is available today Contact sales@databricks.com Or sign up for a trial at https://databricks.com/registration

Thank you