Customer Case Study. Automatic Labs



Similar documents
Customer Case Study. Sharethrough

Databricks. A Primer

Making big data simple with Databricks

Databricks. A Primer

Customer Case Study. Celtra

Ali Ghodsi Head of PM and Engineering Databricks

The Easiest Way to Run Spark Jobs. How-To Guide

BIG DATA ANALYTICS For REAL TIME SYSTEM

How To Handle Big Data With A Data Scientist

Big Analytics in the Cloud. Matt Winkler PM, Big

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Predictive Analytics

In-Memory Analytics for Big Data

CRITEO INTERNSHIP PROGRAM 2015/2016

KnowledgeSEEKER Marketing Edition

PBI365: Data Analytics and Reporting with Power BI

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Copyright 2013 Splunk Inc. Introducing Splunk 6

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Enhancing productivity. Enabling success. Sage CRM

Powerful analytics. and enterprise security. in a single platform. microstrategy.com 1

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

From Spark to Ignition:

Cisco Data Preparation

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

How To Use Hp Vertica Ondemand

Empower Individuals and Teams with Agile Data Visualizations in the Cloud

Logentries Insights: The State of Log Management & Analytics for AWS

BIG DATA TRENDS AND TECHNOLOGIES

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Big Data on Google Cloud

Shark Installation Guide Week 3 Report. Ankush Arora

Rocket CorVu NG. Rocket. Independence from Engineering. Powerful Data Visualization for Critical Decision-Making. brochure

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Ignite Your Creative Ideas with Fast and Engaging Data Discovery

White Paper: Datameer s User-Focused Big Data Solutions

The Rise of Industrial Big Data

The cloud that s built for your business.

How To Create A Data Visualization With Apache Spark And Zeppelin

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

The Purview Solution Integration With Splunk

Load DynamiX Storage Performance Validation: Fundamental to your Change Management Process

Cloudify and OpenStack Heat

Changing the Equation on Big Data Spending

How Companies are! Using Spark

Big Data and Market Surveillance. April 28, 2014

Unlocking the True Value of Hadoop with Open Data Science

Big Data Use Case: Business Analytics

AURO Enterprise Cloud

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

Three Open Blueprints For Big Data Success

Enhancing Productivity. Enabling Success. Sage CRM

Enhancing productivity, enabling. Success. Sage CRM

JENZABAR EX. Exceptional insights. Extraordinary results. JENZABAR EX

<no narration for this slide>

QPR WorkFlow. Minimize Process Time, Maximize Process Outcome. QPR WorkFlow 1

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

Oracle Big Data Strategy Simplified Infrastrcuture

Five Reasons Spotfire Is Better than Excel for Business Data Analytics

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

Architectures for massive data management

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Microsoft Big Data. Solution Brief

Unified Batch & Stream Processing Platform

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Data Science Certificate Program

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

MDS CLOUD Centralized Printing Fleet Intelligence

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Wonderware SmartGlance

Skynax. Mobility Management System. System Manual

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

Why Cloud BI? of Software-as-a-Service Business Intelligence. Executive Summary. This white paper explores the 10 substantial

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

The 4 Pillars of Technosoft s Big Data Practice

How To Make Data Streaming A Real Time Intelligence

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Moving From Hadoop to Spark

A Nemertes Research Conference The New Technology Game Changers: Big Data, Cloud, and SDN

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Executive Summary WHO SHOULD READ THIS PAPER?

Welcome to the Force.com Developer Day

Optimized Hadoop for Enterprise

Transcription:

Customer Case Study Automatic Labs

Customer Case Study Automatic Labs Benefits Validated product in days Completed complex queries in minutes Freed up 1 full-time data scientist Infrastructure savings of $10K/month Summary Automatic Labs needed to run large and complex queries against their entire data set to explore and come up with new product ideas. Their prior solution using Postgres impeded the ability of Automatic s team to efficiently explore data because queries took days to run and data could not be easily visualized, preventing Automatic Labs from bringing critical new products to market. Automatic Labs deployed Databricks, a simple yet powerful unified big data processing platform on Amazon Web Services (AWS) and realized these key benefits: Reduced time to bring product to market. Minimized the time to validate a product idea from months to days by speeding up the interactive exploration over Automatic s entire data set, and completing queries in minutes instead of days. Eliminated DevOps and non-core activities. Freed up one full-time data scientist from non-core activities such as DevOps and infrastructure maintenance to perform core data science activities. Infrastructure savings. Realized savings of ten thousand dollars in one month alone on AWS costs due to the ability to instantly set up and tear-down Spark clusters. Customer Case Study Automatic Labs 2

Business background Automatic Labs collects a wide array of driving data, such as the location and vehicle computer information from its users through a smartphone and a hardware attachment to the vehicle. The data is then analyzed to create custom driving reports, easy-to-understand explanations of vehicle warning lights, and recommendations for more fuel efficient driving habits for each user. The company depends heavily on its analytics platform to rapidly build innovative products through insights gleaned from data. Challenges Automatic Labs was constantly challenged in trying to find new patterns and insights from the deluge of data generated by users and their vehicles. As a result, the engineers and data scientists at Automatic Labs wanted to explore massive data sets interactively, without knowing what may turn out to be the next critical feature in their product. The rapid growth of data soon outstripped the capabilities of their Postgres database, which pushed them to evaluate alternatives. Because of the size of the data set and the exploratory nature of the work, Automatic Labs needed a big data analytics platform that allowed its engineers and data scientists to query and visualize their data interactively. This platform needed to be fast enough to provide a real-time interactive experience, capable of creating visualizations rich enough to unlock deep insights, while being user friendly and easy to learn to maximize the productivity and output of the product team. Finally, the platform also needed to leverage the existing skillset of Automatic s team - Python for engineers and data scientists, SQL for business analysts. Customer Case Study Automatic Labs 3

Solution Automatic Labs chose Databricks over Hadoop-based alternatives as the big data processing platform to replace Postgres. Databricks is a unified cloud based big data processing platform that is built on 100% open source Spark. It combines the fast performance of Spark (up to 100x faster than Hadoop MapReduce), the rich capabilities of standard library such as Spark SQL and MLlib, with a multi-user, graphical, and user-friendly interface. After being selected, Databricks was deployed in the Virtual Private Cloud (VPC) of Automatic Labs Amazon Web Services (AWS) VPC within days. The simple cluster management interface in Databricks allowed engineers and data scientists alike to create, modify, and delete Spark clusters with a few clicks without help from DevOps or IT. The ability to rapidly manipulate Spark clusters made it easy for Automatic Labs to dynamically provision resources based on processing needs. Once the Spark clusters are up and running, they were able to easily bring their data from AWS S3 into the interactive workspace and begin analysis with Spark SQL. The interactive workspace of Databricks provides notebooks, where users can write code in Python, Scala or run queries in SQL and visualize results instantly with built-in charts and graphs. Since most people are familiar with python or SQL at Automatic Labs, Databricks offered a simple and powerful platform for their product teams to write code to process their massive data set, and to examine the results through the rich graphing capabilities of matplotlib. For simpler analysis Databricks also provided a way for their non-technical personnel to run SQL queries against the data and quickly visualize the results using just a few clicks via charts and graphs built into notebooks. The interactive workspace supports multiple users, who have unique logins to ensure security. During the analysis, Automatic Labs engineers and data scientists were able to easily collaborate through notebooks by marking up code and graphs with comments, making Databricks an effective platform for publishing and report their business critical results. Customer Case Study Automatic Labs 4

Benefits Databricks met all the critical needs of Automatic Labs by speeding up their interactive analysis through a powerful big data platform that was also simple to use. The engineers and data scientists at Automatic Labs were instantly more productive with Databricks, not only did the platform support real-time interactive analysis with rich data visualization, it was also easy enough for them to learn, enabling new personnel to quickly ramp up and contribute to the team. Databricks enabled Automatic Labs to shrink the time it took to validate a product idea from months to days and sped up the interactive exploration over Automatic s entire data set. Many large and complex queries that took days to run in Postgres can be completed using Spark SQL in Databricks interactive workspace in under an hour. An analysis that languished for 3 months without conclusive results prior to the introduction of Databricks was completed in a single day using Databricks. The ability to validate product ideas quickly with Databricks resulted in Automatic Labs creating a new product after months of stalemate in analysis. Databricks simple cluster management interface allows our engineers and data scientists to create, modify, and delete Spark clusters with just a few clicks without any help from DevOps or IT. The ability to rapidly manipulate Spark clusters has made it uniquely simple for us to dynamically provision resources based on our processing needs. Once we had our Spark clusters up and running, we were able to easily bring our data from AWS S3 into our interactive workspace and begin analysis with Spark SQL. Rob Ferguson Director of Engineering, Automatic Labs Automatic Labs was also able to free up at least one full-time data scientist from nonvalue added activities such as DevOps and infrastructure maintenance to perform core data science activities. Without Databricks, Automatic Labs wasted critical data science personnel time on configuring Spark clusters, setting up infrastructure orchestration scripts, and various development environments. The zero-touch, fully hosted Spark platform and the accompanying interactive workspace of Databricks provided everything the team needed to be productive out of the box, avoiding the unnecessary drain on Customer Case Study Automatic Labs 5

precious resources. According to Automatic s team members, the environment provided by Databricks worked better in day one comparing to their previous in-house alternative that has been incrementally built up in the earlier months. In addition to productivity, Automatics Labs was also able to save on infrastructure costs through the ability of Databricks to setup and tear town Spark clusters instantly. Being able to precisely match infrastructure provisioned to need resulted in significant savings. In one instance, Automatic Labs estimated the AWS cost savings to be between seven and ten thousand dollars per month. Databricks meets all our critical needs by speeding up our interactive analysis through a powerful big data platform that is also simple to use. Our engineers and data scientists were instantly more productive once Databricks was up and running. Not only did the platform support real-time interactive analysis with rich data visualization, it was also easy for them to learn, enabling our new personnel to quickly ramp up contribute to the project with minimal down time. Rob Ferguson Director of Engineering, Automatic Labs Evaluate Databricks with a trial account now. /registration Customer Case Study Automatic Labs 150417 6