Customer Case Study. Sharethrough



Similar documents
Making big data simple with Databricks

Databricks. A Primer

Customer Case Study. Automatic Labs

Customer Case Study. Celtra

Databricks. A Primer

Ali Ghodsi Head of PM and Engineering Databricks

BIG DATA ANALYTICS For REAL TIME SYSTEM

Using Tableau Software with Hortonworks Data Platform

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

!!!!! BIG DATA IN A DAY!

Microsoft Big Data. Solution Brief

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

The Easiest Way to Run Spark Jobs. How-To Guide

Big Data Web Analytics Platform on AWS for Yottaa

HDP Hadoop From concept to deployment.

Unified Big Data Processing with Apache Spark. Matei

The Future of Data Management

locuz.com Big Data Services

Cisco Data Preparation

Azure Data Lake Analytics

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

The 4 Pillars of Technosoft s Big Data Practice

Tagetik Extends Customer Value with SQL Server 2012

High Performance Data Management Use of Standards in Commercial Product Development

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

From Spark to Ignition:

Powerful analytics. and enterprise security. in a single platform. microstrategy.com 1

Empower Individuals and Teams with Agile Data Visualizations in the Cloud

Why Big Data Analytics?

Copyright 2013 Splunk Inc. Introducing Splunk 6

Moving From Hadoop to Spark

Implement Hadoop jobs to extract business value from large and varied data sets

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Leveraging Machine Data to Deliver New Insights for Business Analytics

INTRODUCING RETAIL INTELLIGENCE

Enhancing productivity. Enabling success. Sage CRM

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Qlik Sense Enterprise

CRGroup Whitepaper: Digging through the Data. Reporting Options in Microsoft Dynamics GP

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

The 3 questions to ask yourself about BIG DATA

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Shark Installation Guide Week 3 Report. Ankush Arora

Logentries Insights: The State of Log Management & Analytics for AWS

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

How To Turn Big Data Into An Insight

Student Project 2 - Apps Frequently Installed Together

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

KnowledgeSEEKER Marketing Edition

Apache Hadoop: The Big Data Refinery

Monitoring Best Practices for

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Cloudera Enterprise Data Hub in Telecom:

STEELCENTRAL APPINTERNALS

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Monitoring Best Practices for COMMERCE

Three Open Blueprints For Big Data Success

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

XpoLog Competitive Comparison Sheet

6.S897 Large-Scale Systems

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

WHY IT ORGANIZATIONS CAN T LIVE WITHOUT QLIKVIEW

Information Builders Mission & Value Proposition

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS

The Clear Path to Business Intelligence

Introduction to Spark

Inventory and Analytics for Browser-based Applications in the Enterprise

Data processing goes big

Enhancing Productivity. Enabling Success. Sage CRM

Empowering Teams and Departments with Agile Visualizations

ORACLE UTILITIES ANALYTICS

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Why should you look at your logs? Why ELK (Elasticsearch, Logstash, and Kibana)?

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Hadoop & Spark Using Amazon EMR

BIG Data Analytics Move to Competitive Advantage

Improving Business Insight

SAP and Hortonworks Reference Architecture

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Transcription:

Customer Case Study

Customer Case Study Benefits Faster prototyping of new applications Easier debugging of complex pipelines Improved overall engineering team productivity Summary offers a robust advertising platform; discovering hidden patterns in data is critical to measuring the effectiveness of their products and in making improvements to the overall product suite Initial attempt to establish a self-hosted Hadoop cluster with Hive as the ad hoc query tool required two full-time engineers to manage the infrastructure, and was not an effective interactive query platform Databricks offered significant benefits for, including: faster prototyping of new applications, easier debugging of complex pipelines, and improved engineering productivity Customer Case Study 2

Business Background builds software for delivering ads into the natural flow of content sites and apps (also known as native advertising). Because serves ads on some of the most popular digital properties such as Forbes and People, the need for a highperformance big data scale processing platform permeates every aspect of their business. A core engineering function at is revenue management and analytics. The engineering team is responsible for building and maintaining a complex series of algorithms to analyze the terabyte-scale data generated by the ad serving platform. This data, known as the clickstream, contains crucial information about interactions between viewers and the content served by. Discovering hidden patterns in the clickstream is critical to measuring the effectiveness of the platform and in making improvements to the overall product suite. The clickstream is also valuable to the Support and Product Management teams at as it helps them define and improve the features and capabilities of the product. These teams leverage information from the clickstream to make data-driven decisions everyday. For example, a product manager would use this data to aggregate device data in a highly customized, specific manner to help identify new market opportunities. The Support team on the other hand, would mine the clickstream to gain deeper insights into the behavior of publisher integrations. Customer Case Study 3

Challenges initially turned to a self-hosted Hadoop cluster to meet it s big data processing needs. In order to keep the Hadoop clusters running smoothly, the team had to dedicate two full-time engineers to infrastructure maintenance. As a consequence, the engineering team lost two irreplaceable members in pursuing their core mission of building and supporting data-driven products. As part of their Hadoop stack, utilized Hive as their query engine. Hive s slow performance meant that many queries took a long time to run, creating not only contention, but also performance bottlenecks throughout the data pipeline. Instead of being able to explore data freely, engineers often had to wait for extended periods for Hive queries to complete, or to troubleshoot queries that never completed. These challenges were severely impeding s team productivity. They needed a faster and more reliable query engine that would minimize the waiting time and mitigate any performance bottlenecks. The desired solution would also need to be more cost efficient and less labor-intensive from an infrastructure management perspective. Solution deployed Databricks to provide the critical data processing components necessary for them to develop and test their data pipeline more effectively. These components included: Fully managed Spark clusters in the cloud that helps them focus on their data and not operations. An interactive workspace for exploration and visualization so teams can learn, work and collaborate in a single, easy to use environment. Customer Case Study 4

Databricks was deployed in s Virtual Private Cloud (VPC) in Amazon Web Services (AWS) within days. The cluster management interface in Databricks was simple enough to enable s engineering team to create, scale, and terminate Spark clusters with a few clicks, instead of dedicating full-time engineers to this task, as was the case with the self-hosted Hadoop clusters. Once the Spark clusters were in place, was able to easily bring their clickstream data from AWS S3 into the interactive workspace of Databricks. The interactive workspace provides notebooks, enabling users to work with the data in their preferred language - SQL, Python, Java, or Scala. Regardless of the language chosen, Spark s performance was memory-optimized and could compute results within an interactive session. The users were immediately able to visualize the results with many rich charts and graphs built into the notebooks with just a few clicks. This ability to visualize and interact with data in real-time with Databricks was a critical new capability for that was unattainable with Hive. With our previous data solution, despite the complexity and having to dedicate two full-time engineers to infrastructure maintenance, we still struggled with slow performance. In contrast, Databricks offered us the critical data processing components necessary for our team to uncover datadriven insights from our valuable clickstream. Robert Slifka Vice President of Engineering, Benefits gained significant benefits by adopting Databricks, including: faster prototyping of new applications, easier debugging of complex pipelines, and improved overall engineering team productivity. With Databricks, was able to prototype new applications dramatically faster, enabling their engineers to easily and painlessly experiment with innovative ways to perform data processing and aggregation. For example, s new streaming project with Kinesis progressed rapidly because the log processing semantics and Customer Case Study 5

analytics had already been thoroughly validated by prototypes built in Databricks prior to integration. Since Databricks is a controlled environment where engineers can easily run production code, debugging complex pipelines also became much easier for the team and they were able to reproduce failure characteristics quicker. This capability enabled to identify the root cause of production failures faster and reduce system downtime. This ultimately resulted in higher productivity for the entire team. Specific examples include: Freeing the two dedicated full-time engineers from supporting self-hosted Hadoop clusters to focus on core responsibilities in building and supporting data products Eliminating time spent on establishing poorly maintained Hive schemas by running Spark SQL Reducing engineering time spent on maintaining separate code bases by replacing custom UDFs in Hive queries with the combination of Spark SQL and libraries in Databricks notebooks More effective collaboration by sharing notebooks and building a common codebase between teams during investigations of failures Providing direct clickstream access to the product team with minimum support from the engineering team by building lightweight custom dashboards in Databricks Thanks to Databricks, our engineers have gone from being burdened with operations, having to face long wait times, performance bottlenecks and other hurdles that impede our progress to having the ability to easily dive right into analytics. As a result, our team is more productive and collaborative with big data than ever. Robert Slifka Vice President of Engineering, Evaluate Databricks with a trial account now. /registration Customer Case Study 150417 6