Customer Case Study Automatic Labs
Customer Case Study Automatic Labs Benefits Validated product in days Completed complex queries in minutes Freed up 1 full-time data scientist Infrastructure savings of $10K/month Summary Automatic Labs needed to run large and complex queries against their entire data set to explore and come up with new product ideas. Their prior solution using Postgres impeded the ability of Automatic s team to efficiently explore data because queries took days to run and data could not be easily visualized, preventing Automatic Labs from bringing critical new products to market. Automatic Labs deployed Databricks, a simple yet powerful unified big data processing platform on Amazon Web Services (AWS) and realized these key benefits: Reduced time to bring product to market. Minimized the time to validate a product idea from months to days by speeding up the interactive exploration over Automatic s entire data set, and completing queries in minutes instead of days. Eliminated DevOps and non-core activities. Freed up one full-time data scientist from non-core activities such as DevOps and infrastructure maintenance to perform core data science activities. Infrastructure savings. Realized savings of ten thousand dollars in one month alone on AWS costs due to the ability to instantly set up and tear-down Spark clusters. Customer Case Study Automatic Labs 2
Business background Automatic Labs collects a wide array of driving data, such as the location and vehicle computer information from its users through a smartphone and a hardware attachment to the vehicle. The data is then analyzed to create custom driving reports, easy-to-understand explanations of vehicle warning lights, and recommendations for more fuel efficient driving habits for each user. The company depends heavily on its analytics platform to rapidly build innovative products through insights gleaned from data. Challenges Automatic Labs was constantly challenged in trying to find new patterns and insights from the deluge of data generated by users and their vehicles. As a result, the engineers and data scientists at Automatic Labs wanted to explore massive data sets interactively, without knowing what may turn out to be the next critical feature in their product. The rapid growth of data soon outstripped the capabilities of their Postgres database, which pushed them to evaluate alternatives. Because of the size of the data set and the exploratory nature of the work, Automatic Labs needed a big data analytics platform that allowed its engineers and data scientists to query and visualize their data interactively. This platform needed to be fast enough to provide a real-time interactive experience, capable of creating visualizations rich enough to unlock deep insights, while being user friendly and easy to learn to maximize the productivity and output of the product team. Finally, the platform also needed to leverage the existing skillset of Automatic s team - Python for engineers and data scientists, SQL for business analysts. Customer Case Study Automatic Labs 3
Solution Automatic Labs chose Databricks over Hadoop-based alternatives as the big data processing platform to replace Postgres. Databricks is a unified cloud based big data processing platform that is built on 100% open source Spark. It combines the fast performance of Spark (up to 100x faster than Hadoop MapReduce), the rich capabilities of standard library such as Spark SQL and MLlib, with a multi-user, graphical, and user-friendly interface. After being selected, Databricks was deployed in the Virtual Private Cloud (VPC) of Automatic Labs Amazon Web Services (AWS) VPC within days. The simple cluster management interface in Databricks allowed engineers and data scientists alike to create, modify, and delete Spark clusters with a few clicks without help from DevOps or IT. The ability to rapidly manipulate Spark clusters made it easy for Automatic Labs to dynamically provision resources based on processing needs. Once the Spark clusters are up and running, they were able to easily bring their data from AWS S3 into the interactive workspace and begin analysis with Spark SQL. The interactive workspace of Databricks provides notebooks, where users can write code in Python, Scala or run queries in SQL and visualize results instantly with built-in charts and graphs. Since most people are familiar with python or SQL at Automatic Labs, Databricks offered a simple and powerful platform for their product teams to write code to process their massive data set, and to examine the results through the rich graphing capabilities of matplotlib. For simpler analysis Databricks also provided a way for their non-technical personnel to run SQL queries against the data and quickly visualize the results using just a few clicks via charts and graphs built into notebooks. The interactive workspace supports multiple users, who have unique logins to ensure security. During the analysis, Automatic Labs engineers and data scientists were able to easily collaborate through notebooks by marking up code and graphs with comments, making Databricks an effective platform for publishing and report their business critical results. Customer Case Study Automatic Labs 4
Benefits Databricks met all the critical needs of Automatic Labs by speeding up their interactive analysis through a powerful big data platform that was also simple to use. The engineers and data scientists at Automatic Labs were instantly more productive with Databricks, not only did the platform support real-time interactive analysis with rich data visualization, it was also easy enough for them to learn, enabling new personnel to quickly ramp up and contribute to the team. Databricks enabled Automatic Labs to shrink the time it took to validate a product idea from months to days and sped up the interactive exploration over Automatic s entire data set. Many large and complex queries that took days to run in Postgres can be completed using Spark SQL in Databricks interactive workspace in under an hour. An analysis that languished for 3 months without conclusive results prior to the introduction of Databricks was completed in a single day using Databricks. The ability to validate product ideas quickly with Databricks resulted in Automatic Labs creating a new product after months of stalemate in analysis. Databricks simple cluster management interface allows our engineers and data scientists to create, modify, and delete Spark clusters with just a few clicks without any help from DevOps or IT. The ability to rapidly manipulate Spark clusters has made it uniquely simple for us to dynamically provision resources based on our processing needs. Once we had our Spark clusters up and running, we were able to easily bring our data from AWS S3 into our interactive workspace and begin analysis with Spark SQL. Rob Ferguson Director of Engineering, Automatic Labs Automatic Labs was also able to free up at least one full-time data scientist from nonvalue added activities such as DevOps and infrastructure maintenance to perform core data science activities. Without Databricks, Automatic Labs wasted critical data science personnel time on configuring Spark clusters, setting up infrastructure orchestration scripts, and various development environments. The zero-touch, fully hosted Spark platform and the accompanying interactive workspace of Databricks provided everything the team needed to be productive out of the box, avoiding the unnecessary drain on Customer Case Study Automatic Labs 5
precious resources. According to Automatic s team members, the environment provided by Databricks worked better in day one comparing to their previous in-house alternative that has been incrementally built up in the earlier months. In addition to productivity, Automatics Labs was also able to save on infrastructure costs through the ability of Databricks to setup and tear town Spark clusters instantly. Being able to precisely match infrastructure provisioned to need resulted in significant savings. In one instance, Automatic Labs estimated the AWS cost savings to be between seven and ten thousand dollars per month. Databricks meets all our critical needs by speeding up our interactive analysis through a powerful big data platform that is also simple to use. Our engineers and data scientists were instantly more productive once Databricks was up and running. Not only did the platform support real-time interactive analysis with rich data visualization, it was also easy for them to learn, enabling our new personnel to quickly ramp up contribute to the project with minimal down time. Rob Ferguson Director of Engineering, Automatic Labs Evaluate Databricks with a trial account now. /registration Customer Case Study Automatic Labs 150417 6