Big Data Web Analytics Platform on AWS for Yottaa



Similar documents
Manual Penetration Testing for ContractPal

Automated Performance Testing of Desktop Applications

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Hadoop & Spark Using Amazon EMR

Big Data Analytics - Accelerated. stream-horizon.com

From Spark to Ignition:

Real Time Big Data Processing

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Big Data Services From Hitachi Data Systems

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

The Future of Data Management

Cloudera Enterprise Data Hub in Telecom:

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

Big Analytics: A Next Generation Roadmap

Customer Case Study. Sharethrough

Big Data at Cloud Scale

How to Leverage Cloud to Quickly Build Scalable Applications

Big Data Analytics Nokia

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Buying vs. Building Business Analytics. A decision resource for technology and product teams

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Big Data Use Case: Business Analytics

BIG DATA TRENDS AND TECHNOLOGIES

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Neelesh Kamkolkar, Product Manager. A Guide to Scaling Tableau Server for Self-Service Analytics

Technology Enablement

Empowering the Masses with Analytics

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Big Data and Industrial Internet

Databricks. A Primer

Databricks. A Primer

Big data blue print for cloud architecture

HDP Hadoop From concept to deployment.

Building Success on Acquia Cloud:

Cisco Data Preparation

Amazon EC2 Product Details Page 1 of 5

DATA ENGINEERING FELLOWS PROGRAM

Building your Big Data Architecture on Amazon Web Services

Ali Ghodsi Head of PM and Engineering Databricks

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Cloud Big Data Architectures

Comprehensive Analytics on the Hortonworks Data Platform

Performance and Scalability Overview

Case Study. Cloud Adoption, Fault Tolerant AWS Support & Magento ecommerce Implementation. Case Study

How To Handle Big Data With A Data Scientist

Scalable Architecture on Amazon AWS Cloud

EXECUTIVE SUMMARY CONTENTS. 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix. 1.

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

CLOUD MANAGED SERVICES FRAMEWORK E-BOOK

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

WE RUN SEVERAL ON AWS BECAUSE WE CRITICAL APPLICATIONS CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY.

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Performance and Scalability Overview

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Rapid Bottleneck Identification

SPARK USE CASE IN TELCO. Apache Spark Night ! Chance Coble!

Real-time Ad-hoc Analytics on S3 with MemSQL

Big Data Pipeline and Analytics Platform

BIG DATA ARCHITECTURE AND ANALYTICS BIG DATA STRATEGIES FOR BUSINESS GROWTH

A New Era Of Analytic

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical

Qlik UKI Consulting Services Catalogue

Big Data Infrastructure at Spotify

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Create and Drive Big Data Success Don t Get Left Behind

A Unified View of Network Monitoring. One Cohesive Network Monitoring View and How You Can Achieve It with NMSaaS

Dominik Wagenknecht Accenture

Making big data simple with Databricks

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Introducing Oracle Exalytics In-Memory Machine

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

The big data revolution

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

How enterprises will use the cloud for big data analytics

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

DataStax Enterprise, powered by Apache Cassandra (TM)

10 Practical Tips for Cloud Optimization

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data simplified. SAPSA Impuls, Stockholm Martin Faiss & Niklas Packendorff, SAP

Hadoop in the Hybrid Cloud

Transcription:

Big Data Web Analytics Platform on AWS for Yottaa Background Yottaa is a young, innovative company, providing a website acceleration platform to optimize Web and mobile applications and maximize user experience, security and profitability. It delivers fast, personalized experiences to users on any device, browser, location, or connection, and makes it possible to discover each user s browsing context and tailor the app s content and optimization profile to meet the specific needs of the moment. case study

Business Challenge The client needed to implement an in-house Operation Intelligence platform but the leading tools on the market do not have the ability to quickly detect issues in system performance and security, so they approached SoftServe, renowned for its expertise in complex Big Data solutions and architecture design methods, for help. It was also important that an Agile approach was used as innovation, expansion, continuous improvement and rapid time to market were the key drivers, forcing everyone in the team to constantly focus on results. Big Data Challenges The project need to solve several challenges, including: High Throughput 1 Billion messages per day Large volume: Estimated 300 TB Near-real time and batch processing at the same time < 1 min event processing latency < 3 sec query response time Semi-structured data sources -Web - Logs BIG DATA WEB ANALYTICS PLATFORM ON AWS 2

Project Description Importance of PoC (Proof-of-concept) in Big Data projects Technical risks are an integral part of any Big Data project. In fact, high complexity and immature technology is the current reality that software architects and engineers face. But building a full-scale prototype is not realistic due to the high cost and time requirements. While architecture analysis alone is insufficient to prove many important system properties such as performance and scalability. Instead, MVP (minimum viable product), throw-away and vertical evolutionary prototypes help in areas that architecture analysis cannot sufficiently address. For this project a throwaway prototype (also known as rapid prototypes or proof-of-concept) was chosen to quickly evaluate the riskiest technology selections, and an MVP to get early feedback from end users and updating the product roadmap accordingly. AWS as an Environment for PoC With no hardware to procure, and no infrastructure to maintain and scale, Amazon Web Service is an extremely effective time-to-market accelerator and was the perfect platform for this project, particularly as, at the early stages, the exact software and hardware requirements were not immediately clear. Particular benefits of the platform were: 1.The opportunity to experiment with software and hardware while looking for the appropriate solution (the environment can be provisioned and unprovisioned in just a few clicks). 2. Tight integration between S3 and EMR enabled a unique Hadoop cluster on demand (Amazon EMR) and resulted in significant cost savings. Hadoop itself has two functions, storing and computing, but with S3 fully covering storage, there was no need to keep the Hadoop cluster up BIG DATA WEB ANALYTICS PLATFORM ON AWS 3

and running when not utilizing the computing function. And as the web logs were stored on S3, it was possible to terminate the Hadoop cluster without data loss and launch the cluster again only when required. Elasticsearch as a Platform for Dashboarding During the discovery phase, SoftServe s Big Data experts suggested Elasticsearch as a primary data storage for real-time analytics. The goal was to implement near-real time scenarios with high query performance (< 3 sec) and minimum data latency (< 1 min) in a highly concurrent environment. The PoC phase was important in order to mitigate performance risks when utilizing Aggregates - a new functionality in Elasticsearch - and focused primarily on three tasks: Task 1. Quickly populate Elasticsearch with log data, instead of full integration with existing infrastructure which typically takes much longer. Task 2. Discover the optimal hardware and configuration for Elasticsearch and tune it for the required workload. Task 3. Create interactive visualization for testing and demos. The SoftServe team utilized EMR and Elasticsearch-Hadoop driver for Task 1. Pig and Hive was used to parse and load log data from S3 into Elasticsearch (see diagram below) where each Hive external table had been pointed to an Elasticsearch index. For Task 2 (discover optimal hardware) the team experimented with different EC2 types (general purpose, storage optimized, compute optimized, etc.) and block storages (SSD and HDD drives, instance storages, EBS with provisioned IOPS, etc.). Performance and load tests BIG DATA WEB ANALYTICS PLATFORM ON AWS 4

showed that the CPU was a bottleneck on the required workload, so finally compute optimize instances (c4 model) was selected as a base EC2 instance type for the Elasticsearch cluster. For Task 3, an initial version of the interactive dashboard using Kibana was created in less than a day. From PoC to Production While some of SoftServe s Big Data experts were working on Elasticsearch evaluation using EMR as a processing engine and load data from S3 into Elasticsearch (ETL approach), others were implementing Flume- >Kafka->Logs Consumers processing pipeline for near-real time scenario. Elasticsearch mappings (data model) was agreed at the beginning, allowing work to commence on both Elasticsearch PoC and near real time scenario, in parallel. This enabled the MVP, and the pre-production system, to be created in a short timeframe. Next Steps Amazon EMR already supports Apache Spark, a powerful computing engine that can supplement the Operation Intelligence platform by introducing stream processing through Spark Streaming and advanced analytics (Spark MLib), out-of-the-box (https://aws.amazon.com/blogs/aws/ new-apache-spark-on-amazon-emr/). But now it fully supports Kafka Direct Approach, Spark has everything required for seamless integration with the system, so the SoftServe team can leverage all advantages of AWS for the future Lambda Architecture. BIG DATA WEB ANALYTICS PLATFORM ON AWS 5

Value Delivered AWS proved to be an extremely effective time-to-market accelerator for the Operation Intelligence platform because no time was required for infrastructure deployment and configuration. The pay-as-you-go basis allows costs to be optimized and provides the flexibility to increase capacity over time depending on need. SoftServe have continued helping the client with implementation and technology advice. The first production version has been successfully released and the system is now evolving with new features. SoftServe s Big Data accelerators, experienced team and effective cloud technologies along with the client s strong product vision, were crucial to the project s success. The prototyping approach from PoC to MVP and to full-featured production, once again showed its effectiveness and allowed the agreed business and technical goals to be achieved. BIG DATA WEB ANALYTICS PLATFORM ON AWS 6

ebook About SoftServe SoftServe is a leading technology solutions company specializing in software development and consultancy services. Since 1993 we ve been partnering with organizations from start-ups to large enterprises to help them accelerate growth and innovation, transform operational efficiency, and deliver new products to market. To achieve this we ve built a strong team of the brightest, most inquiring minds in the industry, and we form close, collaborative relationships with our clients so we can really understand their needs and deliver intuitive software that exceeds their expectations. Our experience stretches from Big Data/Analytics, Cloud, Security and UX Design to the Internet of Things, Digital Health and Digital Transformation, we have offices across the globe and development centers across Eastern Europe. For more information please visit www.softserveinc.com. USA HQ Toll Free: 866-687-3588 Tel: +1-512-516-8880 Ukraine HQ Tel: +380-32-240-9090 Bulgaria Tel: +359-2-902-3760 Germany Tel: +49-69-2602-5857 Netherlands Tel: +31-20-262-33-23 Poland Tel: +48-71-382-2800 UK Tel: +44-207-544-8414 EMAIL info@softserveinc.com WEBSITE: www.softserveinc.com BIG DATA WEB ANALYTICS PLATFORM ON AWS 7