Improve performance and availability of Banking Portal with HADOOP



Similar documents
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Modern Web development and operations practices. Grig Gheorghiu VP Tech Operations Nasty Gal

FUJITSU Software ServerView Cloud Monitoring Manager V1 Introduction

Fast Data in the Era of Big Data: Twitter s Real-

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Information Retrieval Elasticsearch

Dominik Wagenknecht Accenture

ENTERPRISE-CLASS MONITORING SOLUTION FOR EVERYONE ALL-IN-ONE OPEN-SOURCE DISTRIBUTED MONITORING

Scalable Architecture on Amazon AWS Cloud

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Data Discovery and Systems Diagnostics with the ELK stack. Rittman Mead - BI Forum 2015, Brighton. Robin Moffatt, Principal Consultant Rittman Mead

I Logs. Apache Kafka, Stream Processing, and Real-time Data Jay Kreps

Using Logstash and Elasticsearch analytics capabilities as a BI tool

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

Comprehensive Analytics on the Hortonworks Data Platform

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

the missing log collector Treasure Data, Inc. Muga Nishizawa

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Datasheet FUJITSU Software ServerView Cloud Monitoring Manager V1.0

Big Data Analytics Nokia

How To Make Data Streaming A Real Time Intelligence

Big Data Management and Security

27 th March 2015 Istanbul, Turkey. Performance Testing Best Practice

While a number of technologies fall under the Big Data label, Hadoop is the Big Data mascot.

Workshop on Hadoop with Big Data

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Siebel & Portal Performance Testing and Tuning GCP - IT Performance Practice

How to Move Your Business to Big Data: The Next Generation Enterprise Architecture

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

WHITE PAPER ON. Operational Analytics. HTC Global Services Inc. Do not copy or distribute.

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

Case Study : 3 different hadoop cluster deployments

How To Use Elasticsearch

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analyzing large flow data sets using. visualization tools. modern open-source data search and. FloCon Max Putas

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Big Data? Definition # 1: Big Data Definition Forrester Research

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Massive Cloud Auditing using Data Mining on Hadoop

Assignment # 1 (Cloud Computing Security)

Large scale processing using Hadoop. Ján Vaňo

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Modern Data Architecture for Predictive Analytics

A Performance Analysis of Distributed Indexing using Terrier

Performance Management for Enterprise Applications

Efficient Management of System Logs using a Cloud Radoslav Bodó, Daniel Kouřil CESNET. ISGC 2013, March 2013

ITP 342 Mobile App Development. APIs

Best Practices for Monitoring: Reduce Outages and Downtime. Develop an effective monitoring strategy with the right metrics, processes and alerts.

One click Hadoop clusters - anywhere

CitusDB Architecture for Real-Time Big Data

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Big Data Web Analytics Platform on AWS for Yottaa

effective performance monitoring in SAP environments

KNIME & Avira, or how I ve learned to love Big Data

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Big Data Use Case: Business Analytics

10 Best Practices for Application Performance Testing

DevOps Best Practices: Combine Coding with Collaboration

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

Oracle Database 12c Plug In. Switch On. Get SMART.

Modernizing Your Data Warehouse for Hadoop

A New Approach to Network Visibility at UBC. Presented by the Network Management Centre and Wireless Infrastructure Teams

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Chase Wu New Jersey Ins0tute of Technology

Elevating Data Center Performance Management

Cloud Computing Now and the Future Development of the IaaS

Proactive database performance management

ntopng: Realtime Network Traffic View

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Splunk Enterprise in the Cloud Vision and Roadmap

Open Source for Cloud Infrastructure

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Testing Big data is one of the biggest

Virtualization and IaaS management

Testing & Assuring Mobile End User Experience Before Production. Neotys

NAVIGATING THE BIG DATA JOURNEY

GS Big Data Platform

PLA 7 WAYS TO USE LOG DATA FOR PROACTIVE PERFORMANCE MONITORING. [ WhitePaper ]

Log infrastructure & Zabbix. logging tools integration

Proactive and Reactive Monitoring

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Transcription:

Improve performance and availability of Banking Portal with HADOOP Our client is a leading U.S. company providing information management services in Finance Investment, and Banking. This company has a variety of different web services, applications, and databases which are serviced by different teams in geographically distributed datacenters.

Powering Banking Portal with Hadoop Powering Banking Portal with Hadoop Business Challenge For end-client services, client satisfaction is one of the most valuable business metrics. If you've lost your customer s satisfaction, you ve lost your customers. Maintaining customer satisfaction is a top priority for any business, but it can often be quite a tall order. There are always factors beyond our control, but luckily with prudent application of the right technologies we Web service response time (WSRT) Time to implement new useful for clients functionality Value added services Our customer experienced problems with WSRT when rolling out new features and this caused decreasing client satisfaction. As a result of our initial assessment, we delivered the customer a list of business problems, a detailed gap analysis report, and a list of potential solutions. Extreme geographic separation of development and operations teams led to miscommunication and reduced productivity, leading to degraded solution performance. A 5% performance loss in web services and analytics applications was enough to overwhelm all services, including the data center - increasing WSRT by a factor of two. All these factors led to lost profit due to release delays, both in production and in development. To ensure stability, the following recommendations were made: Control product quality and performance at all stages. Develop an automated product and server monitoring system which analyses system and application metrics as well as product health. Provide a fully automated solution. Here is a list of the most important problems, which caused decreasing product stability: Complicated solution architecture caused unpredictable effects in case of any changes to product functionality. http://www.dtm.io/ Banking Portal with Hadoop 1 http://www.dtm.io/ Banking Portal with Hadoop 2

Project Description Based on our consulting recommendations and the characteristics of their products, the client set the following Powering Banking Portal with Hadoop Automated test development environments with real production data Automated performance testing for every module during the development phase Automated performance monitoring Performance history logging with analysis features Gor Web Servers Web Apps Ambari Sensu Client Log Shipper Sensu Logstash Flapjack: Notification ElasticSearch: Search DB Kale: Anomaly Detection Engine YARN Kibana: Visualisation The concept for this solution was to provide an easy way to add availability and performance monitoring tools, from the development to production phase - from the QA team to end users. As this is a modular system, each of these modules can be easily replaced by a better one if need be. Apps Services Hadoop Cluster PROD DEV QA HDFS Hive: Query CLI Hadoop modules data flow http://www.dtm.io/ Banking Portal with Hadoop 4 http://www.dtm.io/ Banking Portal with Hadoop 3

Scaled power of Hadoop The system was designed to be a multi-layer, highly scalable platform with the ability to detect anomalies in all modules within production, development, and QA environments. Our solution included the following integrated features: Traffic forwarding This component provides the ability to forward any HTTP traffic replay in real-time in production, staging, and dev environments. This component was implemented based on the open-source tool Gor. anomaly detection component was created by Datamart LLC and is part of our Datamart Analytics Framework. Notification and Visualisation These components were built on top of the open-source tools Flapjack and Kibana. Both these frameworks provide very sophisticated API and can be integrated with almost any external modules. Flapjack provides integration with SMS gates, and Apple and Google Push Services. Metrics collection and aggregation This component uses both Ambari API to collect Hadoop cluster metrics; Sensu clients for collecting general system metrics, like CPU and Memory usage, and Logstash for collecting log files from applications. All these modules are open-source and free of cost. These powerful components can support horizontal scaling, and monitor up to 50,000 servers right out of the box. Anomaly detection This component is responsible for analysing all the data and detecting anomalies in the collected metrics. Kale, a detection reporter, uploads to the ElasticSearch database for next visualisation with the Kibana web visualisation framework. This Delivered Value Datamart LLC successfully delivered a new, ready-to-use platform monitoring system with all of the required features and capabilities. Key Benefits: Implementation of this solution allowed the size of the operations team to be reduced by half while maintaining the highest level of product availability. http://www.dtm.io/ Banking Portal with Hadoop 5 http://www.dtm.io/ Banking Portal with Hadoop 6

The overall solution performance increased by 26% in 3 months. This allowed the development team to implement new functionality twice as fast as before. New features can be tested in an environment with the same conditions as those in production, yielding realistic and reliable results. TCO was reduced. The successful results of this project led to a fruitful business partnership. http://www.dtm.io/ Banking Portal with Hadoop 8