Topology Aware Analytics for Elastic Cloud Services

Similar documents
PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Development of Monitoring and Analysis Tools for the Huawei Cloud Storage

Towards Smart and Intelligent SDN Controller

Improved metrics collection and correlation for the CERN cloud storage test framework

The Purview Solution Integration With Splunk

Technical Overview Simple, Scalable, Object Storage Software

Testing & Assuring Mobile End User Experience Before Production. Neotys

Characterizing Task Usage Shapes in Google s Compute Clusters

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Amazon EC2 Product Details Page 1 of 5

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Machine Data Analytics with Sumo Logic

SolarWinds Virtualization Manager

Responsive, resilient, elastic and message driven system

Monitoring Elastic Cloud Services

Simplified Management With Hitachi Command Suite. By Hitachi Data Systems

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Scalable Architecture on Amazon AWS Cloud

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

VMware Virtualization and Cloud Management Overview VMware Inc. All rights reserved

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Logentries Insights: The State of Log Management & Analytics for AWS

Cloud Infrastructure Management - IBM VMControl

VDI FIT and VDI UX: Composite Metrics Track Good, Fair, Poor Desktop Performance

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Building Docker Cloud Services with Virtuozzo

DataStax Enterprise, powered by Apache Cassandra (TM)

Networking in the Hadoop Cluster

Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

SolarWinds Virtualization Manager

Performance White Paper

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

can you effectively plan for the migration and management of systems and applications on Vblock Platforms?

Cloud Computing Trends

TECHNOLOGY WHITE PAPER Jan 2016

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

NextGen Infrastructure for Big DATA Analytics.

MANAGEMENT SUMMARY INTRODUCTION KEY MESSAGES. Written by: Michael Azoff. Published June 2015, Ovum

ORACLE COHERENCE 12CR2

GigaSpaces Real-Time Analytics for Big Data

Performance Management for Cloudbased STC 2012

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

The Complete Performance Solution for Microsoft SQL Server

Sisense. Product Highlights.

Paper Robert Bonham, Gregory A. Smith, SAS Institute Inc., Cary NC

Performance and Scalability Overview

CitusDB Architecture for Real-Time Big Data

PRODUCTIVITY IN FOCUS PERFORMANCE MANAGEMENT SOFTWARE FOR MAILROOM AND SCANNING OPERATIONS

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Client Overview. Engagement Situation. Key Requirements

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Mike Chyi, Micro Focus Solution Consultant May 12, 2010

How To Use Mindarray For Business

DEVOPS: INNOVATIVE ENGINEERING PRACTICES FOR CONTINUOUS SOFTWARE DELIVERY

Cloud Computing. Adam Barker

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

TECHNOLOGY WHITE PAPER Jun 2012

Comparing NoSQL Solutions In a Real-World Scenario: Aerospike, Cassandra Open Source, Cassandra DataStax, Couchbase and Redis Labs

solution brief September 2011 Can You Effectively Plan For The Migration And Management of Systems And Applications on Vblock Platforms?

Payment minimization and Error-tolerant Resource Allocation for Cloud System Using equally spread current execution load

Testing Big data is one of the biggest

How To Create A Cloud Based System For Aaas (Networking)

Application of Predictive Analytics for Better Alignment of Business and IT

Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data

Health monitoring & predictive analytics To lower the TCO in a datacenter

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Business white paper. environments. The top 5 challenges and solutions for backup and recovery

Enterprise Applications in the Cloud: Non-virtualized Deployment

Lecture 02a Cloud Computing I

BROCADE PERFORMANCE MANAGEMENT SOLUTIONS

Writers: Joanne Hodgins, Omri Bahat, Morgan Oslake, and Matt Hollingsworth

CLOUD MONITORING BASED ON SNMP

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Assignment # 1 (Cloud Computing Security)

MEng, BSc Applied Computer Science

Load and Performance Load Testing. RadView Software October

EMC SCALEIO OPERATION OVERVIEW

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Hadoop & SAS Data Loader for Hadoop

Second-Generation Cloud Computing IaaS Services

7 Ways OpenStack Enables Automation & Agility for KVM Environments

Real-Time Analytics for Big Market Data with XAP In-Memory Computing

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Web Application Hosting in the AWS Cloud Best Practices

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Tableau Server Scalability Explained

Google Cloud Platform The basics

Cloud Design and Implementation. Cheng Li MPI-SWS Nov 9 th, 2010

Performance Analysis of Web based Applications on Single and Multi Core Servers

Transcription:

Topology Aware Analytics for Elastic Cloud Services athafoud@cs.ucy.ac.cy Master Thesis Presentation May 28 th 2015, Department of Computer Science, University of Cyprus

In Brief.. a Tool providing Performance and Cost Analytics for Elastic Cloud Services using Monitoring and Cost Data and utilizing the Cloud Service Topology 2

In Brief.. 3

Presentation Outline Cloud Computing and Cloud Analytics The gap in Cloud Analytics Tools Our Approach Design / Implementation Evaluation Conclusions & Future Work 4

Cloud Computing Provides a shared pool of physical resources data storage space, networks and computer processing power Utilized to run virtualized resources 5

Cloud Computing Economics Biggest advantages of cloud computing Rent on demand resources reduce bare metal machines maintenance costs Pay as you go initial cost is minimal over-cost easily manageable 6

Cloud popularity Cloud computing is responsible for providing the facilities that host nowadays the larger percentage of computing and storage services of the Internet. Cloud Maturity of Responders Enterprise workloads by type Source: Right 2015 Scale State of the Cloud Report 7

Cloud Service Development Lifecycle Idea Learn Build Cloud Usage and Cost Analytics Tools Data Code Measure 8

What is Analytics? The process of applying mathematics and statistics over a set(s) of data and the visual representation of the extracted information. Web analytics (e.g. Google) Business Analytics (e.g. SAP) Website Traffic and Visit Sales of a product 9

Cloud Analytics Insights into application behaviour assist users to improve application performance, resource utilization and thereby reduce cost Increase of demand from cloud analytics More than 1,000 customers in RightScale Cloud Analytics platform after a month announced 10

Analytics Insights Utilization / Performance Usage e.g. ram or disk used in GB, cpu load in percentage State of a resource over time Short and long term trends Visualizations intercepting the data input rate (e.g. requests/sec) and the resource\ usage (CPU, memory usage) Cost Cost per hour/month/year Cost change over the time 11

But Still Lack of Visibility 87% of deployments found underutilized in terms of CPU, Memory, I/O and Network Instances are underutilized with average utilization rates between 8-9% Difficult for user to digest analytics data, as cloud deployments scale out to massive sizes The Big Data Group Survey 2014 Cloudyn White Paper: The Great Hope of Cloud Economics 12

Different Users Different Needs 13

IT Staff Maintainers IT Staff Maintainers Forensic analysis of an incident, primarily uses monitoring data Analytics for anomaly detection e.g. in request rate to identify dos attacks 14

Different Users Different Needs Application Developers Optimize / Fine-tune cloud Services Configure Cloud Service Elastic Behavior (e.g. set rules / thresholds) 15

Different Users Different Needs Department Manager Expenses management for the departments Products / Services Single / Overall deployment(s) cost 16

Develop and Deploy 18

Develop and Deploy 19

Cloud Service Analytics Analytics Per Instance CPU utilization Possible Aggregation Per Tenant Per Cloud Project Per Region 20

Cloud Service Analytics Find which instance under-performs Identify the most costly project Pinpoint which tenant cost less Helpful for IT Staff and Managers Insufficient for Developers Can not easily focus on single deployment Can not optimize a single application tier (e.g. database) 21

Visualizations Based on Service Topology Designing User-oriented or User-centric Interfaces can increase the user experience *From Human Computer Interaction (HCI) literature 22

Advance over State-of-the-Art Focus is driven on Resource performance visualization Total application cost visualization State of the Art not aware of application topology Insights limited to application level Multi-grain performance evaluation is not available (e.g. how scaling one tier affects the cost of another tier) 23

An analytics tool Our Proposal Aim helping Cloud Service Developers Provide an additional view of the existing information. Cloud Service Topology - The keystone of our tool a way to graphically represent the blueprint of a cloud service. 24

Our Proposal Extract insights, such as the trend of the data Apply statistical calculations, In the time-series formatted data 25

Leverage Service Topology Enhance metrics data with service topology Provide fine-grained analytics. Per tier Analytics (useful to identify bottlenecks or possible bottlenecks) How one tier can affect others 26

Support elastic applications Exploit cloud elasticity More detailed analytics for the cloud application lifecycle. Help understand what it may seem abnormal behavior Use elasticity actions as time bookmarks to navigate through the deployment 27

Functionality / Requirements Calculate analytics for a cloud service deployment Visualize and Map analytics on the service topology Integrate different data sources (e.g. usage and cost metering data) Understand and expose the elastic nature of a service Respond in a timely manner 28

Two components Implementation Back-end (Computation Layer) Retrieves data from the configured data-sources Combines and Analyzes the data Exports the enriched data via a REST API. Front-end (Presentation Layer) Helps the user interact with the Backend Service Presents the information in a unified and descriptive manner 29

The Analytics Tool Runs as a Cloud Service 30

Architecture 31

Data Collection Provides A set of interfaces in order to ensure the needed abstraction A set of data objects to help communication of the internal modules Interacts with external data sources to get data Monitoring System Cost calculation Module File System Transforms the data to the internal structures 32

Analytics Calculation Statistical analysis over time-series data Common methods like MIN, MAX, MEDIAN, MEAN, SD and COUNT Based on apache.common.math Additional Optimization for performance Incremental MEAN calculation 33

Trend Calculation Analytics Calculation Implements a Simple Moving Average (SMA)* function Process-level Parallelization Break data to small chunks Calculate each chunk Trend Combine the result * G. Sanchez, 2010, Moving intervals for nonlinear time series forecasting, 34

Visualizing Time-series Time-series data a sequence of data points, typically consisting of successive measurements made over a time interval Frequently plotted via line charts The most popular choice of visualization, among scatter plots and space filling approaches 35

Visualizing Time-series - Issues Not the whole data can fit to a screen Plotting too much data* results messy Visualizations Reduces the clarity and readability User can digest less data than the one actually visualized 5000 points 3000 points * According to screen size in pixels 36

Visualizing Time-series - Solutions Reduce the amount of data (Data reduction) Remove some strain from the rendering tool Transfer less data over network Data reduction Techniques Aggregation, Binning and Sampling Algorithms preserve The visualization accuracy Not the analysis process accuracy 37

Down-sampling Largest-Triangle-Three-Buckets algorithm Separates the raw data in almost equal size buckets and Selects from each one the most representative point based on the previously selected sample. Sveinn Steinarsson Downsampling Time Series for Visual Representation 38

Additional Optimizations Visual Feedback While user waits for analytics Calculations Partial Loading / Pre-Loading Quickly create and present an estimation of the visualization e.g. Using High sampling rate Add the details later, seamlessly 39

Does it meet our expectations? Expected Characteristics / Requirements Calculate analytics for a cloud service deployment Visualize and Map analytics on the service topology Integrate different data sources (e.g. usage and cost metering data) Understand and expose the elastic nature of a service Respond in a timely manner 40

Evaluation - Functionality Define two realistic applications Configure to use the collected data Use UI to inspect analytics 41

Use Cases Case 1 A three-tier service* 1. HAProxy load balancer 2. Application server tier 3. Cassandra NoSQL distributed data storage *D. Trihinas, Managing and monitoring elastic cloud applications **Service Topology As it was Designed through CAMF User Interface 42

Use Cases Case 2 A data analysis and exploratory game application* 1. Load balancer 2. Application servers (GO) 3. Compute nodes for extracting and transforming of data 4. Redis cache 5. PostgreSQL database as the primary storage 6. Cassandra DB storing large binary blobs *Dataplay was developed by Playgen for the purposes of CELAR (FP7-317790) **Service Topology As it was Designed through CAMF User Interface 43

Evaluation - Results View auto-generated analytics reports, for cloud services components including both usage and cost metrics 44

Evaluation - Results Present analytics for various components mapped onto the cloud service topology 45

Evaluation - Response time Define What to measure Date sets Up to 1-million random metric values Data from running a 3-Tier Application Repeat each measurement several times, extract the mean Test if response time < 4s D. Trihinas, Managing and monitoring elastic cloud applications F. F.-H. Nah, A study on tolerable waiting time: how long are web users willing to wait? 46

Backend Service (B - C) Backend Service Parse data from data source Calculate trend Apply sampling Write the response 47

Frontend Application (E - F) Frontend application Transform to Google Charts format Render the Chart 48

End-to-End Time Scenario 1 Scenario - 1 3 Day raw data (10sec interval) 60% Down-sampling ~18000 point for rendering 49

End-to-End Time Scenario 2 Scenario - 2 2 Month raw data (10sec interval) Variable Downsampling Target 9000 final points 50

Evaluation Results Successfully deliver analytics In less than 4-sec time Fetch data 20% - 30% of total time Abstraction(s) does not come for free Google Chart Limits Unresponsive after ~9000 points Significant Impact to total response time 51

In this thesis Conclusion Identified the missing of a middle abstraction layer To advance the state of the art Utilize service topology provide a more structured visualization approach for the cloud usage analytics. Considering elastic cloud services Use Resizing Action as Add-on information to analytics 52

Conclusion We implemented an analytics tools Targets to help the developers. Connects to different data sources to obtain the data for the analytics Provide topology-aware analytics for elastic cloud services We evaluated our tool Provides accurate analytics results using metering data in less than 4 seconds. 53

Future Work / Plans Increase the variety of offered analytics In-depth analysis Behavior forecasting Explore ways to increase performance Usage Analytics Frameworks (e.g. Storm) Stream processes But, not all time-series statistic can be streamlined 54

Future Work / Plans In the current state, we only support accessing JCatascopia monitoring system API Various user or system exported files that follow a specific format Expand the supported Data-Sources Build additional connectors Based on our Abstraction Model Target Cloud Monitoring Systems or any service that can provide us with data to enhance the extracted insights. 55

Acknowledgements www.celarcloud.eu co-funded by the European Commission source code: https://github.com/celar/cloud-is 56

Laboratory for Internet Computing Department of Computer Science University of Cyprus http://linc.ucy.ac.cy 57