Monitoring Cloud Applications Amit Pathak 1
Agenda ontext hallenges onitoring-as-a-service ey Highlights enefits 2
Context Are agreed service levels met? Overall how many applications are healthy vs non-healthy? Is the health getting worse over time? Are the business functions being performed as expected? Do you have capacity within applications? 3
Context Cloud Complexity Scale and diversity of the infrastructure - Servers, network devices, storages, etc. - Hundreds, even thousands of machines Massive number of user applications - Catastrophic consequence of failure / security breach / performance degradation 4
Context Resource utilization is tightly coupled with cost incurred by customers Monitoring is indispensable Availability, failure detection Performance, provisioning Security, anomaly detection Application-level monitoring 5
Challenges - Overview Inherits performance monitoring challenges of virtualized world End user response time a primary metric Mechanism to collect data from various sources Managing agents Monitor, identify & heal bottlenecks 6
Challenges - Overview Detect performance degradation: Single malfunctioning application on a guest has a potential to degrade performance of host and other resources Resource contention among applications executing on VMs may hamper performance Virtual machines not configured with sufficient resource to handle workload 7
Challenges A Closer Look Source: Monitis 8
Challenges A Closer Look System Challenges User Challenges Cloud Monitoring Network Challenges 9
Challenges System Level Efficient Scalability: Monitor tasks tens of thousands Cost effective - minimize resource usage Facilitating service 10
Challenges System Level Efficient Scalability: Massive Scale Monitor inherent large scale tasks Large number of users - Infrastructure monitoring - Application monitoring Monitor tasks with high cost e.g. Resources with high consumption 11
Challenges System Level Monitoring QoS Assurance: SLA management Application security Federated identity of cloud applications Secured integration of cloud apps with on-premise apps Multi-tenant environment Authorization & access control Monitor contention between monitoring tasks 12
Challenges User Level Continuous violation detection Need of different detection model - Dynamically add/remove servers based on performance Achieve efficiency at the same time Short-term burst Persistent violation 13
Challenges Network Level Resource-aware monitoring fabric Monitoring the functioning of both systems and applications running on large-scale distributed systems Continuous collecting detailed attribute values - A large number of nodes - A large number of attributes Overhead increases quickly as the system, application and monitoring tasks scales up 14
Performance Monitoring Understand performance of virtual infrastructure outside in approach Troubleshoot bottlenecks Plan future needs 15
Key Parameters To Monitor CPU Memory Network Disk 16
CPU CPU saturated? High Ready time Problematic if it is sustained for high periods Possible contention for CPU resources among VMs Workload Variability? Resource limits on VMs? Actual over commitment? High SwapWait time 17
Memory Swap in rate Swap out rate Swap used 18
Disk What should I look for to figure out if disk is an issue? IOPs? Bandwidth (read/write)? Latencies? 19
Network What should I look for to figure out if network is an issue? Packate rate? Bandwidth (read/write)? NIC status? 20
Monitoring-as-a-Service 21
Monitoring-as-a-Service Similar to other cloud services Database service (e.g. SimpleDB, Datastore) Storage service (e.g. S3) Application service (e.g. AppEngine) 22
High Level Solution Applications, Server CPU, memory, disk IO Events & Alerts Customization Packate rate, bandwidth, NICs Gather data from various resources Trend analysis 23
Monitoring-as-a-Service External monitoring Server monitoring Network monitoring Transaction monitoring Cloud monitoring Web Traffic monitor Web server, file server, mail server, VOIP CPU, memory, processes, storage Http, SSH, SNMP, discovery Multi-step apps, workflows Track running instances, auto-deploy, usage Visitor, page views 24
Key Highlights Scale dynamically Have minimum (or no) impact on the monitored infrastructure Should be portable and has to be light weight Easy feature customization. Not all metrics will need to be monitored in the cloud for everyone Heavy network based monitoring tools may not be a good fit 25
Key Highlights Comprehensive monitoring of resource performance and availability Applications, databases, middleware and web servers Provide innovative ideas to fetch data as business need grows Dashboard, views, reports Co-relate information from different sources Trends analysis Predict bottlenecks 26
Benefits End-to-end support Easy to use & maintain Reliable service Feature customization Cost effective 27
Summary Cloud is complex; monitoring needs are indispensable End user response time is primary focus Cloud services must be treated differently to on-premise software when it comes to systems monitoring Do not rely on vendors completely. If SLAs are serious, maintain your own logs Existing tools are good but use programmatic APIs for specific needs 28
Thank You 29
References http://developer.vmware.com http://www.cc.gatech.edu/ http://portal.monitis.com/index.php/resources http://www.hyperic.com/ http://mypublicstrangeworld.posterous.com/cloud-monitoring-services-a-resource-guide http://www.itpro.co.uk/630655/dont-leave-cloud-monitoring-to-vendors-expert-warns http://www.virtualizationpractice.com http://virtualization.sys-con.com/ http://blog.newrelic.com/ 30