Monitoring a cloud? why what how Rodrigue Chakode Toulouse 2015 http://realopinsight.com/ 1 1
Why to monitor? Like when you're riding, it's watching where you are going => adjust as you go along and ensure that you're on the right track. 2 2
What monitoring is? Regular observation (and recording) of activities taking place in a system. => Routine process gathering information on various aspects of the system Credit: Phil Bartle 3 3
Monitoring in a nutshell Analysing the situation in a system Determining whether how the system is operating Identifying problems facing the system Determining whether the way the system is planned is the most appropriate (e.g. performance monitoring) Ensuring all activities in the system are carried out properly, and by the right people and in right time (e.g. IDS) Using lessons from the pass to forecast the future (e.g. capacity planning) 4 4
How to monitor? watch, record, analyze, adjust 5 5
monitoring system landscape 6 6
Monitoring panorama 7 7
Choose suitable monitoring system Performance monitoring (realtime) Graphite, Availability monitoring Nagios, Zabbix, Centreon... Business service monitoring Advanced e.g. Grafana, New Relic, AppDynamics... aggregation capabilities built on top of existing systems RealOpInsight SLA Monitoring Advanced consolidation capabilities built on top of existing systems 8 => should be driven by needs 8
What to monitor? The question of gold! 9 9
Almost everything can be monitored hardware CPU, memory, disk, NIC... software processes, file system... services socket, port, protocol... 10 10
Everything shall not be monitored Too much information can drown useful information You may waste time considering useless information Fact 11 Impression 11
What's the best monitoring? The killer question! 12 12
Use case: performance monitoring feeding in timeserie data Data collector Timeseries Database Mngt policies - storage - aggregation - retention Visualization (+ aggregation) 13 13
Demo: Graphite + Grafana 14 14
Use case: availability monitoring 15 15
Demo: Nagios 16 16
Availability & performance monitoring fail to link IT to business 17 17
Don't be confused: ops, managers, CEO... need clear information! 18 18
Data is good, but information is better Too many alarms kill alarm 19 19 S. Bortzmeye
Cloud implies new ways of thinking IT Service/application-centric Constraints on SLA Leased resources multi-tenancy 20 20
Prioritize and orchestrate work based on business needs 21 21
Business Service Monitoring (BSM) Add intelligence in the monitoring Quick assess how each incident impacts your business Business impact vs Low-level incident Prioritization of incidents Delegating management, multitenancy... 22 22
How a failure actually impacts your business? 23 23
Business Impact RAID 0 (striping) RAID 1 (mirroring) 24 24
More about business impact Redundant databases Merchant-site 25 25
Benefits of business service monitoring? Reduces downtime by up to 75% Delivers services up to 30% more efficiently Credit: http://www.bmc.com/solutions/bsm/ 26 26
BSM is a Discipline, and Every IT Environment is Unique 27 27
BSM => deal with dependencies 28 28
About Dependencies A business service may depend on : IT items (discs, And/or processes, ping...) other business services E.g. Streaming Web Server Databases Network Operating System Hardware Devices... 29 29
BSM => flexible incident management Data collection: only what is useful Delegating management, multitenancy, op-centric views... Specific algorithms for aggregating and propagating severities High-level notifications 30 30
Takes the IT you already have, and adds to it the visibility and control of a unified platform 31 31
Use case: RealOpInsight Versatile open source BSM toolkit Support to Nagios, Zabbix, Zenoss, Shinken, Centreon, Icinga, Naemon, GroundWork, op5... Multi-sources, multi-backends Easy to deploy, easy to use Binaries, repositories, VM images Cross-plaform (OS X, Windows, Linux), GUI and Web-based user interfaces http://realopinsight.com 32 32
Flexible Architecture thanks to API-based integration 33 33
At-a-glance dashboard 34 Global overview of business services healthy Summary via thumbnails, tab status, tooltips, latest open events feeds Web UI fully compatible with mobiles 34
Detailed View 35 Interactive treeview, map, event console and chart Custom severity aggregation and propagation rules... 35
Business Views as Descriptive and Interactive Dashboards XML-based description Native WYSIWYG editor Dynamic dashboard generation 36 36
Demo? 37 37
Rodrigue Chakode @ About.Me PhD HPC/Cloud Former HPC/Cloud engineer Presales in Broadcast area DevOps, cloud & monitoring contributor OpenNebula Author Contributor (addons: oneinsight, SVMSched) & leader RealOpInsight Labs (http://realopinsight.com) Links Http://linkedin.com/in/rodriguechakode http://slideshare.net/rodriguechakode http://realopinsight.com https://github.com/opennebula/addon-oneinsight http://wiki.opennebula.org/ecosystem:svmsched 38 38
Thanks, Q/A? Monitor at scale with business focus @realopinsight 39 39