EM12c Monitoring Best Practices Author: Rob Zoeteweij Date: 13 October 2012 http://oemgc.wordpress.com Some weeks ago I posted an article on my blog after attending Ana McCollum s presentation Beyond the Basics: Making the Most of Oracle Enterprise Manager Monitoring at OOW 2012. In this document I further elaborated my notes to give a good overview of all topics discussed during the presentation. All credits for Ana! To my opinion this document could very good be the bases for your guys EM12c Best Practices document. I included some snippets of pictures I took of the slides during the presentation. They are bit blurry (sorry for that), but I hope they will give a bit more understanding. Creating the Administration Group Hierarchy Specify multiple values for the target property criteria Target Type criteria: Database, Listener, ASM belong to the same group instead of 3 groups Set the time zone when you define the group o Time zone is used for group operations and charts o All subgroups will default to the same time zone After the hierarchy is created, you can: o Add or remove values for a target property (expand/shrink hierarchy horizontally) o Add new/remove target property criteria (add/remove new level) Hierarchy will be deleted and re- created Template Collections will remain but will need re- association o Rename any group (EMCLI rename_target) How do I set Target Properties so Targets join Administration Groups? Set properties during target addition/promotion workflow o Target Properties page in console Target menu Target Setup Properties Possible Property Values are based on Administration Group Hierarchy (New in Rel2) o Use EMCLI set_target_property_value for setting the Property Values for multiple Targets at once Aggregate Targets o Cluster targets Target property set on the cluster automatically applies to all members o Non- cluster aggregate targets
Target property set on aggregate does not auto apply to members Members could be part of different aggregate targets, properties therefor need to be set explicitly Templates auto- applied only to members whose target properties match the group criteria (aka Direct Members) To set target property on aggregate and its current members EMCLI set_target_property_value propagate_to_members Example: set the Location property of a database system including its members: emcli set_target_property_value property_records= dbrac_sys:oracle_dbsys:location Bangalore propagate_to_members What Monitoring Settings will be applied to the Administration Group? Enhanced Group Management Settings (New Rel2) o Use on LEAF Groups o Shows parent groups/template collections o Review specific monitoring templates o Review combined monitoring settings from multiple templates o Verify if management settings have been applied to the group Are my Targets monitored using our Standards for Monitoring? Check synchronization Status region of TOPMOST administration group o Shows sync status of all targets in hierarchy Sync Status Column Synchronized Targets Pending Targets Running Targets Failed Targets N/A targets What to do Nothing. Targets are in sync with monitoring templates Ensure you have Global Sync Schedule defined. Indicated by Next Synchronization date; if N/A set schedule Nothing. Check later to see if they are all synchronized. Drilldown to get details; Fix where possible. Will attempt to re- sync on next sync schedule, or on demand by user Targets have no associated monitoring template. Drilldown to get target type, add monitoring template to template collection. Privileges required for Monitoring Setup You need to use super administrator to perform these actions Monitoring Setup Create Administration Group Hierarchy Required Privilege FULL Any Target Create Privilege Propagating Group
Use Monitoring Templates Use Template Collections Associate Template Collection with Administration Group None to create View on specific Monitoring Template Create Template Collection View/Full on specific Template Collections or View any Template Collection Operator on group View on Template Collection Incident Management Manage by Incidents o Significant events o Combination of events related to the same issue (e.g. events raised from database, host, storage indicating lack of space) Centralized incident management console o View, manage, diagnose and resolve incidents from one location Support for incident lifecycle operations o Assign, acknowledge, prioritize, track status, escalate, suppress o Notify and open helpdesk ticket Integrated Oracle expertise o Access to My Oracle Support knowledge base o Accelerates incident and problem diagnosis and resolution What Targets should be used in Rule Sets? Specify group(s)/systems o Specify administration group(s) if applicable o Rules keep up with changes in group membership o Example: All database targets whose Lifecycle Status = Mission Critical or Production
How do I organize my Rule Sets / Rules? Combine all rules applying to the same group in one rule set Leverage the order of rules within a rule set and group similar rules together: o Rules to create incidents o Rules to manage incidents (email, ticketing, escalation) o Put duration- based rules last Duplicate actions across rule sets o Create Incident : first rule wins (can t create multiple incidents for same event o Incident workflow (assign, set priority ): last rule wins (final value from rule) o Notifications: all actions executed What Type of Rules should I choose? Type of Rule Best used for Event Rule Create incidents based on events Create helpdesk tickets for incidents Send events to third party management systems Send email for specific events of interest (e.g. send email to business users if target is down) Incident Rule Automate incident workflow operations (e.g. assign incident) Send notifications on incidents Create helpdesk tickets for incidents (e.g. create ticket if incident is escalated to level 2) Problem Rule Automate problem workflow operations (e.g. assignment, prioritization, etc.) Send notifications on problems What Conditions should I specify in Event Rules Use broad criteria that spans multiple target types Metric Alert event rule o Use broad criteria (e.g. all critical events or critical events on specified target types) instead of individual metrics Requires controlling metric alerts thresholds) at the source Simplifies rule maintenance: No need to change rule for new metrics Target Availability event rule o Based on status metric o Choose agent unreachable only for host and/or agent targets o Choose down for all other targets Target Availability Event Severities Scenario Target Type Target Status Availability
Target is down Agent is down or unreachable Host is down or unreachable All target types except host and agent Down Event Severity Fatal Agent Agent Unreachable Critical All non- agent targets including host Agent Unreachable Warning Agent on the host Agent Unreachable Critical All agents on the Agent Unreachable Warning host including host Blackout started on target All target types Blackout Advisory Target is up (from any of All target types Up Clear the other states) Target is in status pending for more than 5 minutes All target types Unknown Warning What Conditions should I specify in Event Rules 2 Job Status event rule o No job events unless you set it up o Setup Incidents Job Events o Choose Job Status to raise events Action Required, Problems are defaults o Select targets on which job events are raised (tip: use groups) Who gets notified for Events / Incidents Checklist for email notifications o Recipient must have at least View on the source of the event o Recipient must have email address and notification schedule Can specify direct email addresses including distribution lists Leverage TO: vs CC: email notification o TO: recipients: Best used to enforce mandatory recipients of the email. Only rule creator can add these o CC: Recipients: Best used for interested parties; Users who self- subsribe to rule are added to the CC line Take advantage of page vs email classification o Enables easier setup for: Page me for critical, email me for warning Use variables as notification recipients: $INCIDENT_OWNER$ $TGT_OWNER$ $PROBLEM_OWNER$ $SOURCE_OBJ_OWNER$ Example: Setup a single rule to send email notifications to the $INCIDENT_OWNER$ when he gets assigned an incident
Tailor email message (Subject, Body) formats for your requirements o Setup Notifications Customize Email Format o Customize per event type, incident or problem email messages Using Incident Rule Sets For rules with duration conditions, put more specific criteria in the rule o Rule: After 7 days, for all critical events, clear event o Better rule: After 7 days, for all critical Generic Alert Log Error events, clear event Leverage failover feature for SMTP gateway by specifying multiple gateways Setup repeat notifications for important incidents o Will repeat until cleared or acknowledged o Acknowledge the incident via Console or Enterprise Manager Mobile How to clear these Events / Incidents? Incidents will auto- clear if all their underlying events are cleared o Most events auto- clear if underlying condition is resolved Exception: Manually- clearable events o emcli clear_stateless_alerts (bulk clear for metric alerts) o get_metrics_for_stateless_alerts lists manually clearable metric alerts o Event Rule to clear events after specified duration Tip: Put the specific metric alerts in the rule o Incident Manager: clear (appears if applicable) Clear multiple incidents (New R2)
Leveraging Incident Manager To filter on incidents of interest, create custom views on groups or by lifecycle status (New R2) To enable more granular tracking of the incident status, add new resolution status values (e.g. Waiting on SME) o emcli create_resolution_state Leverage Resolved incident status as soft closed o Set this wen fix has been implemented o Enterprise Manager will set to Closed when the underlying event/incident is cleared Maintain Priority processing of important Targets Set Lifecycle Status target property especially for important targets o Mission Critical, Production, Staging, Test, Development o Highest priority - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Lowes priority Used to prioritize loading of data and metric alerts, and processing of events for notifications, creating incidents, etc. Enable priority processing of important targets even if managed targets increase