Best Practices for Monitoring Exadata Enterprise Manager Cloud Control 12c Farouk Abushaban Senior Principal Technical Analyst Oracle USA, Engineered Systems Support September, 2014 Copyright 2014, Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal/Restricted/Highly Restricted
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Copyright 2014, Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal/Restricted/Highly Restricted 3
Objectives Understand EM 12c Topology Agent Deployment Best Practices Component Level Monitoring Discovery Deep-Dive Copyright 2014, Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal/Restricted/Highly Restricted 4
Program Agenda 1 2 3 4 5 Quick Overview of EM 12cR4 Tour of Exadata Monitoring Deep-Dive Into Database Machine Discovery Challenges and Troubleshooting Q&A Session 5
Program Agenda 1 2 3 4 5 Quick overview of EM 12cR4 Tour of Exadata Monitoring Deep-Dive Into Database Machine Discovery Challenges and Troubleshooting Q&A Session 6
Section 1: EM OVERVIEW 7
Enterprise Manager Cloud Control 12c Concepts System Management Software Lights-Out Monitoring and Notification Management and Administration Single GUI Centralized Management Target Administration Life-Cycle Management Automation Complete IT Monitoring Oracle Products Non-Oracle technologies Out-of-the-box Metrics and Alerting Real-time + Historical Perf. Trending Reports Publishing 8
Enterprise Manager Cloud Control 12c Major Components Oracle Management Agent (EM Agent) Oracle Management Server (OMS) Oracle Management Repository (EMREP) Oracle Management Plug-Ins 9
EM Console EMCLI Agent Repository Database Management Server Agent Agent 10
Management Via Plug-Ins Provide specific management capabilities per target type Standard (default installed) Plug-Ins Install Plug-Ins for each product as needed Automatic Plug-In deployment during target discovery* Automated Plug-In updates via Plug-In Manager Online or Offline Quarterly bundled updates started since 12cR3 (12.1.0.3) 11
Exadata Plug-In 12
Exadata Plug-In 12.1.0.5 / 12.1.0.6 New Features and Enhancements Supported HW and SW SPARC SuperCluster T5-8 server 1/8 Rack and Multi-Rack Expansion Rack, X4-2, 11.2.3.3, 12c GI IB performance and on-demand schematic refresh IORM active by default: CDB I/O with PDB breakdown SNMP support for non-public community strings Detailed Summary of Flash and Spindle disk performance side-by-side Cell HW Fine Grained performance monitoring Guided resolution for cell alerts and more...(see References slide for docs link) 13
Program Agenda 1 2 3 4 5 Quick overview of EM 12cR4 Tour of Exadata Monitoring Deep-Dive Into Database Machine Discovery Challenges and Troubleshooting Q&A Session 14
Section 2: Exadata Component- Monitoring Tour 15
Exadata Monitoring Install Discover Deploy Agents Introduces Host Targets Discover and Configure Exadata DBM Promote Monitored Targets Monitor Customize Monitoring Automate Tasks 16
Exadata Rack Components What s Monitored? Database Servers Storage Servers InfiniBand Switches PDU s Cisco Switch KVM Switch 17
Agent Deployment Install on each compute node Database Servers 18
Exadata Monitoring Deploy Agents Agents run on compute nodes only Compute nodes are RAC host targets Monitor Exadata targets remotely No additional software on Cells, IB s, KVM, PDU s, Cisco, and ILOM Built-in failover monitoring via OMS mediation Assign 2 agents per target Only master agent is actively monitoring the target OMS switches to backup agent when current master agent is down 19
Add Host Target 20
Agent Installation Properties 21
22
23
Program Agenda 1 2 3 4 5 Quick overview of EM 12cR4 Tour of Exadata Monitoring Deep-Dive Into Database Machine Discovery Challenges and Troubleshooting Q&A Session 24
Section 3: Exadata Discovery Deep-Dive 25
Exadata Guided Discovery More than just discovery Even better with 12.1.0.6 exa* Specify Schematic Active pre-requisite check Sets up SSH user equivalence Subscribes to SNMP Supports re-discovery of newly added hardware components Assigns Primary and Backup agents to each component 26
Discover Exadata DBM 27
28
Storage Cell Discovery From Compute Node Runs $/usr/sbin/ibnetdiscover Reads the cell hostnames and IP addresses from the output Pre-12.1.0.6 runs $ORACLE_HOME/bin/kfod op=cellconfig Reads /etc/oracle/cell/network-config/cellip.ora 29
Infiniband Network Discovery 30
InfiniBand Network Discovery From Compute Node Runs ssh nm2user@<ibswitch> ibnetdiscover Reads IB Switch names connected to the Compute Node Matches up the Compute node vs. Agent hostnames: https://exa01db01.acme.com:3872/emd/main ca 2 H-00212800.. # exa01db01 S 192.168.HCA-3 31
Prerequisite Checks You can manually run this pre-requisite check ahead of time from the compute node: $ORACLE_HOME/perl/bin/perl exadatadiscoveryprecheck.pl 32
Guided Discovery Wizard Summary Select agent (provide RDBMS home path Pre-.6) Exadata cells: runs ibnetdiscover (or kfod & cellip.ora for Pre-.6) Infiniband Switches + Compute Nodes >> ibnetdiscover KVM, PDU, Cisco +ILOM through schematic file on compute node Automatically subscribes to SNMP (Cells, and IB switches) Agent mediation and Target promotion 33
Monitoring Storage Cells EM Agent runs cellcli via ssh to collect Storage Cell metrics. MS sends SNMP traps to EM Agent for subscribed alert conditions Requires cellmonitor ssh eq. setup with Agent user Associates ASM targets and disk groups Collects rich storage data on home page, plus: Aggregate storage metrics Cell alerts via SNMP (PUSH) Capacities IORM consumer and DB level metrics And much more 34
Monitoring Infiniband Switches EM Agent runs remote ssh calls to the switch collect metrics IB Switch sends SNMP traps (PUSH) for some alerts Requires ssh eq. for nm2user for metric collections such as: Response Various sensor status Fan Voltage Temperature Port performance data Port administration 35
Monitoring Cisco Switch EM Agent runs remote SNMP get and push to collect metric data against the Cisco switch. Status / Availability Port status Vital signs: CPU, Memory, Power, Temperature Network interface various data Incoming traffic errors, traffic kb/s and % Outgoing traffic errors, traffic kb/s and % Admin and Operational bandwidth Mb/s 36
Monitoring ILOM targets EM Agent runs remote ipmitool and SNMP calls to each Compute Node ILOM target Requires nm2user credentials to run ipmitool Runs collections via perl script wrappers & collects: Response availability Sensor alerts Temperature Voltage Fan speeds Configuration Data: Firmware version and Serial number, etc 37
Monitoring Power Distribution Units EM Agent runs remote SNMP get calls and receives SNMP traps (PUSH) from each PDU Response / Ping status Phase values 38
Monitoring KVM Switch EM Agent runs remote SNMP get calls and receives SNMP traps (PUSH) from the KVM switch Status / Response Reboot events Temperature Fan status Power state Factory settings 39
Program Agenda 1 2 3 4 5 Quick overview of EM 12cR4 Tour of Exadata Monitoring Deep-Dive Into Database Machine Discovery Challenges and Troubleshooting Q&A Session 40
Section 4: Challenges and Troubleshooting 41
Challenges Redeployment of a rack: DEV to UAT to PROD etc Partitioning full rack to smaller independent racks: Full rack >> One ½ rack + two ¼ racks Combining partitioned racks to a larger rack: Two ½ racks >> Full rack Two ¼ racks >> One ½ rack etc. 42
Update existing OneCommand configurations Generate new schematic files for each partitioned rack Generate new OneCommand configuration to consolidate racks 43
Discover Exadata DBM 44
45
Challenges Discovery Adding new hardware Expanding ½ or ¼ Adding storage cells Attaching additional rack Attaching Storage Expansion rack Adding spine switch. etc 46
Challenges Networking Network configuration changes Re-IP some or all components Domain name changes Hostname changes Subnet changes Additional backup network / NICs Additional listeners (IB listeners or TNS) Firewall rules Etc 47
Troubleshooting Discovery Issues Compute node not managed by EM Check Agent hostname is different than compute node hostname Wrong agent used for discovery Reset the compute node name from client to management or vice-versa Short hostname used for agents? Fix # ibnetdiscover Match up to agent hostname Select compute node agent for discovery # /usr/sbin/set_nodedesc.sh Re-install agents using fully-qualified hostname <hostname.domain> 48
Troubleshooting Discovery Issues Extra or missing components new DBM Check Examine extra components for DBM membership Which schematic file was used for discovery? Missing components Need to generate a new schematic file Fix De-select extra components manually from the discovered list Ensure that EM can read the latest xml file on the compute node Check schematic file content Log an SR and provide details 49
Troubleshooting Discovery Issues Discovery just hangs Check Examine network OMS reported errors Repository issues Agent logs Review / Fix Hostname resolution Accessibility from OMS to Agent(s) Execute a simple job from the console MW_HOME/gc_inst/sysman/log/emoms.log Repository database alert.log $AGENT/agent_inst/sysman/log/gcagent.log 50
Troubleshooting Schematic Issues Schematic page blank Check for Browser support and EM 12c Run through discovery again and watch for messages Check emoms.log for exceptions at the same time Components missing Add manually to the schematic page - Edit button Check for component presence in EM (is it monitored?) 51
Troubleshooting Target Status Issues Target status shows DOWN inaccurately Cell: Check ssh equivalence (cellmonitor user) ssh i /home/oracle/.ssh/id_dsa l cellmonitor <cell name> -e cellcli list cell Output should be: <cell name> PDU: Check for access to browser console of PDU http://<pdu name> Is it connected to the lan? Cisco: Check for proper SNMP subscriptions See Exadata Management Doc Post Discovery 52
Troubleshooting Metric Collections Target status shows Metric Collection Error Hover over the Icon or navigate to Incident Manager Read the full text of the error Visit the Target Setup >> Monitoring Configuration page and examine Trigger a new collection: Target menu > Configuration > Last Collected > Actions > Refresh Access the monitoring Agent Metric Browser https://<agent URL>/emd/browser/main Click the target >> click Response and evaluate the results / log an SR 53
Troubleshooting Pending Status Cellsys target in Pending status forever Must have Cluster ASM, Database and Storage Cell association Check / fix the status of the associated target database Check / fix the status of the associated target ASM cluster Ensure UP status for all cell server targets Delete unassociated cellsys targets Check for problematic DBMS_JOBS in the repository database 54
Troubleshooting Pending Status Database Machine target or any associated components in Pending status Check for duplicate or pending delete targets: Setup >> Manage Cloud Control >> Health Overview Check target configuration: Target Setup >> Monitoring Configurations Search for the target name in the agent or OMS logs $ grep <target name> gcagent.log or emoms.log 55
Troubleshooting Maintenance EMDiag Download and install the latest version Always check for the latest repvfy drop. Note 1426773.1 Run: repvfy verify exadata level 9 details Run: repvfy verify This will summarize all critical / fatal issues in the repository Share the output with Support and explain the symptoms 56
Summary What we covered today EM 12c Topology and Design Agent Deployment Best Practices Component Monitoring / Discovery First-aid Troubleshooting tips 57
References Documentation Libraries, Notes, etc.. Exadata Management Online Docs Plug-In Manager - EM Cloud Control Admin Guide Exadata Plug-In BP Note 1613177.1 Database Plug-In BP Note 1580350.1 Exadata Monitoring Patch Requirement Note 1323298.1 58
Learn More Available References and Resources to Get Proactive About Oracle Support Best Practices www.oracle.com/goto/proactivesupport Get Proactive in My Oracle Support https://support. oracle.com Doc ID: 432.1 Get Proactive Blog https://blogs.oracle.com/getproactive/ Ask the Get Proactive Team get-proactive_ww@oracle.com 59
Program Agenda 1 2 3 4 5 Quick overview of EM 12cR4 Tour of Exadata Monitoring Deep-Dive Into Database Machine Discovery Challenges and Troubleshooting Q&A Session 60
Questions & Answers 61
Drinks. Food. Fun. My Oracle Support Monday Mix Tonight! Monday, September 29 6:00 to 8:00 p.m. ThirstyBear Brewing Company (only ½ block from Moscone Center) Join us for a relaxing Happy Hour after a busy day at Oracle OpenWorld! Take a break and unwind with your peers Get to know the Oracle support engineers you depend on Meet My Oracle Support executives and developers Enjoy drinks and hors d oevres Admission is free with your Oracle OpenWorld badge Event details at: www.oracle.com/goto/mondaymix Copyright 2014 Oracle and/or its affiliates. All rights reserved. Oracle Confidential Internal/Restricted/Highly Restricted 62
THANK YOU 63
64