Linux Managing your server with service and productivity tools
Linux Managing your server with service and productivity tools
Note Before using this information and the product it supports, read the information in Notices on page 33. Fifth Edition (October 2014) Copyright IBM Corporation 2010, 2014. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents Managing your server with service and productivity tools.......... 1 What's new in Managing your server with service and productivity tools........... 1 Installing service and productivity tools..... 2 Installing tools by using a server package manager 2 Installing specific tools by using the rpm command.............. 2 Installing tools by using the IBM Installation Toolkit............... 3 Displaying package man pages........ 4 Middleware and infrastructure........ 6 DynamicRM............. 6 ServiceRM.............. 7 Service and productivity tools........ 7 Hardware inventory........... 7 Inventory scout............ 8 Platform diagnostics (ppc64-diag)...... 9 Extended error handling........ 12 servicelog.............. 12 Environmental and power management features 14 Service aids............. 15 Handling errors in guests with FWNMI... 19 IBM Performance Management for Power Systems.............. 19 Large page analysis........... 20 IBM Power RAID adapter utilities (iprutils)... 20 IBM Electronic Service Agent....... 22 Diagnosing RTAS events.......... 23 Displaying RTAS events in the kernel ring buffer 23 Displaying RTAS events in the servicelog database.............. 23 Displaying RTAS events logged in the /var/log/platform file......... 24 Collecting support data.......... 27 Running stand-alone diagnostics on a PowerKVM system........... 28 Getting support............. 30 Online version............. 31 Notices.............. 33 Trademarks.............. 34 Code license and disclaimer information............. 35 Copyright IBM Corp. 2010, 2014 iii
iv Linux: Managing your server with service and productivity tools
Managing your server with service and productivity tools Linux on Power service and productivity tools are also known as RAS (reliability, availability, and serviceability) tools. This topic collection provides information about hardware service diagnostic aids, productivity tools, and installation aids for Linux operating systems on IBM servers based on POWER8, POWER7, POWER6, and POWER5 technology. Note: By using the code examples, you agree to the terms of the Code license and disclaimer information on page 31. What's new in Managing your server with service and productivity tools Managing your server with service and productivity tools introduces the following new topics. IBM Serviceable Event Provider for PowerKVM systems The IBM Serviceable Event Provider provides key functions, which identifies serviceable problems and sends the respective SNMP traps. IBM Serviceable Event Provider also provides a registration mechanism to receive the SNMP traps. See IBM Serviceable Event Provider for IBM PowerKVM Systems for more information. Extended error handling capabilities On systems running IBM PowerKVM with OPAL firmware, you can take advantage of extended error handling (EEH) capabilities for detecting and reporting a variety of PCI bus error conditions. See Extended error handling on page 12 for more information. Firmware assisted non-maskable interrupts On systems running IBM PowerKVM virtualization, you can use firmware assisted non-maskable interrupts (FWNMI) for error handling and recovery of machine checks in guests. See Handling errors in guests with FWNMI on page 19 for more information. Environmental and power management features POWER8 systems with OPAL firmware provide new features for safeguarding your system's environment and power supply. See Environmental and power management features on page 14 for details. Reliability, availability, and service tools for PowerKVM environments Statements of support and new tools for PowerKVM environments were added to the following topics: v Hardware inventory on page 7 v Inventory scout on page 8 v IBM Power RAID adapter utilities (iprutils) on page 20 v Platform diagnostics (ppc64-diag) on page 9 v Service aids on page 15 Copyright IBM Corp. 2010, 2014 1
Stand-alone diagnostics Use these instructions to diagnose hardware problems on a system using IBM PowerKVM virtualization. The stand-alone diagnostics tool cannot be run on a system while the system is in PowerKVM mode, so these instructions will guide you through changing the mode from PowerKVM to PowerVM, running the diagnostics tool, and changing the mode back to PowerKVM. See Running stand-alone diagnostics on a PowerKVM system on page 28 to learn more. Installing service and productivity tools You can install service and productivity tools by using your server package manager, the rpm command, or the IBM Installation Toolkit. Installing tools by using a server package manager You can install service and productivity tools from the IBM Linux on Power tools repository by using your server package manager. Note: On servers connected to the internet but without graphics support, you might find that using a line-mode web browser like w3m or lynx simplifies downloading files directly to your server. See the instructions for installing packages using the IBM Linux on Power tools repository in the IBM Linux on Power tools repository (http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaae/ liaaetoolsrepository.htm) topic. You can use the repository to install all service tools, or a subset of the tools. For information about service and productivity tools package support, see Getting support on page 30. Installing specific tools by using the rpm command You can install specific service and productivity tools from the IBM Tools Repository by using the rpm command. Tool availability can vary by server type and Linux distribution. Note: On servers connected to the internet but without graphics support, you might find that using a line-mode web browser like w3m or lynx simplifies downloading files directly to your server. Complete the following steps: 1. Review this package prerequisite diagram to understand which packages to obtain: https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/flowchart.pdf. 2. Obtain the RPM package files for specific tools and their prerequisites as described at one of the following sites corresponding to your server type and Linux distribution. Tool packages must be installed in the order listed in the site table. Blade server Red Hat Enterprise Linux 6 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/bladecenter/ rhel6.html Red Hat Enterprise Linux 5 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/bladecenter/ rhel5.html SUSE Linux Enterprise Server 11 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/suselinux/ bladecenter/sles11.html 2 Linux: Managing your server with service and productivity tools
SUSE Linux Enterprise Server 10 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/suselinux/ bladecenter/sles10.html HMC-managed or IVM-managed server Red Hat Enterprise Linux 6 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/ hmcmanaged/rhel6.html Red Hat Enterprise Linux 5 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/ hmcmanaged/rhel5.html SUSE Linux Enterprise Server 11 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/suselinux/ hmcmanaged/sles11.html SUSE Linux Enterprise Server 10 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/suselinux/ hmcmanaged/sles10.html Stand-alone server Red Hat Enterprise Linux 6 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/other/ rhel6.html Red Hat Enterprise Linux 5 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/redhat/other/ rhel5.html SUSE Linux Enterprise Server 11 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/suselinux/other/ sles11.html SUSE Linux Enterprise Server 10 https://www14.software.ibm.com/webapp/set2/sas/f/lopdiags/suselinux/other/ sles10.html Note: For Ubuntu packages, see the Ubuntu Package Search page. Ubuntu packages are provided by the distribution in.deb format. RSCT packages are not currently available for the Ubuntu distribution. 3. If the package files are in compressed, gzip format (<filename>.gz), uncompress them by entering the following command for each file: gunzip <filename>.gz Otherwise, continue to the next step. 4. Install RPM files by entering the following command for each file: rpm -Uvh <filename>.rpm For information about service and productivity tools package support, see Getting support on page 30. Installing tools by using the IBM Installation Toolkit You can install service and productivity tools by using the IBM Installation Toolkit. For information about installing service and productivity tools by using the IBM Installation Toolkit while installing Linux on your server, see Installing a Linux distribution (http://publib.boulder.ibm.com/ infocenter/lnxinfo/v3r0m0/topic/liaan/ppwelcomeinstalllinux.htm). Managing your server with service and productivity tools 3
For information about installing service and productivity tools by using the IBM Installation Toolkit when Linux is already installed on your server, see Updating an installed system (http:// publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaan/ppcreaetenetwork.htm). For an overview of the IBM Installation Toolkit, see Introducing IBM Installation Toolkit (http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaan/ppintroduction.htm). Displaying package man pages This topic explains how to display man pages provided by packages installed on your system. The version of a service and productivity tools package can vary between Linux distributions. You might also want to install the latest version of an open source package that is available online. Given this variance and flexibility, the best source of accurate and relevant information for commands in a package are the man pages installed with the commands. To determine the man pages provided by a package installed on your system, list the file content of the package by running rpm -ql package-name. Man page path names typically contain /man/: # rpm -ql servicelog /usr/bin/log_repair_action /usr/bin/servicelog /usr/bin/servicelog_manage /usr/bin/servicelog_notify /usr/bin/v1_servicelog /usr/bin/v29_servicelog /usr/sbin/slog_common_event /usr/share/doc/packages/servicelog /usr/share/doc/packages/servicelog/copying /usr/share/man/man8/log_repair_action.8.gz /usr/share/man/man8/servicelog.8.gz /usr/share/man/man8/servicelog_notify.8.gz You can display the man pages you discover with the man command: 4 Linux: Managing your server with service and productivity tools
# man log_repair_action LOG_REPAIR_ACTION(8) POWER Diagnostic Tools LOG_REPAIR_ACTION(8) NAME log_repair_action - create a log entry to indicate that a device was repaired SYNOPSIS /usr/sbin/log_repair_action -l location-code [-q] /usr/sbin/log_repair_action -l location-code -d date [-q] DESCRIPTION The log_repair_action command creates an entry in the error log to indicate that the device at the specified location code has been repaired. When viewing a list of platform errors, all errors on the device at the specified location code prior to the specified date will be considered closed (fixed). OPTIONS -l location-code or --location="location-code" Specify the lcoation code of the device which was repaired. -d date or --date="date" Specify the date and time on which the device was repaired. If not specified, defaults to the current date/time. -q or --quiet Do not prompt for confirmation or print error messages. AUTHOR Written by Michael Strosaker (strosake@austin.ibm.com) SEE ALSO servicelog(8) sysdiag(8) Linux February 2005 LOG_REPAIR_ACTION(8) : The following example script simplifies discovering and displaying the man pages provided by a package. It displays a selection list of the man pages provided by a required package-name argument. It then displays the man page for a selection. Note: By using the code examples, you agree to the terms of the Code license and disclaimer information on page 31. #! /bin/bash if [ -z "$1" ];then echo "missing package-name argument" exit 1 fi # Discover the man pages in the package. "sort -u" to keep only unique # instances in case there are duplicate pages in different languages. man_pages=`rpm -ql $1 sed -n s/^.*\/\(.*\)\.\(\w\)\.gz$/\1\[\2\]/p sort -u` # Show a selection list. Pass a selected man page to the man command. PS3="Select man page: " select man_page in $man_pages; do man `echo "$man_page" sed s/^\(.*\)\[\(.*\)\]$/\2 \1/ ` done The following example shows running the script to list man pages in the servicelog page, and then display the log_repair_action man page. It assumes that the script was copied and pasted into a file named pkg-man in the current directory, and that the execute permissions of the pkg-man file are set. Managing your server with service and productivity tools 5
#./pkg-man servicelog 1) log_repair_action[8] 2) servicelog[8] 3) servicelog_notify[8] Select man page: 1 (<enter> is pressed here) LOG_REPAIR_ACTION(8) POWER Diagnostic Tools LOG_REPAIR_ACTION(8) NAME log_repair_action - create a log entry to indicate that a device was repaired SYNOPSIS /usr/sbin/log_repair_action -l location-code [-q] /usr/sbin/log_repair_action -l location-code -d date [-q] DESCRIPTION The log_repair_action command creates an entry in the error log to indicate that the device at the specified location code has been repaired. When viewing a list of platform errors, all errors on the device at the specified location code prior to the specified date will be considered closed (fixed). OPTIONS -l location-code or --location="location-code" Specify the lcoation code of the device which was repaired. -d date or --date="date" Specify the date and time on which the device was repaired. If not specified, defaults to the current date/time. -q or --quiet Do not prompt for confirmation or print error messages. AUTHOR Written by Michael Strosaker (strosake@austin.ibm.com) SEE ALSO servicelog(8) sysdiag(8) Linux February 2005 LOG_REPAIR_ACTION(8) : Middleware and infrastructure Some service and productivity tools use middleware that provides an infrastructure for accessing system data, logs, and events. DynamicRM Dynamic Resource Manager (DynamicRM) is a Reliable, Scalable, Cluster Technology (RSCT) resource manager. DynamicRM allows a Hardware Management Console (HMC) to do the following tasks: Note: DynamicRM cannot be used on systems using IBM PowerKVM virtualization. v Dynamically add or remove processors, memory, or I/O slots from a running partition. v Concurrently update system firmware. v Perform certain shutdown operations on a partition. v Migrate a partition from POWER6 to POWER7 processor-based servers. v Enable end-to-end virtual device view. The end-to-end virtual device view shows how virtual disks map to disk names in Linux, and how virtual Ethernet devices map to Linux Ethernet interfaces. The DynamicRM package does not ship any user commands. The package depends upon RSCT. 6 Linux: Managing your server with service and productivity tools
ServiceRM Service Resource Manager (ServiceRM) is a Reliable, Scalable, Cluster Technology (RSCT) resource manager. ServiceRM creates serviceable events from the output of platform diagnostics (ppc64-diag). ServiceRM sends these events to the Service Focal Point on the Hardware Management Console (HMC) or the Integrated Virtualization Manager (IVM). Note: ServiceRM cannot be used on systems using IBM PowerKVM virtualization. The ServiceRM package does not ship any user commands. The package depends upon RSCT. Service and productivity tools Several service and productivity tools are available. Hardware inventory Hardware inventory provides a simple way of finding basic information about your installed hardware. Hardware includes processors, memory, serial ports, parallel ports, power supplies, fans, graphics adapters, network adapters, and SCSI and IDE devices such as disks. Hardware inventory consists of several different commands that you can use to gather data about your hardware. This data can be viewed by you or accessed by higher-level serviceability tools. The commands access the Vital Product Data (VPD) database. The default VPD is the /var/lib/lsvpd/vpd.db file. You can direct the commands to access other VPD files that contain previous hardware inventory databases that are replaced by the system. Some systems feature dynamic VPD. The commands access the dynamic VPD only when started by the root user. The utilities described here are supported in the following Linux distributions and virtualized environments: Table 1. Support for hardware inventory utilities Utility libvpd Library lscfg lsmcode PowerVM partition on any level of Power processor POWER8 PowerKVM support Supported on host, not applicable on guests Supported on host, not applicable on guests Supported on host, not applicable on guests Managing your server with service and productivity tools 7
Table 1. Support for hardware inventory utilities (continued) Utility lsvio lsvpd vpdupdate PowerVM partition on any level of Power processor v Ubuntu POWER8 PowerKVM support Not supported Supported on host, not applicable on guests Supported on host, not applicable on guests For Linux distributions currently supported on Power systems, see Linux on Power overview. Hardware inventory commands are provided by the lsvpd package. The commands that are typically included are: vpdupdate Update the VPD database. Note: The information that the other commands in the lsvpd package provide is correct only if the vpdupdate command is run after any changes are made to the system configuration. If you are unsure that any changes were made, run thevpdupdate command. lscfg List hardware configuration information for the system and its components. lsmcode List hardware microcode and firmware levels. lsvio List virtual I/O adapters and devices. lsvpd List VPD for the system and its components. The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. For more information about the lsvpd package, see lsvpd: Utility to List Device Vital Product Data (VPD) (http://linux-diag.sourceforge.net/lsvpd.html). Inventory scout Inventory scout surveys the system for Vital Product Data (VPD). Inventory scout commands are provided by the IBMinvscout package. The commands that are typically included are: invscout Write the current system VPD to a VPD survey file. 8 Linux: Managing your server with service and productivity tools
The utilities described here are supported in the following Linux distributions and virtualized environments: Table 2. Support for inventory scout utilities Utility invscout Linux distributions running on a PowerVM partition on any level of Power processor POWER8 PowerKVM support Not applicable For Linux distributions currently supported on Power systems, see../liaam/liaamoverview.dita. The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. Platform diagnostics (ppc64-diag) Platform diagnostics report firmware events, provide an automated response mechanism to urgent events, and provide event notifications to system administrators and service frameworks. The utilities described here are supported in the following Linux distributions and virtualized environments: Table 3. Support for ppc64-diag utilities Utility rtas_errd PowerVM partition on any level of Power processor POWER8 PowerKVM support distributions on guests: v Ubuntu opal_errd Not applicable Host only. Not applicable on guests. opal-elog-parse Not applicable Host only. Not applicable on guests. opal-dump-parse Not applicable Host only. Not applicable on guests. diag_encl Not applicable encl_led usysident Not applicable Not supported Managing your server with service and productivity tools 9
Table 3. Support for ppc64-diag utilities (continued) Utility usysattn ppc64-diag Error Log Analyzer (ELA) PowerVM partition on any level of Power processor POWER8 PowerKVM support Not supported Not supported For Linux distributions currently supported on Power systems, see Linux on Power overview. Platform diagnostics for systems using PowerVM virtualization The platform diagnostics rtas_errd daemon logs platform events that are detected by firmware to servicelog. Platform events are also known as RTAS events. The rtas_errd daemon might also take more action on certain types of events, such as failures of fans or power supplies. It is configured to start automatically when Linux boots. Platform diagnostics commands and the rtas_errd daemon are provided by the ppc64-diag package. The commands that are typically included are: explain_syslog Read a file (or stdin) that is in the format that is produced by the syslogd daemon, and print an explanation for each line that matches a message in the /etc/ppc64-diag/message_catalog message catalog. The explanations include probable cause and recommended action. If run with the -M flag, the command reads from the /var/log/messages file. For example: explain_syslog -M syslog_to_svclog Read a file (or stdin) that is in the format that is produced by the syslogd daemon, and log an event to the servicelog database for each line that matches a message in the /etc/ppc64-diag/ message_catalog message catalog. It is not automatically started when Linux boots. If run in the background with the -M flag, it continuously monitors the /var/log/messages file. For example: syslog_to_svclog -M & usysident Use this utility to operate device identification, or to view and modify system identification indicators. This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3. usysattn If you run the usysattn utility without arguments, the system prints a list of all of the attention indicators on the system along with their current status (on or off). This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3. Enclosure diagnostics (diag_encl) As of SUSE Linux Enterprise Server 11 SP3, you can use additional options to diagnose problems on the 5888 PCIe storage enclosure. The diag_encl utility is contained in the ppc64-diag package. 10 Linux: Managing your server with service and productivity tools
The diag_encl utility can be run as part of a Linux CRON job (recommended), or run independently. For more information on setting up a CRON job, including the diag_encl utility, see Connecting and configuring the disk drive enclosure in a system running Linux (http://www.ibm.com/support/ knowledgecenter/power7/p7ham/scsidiskdriveenclosurelinux.htm). Run the following command to access enclosure diagnostics as part of a CRON job: :/usr/sbin/diag_encl -scl Options for the diag_encl utility include the following: v -h: Print this help message. v -s: Generate serviceable events for any failures and write events to the service log. v -c: Compare with previous status and report only new failures. v -l: Turn on fault LEDs for serviceable events. v -v: Verbose output. v -V: Print the version of the command and exit. v -f: For testing, read SCSI enclosure services (SES) data from path.pg2 and VPD from path.vpd. v <scsi_enclosure>: The SCSI generic (sg) device on which to operate, such as sg7. If you do not specify a device, all such devices are diagnosed. For more information, see the 5888 PCIe storage enclosure topic (http://www.ibm.com/support/ knowledgecenter/power7/p7ham/p7ham_5888_kickoff.htm). Note: You can also use the diag_encl utility on the IBM TotalStorage EXP24 Ultra320 SCSI Expandable Storage Disk Enclosure (7031). Light path diagnostics Light path diagnostics is a system of light emitting diodes (LEDs) on various external and internal components of the server. When an error occurs, LEDs are lit throughout the server. Use the following utilities to gather information about light path diagnostics: usysident Use this utility to view and turn on or off the indicators that identify devices on Power systems. This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3. usysattn If you run the usysattn utility without arguments, the system prints a list of all of the attention indicators on the system along with their current status (on or off). This utility was previously in the powerpc-utils package, and now resides in the ppc64-diag package as of SUSE Linux Enterprise Server 11 SP3. Example: Locating a faulty Ethernet card 1. The service log notifier alerts the light path diagnostics subsystem, lp_diag, that the Ethernet card is not functioning. Typically, the lp_diag utility runs automatically through an script that is registered when the ppc64-diag package is installed. 2. The lp_diag utility enables an indicator LED. 3. You notice that one of the LEDs on your system is lit and not flashing. You run the usysattn utility from the command line to get the location code of the LED indicator. 4. To gather more information about card, you run the lscfg utility. 5. You replace the faulty card, and use the log_repair_action utility to reset the LED. Managing your server with service and productivity tools 11
For more information, see Light path diagnostics topic (http://www.ibm.com/support/ knowledgecenter/power7/p7eal/p7eal_lightpathdiagnostics.htm). The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. For more information about the ppc64-diag package, see ppc64 Platform Diagnostics (http://linux-diag.sourceforge.net/ppc64-diag/). Platform diagnostics for systems using PowerKVM virtualization and OPAL firmware The platform diagnostics opal_errd daemon logs platform events that are detected by the PowerKVM host. The daemon stores logs in the /var/log/opal-elog directory. One file for each log is saved, and one message is displayed in syslog. For example, "May 20 10:44:16 llmjuno03b ELOG[34914]: LID[5034a000]::SRC[11007201]::External Environment:: Predictive Error::Service action required" The events logged in syslog can be one of the following three types: v Service action and call home are required v Service action is required v No service action is required Internally, the opal_errd daemon calls the extract_opal_dump command, which extracts platform dump data and stores it in /var/log/dump. opal-elog-parse Use this tool to parse error logs on the PowerKVM host system. This tools parses logs from the /var/log/opal-elog directory and gives detailed info about the log. opal-dump-parse When Power systems running OPAL firmware crash, the FSP generates a system dump (SYSDUMP). When the PowerKVM hosts reboots, it stores the SYSDUMP in the /var/log/dump directory, along with other platform dumps. Use the opal-dump-parse utility to extract OPAL logs and kernel raw buffer information from SYSDUMP. Extended error handling On systems running IBM PowerKVM with OPAL firmware, you can take advantage of extended error handling (EEH) capabilities for detecting and reporting a variety of PCI bus error conditions. The EEH hardware features allow PCI bus errors to be cleared and a PCI card to be "rebooted" and recovered to anoperative state automatically, without having to reboot the operating system. This feature is enabled on real PCI cards, which are passed through to the guest from the host. This feature is not available on emulated PCI cards. servicelog Use the servicelog and related utilities to manage events that require service. The utilities described here are supported in the following Linux distributions and virtualized environments: 12 Linux: Managing your server with service and productivity tools
Table 4. Support for servicelog utilities Utility servicelog library and all servicelog tools Linux distributions running on a PowerVM partition on any level of Power processor POWER8 PowerKVM support Not applicable For Linux distributions currently supported on Power systems, see../liaam/liaamoverview.dita. The servicelog packages provide a library for logging service-related events to the service log database, and commands for viewing the contents of the database. This database allows for the logging of serviceable and informational events, and service procedures that are performed upon the system. If an event occurs that requires a service action to repair, the event is logged in the servicelog database. Example events include hardware failures that require the replacement of a Field Replaceable Unit (FRU), or issues that require a firmware update to fix. After you repair a hardware device, run the log_repair_action command to mark all of the associated events as closed. An example of a repair action is the replacement of an FRU. To use servicelog, run the servicelog command. The statistics that are stored in the servicelog database, similar to the following, are displayed: # servicelog Servicelog Statistics: There are 3 open events requiring action. Summary of Logged Events: Type Total Open Closed Info RTAS 5 3 2 0 ------------------------------- 5 3 2 0 Logged Repair Actions: 2 Registered Notification Tools: 3 Platform diagnostics (ppc64-diag) is an example of a service and productivity tool that writes to servicelog. The servicelog library is provided by the libservicelog package. The servicelog commands are provided by the servicelog package. The commands that are typically included are: log_repair_action Create a log entry to indicate that a device was repaired. servicelog Query and display the contents of the servicelog database. servicelog_manage Perform management or maintenance operations on the servicelog database. servicelog_notify Add, modify, view, or remove tools to be notified when events are logged in the servicelog database. Managing your server with service and productivity tools 13
The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. For more information about the servicelog packages, see servicelog: System Service Database (http://linux-diag.sourceforge.net/servicelog/). Environmental and power management features POWER8 systems with OPAL firmware provide features for safeguarding your system's environment and power supply. These features are currently provided as a limited technical preview offering. The following features are included on POWER8 PowerKVM systems running OPAL firmware as a limited technical preview. Environmental and power warnings (EPOW) The EPOW feature monitors the environmental and power status of the system. EPOW events are communicated from the system to the firmware and the hypervisor. When potentially dangerous events are detected, the hypervisor takes corrective action to minimize damage to the system. The following events are monitored by EPOW: v Power supply: On uninterruptible power supply (UPS) Power configuration changes Impending power failures Incomplete power v Temperature: Ambient temperature issues Internal temperature issues Ambient humidity issues v Cooling: Insufficient cooling If any of these conditions are detected, the hypervisor notifies the host. The host system can then take corrective action. Delayed power off (DPO) On Power systems, you can initiate machine shutdown from either the front operations panel or via the Advanced Systems Management (ASM) interface. Because machines may be running critical workloads that need to be shut down gracefully, DPO provides a safeguard against abrupt shutdowns. When you initiate a shutdown, the Flexible Service Processor (FSP) sends a DPO initiation command to the OPAL firmware. The OPAL firmware acknowledges the receipt of the DPO command, then informs the hypervisor of the impending system shutdown. The hypervisor starts the process of gracefully bringing down the system, ultimately resulting in the Central Electronic Complex (CEC) power down. After receiving the acknowledgement from the OPAL firmware for the initial DPO command, the FSP waits for a maximum of 45 minutes to receive a power down command from the hypervisor. If the FSP does not receive a power down command within this 45 minutes, it assumes that there is nothing critical running and powers down the system. Similar environmental and power management features are available on systems running PowerVM with traditional POWER firmware as fully supported features. 14 Linux: Managing your server with service and productivity tools
Service aids Service Aids includes a wide variety of commands to help you manage your system. The utilities described here are supported in the following Linux distributions and virtualized environments: Table 5. Support for powerpc-utils utilities Utility activate_firmware amsstat PowerVM partition on any level of Power processor POWER8 PowerKVM support Not applicable Not applicable apport-collect Not applicable distributions in guests: v Ubuntu bootlist Not supported drmgr hvcsadmin lparstat lsdevinfo lsprop distributions in guests: v Ubuntu Some functions, including PCI hot plug, may not be fully supported. Not applicable Not supported Not supported distributions in guests: Also supported on the host system. Managing your server with service and productivity tools 15
Table 5. Support for powerpc-utils utilities (continued) Utility lsslot v v v ls-vsci ls-veth ls-vdev nvram ofpathname ppc64_cpu rtas-dump rtas_event_decode rtas_ibm_get_vpd serv_config PowerVM partition on any level of Power processor POWER8 PowerKVM support Not supported Not supported distributions in guests: Also supported on the host system. distributions in guests: v Ubuntu distributions in guests: Also supported on the host. distributions in guests: v Ubuntu distributions in guests: v Ubuntu Not supported Not supported 16 Linux: Managing your server with service and productivity tools
Table 5. Support for powerpc-utils utilities (continued) Utility set_poweron_time snap sys_ident uesensor update_flash vscsisadmin PowerVM partition on any level of Power processor v v Red Hat Enterprise Linux Note: The snap command is deprecated in Red Hat Enterprise Linux 7.0. Use the sosreport command instead. SUSE Linux Enterprise Server Note: The snap command is deprecated in SUSE Linux Enterprise Server 12. Use the supportconfig command instead. POWER8 PowerKVM support distributions in guests: v Ubuntu distributions in guests: v v v Red Hat Enterprise Linux Note: The snap command is deprecated in Red Hat Enterprise Linux 7.0. Use the sosreport command instead. SUSE Linux Enterprise Server Note: The snap command is deprecated in SUSE Linux Enterprise Server 12. Use the supportconfig command instead. Ubuntu Note: The snap command is deprecated in all current versions of Ubuntu. Use the apport-collect command instead. distributions in guests: v Ubuntu Not applicable Supported on the host only. Not supported For Linux distributions currently supported on Power systems, see Linux on Power overview. Service aids commands are provided by the powerpc-utils package. The commands that are typically included are: activate_firmware Activate a firmware image that was updated concurrently. amsstat Display Active Memory Sharing (AMS) statistics. Managing your server with service and productivity tools 17
bootlist View or update the system bootlist stored in NVRAM. drmgr Perform DLPAR operations on a client LPAR. hvcsadmin The hypervisor virtual console server administration utility. lparstat Display current LPAR-related parameters, LPAR utilization statistics, and hypervisor information. lsdevinfo Display information about virtual devices. lsslot List DLPAR and hotplug capable slots. ls-vdev Display information about virtual SCSI adapters and devices. ls-veth Display information about virtual Ethernet devices. ls-vscsi Display information about virtual SCSI devices. nvram Display or modify data that is stored in the non-volatile RAM (NVRAM). nvsetenv A wrapper to call various forms of the nvram command. ofpathname Translate between Open Firmware and logical device names. ppc64_cpu Display or set the processor SMT, cores, DSCR, smt-snooze-delay, run mode and frequency settings. rtas_dump Display the contents of RTAS events in the /var/log/messages, /var/log/platform, and /var/log/boot.msg files in a human-readable form. rtas_event_decode Display the contents of one RTAS event in a human-readable form. rtas_ibm_get_vpd Display dynamically changing vital product data. serv_config Display and configure system service policies and settings. set_poweron_time Set a time in the future for the system to be powered on. snap Generate a configuration snapshot for service. sys_ident Generate unique identification numbers. uesensor Display the state of system environmental sensors. update_flash Update, manage, or validate firmware. On certain POWER8 systems, you cannot update your firmware if your system entitlement has expired. You will see the following output when you attempt to run the update_flash command: 18 Linux: Managing your server with service and productivity tools
The selected firmware image cannot be applied. The Build Date of the firmware image selected is <date>. The System s Update Access Key Expiration Date is <date>. Please go to http://www.ibm.com/servers/eserver/ess to obtain a replacement update access key. Note: This information applies only to systems running PowerVM virtualization. This information does not apply to Linux-only POWER8 systems. Follow the link to the Entitled software support website to update your system. On systems that have Petitboot installed, you can run the update_flash command to update your firmware from the Petitboot shell. For more information about Petitboot, see Using Petitboot. The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. For more information about the powerpc-utils package, see Powerpc-utils (http://powerpcutils.sourceforge.net/). Handling errors in guests with FWNMI On systems running IBM PowerKVM virtualization, you can use firmware assisted non-maskable interrupts (FWNMI) for error handling and recovery of machine checks in guests. The FWNMI feature provides firmware support for platform-dependant error recovery for recoverable non-maskable machine check interrupts. Using FWNMI, analysis of and information about interrupts is passed to the guest operating system via QEMU. PowerVM also has this support for better error recovery for Linux on Power logical partitions. Now, equivalent support is also available for Linux on Power PowerKVM guests. The FWNMI feature is automatically enabled on all guests, regardless of the Linux distribution running on the guest. To activate the FWNMI feature on a guest, issue the following runtime abstraction layer (RTAS) call: ibm.nmi-register This calls registers the machine check (MC) handler with QEMU. Whenever a guest receives a machine check interrupt, control will go to QEMU first. QEMU will then build memory error information, and will pass it to the guest operating system. IBM Performance Management for Power Systems IBM Performance Management for Power Systems provides you with critical information about current and long-term system utilization trends. It also provides insight on what extra capability your system has, and what upgrades you might need for future applications. IBM Performance Management for Power Systems enables automated performance analysis and capacity planning for PowerLinux servers. With IBM Performance Management for Power Systems, utilization information can be automatically collected from the servers with logical partitions (LPARs) that you elect to monitor. You have the choice of transmitting the daily collected data to IBM on a routine basis with IBM Electronic Service Agent for PowerLinux. You can re-examine your utilization and capacity environment from up to 24 months prior with ongoing, interactive access. With this information, you can visualize system workloads, and server consolidation and virtualization possibilities. IBM Performance Management for Power Systems is provided by the ibmpmlinux package. Managing your server with service and productivity tools 19
For more information about IBM Performance Management for Power Systems, see IBM Performance Management for Power Systems (http://www-03.ibm.com/systems/power/support/perfmgmt/). Large page analysis The IBM Large Page Analysis tool records runtime memory usage and generates a translation lookaside buffer (TLB) miss rate report. The TLB miss rate report that is generated by the IBM Large Page Analysis tool can show you where using a larger page size might benefit application performance. The INTERPRETING REPORTS section of the lpa overview man page describes the significance of the information in the TLB miss rate report. You can use a larger page size with libhugetlbfs. For more information about libhugetlbfs, see libhugetlbfs (http://libhugetlbfs.sourceforge.net/). The IBM Large Page Analysis tool is provided by the lpa package. The commands that are typically included are: lpa_record Run an application and record memory statistics by memory region. The recorded data is stored as a collection of trace files in the ~/.lpa_logs directory for later processing by the lpa_report command. Memory regions include heap, stack, text, data, and bss (optional). lpa_report Compute the predicted cost in TLB miss rates from trace files that are created by the lpa_record command for various page size mappings. Page size mappings include 4K, 64K, 16M, and 16G. Run the lpa_record command before you run the lpa_report command. Note: There is also an overview man page that is called lpa for which there is no command. The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. IBM Power RAID adapter utilities (iprutils) IBM Power RAID (IPR) adapter utilities provide tools that are required by the IBM Power RAID adapter device driver. With IBM Power RAID adapter utilities, you can configure, update, and query the adapter. The information that you can query includes disk status, disk array status, and Serial-attached SCSI (SAS) path status. You also gather adapter failure information. IBM Power RAID adapter utilities are provided by the iprutils package. The commands that are typically included are: iprconfig Configure IBM Power RAID storage adapters, display information about them, and recover adapters and disk units. Note: The iprconfig command can be run from the Petitboot shell to configure RAID on systems using PowerKVM virtualization. For more information, see Using Petitboot. The options of this tool include: Display hardware status. Display information about all IPR SCSI disks that are attached to your system. 20 Linux: Managing your server with service and productivity tools
Work with SCSI Bus Configuration. Set some SCSI bus parameters such is Max Bus Throughput and host SCSI ID. Work with Driver Configuration. Adjust device drivers log levels. Work with disk configuration. Set device attributes, such as queue depth. Download microcode. Download microcode to adapters and SCSI disks. Create a disk array. Create a disk array. Delete a disk array. Delete existing disk arrays. Data that is stored on the devices is not preserved after you run this command. Add a device to a disk array. Add devices of similar capacity to an existing RAID 5 disk array. Format Device for advanced function/format Device for JBOD (Just a Bunch Of DASD) function. Disk devices can either be formatted to 512 bytes/sector or 522 bytes/sector. Devices must be formatted to 522 bytes/sector, also known as advanced function format, to be used in a disk array or as a hot spare. Devices not in a disk array can be formatted to 512 bytes/sector so that they can be used directly by the operating system. Create a hot spare. Configure a disk to be a hot spare, which can be used by an adapter to automatically replace a failed device Concurrent add device. Concurrently add a SCSI disk to a running system. Concurrent remove device. Concurrently remove a SCSI disk from a running system. Initialize and format disk. Send a SCSI format command to SCSI disks. Use this option with caution. Reclaim IOA cache storage. Note: IOA stands for I/O Adapter. This option is for IBM hardware service personnel. This option is potentially dangerous and might delete data from the non-volatile write cache on an adapter. Rebuild disk unit data. Reconstruct a device as an active array member. Note: This option is generally used following concurrent maintenance after a failing array member device was replaced. Work with Resources Containing Cache Battery Packs. View the status of the Cache Battery on resources that contain battery packs. Use this option during maintenance actions on resources that contain battery packs. Analyze Log. View the error messages that are logged by the IPR device driver. Information message reported by iprconfig You might receive an error message in the following situations: v You ran the iprconfig utility from a console while concurrently adding a disk to the system. Managing your server with service and productivity tools 21
v You ran the iprconfig utility from a console while concurrently removing a physical device from the system. The error message is similar to this example: EPOW <0x6240040000000b8 0x0 0x0> RTAS: event: 5, Type: EPOW, Severity: 1 This message is for your information only. No action is necessary. iprdump Gather and dump information in the event of an adapter failure. By default, the dump data is saved in /var/log/iprdump.dump_id, where DUMP_ID is the id of the dump. The iprdump command writes an entry to the system error log when it creates a dump. The iprutils package provides an /etc/init.d/iprdump script to start the iprdump command as a service during boot time. iprinit Initialize IBM Power RAID adapters and devices for optimal performance, and load any configurations that are saved by the iprconfig command. The iprutils package provides an /etc/init.d/iprinit script to start the iprinit command as a service during boot time. iprupdate Attention: Deprecated. Use the iprconfig command to update and manage device microcode. The commands that are provided by this package, and their features and usage, might vary by distribution and release. Consult the man pages on your system for the most accurate description of their features and usage. For more information about how to list and display the man pages for commands that are provided by this package, see Displaying package man pages on page 4. For more information about the iprutils package, see IPR Linux device driver (http:// iprdd.sourceforge.net/). Note: This site includes information for both the IPR Linux device driver (iprdd) and the iprutils package. For more information about SAS RAID controllers for Linux on your Power Systems, see the SAS RAID controllers for Linux topic corresponding to your system in the Systems Hardware Information Center. SAS RAID controllers for Linux contains subtopics that illustrate various options of the iprconfig command. POWER6 systems http://www.ibm.com/support/knowledgecenter/power6/arebk/sascontroller_kickoff.htm POWER7 Systems http://www.ibm.com/support/knowledgecenter/power7/p7ebk/p7ebkkickoff.htm PowerLinux systems http://www.ibm.com/support/knowledgecenter/powerlinux/p7ebkl/p7ebkkickoff.htm POWER8 http://www.ibm.com/support/knowledgecenter/8247-22l/p8ebk/commontasks.htm IBM Electronic Service Agent IBM Electronic Service Agent, along with the IBM Electronic Support website, make up IBM Electronic Services. This topic collection provides information about installing, activating, configuring, using, and troubleshooting IBM Electronic Service Agent on PowerLinux servers. The most current version of this information is in the IBM Knowledge Center for Linux. To access this information, use the following web address: 22 Linux: Managing your server with service and productivity tools
http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaao/liaaokickoff.htm. Diagnosing RTAS events You can install and use the ppc64-diag, libservicelog, servicelog, and powerpc-utils packages to diagnose RTAS (Run Time Abstraction Services) events. RTAS events (also known as platform events) might occur while your system is booting or running. RTAS events might be logged in the following places: v The console and the kernel ring buffer v The servicelog database v The /var/log/platform file Displaying RTAS events in the kernel ring buffer The kernel displays RTAS events in the console, and captures them in the kernel ring buffer. To display the kernel ring buffer, run the following command: # dmesg The following example shows RTAS events that are reported in the console and kernel ring buffer: RTAS: event: 2340, Type: Platform Error, Severity: 2 RTAS: event: 2341, Type: Dump Notification Event, Severity: 1 The event numbers displayed as 2340 and 2341 in the example identify events on your system. They have no diagnostic meaning. Displaying RTAS events in the servicelog database The ppc64-diag package includes the rtas_errd daemon, which logs RTAS events to the servicelog database with the libservicelog package. The servicelog package provides commands for viewing the database. Install each package. See ppc64-diag and libservicelog/servicelog for information about these packages. You can display a summary of servicelog events to determine whether any RTAS events are saved to the servicelog database. To display the summary, run the following command: # servicelog The following sample output shows that there are 5 RTAS events: Servicelog Statistics: There are 3 open events requiring action. Summary of Logged Events: Type Total Open Closed Info RTAS 5 3 2 0 ------------------------------- 5 3 2 0 Logged Repair Actions: 2 Registered Notification Tools: 3 To display the servicelog database, run the following command: # servicelog --dump The following sample output shows an example of an RTAS event in the servicelog database: Managing your server with service and productivity tools 23
Servicelog ID: 4 Log Timestamp: Fri Jul 27 04:51:31 2012 Event Timestamp: Fri Jul 27 08:51:29 2012 Update Timestamp: Fri Jul 27 04:51:31 2012 Type: Power Platform (RTAS) Event Severity: 3 (EVENT) Platform: ppc64 Model/Serial: 8406-71Y/108B7AA Node Name: abc.ibm.com Reference Code: B7004400 Serviceable Event: No Predictive Event: No Disposition: 0 (Recoverable) Call Home Status: 0 (None Needed) Status: Closed Action Flags: 2000 Event Type: 228 - Dump Notification Event Kernel ID: 2 Platform ID: 50142eee Creator ID: H - Hypervisor Subsystem ID: 82 - Platform firmware RTAS Severity: 00 - Informational or non-error event Event Subtype: 08 - Dump Notification Extended Reference Codes: 2: 00000000 3: 00000000 4: 00000000 5: 00000000 6: 00000000 7: 00000000 8: 00000000 9: 00000000 Description: Platform Firmware Informational (non-error) Event. Refer to the system service documentation for more information. A platform dump was generated and downloaded to the filesystem (1739921 bytes): /var/log/dump/fspdump.108b7aa.aa000000.20120727044812 For information associated with the reference code in an RTAS event logged in servicelog, search for the reference code in Knowledge center. Displaying RTAS events logged in the /var/log/platform file Install the powerpc-utils package, which provides the rtas_dump command. See powerpc-utils for information about the package. You can display the RTAS events logged in the /var/log/platform file in two ways: v View the file in a text editor. The events display without formatting. v Run the rtas_dump command to display the events with formatting. For example: rtas_dump -f /var/log/platform The following sample output shows several RTAS events logged in the /var/log/platform file: ==== RTAS Event Dump (1) Begin ====================================== Version: 00000006 Severity: 00000001 (Event) Type 000000e4 (Dump Notification Event) Status: new ==== Private Header ================================================= Date: 27 Jul 2012 Time: 4:50:52:5 Creator ID: PHyp (H). Creator Subsystem Version: 0000000000000000. Platform Log ID: 50142eee Log Entry ID: 820001fc ==== User Header ==================================================== Subsystem ID: 00000082 (Platform Firmware) Event Data 00000003 24 Linux: Managing your server with service and productivity tools
Event Type: 00000008 Dump notification. Event Severity: 00000000 Informational or non-error event, Action Flag: 00002000 Report Externally, (HMC and Hypervisor). ==== Primary SRC Section ============================================ Platform Data: 0x0000: 00000100 000048 [...H. ] Extended Reference Codes: 2: 00000000 3: 00000000 4: 00000000 5: 00000000 6: 00000000 7: 00000000 8: 00000000 9: 00000000 Primary Reference Code: "B7004400 " ==== Unknown Section ================================================ Section ID: EH Section Length: 0000004c Version: 00000001 Sub_type: 00000000 Component ID: 00004552 Raw Section Data: 0x0000: 38343036 2d373159 31303842 37414120 [8406-71Y108B7AA ] 0x0010: 00000000 41583731 305f3131 39000000 [...AX710_119...] 0x0020: 00000000 30303034 30373031 30343036 [...000407010406] 0x0030: 30313836 00000000 00000000 00000000 [0186...] 0x0040: 00000000 [... ] ==== Machine Type =================================================== Model/Type: 8406-71Y (tttt-mmm) Serial Number: 108B7AA ==== Dump Locator section =========================================== Dump ID: 00000001 Dump Field Format: ascii Dump Location: Partition Dump Size: 0000000012b4d790 ==== Unknown Section ================================================ Section ID: LP Section Length: 00000018 Version: 00000001 Sub_type: 00000000 Component ID: 00004552 Raw Section Data: 0x0000: 00000401 00000000 00000000 00010000 [...] ==== RTAS Event Dump (1) End ======================================== ==== RTAS Event Dump (2) Begin ====================================== Version: 00000006 Severity: 00000001 (Event) Type 000000e4 (Dump Notification Event) Status: new ==== Private Header ================================================= Date: 27 Jul 2012 Time: 4:51:29:71 Creator ID: PHyp (H). Creator Subsystem Version: 0000000000000000. Platform Log ID: 50142eee Log Entry ID: 820001ff ==== User Header ==================================================== Subsystem ID: 00000082 (Platform Firmware) Event Data 00000003 Event Type: 00000008 Dump notification. Managing your server with service and productivity tools 25
Event Severity: 00000000 Informational or non-error event, Action Flag: 00002000 Report Externally, (HMC and Hypervisor). ==== Primary SRC Section ============================================ Platform Data: 0x0000: 00000100 000048 [...H. ] Extended Reference Codes: 2: 00000000 3: 00000000 4: 00000000 5: 00000000 6: 00000000 7: 00000000 8: 00000000 9: 00000000 Primary Reference Code: "B7004400 " ==== Unknown Section ================================================ Section ID: EH Section Length: 0000004c Version: 00000001 Sub_type: 00000000 Component ID: 00004552 Raw Section Data: 0x0000: 38343036 2d373159 31303842 37414120 [8406-71Y108B7AA ] 0x0010: 00000000 41583731 305f3131 39000000 [...AX710_119...] 0x0020: 00000000 30303034 30373031 30343036 [...000407010406] 0x0030: 30313836 00000000 00000000 00000000 [0186...] 0x0040: 00000000 [... ] ==== Machine Type =================================================== Model/Type: 8406-71Y (tttt-mmm) Serial Number: 108B7AA ==== Dump Locator section =========================================== Dump ID: aa000000 Dump Field Format: ascii Dump Location: Partition Dump Size: 00000000001a8c91 ==== Unknown Section ================================================ Section ID: LP Section Length: 00000018 Version: 00000001 Sub_type: 00000000 Component ID: 00004552 Raw Section Data: 0x0000: 00000401 00000000 00000000 00010000 [...] ==== RTAS Event Dump (2) End ======================================== ==== RTAS Event Dump (3) Begin ====================================== Version: 00000006 Severity: 00000002 (Warning) Type 000000e0 (Platform Error) Status: unrecoverable new ==== Private Header ================================================= Date: 27 Jul 2012 Time: 4:48:6:8 Creator ID: Service Processor (E). Creator Subsystem Name:. Platform Log ID: 50142eee Log Entry ID: 50142eee ==== User Header ==================================================== Subsystem ID: 00000000 Event Data 00000003 Event Type: 00000000 Unknown event type (0). Event Severity: 00000040 Unrecoverable error, general. 26 Linux: Managing your server with service and productivity tools
Action Flag: 0000a902 Unknown action flag (0x0000a902). ==== Primary SRC Section ============================================ Platform Data: 0x0000: 04000900 000048 [...H. ] Extended Reference Codes: 2: 000000f0 3: 00000c00 4: 00000000 5: 20000000 6: 070c001c 7: 00000000 8: 00000000 9: 00000000 Primary Reference Code: "A1003000 " ==== Unknown Section ================================================ Section ID: EH Section Length: 00000060 Version: 00000001 Sub_type: 00000000 Component ID: 00003100 Raw Section Data: 0x0000: 38343036 2d373159 31303842 37414100 [8406-71Y108B7AA.] 0x0010: 00000000 41583731 305f3131 39000000 [...AX710_119...] 0x0020: 00000000 62313130 39625f31 3139302e [...b1109b_1190.] 0x0030: 37313200 00000000 00000000 00000000 [712...] 0x0040: 00000014 41313030 33303030 5f303030 [...A1003000_000] 0x0050: 30304330 30000000 [00C00... ] ==== Unknown Section ================================================ Section ID: UD Section Length: 0000009c Version: 00000002 Sub_type: 00000004 Component ID: 00003100 Raw Section Data: 0x0000: 00003300 2f6f7074 2f666970 732f6269 [..3./opt/fips/bi] 0x0010: 6e2f6475 6d707379 7374656d 00000000 [n/dumpsystem...] 0x0020: 00000000 00000000 00000000 00000000 [...] 0x0030: 00000000 00000000 00000000 00000000 [...] 0x0040: 00000000 00000000 00000000 00000000 [...] 0x0050: 00000000 00000000 66697073 3731322f [...fips712/] 0x0060: 62313130 39625f31 3139302e 37313200 [b1109b_1190.712.] 0x0070: 00000000 00000000 00000001 00000002 [...] 0x0080: 00000804 00000005 0000000f 000131cc [...1.] 0x0090: 00000000 [... ] ==== Machine Type =================================================== Model/Type: 8406-71Y (tttt-mmm) Serial Number: 108B7AA ==== Unknown Section ================================================ Section ID: SW Section Length: 00000014 Version: 00000002 Sub_type: 00000001 Component ID: 0000f000 Raw Section Data: 0x0000: 00000b00 00030008 00000001 [... ] ==== RTAS Event Dump (3) End ======================================== For information associated with the primary reference codes in each RTAS event logged in the /var/log/platform file, search for the reference code in Knowledge center. Collecting support data You can collect support data to facilitate addressing hardware or system issues. You can collect Linux operating system support data by running the snap command: snap -h Managing your server with service and productivity tools 27
Note: The snap command is deprecated in Red Hat Enterprise Linux 7.0 and later versions. Use the sosreport command instead. Command-line parameters for the snap command include the following: v a: Collect all data and detailed information. This option results in more files and output. v d dir: Specify the directory where files and output are collected. The default value is /tmp/ibmsupt. v o file: Specify the output file. The.tar file type is required, and the.tar.gz file type is optional. The default value is snap.tar.gz. v t: Add hostname and timestamp to output file name. v v: Verbose output. v h: Print this help message. To view the exit code of the command, enter echo $? immediately after running the snap command. Possible exit codes are: v 0: snap data was successfully captured v 1: invalid command line v 2: other fatal error The data collected by the snap command is placed in the snap.tar.gz file in the current directory, unless you specify another directory. The snap command is described in Service Aids. Additional utilities for collecting data for support include the following: v sosreport: Generates a compressed.tar file containing information about the hardware and software on the system. You can then send this file to your support contact. v supportconfig: Gathers system troubleshooting information on SUSE Linux Enterprise Server systems. It captures the current system environment and generates a tar-archive. Additional support data for HMC- and IVM-managed servers is available through tasks under Service Management in the HMC and IVM web interfaces. These interfaces provide detailed help for Service Management tasks. For more information about setting up the call-home mechanism to send support data to IBM service and support, see IBM Electronic Service Agent. Running stand-alone diagnostics on a PowerKVM system Use these instructions to diagnose hardware problems on a system using PowerKVM virtualization. Use these diagnostics only if you are directed to do so by your next level of support or your hardware service provider. If there is a problem, you will receive a service request number (SRN) that can help you identify the problem and determine a corrective action. Before starting this procedure, have these prerequisite items: v The IBM Standalone Diagnostics CD-ROM available. You can download and burn the CD from the Standalone Diagnostics CD site (http://www14.software.ibm.com/webapp/set2/sas/f/diags/ home.html) on ibm.com. v Connecting PC or laptop v Serial (9-pin D-shell connector) to RJ-45 cable. If your connecting system does not have a serial port, you will also need a serial to USB adapter and any associated drivers that your connecting system may need in order to use the adapter. v Terminal emulator program such as Minicom on a Linux system or PuTTY on a Windows system. Set the communications to use a 19200 baud with data bits of 8, a parity of None, and stop bits of 1. 28 Linux: Managing your server with service and productivity tools
PowerKVM runs on top of the Open Power Abstraction Layer, or OPAL. The stand-alone diagnostics tool cannot be run on a system while the system is in OPAL mode. These instructions will guide you through changing the mode from OPAL to PowerVM, running the diagnostics tool, and changing the mode back to OPAL. Your existing PowerKVM configuration will not be affected by performing this procedure. Note: These instructions assume that you are using the ASMI web interface. If you choose to connect to the ASMI interface through a serial console session, see the notes throughout this procedure to find the alternate menu options that are available when changing the firmware mode. 1. Change the firmware type from OPAL to PowerVM: a. Log into the ASMI web interface using your administrator user ID and password. If you are using a serial console session, connect to your server using a serial cable and log into the ASMI menu. b. If your system is still running, power it off by selecting Power/Restart Control > Power On/Off System Verify your settings and click Save settings and power off. When using a serial console, select Power off. c. After the system has powered off, select System Configuration > Firmware Configuration. Note: When using a serial console, these options may be System Configuration > Hypervisor Configuration > Hypervisor Mode d. Select PowerVM as your Firmware Type (or Hypervisor Mode) and click Continue. 2. Run the Standalone Diagnostics CD: a. From the ASMI main menu, select Power/Restart Control > Power On/Off System. b. For the AIX/Linux partition mode boot, select Boot to SMS menu. Note: Using a serial console connection, set the AIX/Linux partition boot mode to be Boot to SMS menu and return to the Power On/Off System menu. c. Select Save settings and power on. Note: Using a serial console connection, select Power on. Press Enter to continue. Important: If you have been using the ASMI web interface, you will now need to connect to your system using a serial console connection. Follow the instructions in the prerequisites to connect. The console connection activates automatically when the server powers on. If the console connection is not found, the system SRC display, available through the control panel, stops at AA00E1B0. Note: If you have been using a serial console connection, you may have to select your console as the active console by pressing 0. d. If prompted, select your language and accept the license agreement. e. Enter the password for your user profile. f. Insert the Standalone Diagnostics CD into the Power System DVD disk drive. g. From the Main Menu, select option 5: Select Boot Options. h. From the Multiboot menu, select option 1: Select Install/Boot Device. i. From the Select Device Type menu, select option 5: List all Devices. j. Select the device that contains the Diagnostics CD. If the device is not listed, press N to continue to the next page. k. On the Select Task menu, select 2: Normal Mode/Boot. l. On the confirmation window, select 1: Yes to exit the SMS menus and boot your system. The system reboots and displays a message: Welcome to AIX. Managing your server with service and productivity tools 29
m. Follow the steps detailed in Selecting testing options (http://www-01.ibm.com/support/ knowledgecenter/api/content/8247-22l/p8ha5/standalone_procedure_nohmc.htm#selections). If you have difficulty with the system recognizing the terminal type, try type vt320. n. After the diagnostics are complete, exit and use the options to "Eject the CD" and "H" to Halt the server. The system powers off. Note: At this time, if you want to go back to using the ASMI web interface, you can disconnect your serial console. 3. Change the system back to PowerKVM mode: a. Return to the ASMI web interface. Note: If you are continuing to use your serial console connection, press Enter if your console does not automatically connect when the system powers down. b. If your system is still running, power it off by selecting Power/Restart Control > Power On/Off System Verify your settings and click Save settings and power off. When using a serial console, select Power off. c. After your system has powered off, select System Configuration > Firmware Configuration. Note: When using a serial console, these options may be System Configuration > Hypervisor Configuration > Hypervisor Mode d. Select OPAL as your Firmware Type (Hypervisor Mode) and click Continue. e. Power on the system by selecting From the ASMI main menu, select Power/Restart Control > Power On/Off System. f. For the AIX/Linux partition mode boot, select Normal. Note: Using a serial console connection, set the AIX/Linux partition boot mode to be Continue to operating system and return to the Power On/Off System menu. g. Select Save settings and power on. Note: Using a serial console connection, select Power on. Press Enter to continue. The Petitboot menu appears and PowerKVM boots as normal. Getting support You can get support for service and productivity tools packages. Support for packages that are open source and shipped by Red Hat is provided by the vendor from which you purchased your Linux contract: IBM Linux Support Line or Red Hat Support. Support for packages that are open source and shipped by SUSE is provided by the vendor from which you purchased your Linux contract: IBM Linux Support Line or SUSE Support. If you purchased Linux operating system support from IBM, support for the packages downloaded from the Service and productivity tools site or installed from the IBM Tools Repository is provided by IBM Linux Support Line. If you did not purchase Linux operating system support from IBM, you can post questions about these packages at the Think Power Linux developerworks community website, Support for the IBM Service and Productivity Tools group at https://www.ibm.com/developerworks/mydeveloperworks/groups/ service/forum/topicthread?topicuuid=59ae7602-b00b-4cb0-8393-2f2f03fbc0eecommunityuuid=fe313521-2e95-46f2-817d-44a4f27eba32page=ps=. 30 Linux: Managing your server with service and productivity tools
Online version You can view the online version of this document in the IBM Information Center for Linux. To view the online version, click Managing your server with service and productivity tools. Code license and disclaimer information IBM grants you a nonexclusive copyright license to use all programming code examples from which you can generate similar function tailored to your own specific needs. SUBJECT TO ANY STATUTORY WARRANTIES WHICH CANNOT BE EXCLUDED, IBM, ITS PROGRAM DEVELOPERS AND SUPPLIERS MAKE NO WARRANTIES OR CONDITIONS EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT, REGARDING THE PROGRAM OR TECHNICAL SUPPORT, IF ANY. UNDER NO CIRCUMSTANCES IS IBM, ITS PROGRAM DEVELOPERS OR SUPPLIERS LIABLE FOR ANY OF THE FOLLOWING, EVEN IF INFORMED OF THEIR POSSIBILITY: 1. LOSS OF, OR DAMAGE TO, DATA; 2. DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES, OR FOR ANY ECONOMIC CONSEQUENTIAL DAMAGES; OR 3. LOST PROFITS, BUSINESS, REVENUE, GOODWILL, OR ANTICIPATED SAVINGS. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF DIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, SO SOME OR ALL OF THE ABOVE LIMITATIONS OR EXCLUSIONS MAY NOT APPLY TO YOU. Managing your server with service and productivity tools 31
32 Linux: Managing your server with service and productivity tools
Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-ibm product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation Dept. LRAS/Bldg. 903 11501 Burnet Road Austin, TX 78758-3400 U.S.A. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Copyright IBM Corp. 2010, 2014 33
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: IBM World Trade Asia Corporation Licensing 2-31 Roppongi 3-chome, Minato-ku Tokyo 106-0032, Japan IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-ibm products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. Any references in this information to non-ibm Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other product and service names might be trademarks of IBM or other companies. 34 Linux: Managing your server with service and productivity tools
Code license and disclaimer information IBM grants you a nonexclusive copyright license to use all programming code examples from which you can generate similar function tailored to your own specific needs. SUBJECT TO ANY STATUTORY WARRANTIES WHICH CANNOT BE EXCLUDED, IBM, ITS PROGRAM DEVELOPERS AND SUPPLIERS MAKE NO WARRANTIES OR CONDITIONS EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT, REGARDING THE PROGRAM OR TECHNICAL SUPPORT, IF ANY. UNDER NO CIRCUMSTANCES IS IBM, ITS PROGRAM DEVELOPERS OR SUPPLIERS LIABLE FOR ANY OF THE FOLLOWING, EVEN IF INFORMED OF THEIR POSSIBILITY: 1. LOSS OF, OR DAMAGE TO, DATA; 2. DIRECT, SPECIAL, INCIDENTAL, OR INDIRECT DAMAGES, OR FOR ANY ECONOMIC CONSEQUENTIAL DAMAGES; OR 3. LOST PROFITS, BUSINESS, REVENUE, GOODWILL, OR ANTICIPATED SAVINGS. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF DIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, SO SOME OR ALL OF THE ABOVE LIMITATIONS OR EXCLUSIONS MAY NOT APPLY TO YOU. Copyright IBM Corp. 2010, 2014 35
36 Linux: Managing your server with service and productivity tools
Printed in USA