Memory Troubleshooting Best Practices for HP ProLiant Servers

Similar documents
How to register. Who should attend Services, both internal HP and external

HP Insight Diagnostics Online Edition. Featuring Survey Utility and IML Viewer

HP Education Services Course Overview

HP VMware ESXi 5.0 and Updates Getting Started Guide

HP Insight Management Agents architecture for Windows servers

Instructions for installing Microsoft Windows Small Business Server 2003 R2 on HP ProLiant servers

It is also available as part of the HP IS DVD and the Management DVD/HPSIM install.

Integrating HP Insight Management WBEM (WMI) Providers for Windows with HP System Insight Manager

HP Server Management Packs for Microsoft System Center Essentials User Guide

HP Client Manager 6.2

HP Online ROM Flash. User Guide

HP ProLiant Lights-Out 100c Remote Management Cards Overview

Using PXE Technology on Compaq ProLiant Servers

HP ProLiant Cluster for MSA1000 for Small Business Hardware Cabling Scheme Introduction Software and Hardware Requirements...

QuickSpecs. Models. HP ProLiant Lights-Out 100c Remote Management Cards Overview

HP Insight Control for Microsoft System Center integration overview

HP Client Manager 6.1

HP One-Button Disaster Recovery (OBDR) Solution for ProLiant Servers

HP2-T25: SERVICING HP RACK AND TOWER SERVER SOLUTIONS

Implementing Red Hat Enterprise Linux 6 on HP ProLiant servers

HP SCOM Management Packs User Guide

Using Integrated Lights-Out in a VMware ESX environment

Computer Setup User Guide

Quick Start to Evaluating. HP t5630w, HP t5730w, HP gt7720

HP Thin Client Imaging Tool

HP BladeSystem Management Pack version 1.0 for Microsoft System Center Essentials Troubleshooting Assistant

HP ProLiant Gen8 Server with HP ilo Management Engine

Using Integrated and Discrete Graphics Simultaneously

Getting Started. rp5800, rp5700 and rp3000 Models

How to configure Failover Clustering for Hyper-V hosts on HP ProLiant c-class server blades with All-in-One SB600c storage blade

Managing Microsoft Hyper-V Server 2008 R2 with HP Insight Management

HP SCOM Management Packs User Guide

End-to-end management

QuickSpecs. Models HP Smart Array E200 Controller. Upgrade Options Cache Upgrade. Overview

RAID 1(+0): breaking mirrors and rebuilding drives

HP Smart Array 5i Plus Controller and Battery Backed Write Cache (BBWC) Enabler

HP ilo Management Engine

QuickSpecs. Overview. Compaq Remote Insight Lights-Out Edition

WHITE PAPER. Installing Microsoft Windows 98 on Compaq Armada E, M, and V Portable Products CONTENTS

Microsoft BackOffice Small Business Server 4.5 Installation Instructions for Compaq Prosignia and ProLiant Servers

QuickSpecs. What's New Dual Port SFF 10K and 15K SAS drives Dual Port 3.5" 15K SAS drives. HP SAS Drives (Servers) Overview

Quick start to evaluating HP Windows Embedded Standard 2009 Thin Clients. HP t5630w, HP t5730w, HP t5740, HP gt7720

HP Embedded SATA RAID Controller

HP Support Plus Service

USB Secure Management for ProCurve Switches

Using HP ProLiant Network Teaming Software with Microsoft Windows Server 2008 Hyper-V or with Microsoft Windows Server 2008 R2 Hyper-V

Directory-enabled Lights-Out Management

HP P4000 G2 Series System Recovery Procedures

Deploying and updating VMware vsphere 5.0 on HP ProLiant Servers

HP OneView Administration H4C04S

P4000 SAN/iQ software upgrade user guide

QuickSpecs. HP SATA Hard Drives. Overview

DELL. Unified Server Configurator: IT and Systems Management Overview. A Dell Technical White Paper

Certification: HP ATA Servers & Storage

QuickSpecs. What's New. Models. ProLiant Essentials Server Migration Pack - Physical to ProLiant Edition. Overview

WHITE PAPER. HP Guide to System Recovery and Restore

HP Notebook Hard Drives & Solid State Drives. Identifying, Preventing, Diagnosing and Recovering from Drive Failures. Care and Maintenance Measures

QuickSpecs. HP Smart Array 5312 Controller. Overview

Guidelines for using Microsoft System Center Virtual Machine Manager with HP StorageWorks Storage Mirroring

installing UEFi-based Microsoft Windows Vista SP1 (x64) on HP EliteBook and Compaq Notebook PCs

Microsoft Windows XP Service Pack 1 on Compaq Evo Notebooks

HP ProLiant Support Pack and Deployment Utilities User Guide. June 2003 (Ninth Edition) Part Number

HP Personal Workstations Step-By- Step Instructions for Upgrading Windows Vista or Windows XP Systems to Windows 7

HP PDU Management Module Overview

QuickSpecs. HP SATA Hard Drives. Overview

HP Smart Array B110i SATA RAID Controller User Guide

HP Cloud Map for TIBCO ActiveMatrix BusinessWorks: Importing the template

ProCurve Manager Plus 2.2

QuickSpecs. What's New HP 3TB 6G SAS 7.2K 3.5-inch Midline Hard Drive. HP SAS Enterprise and SAS Midline Hard Drives. Overview

QuickSpecs. What's New HP 1.2TB 6G SAS 10K rpm SFF (2.5-inch) SC Enterprise 3yr Warranty Hard Drive

HP Converged Infrastructure Solutions

HP Server Console Switch with Virtual Media Overview

HP Server Integrations with Microsoft System Center Products Support Matrix

HP Hardware Support Onsite 6-Hour Call-to-Repair Service - U.S.

QuickSpecs. Models HP Server Console Switches

HP Software Technical Support

QuickSpecs. HP SATA Hard Drives. Overview

QuickSpecs. HP Dynamic Smart Array Controllers. Models HP Dynamic Smart Array RAID Controllers. HP Dynamic Smart Array Controllers.

HP Hardware Support Onsite 6-Hour Call-to-Repair Service HP Customer Support Contractual Service Package

QuickSpecs. Models. HP Insight Remote Support Software Portfolio Overview

HP Systems Insight Manager 7.0 and HP Agentless Management overview

HP Hardware Support Onsite 6-Hour Call-to-Repair Service - U.S.

HP Solid State Drive (SSD) Overview

Configuring Memory on the HP Business Desktop dx5150

HP Factory-Installed Operating System Software for Microsoft Windows Small Business Server 2003 R2 User Guide

HP online support resources

HP Systems Insight Manager and HP OpenView

QuickSpecs. HP Insight Recovery Software Overview

HP Prior Software Version Support HP Mature Software Product Support

Models Smart Array 6402A/128 Controller 3X-KZPEC-BF Smart Array 6404A/256 two 2 channel Controllers

Using HP System Software Manager for the mass deployment of software updates to client PCs

SMART INSTALL CONTENTS. Questions and answers

Also on the Performance tab, you will find a button labeled Resource Monitor. You can invoke Resource Monitor for additional analysis of the system.

QuickSpecs. What's New. Models. HP ProLiant Essentials Performance Management Pack version 4.4. Overview

Creating and Restoring Images on the HP Thin Client with Altiris Deployment Server v6.5

HP Hardware Support Onsite 6-Hour Call-to-Repair Service

HP Intelligent Management Center v7.1 Virtualization Monitor Administrator Guide

HP ProLiant DL380 G5 High Availability Storage Server

Transcription:

Memory Troubleshooting Best Practices for HP ProLiant Servers Accurately troubleshooting memory issues in ProLiant server configurations is an important process that can help prevent unnecessary replacement of hardware components. In addition, accurate problem diagnosis prevents customers from experiencing unnecessary downtime while waiting for hardware that may not need to be replaced. Following standard troubleshooting guidelines and using them each time a memory issue is suspected helps to establish this. HP has developed several methods for troubleshooting memory problems in ProLiant servers. The purpose of this white paper is to assist HP customers in troubleshooting memory problems by successfully isolating the specific DIMMs causing the problem. This helps to prevent nonessential replacement of unaffected DIMMs or, in some cases, entire banks of memory. In addition, effective troubleshooting can help determine if a firmware or other software download can resolve a problem without replacing hardware. This white paper covers the following topics: Why Should I Troubleshoot Every Memory Problem? How Can I tell if a Memory Problem has occurred? What tools are available from HP to help identify a failing DIMM? Troubleshooting Using HP Insight Diagnostics Offline Edition Troubleshooting Using HP Insight Diagnostics Online Edition Troubleshooting Flowchart for Bootable Systems Troubleshooting Flowchart for Non-Bootable Systems What role does Firmware play in Solving Memory Problems? Why buy HP memory? Other Troubleshooting Resources

Why Should I Troubleshoot Every Memory Problem? Accurate diagnosis of system memory problems in ProLiant servers has many advantages, including: Prevents unnecessary hardware replacement. Prevents the return of parts that test NFF (No Fault Found). Prevents server downtime. Best P ractice: Many product issues that result in hardware replacement are preventable or correctable with a firmware update. HP recommends checking for a firmware update before sending a part back to HP for replacement. Based on the HP ProLiant product return rates, a significant percentage of all returned hardware products were functioning properly and only needed a firmware update. Although not all products fall into this category, server downtime and time spent removing, returning, and ultimately replacing hardware may have been avoided if an attempt had been made to flash the firmware during the troubleshooting process. How Can I Tell if a Memory Problem has Occurred? There are many indicators that a problem has occurred within the memory subsystem. HP has several tools used to identify the status of hardware and software within a system. Using these tools is a good first step in the process. When a memory problem is suspected, check one or all of these common places to find information: The HP System Management Homepage HP Systems Insight Manager (HP SIM) Server Logs DIMM Slot LEDs IMP ORTANT: When a memory error is detected, the firmware illuminates the fault LEDs located near each DIMM slot on the system board. If the system identifies an error to a specific slot, that LED illuminates. However, if the system can only identify an error within a bank, but cannot isolate a specific DIMM, all the LEDs in the bank will illuminate. In addition, if the system cannot identify the bank in which the error has occurred, all the LEDs in all banks illuminate, making the task of isolating the failing DIMM more difficult, and the chance of replacing functioning banks of memory more likely. 2

Therefore, further troubleshooting is necessary to determine which specific DIMM is failing. Use the LEDs as a tool in identifying that a memory problem may exist, but don t rely solely on the status of the LEDs to determine if hardware should be replaced. What Tools are Available from HP to Help Identify a Failing DIMM? Refer to the following HP system tools when a memory problem is suspected. HP System Management Homepage The HP System Management Homepage supplies a consolidated view of system hardware health, configuration, performance and status information for individual HP servers. Details are provided on total system health, including system memory. Information on memory can be found under the Performance section on the main page (See Figure 1). Figure 1: Overview of the HP System Management Homepage 3

HP Systems Insight Manager (HP SIM) HP Systems Insight Manager monitors the health of the hardware in the system and polls installed hardware for its status every few minutes. Refer to the screenshot below for an example of events displayed on the System page. Figure 2: HP Systems Insight Manager 4

System Logs Server system logs record the status of hardware events, including memory issues. For servers running Microsoft Windows operating systems, either of the following logs can be a valuable resource: Integrated Management Log (IML) Event Viewer For servers running Linux operating systems, refer to either of the following: Integrated Management Log (IML) varlog/messages file Microsoft Windows Operating Systems: Using the IML The IML Viewer is a software tool created by HP and is available as a downloadable component pack from HP.COM. It can also be accessed via the HP System Management Homepage (SMH). Navigate to this tool through SMH by clicking on the Logs tab or through the operating system from HP System Tools. Figure 3 below shows the IMH accessed via SMH. System memory issues, if present, will be recorded and will be visible in the IML. 5

Figure 3: Integrated Management Log Microsoft Windows Operating Systems: Using Event Viewer The Event Viewer is a software tool available as part of Microsoft Windows operating systems. It can be accessed by navigating to HP System Tools via the Start menu. Figure 4 shows an example of server events that are logged in this tool. 6

Figure 4: Event Viewer (Microsoft Windows Operating Systems) Linux Operating Systems: Using the IML The IML Viewer is a software tool created by HP and is available as a downloadable component pack from HP.COM. For systems running Linux, type the command hplog v to view the IML log and check for system memory errors. From the Linux Command Prompt, type hplog v and the entire IML will be displayed. Any detected system memory error will be logged, including any pre-failure warranty memory events on systems where applicable. Linux Operating Systems: Using /var/log/messages file The Linux system log (/var/log/messages) can be viewed using the cat, more, and less commands. The following types of messages may be logged here if a memory problem has occurred: : Oct 11 01:51:18 dhcp57-150 hpasmd[12039]: WARNING: hpasmd: Corrected Memory Error threshold exceeded (Slot 3, Memory Module 6) [root@dhcp57-150 ~]# hplog -v ID Severity Initial Time Update Time Count ------------------------------------------------------------- 0011 Information 22:50 09/16/2004 22:50 09/16/2004 0001 LOG: IML Cleared (Administrator) 0012 Repaired 00:32 09/21/2004 00:32 09/21/2004 0001 LOG: Memory Cartridge Not Redundant (Slot 5) 0033 Caution 04:02 10/06/2006 04:02 10/06/2006 0001 LOG: Corrected Memory Error threshold exceeded (Slot 3, Memory Module 6) 7

DIMM Slot LEDs as Memory Problem Indicators DIMM slot LEDs can be helpful indicators of issues with system memory. Figure 5 below gives an example of the location of LEDs that will illuminate when certain DIMM slots indicate a failure. The example below is from a ProLiant DL380 G4 system board; however, similar graphics for specific systems can be found in server User Guides and other system documentation. Memory Problem Indicators 5 DIMM failure slot 6C Amber = Memory failed Off = Normal 6 DIMM failure slot 5C Amber = Memory failed Off = Normal 7 DIMM failure slot 4B Amber = Memory failed Off = Normal 8 DIMM failure slot 3B Amber = Memory failed Off = Normal Example DL 380 G4 9 DIMM failure slot 2A Amber = Memory failed Off = Normal DIMM Slot LEDs on the System Board 1 DIMM failure 0 slot 1A Amber = Memory failed Off = Normal 9/5/2006 12 Figure 5: DiMM Slot LEDs In the example below, the DIMM Slot LED on a ProLiant DL380 G7 server is illuminated: 8

Once a memory problem is suspected based on one of the methods outlined above, the first step is to schedule downtime with the customer for troubleshooting. HP recommends using HP Insight Diagnostics, available on the SmartStart CD and as a standalone download on HP.COM to begin the troubleshooting process. HP Insight Diagnostics is a proactive server management tool that is available in offline and online editions. It provides diagnostics and troubleshooting capabilities to help locate system and component problems. Features of HP Insight Diagnostics: Works on multiple operating systems Can be used online or offline HP Insight Management Agents and the System Management driver can be leveraged You can leverage HP Insight Management Agents and the System Management driver Common Diagnostic Model (CDM) compliance. For more information on HP Insight Diagnostics, refer to the following URL: http://h18013.www1.hp.com/products/servers/management/hpid/index.html 9

Troubleshooting Using HP Insight Diagnostics Online Edition HP Insight Diagnostics Online edition is available in HP ProLiant Support Packs and from HP.COM. The Insight Diagnostics Online edition performs various non-intrusive in-depth system and component diagnosis while the operating system is running. The survey feature can identify and resolve problems without taking the system down. Troubleshooting Using HP Insight Diagnostics Offline Edition Insight Diagnostics Offline edition is available in HP SmartStart by choosing Launch server diagnostics. Some benefits of HP Insight Diagnostics are: Performs extensive in-depth system and component testing while in a controlled operating environment. Survey feature enables IT administrators to track hardware and software changes in order to form a complete and thorough auditing process for the system. Test results can be analyzed by IT administrators to diagnose system and component problems in order to repair and return servers back to the production environment. Online vs. Offline testing Offline is the preferred testing method because it is the most accurate. When testing offline, there is minimal interference from the operating system and address space testing is maximized because a very small Linux kernel is used. Helpful Link s: HP Insight Diagnostics User Guide: www.hp.com/servers/diags Troubleshooting Flowcharts The following troubleshooting flowcharts can be used as another tool for diagnosing specific DIMMs that have failed. They can be used as a general guideline and should be used in conjunction with other tools and methods outlined in this white paper. Because a lot of system issues can be solved by upgrading firmware, download the latest version of the server firmware before proceeding with the troubleshooting methods outlined in the following flowcharts. The latest firmware is available as follows: 10

1. Go to www.hp.com 2. Select "Software and Driver Downloads." 3. Enter the Server model (for example, "DL160 G5"). 4. Select the specific server model from the "Product Search Results" page, if this page appears. 5. Select the appropriate operating system. 6. Select BIOS - System ROM. 7. Select the latest version of the System ROM firmware and click "Download." General Troubleshooting Flowchart for a Bootable System Using HP Insight Diagnostics 11

Troubleshooting flowchart for a Non-Bootable System Use this flowchart to help diagnose a memory problem in any of the following conditions: The system stops responding and displays a parity error message on the screen during boot. The system beeps and all of the DIMM LEDs illuminate during boot. The system is unable to boot far enough to run Offline Diagnostics, but no messages are displayed that can identify the failing DIMM. 12

What role does firmware play in Troubleshooting memory problems? Many product issues that result in hardware replacement, including issues in which memory is suspected, are preventable or correctable with a firmware update. HP recommends checking for a firmware update before sending a part back to HP for replacement. Based on the HP ProLiant product return rates, a significant percentage of all returned hardware products were functioning properly and only needed a firmware update. Although not all products fall into this category, server downtime and time spent removing, returning, and ultimately replacing hardware may have been avoided if an attempt had been made to flash the firmware during the troubleshooting process. The following paragraphs contain information on each of HP s methods for updating firmware and provide information on how to perform the updates. Currently, there are two different methods for updating firmware on HP servers and options: the Online ROM flash and the Offline ROM flash. Note that the Online ROM Flash is not currently available for all products. If an Online ROM Flash is unavailable for a particular server or option, an Offline upgrade will need to be performed. 13

TIP: If a server is deployed more than three months after purchase, use the HP Support and Drivers page on HP.COM, rather than the Firmware Maintenance CD that shipped with the server. In addition, check for firmware updates for any options that may have been in stock but were not deployed until later. This ensures that all server components are running the latest firmware versions. The latest firmware upgrades are available at: 1. Go to www.hp.com 2. Select "Software and Driver Downloads." 3. Enter the Server model (for example, "DL160 G5"). 4. Select the specific server model from the "Product Search Results" page, if this page appears. 5. Select the appropriate operating system. 6. Select BIOS - System ROM. 7. Select the latest version of the System ROM firmware and click "Download." Updating Firmware Using the ONLINE ROM FLASH Method The Online ROM Flash is an innovative technology developed by HP that allows the firmware to be upgraded either locally or remotely via a downloadable file called a Smart Component. These Smart Components enable the update to be performed while the server is operational, thereby avoiding costly server downtime. Benefits of the Online ROM Flash include: The server does not have to be taken offline to perform the upgrade. The upgrade process takes less than a minute to complete. The server can be scheduled for a reboot at a later time to deploy the new firmware after the upgrade process. The server administrator can remotely perform the upgrade to multiple servers at one time using the ProLiant Remote Deployment Utility, the ProLiant Remote Deployment Console Utility, and other HP server management technologies, such as HP Systems Insight Manager (HP SIM). The Smart Component updates the firmware and configures the system so that the new settings will take effect on the next reboot. This feature allows the update to be performed but gives the administrator control of when the new settings are deployed. 14

For more information on how to deploy firmware updates, refer to the README or TXT files that are packaged with the firmware or refer to the following links: Firmware management Best Practices for HP BladeSystem: http://h20000.www2.hp.com/bc/docs/support/supportmanual/c02049593/c02049593.pdf?jumpid=re g_r1002_usen HP Smart Update Manager help: http://downloads.linux.hp.com/sdr/downloads/proliantsupportpack/suse/10/i386/8.50/hpsum_help_en.htm?jumpid=reg_r1002_usen HP Smart Update Manager User Guide: http://h20000.www2.hp.com/bc/docs/support/supportmanual/c02161675/c02161675.pdf?jumpid=re g_r1002_usen ProLiant Support Pack User Guide: http://h20000.www2.hp.com/bc/docs/support/supportmanual/c01868185/c01868185.pdf?jumpid=re g_r1002_usen Subscriber s Choice for receiving firmware updates: http://docs.hp.com/en/5992-5814/pr01s04.html Updating Firmware Using the Offline ROM Flash Method The Offline ROM Flash, as its name implies, is performed when the server is taken down for regular maintenance. Although the results will be the same, the Offline ROM Flash does not provide the same benefits of the new Online ROM Flash method. In addition, when upgrading remotely, the server administrator can only update one server at a time. There are two methods of performing an Offline ROM Flash. The firmware can be updated using a ROMPaq Diskette or using the ROM Update Utility. A ROMPaq is a floppy-disk based method of upgrade. The firmware is downloaded onto a floppy diskette and then the system is booted to the floppy drive. The ROM Update Utility is located on the Firmware Maintenance CD, or can be downloaded to a USB Drive Key using the HP Drive Key Boot Utility. Note: Hard Drive components can only be updated using the Offline method. For more information: http://h20000.www2.hp.com/bizsupport/techsupport/softwaredescription.jsp?lang=en&cc=us&switem=m TX-44b49e9d4d0f4f4cb244e3f0ce&jumpid=reg_R1002_USEN 15

The HP Drive Key Boot Utility The HP Drive Key Boot Utility can format an HP Drive Key so that it can be used as a bootable device. The utility also provides the ability to load the ROM Update Utility on an HP Drive Key. After the ROM Update Utility has been installed, the Offline ROM Flash Smart Components can be downloaded to the drive key from the following URL and deployed using the ROM Update Utility. For more information or to download the HP Drive Key Boot Utility, refer to the following link: http://h20000.www2.hp.com/bizsupport/techsupport/softwaredescription.jsp?lang=en&cc=us&switem=m TX-UNITY-I23839&jumpid=reg_R1002_USEN For additional information on the HP Drive Key Boot Utility, refer to the following URL: http://h20000.www2.hp.com/bc/docs/support/supportmanual/c00218060/c00218060.pdf The HP Drive Key Utility can be downloaded from the following URL: http://www.compaq.com/support/files/server/us/locate/8641.html Other Troubleshooting Resources The HP ProLiant Troubleshooting Guide is located at the following URL and may be helpful in diagnosing memory or other system problems: http://h20000.www2.hp.com/bc/docs/support/supportmanual/c00300504/c00300504.pdf User Guides for each individual ProLiant server, available on the hp.com Support and Drivers page, contain specific guidelines for populating memory for that system. For example, memory configuration guidelines can be found on page 49 of the User Guide for the ProLiant DL580 G7 server, available here: http://h20000.www2.hp.com/bc/docs/support/supportmanual/c02267159/c02267159.pdf?jumpid=re g_r1002_usen For the latest memory configuration information, search the QuickSpecs for all ProLiant servers here: http://www.hp.com/go/proliant Before replacing memory in ProLiant servers, ensure that accurate troubleshooting methods are used. Taking the time to isolate specific DIMMs that are failing can prevent unnecessary hardware replacement and customer dissatisfaction. Follow standard troubleshooting guidelines highlighted in this white paper and use them each time a memory issue is suspected. Accurate problem diagnosis reduces warranty costs for HP and prevents customers from experiencing unnecessary downtime. 16

17

2007, 2010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. 446969-002 Rev B, 7/2010