Troubleshooting Guide for BIOS POST on 13 th Generation of Dell PowerEdge Servers Wei Liu Dell Server BIOS Development September 2014 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Revisions Date August 2014 Description Initial draft THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. 2014 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell. Dell, the DELL logo, and the DELL badge are trademarks of Dell Inc. Intel, the Intel Logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Microsoft, Windows, and Windows Server are registered trademarks of Microsoft Corporation in the United States and/or other countries. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims any proprietary interest in the marks and names of others. 2 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Table of contents Revisions... 2 Executive summary... 4 1. BIOS Splash Screen Display... 4 2. POST Error and Warning Messages... 6 3. Post Code in idrac Web GUI... 9 4. Driver Health Status Report... 10 5. Dell Diagnostics (epsa)... 12 6. Red Screen of Death (RSOD)... 14 7. Yellow Screen of Death (YSOD)... 16 3 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Executive summary The Unified Extensible Firmware Interface (UEFI) is a set of industry-standard firmware interfaces that is designed to replace the legacy BIOS to support modern operating systems and hardware architectures. Dell has been shipping UEFI support in the BIOS since the 11 th generation of PowerEdge servers through a UEFI-over-Legacy model, where it is the legacy BIOS that initializes the whole system and loads the UEFI layer at the end of Power-On Self-Test (POST) if needed. The Dell Lifecycle Controller technology is built upon UEFI as well. The BIOS on the 13 th generation of Dell PowerEdge servers is now a native UEFI implementation, with a Compatibility Support Module (CSM) to provide legacy BIOS interfaces to support operating systems that are not UEFI-aware. The look and feel of the boot process is dramatically different from the previous generations. This guide provides troubleshooting solution for possible issues that may arise during POST and pre-boot environment on the 13 th generation of PowerEdge servers. 1. BIOS Splash Screen Display After the system is powered on, the Dell server BIOS may get to video display almost instantly. Fig. 1 is a sample snapshot of the POST splash screen. The text next to the progress bar on the bottom of the screen indicates various phases of POST. The text can aid in troubleshooting issues that happen during the system boot process. The following table lists the currently supported progress texts in the BIOS: Text Display Phase of the Boot Process Initializing Intel QuickPath Interconnect... BIOS performs an early initialization of the chipset, processors, and QPI interfaces. Configuring Memory BIOS initializes the system memory. Loading BIOS Drivers BIOS starts the Driver Execution Environment (DXE) phase, loads and executes DXE drivers to perform additional chipset, processor and hardware initializations. Initializing idrac BIOS waits for idrac to become ready. This phase may take more than a few seconds on the first AC power on of the system. Initializing idrac Done idrac initialization has completed. Initializing PCIe, USB and Video Start of PCI enumeration and detection of USB keyboard devices. Initializing PCIe, USB and Video Done PCI and USB enumeration has completed. Legacy PCI option ROM initialization (BIOS boot mode only) Applies to the BIOS boot mode only. The onscreen display varies, depending on the type of PCIe cards that are installed in the system. Testing Memory (X% Complete) Software-based memory test phase. A percent progress. Note: The memory test is disabled in the BIOS setup by default. 4 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Testing Memory Done [No Errors] Testing Memory Done [Errors Encountered] Testing Memory Aborted Loading Lifecycle Controller Drivers Loading Lifecycle Controller Drivers Done Initializing Firmware Interfaces Running In-System Characterization... Connecting iscsi device(s) Enumerating Boot options Enumerating Boot options Done Entering Lifecycle Controller Lifecycle Controller: Applying Updates or Setting System Configuration Lifecycle Controller: Collecting System Inventory Lifecycle Controller: Done Booting Memory test completed without any issue. Memory test has found error(s). Memory test was aborted by pressing <ESC> or spacebar. BIOS loads the Lifecycle Controller drivers. BIOS has finished loading the Lifecycle Controller drivers. BIOS connects the UEFI drivers to the device handles. The UEFI drivers from add-in PCIe cards are expected to be installed in this phase. In-System Characterization (ISC) is in progress. the UEFI iscsi device drivers are connected. This display applies to UEFI boot mode only. It gets displayed when an iscsi boot device(s) has been configured. BIOS starts to enumerate Boot Options in the system. The enumeration of Boot Options has completed. The system is booting into the Lifecycle Controller. An Automated Task Application is being scheduled in the Lifecycle Controller. Lifecycle Controller is collecting system inventory for this boot. Lifecycle Controller has finished execution. BIOS has finished POST and is giving control to the operating system. 5 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 1 POST splash screen and progress bar 2. POST Error and Warning Messages The BIOS on the 13 th generation of PowerEdge servers can display informational, warning and error messages during POST to help you troubleshoot various issues. If the error occurs early in POST, such as during memory initialization, then a pop-up message box with a detailed description of the issue (e.g. Fig. 2) may be displayed on the screen. 6 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 2 An error message box in early POST If the issue is detected at a later time in POST, corresponding error and warning messages aredisplay ed on the screen with a UEFIxxxx prefix. An event entry is logged in the Lifecycle Controller log (LC log) as well. Depending on the severity of the error/warning, the system may proceed with continuing boot, or prompt with F1/F2/F10/F11 for user input, or reset, or halt. The message comprisesof two parts, the error/warning message itself, and a recommended response action. You can follow the corresponding recommended response action to address the issue. For a complete list of POST error and warning messages, see the Event and Error Message Reference Guide for 13 th Generation Dell PowerEdge Servers. In the following example, the UEFI driver for the Integrated Network card is not signed. The user has just turned on Secure Boot in BIOS setup utility. In the next boot, a few error messages are displayed on the screen during POST. - The first error message (UEFI0072) displays that the UEFI driver from the Integrated NIC 1 Port 1 Partition 1 was not loaded because it failed the Secure Boot authentication. You may address this issue by updating the NIC firmware to a version that supports the UEFI driver signing. 7 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
- The second error message (UEFI0071) displays that the previously configured UEFI network boot interface is no longer available. This is a result of the corresponding UEFI driver not being loaded. - The third warning message (UEFI0074) displays that the Secure Boot policy has been modified since the last time the system was booted. In this particular example, the user enabled Secure Boot on purpose, so no action needs to be taken. Fig. 3 An example of POST error messages Corresponding logs for the error and warning messages will be recorded in the Lifecycle Log (Fig. 4). 8 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 4 Screen shot of the Lifecycle Log 3. Post Code in idrac Web GUI In case you cannot get to the screen display, the Post Code feature available in the idrac web GUI may come handy. This page displays the last system POST code with a descriptive text. POST code helps to detect pre-video hangs, report fatal errors, and analyze system failures during POST. 9 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 5 An example of the Post Code in the idrac Web GUI 4. Driver Health Status Report The UEFI specification defines a Driver Health Protocol (DHP). The DHP provides services allowing a UEFI driver to express health status of a controller, return status messages associated with the health status, perform repair operations if necessary and request configuration changes to place the controller back in a usable state. Dell server BIOS checks the driver health status of each UEFI driver in the system, and displays the status messages. The BIOS may invoke the repair and configuration utility if a repair or reconfiguration operation is required. In most cases, you can follow the instructions on the screen to proceed. Fig. 6 is an example display where the BIOS halts on some errors returned from DHP. In this particular example, the idrac DHP detected that the backplane 2 power cable has been disconnected; The LSI SAS controller requires configuration changes, possibly due to a catastrophic issue. 10 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 6 Example of errors detected by UEFI Driver Health Protocol The following (Fig. 7) is a snapshot of the Driver Health Manager in the case when a driver requires configuration change. The Driver Health Manager lists all the device instances that require reconfiguration. You can select each one of them and follow the instructions on the screen to configure the devices. 11 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 7 Driver Health Manager 5. Dell Diagnostics (epsa) Dell Enhanced Pre-Boot System Diagnostics (epsa) are diagnostics tests that are embedded in the system (Fig. 8). These tests allow you to check the hardware health status outside the operating system environment. The findings of this diagnostics can assist you in troubleshooting the fault and working toward a resolution to the issue. The epsa can be launched from the Boot Manager-> System Utilities-> Launch Diagnostics (Fig. 9). 12 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 8 Sample screen shot of epsa 13 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 9 Launching diagnostics from Boot Manager 6. Red Screen of Death (RSOD) The Dell server BIOS implements an enhanced CPU exception handler (RSOD) which aids the user and tech support to analyze the software exception when the system crashes in the pre-boot UEFI environment. The debug information is displayed on the screen and additional information and stack traces can be retrieved through the serial port (if available). You can save the dump and use it for debugging offline. 14 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
A sample RSOD display is depicted in Fig. 10. Fig. 10 An example of the RSOD screen shot When an exception is raised by the processor the BIOS displays the RSOD screen with the following information related to the exception. The exception type, such as Page Fault, General Protection Fault, Divide by Zero, Breakpoint, and so on. A Dell-defined error value, pre-fixed with UEFIxxxx. Note a corresponding error will be logged to the LC log as well. Partial register set (x86 64bit). Last-Branch records and associated module names if available. Current RIP and Faulting driver module name Stack trace back from faulted module. Additional information is available from the serial port dump. To retrieve the serial dump, you can connect the server to a client system with a null modem cable and use any terminal program (for example, Putty or HyperTerminal) with the baud rate set to 115200 bps, then press <ENTER>. The serial dump can be retrieved from Serial over LAN (SOL) method as well. 15 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Note: The RSOD serial dump can be obtained at the point of failure. The serial session does not have to be started prior to the RSOD. RSOD are usually caused by software issues, and may be resolved by updating the BIOS, Lifecycle Controller, or the UEFI firmware for PCIe cards. You may send the screen shot and serial dump to Dell support for further analysis, should you encounter a RSOD even after all the firmware updates. 7. Yellow Screen of Death (YSOD) When a hardware error occurs during UEFI pre-boot environment (excluding CSM phase in BIOS boot mode), the Dell server BIOS may display a Yellow Screen of Death (YSOD) with some of the software contexts at the time when the issue is detected. The hardware errors include Nonmaskable Interrupt (NMI) and Machine Check Errors (MCE). You should check the System Event Log (SEL) to identify the source and type of the error. Update the corresponding device firmware if the error is originated from a PCIe device. Note: The stack trace displayed on the YSOD screen only provides some context information before the failure, and not the source of the problem. A sample YSOD is depicted in Fig. 11. 16 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers
Fig. 11 An example of the YSOD screen shot 17 Troubleshooting Guide for BIOS POST on Dell 13 th Generation of PowerEdge Servers