1 Service on ThunderLT with AMI MegaRAID Controller Familiarity with the hardware and software discussed in troubleshooting and repairing this array is found at the end of this document, under the heading ThunderLT and AMI MegaRAID Points of Interest. Screenshots and drawings are shown in that section. Brief Description: ThunderLT uses RAID-10 technology. This is an amalgam of RAID 0 and RAID 1 put together RAID 0 is a stripeset with no fault-tolerance (also known as JBOD ), and RAID 1 is a mirror set (one drive carrying an identical copy of all data on the other). Put together, RAID 10 is a mirrored stripeset two drives striped together, with an identical copy of all data on two other drives striped together. ThunderLT uses the minimum RAID-10 configuration of four drives. Generically, RAID-10 can be scaled up two drives at a time (for example, an 8-drive RAID 10 is a 4-drive RAID-0 mirrored by another four-drive RAID-0). The AMI MegaRAID controller used in the ThunderLT has two important features: First, on the positive side, it has operator notification a system tray icon visually indicates the array s condition. On the not-so-good side, the MegaRAID isn t very serviceable when configured as RAID 10 it isn t possible to introduce a new drive as a spare and rebuild the array as with other arrays found on Thunder. Fortunately, even though the damaged configuration must be destroyed and replaced with a new, identical configuration, full data recovery is likely (but not certain). Failure Symptoms to Expect With This Configuration: - Tray icon is not green. - Error dialog stating a drive is down after logging on to Windows. - MegaRAID BIOS stuck before booting Windows, usually says rebuilding. - Long wait times typically in Thunder when accessing certain areas of the database window or playing certain clips. Sometimes the tray icon will turn red, then later (or after rebooting) back to green. No other sign of database damage (purging flash_st.log doesn t help). Other possible issues with talking to drive in general (like Properties on the E: drive, or Disk Management reports the partition is At Risk instead of Healthy). - Events in the Windows System Event log related to Physical Drive 1.
2 Repairing the MegaRAID Array: 1. If the array is working well enough, it is strongly recommended that a backup of all data is done immediately. Corrective actions on the hardware have historically shown a 10% risk of all data lost. 2. The defective drive must be identified. Depending on the symptoms, choose one of the following paths: a. YELLOW TRAY ICON. Double-click on the tray icon to launch the utility. Select the Physical Drives display. The defective drive will either be unlisted, or marked as missing or offline. Take note of which bus and member (such as Primary Slave or Secondary Master). b. WON T BOOT, STUCK ON DRIVE SCAN The controller scans for drives in the following order: Primary Master, Primary Slave, Secondary Master, Secondary Slave. The defective drive is likely to be the first unit not shown. For example, if the BIOS scan freezes at Primary Master, then the Priary Slave is likely to be hanging the scan and needs to be replaced. Or, if no drives appear to be responding, the Primary Master is likely to be the problem. CAUTION: with this symptom, if a Master unit is identified as the problem, it is possible that the slave unit could be hanging the bus altogether. If the later step of removing the drive doesn t solve this problem and a Master was removed, try removing the Master unit and re-connecting the Slave unit to maximize chances of data recovery! c. WON T BOOT, REBUILDING MESSAGE The BIOS will explicitly identify the problem drive with a message. d. GREEN TRAY ICON BUT PROBLEMS USING E: DRIVE IN THUNDER OR WINDOWS Double-click the tray icon to launch the utility. Select the Physical Drives Display, and examine the ERRORS field for all four drives. The problem drive is the drive showing a non-zero error count. If all four drives show a zero error count, try to cause the problem again using Thunder or Windows Explorer and re-check. 3. If the defective drive is causing stability or boot problems, it should be removed before delivery of replacement parts, if necessary. At this point, re-visit whether or not the data can be backed up if not possible earlier. This pertains to all problem descriptions except yellow tray icon or red tray icon. If the tray icon is yellow, it is safe to wait until the replacement is available before proceeding (unless problems using the E: drive are also present). If the tray icon is red, all data is already lost and the unit is down for service until replacements are available anyway. Removing the defective drive:
3 a. Physically identify the bus members follow the data cable from the appropriate bus connector on the card (Primary or Secondary) to the two connected drives. b. Physically identify the drive if possible, read the jumper map on the top drive s label. Then examine the jumpers on the two bus members to determine which is the bad drive. c. If the bad drive is causing boot or stability problems, simply disconnect its data cable, reboot the machine with three drives present, and attempt to back up the data if possible. CAUTION: if symptom was a frozen drive scan and a Master unit was identified, reconnect the Master and disconnect the slave if the drive scan is still frozen. DO NOT disconnect both drives unless the controller won t scan with either one present. If this is the case, then all data may already be lost, but make the attempt anyway if Windows boots with no E: drive present, all data is lost. Otherwise, attempt to back up the content. 4. The replacement drive must be properly jumpered make sure it is configured to replace the bad drive before proceeding! Then replace the defective drive physically. The 4-drive cage is fastened to the bottom of the Thunder cabinet using 4 captive screws. Loosen all four screws, then pull the drive cage out and replace the bad drive. 5. Boot the Thunder, and enter the MegaRAID BIOS Setup Press CTRL-M when prompted to do so. 6. Without regard to current configuration, use F5 to delete all array information and use the F3: Auto-Configure RAID10 function. Historically, an array that had one failed drive will at this point appear to be two arrays, one offline with only one drive, and the other may or may not be offline with two drives. Ignore this inconsistency and simply delete both arrays. Then press F3, and approve of all default settings. Press F10 to save and exit. 7. If data has not been lost so far, at this point there is a 90% probability that it will be retained and the unit can be put back into service. If all data was lost, follow regular installation procedure, beginning with partitioning and formatting the array (IMPORTANT: Use the Partition and Format Wizard in Disk Management, and make sure to change Allocation Unit Size from default to 64K. Select Quick Format and allow Windows to determine if this is possible it will save a great deal of time). Share the new E: drive as E (not E$ ) and either restore the backup of the data or create a new Thunder database.
4 ThunderLT with AMI MegaRAID Points of Interest: 1. At boot time, the following BIOS display will appear, after the Adaptec SCSI BIOS completes initialization: MegaRAID IDE BIOS Version x.x.x (c) Copyright American Megatrends Inc., USA MegaRAID IDE Adapter Card found at PCI Bus No:xx Dev No:xx Scanning for Primary Master.... found xxxx xx MB Scanning for Primary Slave.... found xxxx xx MB Scanning for Secondary Master.... found xxxx xx MB Scanning for Secondary Slave.... found xxxx xx MB Press Ctrl-M to run Configuration Utility On ThunderLT systems, all four drives will be a Seagate or IBM/Hitachi model number. Normally, there will also be information about the array here (screenshot not available). 2. The System Tray icon for this controller: This tray icon provides visibility to the operator on the array s condition, it appears next to the Windows system clock on the Taskbar. Its color is the indicator when green, the array is fully fault-tolerant and online. When yellow (as shown), the array is in a critical state not all four drives are online. When red, the array has completely failed (all data is lost).
5 3. ThunderLT uses a RAID-10 array, configured using the defaults offered in the MegaRAID IDE Setup Utility. The screenshot below is the BIOS configuration utility itself. This is not an actual screenshot of a ThunderLT. The array configuration in the largest rectangle (upperleft corner) is correct for ThunderLT except for size (Thunder s array should be approximately 120GB). Also note the Help rectangle (upper-right corner). The F3 key is used to autoconfigure RAID-10 for Thunder, and accepting all default values. The drive list on the bottom rectangle is entirely inaccurate for ThunderLT s drives; however, the array#0 membership and status are important during troubleshooting.
6 4. The actual controller card. AMI s diagram of the hardware is as follows: Note the position of the two IDE connectors the Primary bus is towards the rear and the Secondary bus is towards the front. This will be useful to know when removing or replacing a failed drive.
7 5. The MegaRAID Console looks like this: The console utility can be accessed by double-clicking or right-clicking the Tray icon. In the screenshot, the Physical Drives display is selected. When troubleshooting broken arrays that appear intact (green tray icon), this display can be useful for determining which drive actually has problems, if not otherwise obvious.