Download Print this page

HP 124708-001 - ProLiant Cluster - 1850 Troubleshooting Manual

System memory troubleshooting best practices for hp proliant servers
Hide thumbs Also See for 124708-001 - ProLiant Cluster - 1850:

Advertisement

System Memory Troubleshooting Best Practices for HP
ProLiant Servers
Accurately troubleshooting system memory issues in ProLiant server configurations is an important process that
can help prevent unnecessary replacement of hardware components. In addition, accurate problem diagnosis
prevents customers from experiencing unnecessary downtime while waiting for hardware that may not need
to be replaced. Following standard troubleshooting guidelines and using them each time a memory issue is
suspected helps to establish this.
HP has developed several methods for troubleshooting memory problems in ProLiant servers.
The purpose of this white paper is to assist HP customers in troubleshooting system memory problems by
successfully isolating the specific DIMMs causing the problem. This helps to prevent nonessential replacement
of unaffected DIMMs or, in some cases, entire banks of memory. In addition, effective troubleshooting can
help determine if a firmware or other software download can resolve a problem without replacing hardware.
This white paper covers the following topics:
Why should I troubleshoot every system memory problem?
How can I tell if a memory problem has occurred?
What tools are available from HP to help identify a failing DIMM?
Troubleshooting using HP Insight Diagnostics Online Edition
Troubleshooting using HP Insight Diagnostics Offline Edition
Troubleshooting flowchart for bootable systems
Troubleshooting flowchart for non-bootable systems
What role does firmware play in solving system memory problems?
Why Buy HP Memory?
Other troubleshooting resources

Advertisement

loading

Summary of Contents for HP 124708-001 - ProLiant Cluster - 1850

  • Page 1 System Memory Troubleshooting Best Practices for HP ProLiant Servers Accurately troubleshooting system memory issues in ProLiant server configurations is an important process that can help prevent unnecessary replacement of hardware components. In addition, accurate problem diagnosis prevents customers from experiencing unnecessary downtime while waiting for hardware that may not need to be replaced.
  • Page 2 Why Should I Troubleshoot Every Memory Problem? Accurate diagnosis of system memory problems in ProLiant servers has many advantages, including: • Prevents unnecessary hardware replacement. Prevents the return of parts that test NFF (No Fault Found). • Prevents server downtime. •...
  • Page 3 banks illuminate, making the task of isolating the failing DIMM more difficult and the chance of replacing functioning banks of memory more likely. Therefore, further troubleshooting is necessary to determine which specific DIMM is failing. Use the LEDs as a tool in identifying that a memory problem may exist, but do not rely solely on the status of the LEDs to determine if hardware should be replaced.
  • Page 4 Figure 1: Overview of the HP System Management Homepage HP Systems Insight Manager (HP SIM) HP Systems Insight Manager monitors the health of the hardware in the system and polls installed hardware for its status every few minutes. Refer to Figure 2 below for an example of events displayed on the System page.
  • Page 5 Figure 2: HP Systems Insight Manager...
  • Page 6 System Logs erver system logs record the status of hardware events, including memory issues. For servers running Microsoft Windows operating systems, either of the following logs can be a valuable resource: Integrated Management Log (IML) • Event Viewer • r servers running Linux operating systems, refer to either of the following: Integrated Management Log (IML) •...
  • Page 7 Microsoft Windows Operating Systems: Using Event Viewer e Event Viewer is a software tool available as part of Microsoft Windows operating systems. It can be accessed by navigating to HP System Tools via the Start menu. Figure 4 shows an example of server even that are logged in this tool.
  • Page 8 Linux Operating Systems: Using /var/log/messages File The Linux system log (/var/log/messages) can be viewed using the cat, more and less commands. The following types of messages may be logged here if a memory problem has occurred: Oct 11 01:51:18 dhcp57-150 hpasmd[12039]: WARNING: hpasmd: Corrected Memory Error threshold exceeded (Slot 3, Memory Module 6) [root@dhcp57-150 ~]# hplog -v ID Severity...
  • Page 9 Once a system memory problem is suspected based on one of the methods outlined above, the first step is to schedule downtime for troubleshooting. HP recommends using HP Insight Diagnostics, available on the HP SmartStart CD, to begin the troubleshooting process. HP Insight Diagnostics is a proactive server management tool that is available in offline and online editions.
  • Page 10: Helpful Links

    Online Vs. Offline Testing Offline is the preferred testing method because it is the most accurate. When testing offline, there is minimal interference from the operating system and address space testing is maximized because a very small Linux kernel is used. Helpful Links: HP Insight Diagnostics User Guide: www.hp.com/servers/diags...
  • Page 11 General Troubleshooting Flowchart for a Bootable System Using HP insight Diagnostics...
  • Page 12 General Troubleshooting Flowchart for a Non-Bootable System Use this flowchart to help diagnose a memory problem in any of the following conditions: • The system stops responding and displays a “parity error” message on the screen during boot. • The system beeps and all of the DIMM LEDs illuminate during boot. The system is unable to boot far enough to run Offline Diagnostics, but no messages are displayed •...
  • Page 13 What Role Does Firmware Play in Troubleshooting Memory Problems? Many product issues that result in hardware replacement, including issues in which memory problems are suspected, are preventable or correctable with a firmware update. HP recommends checking for a firmware update before sending a part back to HP for replacement. Based on the HP ProLiant product return rates, a significant percentage of all returned hardware products were functioning properly and only needed a firmware update.
  • Page 14 For detailed information on the Online ROM Flash process, refer to the Online ROM Flash User Guide at the following URL: http://www.compaq.com/support/files/server/us/webdoc/rom/OnlineROMFlashUserGuide.pdf Smart Components for HP ProLiant servers and storage can be obtained from the following links: Microsoft Windows Operating Systems: http://h18023.www1.hp.com/support/files/server/us/winroms.html...
  • Page 15 For more information on how to deploy firmware updates remotely, refer to the following URL: http://h18004.www1.hp.com/products/servers/management/im/index.html Customer Advisory c00683436, Online ROM Flash Tool Available For Updating Firmware On ProLiant Servers And Options, is available at: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?locale=en_US&objectID=c00683436 Updating Firmware Using the OFFLINE ROM Flash Method The Offline ROM Flash, as its name implies, is performed when the server is taken down for regular maintenance.
  • Page 16 The HP Drive Key Utility can be downloaded from the following URL: http://www.compaq.com/support/files/server/us/locate/8641.html Why Buy HP Memory? The following white paper, Why Buy HP Memory, provides important information for customers who are buying memory for HP servers; addresses important questions from HP customers and explains extensive qualification and testing procedures and memory warranty issues.
  • Page 17 ProLiant Servers With Hot Add Memory Capability May Be Unable To Access Memory Above 4 GB Because PAE Support Is Not Enabled In Microsoft Windows Server 2003: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=PSD_EM04 0802_CW01&jumpid=reg_R1002_USEN First Edition of the HP ProLiant DL380 Generation 5 User Guide Incorrectly States that Advanced ECC Memory Single Fully Buffered DIMM (FBDIMM) Mode is Supported: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?locale=en_US&objectID=c00712235 SYSTEM ROM UPGRADE RECOMMENDED to Ensure Proper Support for all Fully Buffered DIMMs...
  • Page 18 © 2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty.

This manual is also suitable for:

Bl10e - hp proliant - 512 mb ramCl380 - proliant - 256 mb ramDl380 - proliant - g2Proliant cl1850226824-001 - proliant - ml750234664-002 - proliant - ml330t02 ... Show all