Download Print this page

NEC 1000 Series Brochure & Specs page 6

Express5800/1000 series

Advertisement

Mainframe-class RAS Features
RAS Design Philosophy
Realization of a mainframe-class continuous operation through the pursuit of
reliability and availability in a single server construct
Generally, in order to achieve reliability and availability on an
open server, clustering would be implemented. However,
clustering comes with a price tag. To keep costs at a minimum,
the Express5800/1000 series servers were designed to
achieve a high level of reliability and availability, but within a
single server.
The Express5800/1000 series server's powerful RAS features
were developed through the pursuit of dependable server
technology.
Dependable Server Technology
Continuous operations through failures
Redundant components, error prediction and error
correction allows for continuous operation
Minimized spread of failures
Technology to minimize the effects of hardware failures on
the system. Reduction of performance degradation and
Smooth recovery after failures
Ability to replace failed components without
The Dual-Core Intel
(Machine Check Architecture)
The framework for hardware, firmware and OS error handling
The Dual-Core Intel
®
Itanium
®
processor, designed for high-end
enterprise servers, not only excels in performance, but is also
abundant in RAS features. At the core of the processor's RAS
feature set, is the error handling framework, called MCA.
MCA provides a 3 stage error handling mechanism – hardware,
firmware, and operating system. In the first stage, the CPU and
chipset attempt to handle errors through ECC (Error Correcting
Code) and parity protection. If the error can not be handled by
the hardware, it is then passed to the second stage, where the
firmware attempts to resolve the issue. In the third stage, if the
error can not be handled by the first two stages, the operating
system runs recovery procedures based on the error report
and error log that was received. In the event of a critical error,
the system will automatically reset, to significantly reduce the
possibility of a system failure.
6
Clustering
multi-node shutdown
shutting down operations
®
Itanium
®
processor MCA
Continuous operations throughout failures; minimize the
spread of failures; and smooth recovery after failures were
goals set forth which lead to implementation of technologies
such as memory mirroring, increased redundancy of intricate
components, and modularization. Through these technologies
a mainframe level of continuous operation was achieved.
Reliability
Mainflame
Center
No chipset on the center plane
Level
plane
ECC protection of main
data paths Intricate error
Chipset
detectionof the high-
speed interconnects
Clock
Core I/ O
Conventional
open server
PCI card
Level
ECC protection
Memory
SDDC Memory
C PU
Intel
®
Cache Safe
Technology*
L3 cache
3
Power
PC Server
H DD
Level
*1 Available only on the 1320Xf/1160Xf
*2 Available only on the 1320Xf
*3 Intel
®
technology designed to avoid cache based failures
*4 Replacement of failed component without shutting down other partitions.
Application Layer
Operating System
The OS logs the error, and then starts the recovery process
Firmware
Seamlessly handles the error
Hardware
CPU and chipset ECC and parity protection
The Firmware and OS aid in the correction of complex platform errors to restore the system
Error details are logged, and then a report flow is defined for the OS
Detects and corrects a wide range of hardware errors for main data structures
Availability
Serviceability
Partial chipset degradation/
Hot Pluggable
*
4
Dynamic recovery
Duplexed*
1
Hot Pluggable
*
4
16 processor domain
segmentation*
2
Hot Pluggable
Core I/O Relief
*
4
Hot Pluggable
*
4
Memory
Mirroring*
1
N+1 Redundant
Hot Pluggable
*
4
Two independent
power sources
Software RAID
Hot Pluggable
*
4
Hardware RAID

Advertisement

loading