NEC 1000 Series Brochure & Specs page 5

Express5800/1000 series

Hide thumbs

page of 12

/ 12
Bookmarks

VLC Architecture

High-speed / low latency Intra-Cell cache-to-cache data transfer

The Express5800/1000 series server

implements the VLC architecture, which

allows for low latency cache-to-cache

data transfer between multiple CPUs

within a cell.

In a split BUS architecture, for a cache-

to-cache data transfer to take place, the

data must be passed through a chipset.

However, in the VLC architecture,

data within the cache memory can

be accessed directly by one another,

bypassing the chipset. This allows

for lower latency between the cache

memory, which results in faster data

transfers.

Dedicated Cache Coherency Interface (CCI)

High-speed / low latency Inter-Cell cache-to-cache data transfer

Another technology implemented in the Express5800/1000 series

server to improve cache-to-cache data transfer is the Cache

Coherency Interface (CCI). CCI, the inter-Cell counterpart of the

VLC architecture, allows for a lower latency cache-to-cache data

transfer between Cells.

Information containing the location and state of cached data is

required for the CPU to access the specific data stored in cache

memory. By accessing the cache memory according to this

information, the CPU is able to retrieve the desired data.

Two main mechanisms exist for cache-to-cache data transfer

between Cells, directory based and TAG based cache coherency.

The cache information, described above, is stored in external

memory (DIR memory) for the directory based, and within the

chipset for the TAG based mechanisms.

In a directory based system, the requestor CPU will first access the

external memory to confirm the location of the cached data, and

then will access the appropriate cache memory. On the other hand,

in a TAG based system, the requestor CPU broadcasts a request to

all other cache simultaneously via TAG.

Crossbar-less configuration

Improved data transfer latency through direct attached Cell configuration

Within the Express5800/1000 series server lineup, the 1080Rf

has been able to lower the data transfer latency by removing the

crossbar and directly connecting Cell to Cell, and Cell to PCI box.

Very Large Cache (VLC) Architecture

Increased enterprise

CPU

applications

Cache

Memory

performance through

reduced cache memory

access latency

Memory

chipset

Direct CPU-to-CPU transfers

Intel

Itanium

2 processor

(Madison : L3 9MB)

Latency

High-speed

cache-to-cache

L3 of other CPU

CPU

Cache

Memory

Data Size

Dual-Core Intel

Itanium

processor

(Montvale : L3 24MB)

Latency

L3 of other CPU

CPU

Cache

Memory

The benefit of the TAG based mechanism, thus implemented in

the Express5800/1000 series server, is that by accessing the

TAG, unnecessary inquiries to the cache memory are filtered for a

smoother transfer of data. Furthermore, the Express5800/1000

series server includes a dedicated high-speed cache coherency

interface (CCI) which is used to connect the Cells directly to

one another without using a crossbar. This interface is used for

broadcasting and other cache coherency transactions to allow for

even faster cache-to-cache data transfer.

Tag Based Cache Coherency

Request is broadcasted to all CPU

simultaneously

CPU

CPU CPU

chip

set

Memory

TAG

The Express5800/1000 Series server

implements a dedicated connection (CCI)

for snooping

Directory Based Cache Coherency

Access Directory to confirm the location of

the data first, then access the appropriate

cache memory

CPU

CPU CPU

chip

Memory

set

Even with the crossbar-less configuration, virtualization of the Cell

card and I/O box has been retained as not to diminish computing

and I/O resources.

Split BUS Architecture

CPU

Cache

Memory

Data transfer controller

Memory

chipset

FSB

Intel

Itanium

2 processor

(Madison : L3 9MB)

Latency

L3 of other

CPU on

transfers

L3 of

different FSB

other CPU on

same FSB

CPU

Cache

Memory

Dual-Core Intel

Itanium

processor

(Montvale : L3 24MB)

Latency

L3 of other CPU

on same FSB

CPU

Cache

Memory

Data Size

This image does not depict actual numbers

Chipset

chip

CPU

CPU CPU CPU CPU

Directory Based Cache Coherency

chip

CPU

set

Memory

set

Memory

TAG

CPU

Memory

TAG

CPU CPU CPU CPU

DIR

chip

Memory

set

DIR

Higher cache memory

access latency.

CPU

Non-uniform

Cache

cache-to-cache data

Memory

transfer.

Inconsistent

performance.

Overhead from transferring

data through the chipset.

FSB

chipset

FSB

Latency

degradation

(approx 3x)

This area increases

due to the increase in

cache size and

Data Size

higher latency

Higher

L3 of other CPU on

latency

different FSB

(approx 3x)

Cache

Memory

Data Size

Performance

increase with

chip

CPU

set

the A

chipset

TAG

chip

Memory

CPU

set

DIR

CPU requesting the information

CPU storing the newest information

Memory that is storing location regarding

the memory

TAG memory (Manages cache line

information for all of the CPUs loaded on a

CELL card)

DIR Memory (Manages cache line

information for all of the memory loaded on

a CELL card)

This manual is also suitable for:

1080rf 1320xf/1160xf 5800 series Express5800/1080rf Express5800/1160xf Express5800/1320xf

NEC 1000 Series Brochure & Specs page 5

Related Manuals for NEC 1000 Series

Related Products for NEC 1000 Series

This manual is also suitable for: