Page 1
A Detailed Look Inside the ® Intel NetBurst™ Micro-Architecture of ® the Intel Pentium 4 Processor November, 2000...
Page 2
Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.
A Detailed Look Inside the Intel Revision History Revision Date Revision 11/2000 ® ™ NetBurst Micro-Architecture of the Intel Pentium Major Changes Release Page 3 ® 4 Processor...
SIMD TECHNOLOGY AND STREAMING SIMD EXTENSIONS 2 ... 6 Summary of SIMD Technologies ... 7 ® INTEL NETBURST™ MICRO-ARCHITECTURE... 9 The Design Considerations of the Intel NetBurst Micro-architecture... 9 Overview of the Intel NetBurst Micro-architecture Pipeline ...10 The Front End ...10 The Out-of-order Core...11 Retirement ...11 Front End Pipeline Detail...11...
4 processor. It includes several important new features and innovations that will allow the Intel Pentium 4 processor and future IA-32 processors to deliver industry leading performance for the next several years. This paper provides an in-depth examination of the features and functions the Intel NetBurst micro-architecture.
This paper begins with a brief introduction of three generations of single-instruction, multiple-data (SIMD) technology. The rest of this paper describes the principle of operation of the innovations of Intel Pentium 4 processor with respect to the Intel NetBurst micro-architecture and the implementation characteristics of the Pentium 4 processor.
16 packed byte, 8 packed word, 4 doubleword, and 2 quadword integers. The full set of IA-32 SIMD technologies (the Intel MMX technology, the SSE extensions, and the SSE2 extensions) gives the programmer the ability to develop algorithms that can combine operations on packed 64- and 128-bit integer and single and double-precision floating-point operands.
Page 8
A Detailed Look Inside the Intel The SSE instructions are useful for 3D geometry, 3D rendering, speech recognition, video encoding and decoding. For more information on the Streaming SIMD Extensions, refer to the IA-32 Intel Developer’s Manual, Volume 1, available at http://developer.intel.com/design/pentium4/manuals/.
(SIMD) technology at high processing rates; (b) to operate at high clock rates, and to scale to higher performance and clock rates in the future. To accomplish these design goals, the Intel NetBurst micro-architecture has many advanced features and improvements over the Pentium Pro processor micro- architecture.
§ predicts branches using highly advanced algorithm. The front end of the Intel NetBurst micro-architecture is designed to address some of the common problems in high- speed, pipelined microprocessors. Two of these problems contribute to major sources of delays: §...
The retirement section also keeps track of branches and sends updated branch target information to the Branch Target Buffer (BTB) to update branch history. Figure 3 illustrates the paths that are most frequently executing inside the Intel NetBurst micro-arachitecture: an execution loop that interacts with multi-level cache hierarchy and the system bus.
Data Prefetch. The front end of the Intel NetBurst micro-architecture has a single decoder that can decode instructions at the maximum rate of one instruction per clock. Complex instruction must enlist the help of the microcode ROM. The decoder operation is connected to the execution trace cache discussed in the section that follows.
® ™ ® A Detailed Look Inside the Intel NetBurst Micro-Architecture of the Intel Pentium 4 Processor The Static Predictor. Once the branch instruction is decoded, the direction of the branch (forward or backward) is known. If there was no valid entry in the BTB for the branch, the static predictor makes a prediction based on the direction of the branch.
Therefore it is no longer necessary to use a detailed list of exact µop count for IA-32 instructions. Commonly used IA-32 instructions, which consist of four or less µops, are provided in the Intel Processor Optimization Reference Manual to aid instruction selection.
(adder, multiplier, and divider) share a pipeline. The Intel NetBurst micro-architecture can support up to three levels of on-chip cache. Only two levels of on-chip caches are implemented in the Pentium 4 processor, which is a product for the desktop environment. The level nearest to the execution core of the processor, the first level, contains separate caches for instructions and data: a first-level data cache and the trace cache, which is an advanced first-level instruction cache.
A Detailed Look Inside the Intel b) avoiding the need to access off-chip caches, which can increase the realized bandwidth compared to a normal load-miss, which returns data to all cache levels. The situations that are less likely to benefit from software-controlled data prefetch are the following: §...
A Detailed Look Inside the Intel branches are resolved. However, speculative loads cannot cause page faults. Reordering loads with respect to each other can prevent a load miss from stalling later loads. Reordering loads with respect to other loads and stores to different addresses can enable more parallelism, allowing the machine to execute more operations as soon as their inputs are ready.
Need help?
Do you have a question about the NetBurst and is the answer not in the manual?
Questions and answers