Page 2
This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or damage arising from the use of any information in this document, or any error or omission in such information, or any incorrect use of the product.
The Cortex-R4F processor is a Cortex-R4 processor that includes the optional Floating Point Unit (FPU) extension. • In this book, references to the Cortex-R4 processor also apply to the Cortex-R4F processor, unless the context makes it clear that this is not the case. Product revision status The rnpn identifier indicates the revision status of the product described in this book, where: Identifies the major revision of the product.
, for embedded software development resources including the http://onarm.com Cortex Microcontroller Software Interface Standard (CMSIS). ARM publications This book contains information that is specific to the Cortex-R4 processor. See the following documents for other relevant information: • AMBA AXI Protocol Specification (ARM IHI 0022) •...
Preface Feedback ARM welcomes feedback on this product and its documentation. Feedback on this product If you have any comments or suggestions about this product, contact your supplier and give: • The product name. • The product revision or version.
RAM, in addition to caches for higher performance to general memory. Error Checking and Correction (ECC) is used on the Cortex-R4 processor ports and in Level 1 (L1) memories to provide improved reliability and address safety-critical applications.
The Cortex-R4 processor implements the ETM v3.3 architecture profile. See the CoreSight ETM-R4 Technical Reference Manual. 1.2.3 Advanced Microcontroller Bus Architecture This Cortex-R4 processor complies with the AMBA 3 protocol. See AMBA AXI Protocol Specification and AMBA 3 APB Protocol Specification. 1.2.4 Debug architecture The Cortex-R4 processor implements the ARMv7 Debug architecture that includes support for CoreSight.
• Low interrupt latency. • Non-maskable interrupt. • Optional Floating Point Unit (FPU). The Cortex-R4F processor is a Cortex-R4 processor that includes the FPU. • A Harvard L1 memory system with: — optional Tightly-Coupled Memory (TCM) interfaces with support for error correction or parity checking memories —...
Page 21
DCCMOUT[7:0] and DCCMOUT2[7:0] enable the comparators to communicate with the rest of the SoC. Contact your system integrator for more information about these signals. ARM provides example comparison logic, but you can change this during implementation. If you are implementing a redundant processor configuration, contact ARM for more information.
Introduction Product documentation, architecture and design flow This section describes the Cortex-R4 processor books, how they relate to the design flow, and the relevant architectural standards and protocols. It contains the following sections: • Documentation • Design flow on page 1-12.
Page 24
Introduction 1.7.2 Design flow The Cortex-R4 processor is delivered as synthesizable RTL. Before it can be used in a product, it must go through the following processes: Implementation The implementer configures and synthesizes the RTL to produce a hard macrocell. This might include integrating RAMs into the design.
It decodes and executes instructions, operating on data held in the registers in accordance with the ARM architecture. Instructions are fed to the DPU from the PFU through a buffer. The DPU performs instructions that require data to be transferred to or from the memory system by interfacing to the LSU.
Functional Description VIC port The core has a dedicated port that enables an external interrupt controller, such as the ARM PrimeCell Vectored Interrupt Controller (VIC), to supply a vector address along with an Interrupt Request (IRQ) signal. This provides faster interrupt entry, but you can disable it for compatibility with earlier interrupt controllers.
2.2.7 APB Debug interface AMBA APBv3 is used for debugging purposes. CoreSight is the ARM architecture for multi-processor trace and debug. CoreSight defines what debug and trace components are required and how they are connected. See the CoreSight Architecture Specification for more information.
CLKIN nRESET nSYSPORESET Figure 2-2 Power-on reset ARM recommends that you assert the nRESET signal for at least four CLKIN cycles to ensure correct reset behavior. It is not necessary to assert PRESETDBGn on power-up. Processor reset A processor or warm reset initializes the majority of the processor, excluding the CoreSight logic.
Page 38
All clocks can be stopped indefinitely without loss of state. Three additional clock inputs, CLKIN2, DUALCLKIN, and DUALCLKIN2, are related to the dual-redundant core functionality, if included. If you are integrating a Cortex-R4 macrocell with dual-redundant core, contact the implementer of that macrocell for information about how to connect the clock inputs.
PC-relative load instruction. The Current Program Status Register (CPSR) is given a known value on reset. See the ARM Architecture Reference Manual for more information. The reset values for the CP15 registers...
Page 41
The processor does not initialize the TCM RAMs. It is not essential to initialize all the memory attached to the TCM interface but ARM recommends that you do. In addition, the main application might require you to preload instructions or data into the TCM. This section describes various ways that you can perform data preloading.
Reference Manual. This chapter describes some of the main features of the architecture but, for a complete description, see the ARM Architecture Reference Manual. This chapter also makes reference to older versions of the ARM architecture that the processor does not implement. These references are included to contrast the behavior of the Cortex-R4 processor with other processors you might have used that implement an older version of the architecture.
Operating modes In each state there are seven modes of operation: • User (USR) mode is the usual mode for the execution of ARM or Thumb programs. It is used for executing most application programs. • Fast interrupt (FIQ) mode is entered on taking a fast interrupt.
3.5.1 The register set In the processor the same register set is used in both the ARM and Thumb states. Sixteen general registers and one or two status registers are accessible at any time. In Privileged modes, alternative mode-specific banked registers become available.
Page 50
Programmers Model For more information, see the ARM Architecture Reference Manual. In Privileged modes, another register, the Saved Program Status Register (SPSR), is accessible. This contains the condition code flags, status bits, and current mode bits saved as a result of the exception that caused entry to the current mode.
In ARM state, most instructions can execute conditionally on the state of the N, Z, C, and V bits. The exceptions are: •...
Page 53
In Thumb state, the processor can only execute the Branch instruction conditionally. Other instructions can be made conditional by placing them in the If-Then (IT) block. For more information about conditional execution in Thumb state, see the ARM Architecture Reference Manual.
Page 54
Programmers Model For more information on the operation of the IT execution state bits, see the ARM Architecture Reference Manual. 3.6.4 The J bit The J bit in the CPSR returns 0 when read. Note You cannot use an to change the J bit in the CPSR.
Page 55
3.6.7 The E bit ARM and Thumb instructions are provided to set and clear the E bit. The E bit controls load/store endianness. See the ARM Architecture Reference Manual for information on where the E bit is used. 3.6.8 The A bit The A bit is set automatically by certain exceptions and is written by privileged software.
3.7.1 Exception entry and exit summary Table 3-4 summarizes the PC value preserved in the relevant R14 on exception entry, and the instruction that ARM recommends for exiting the exception handler. Table 3-4 Exception entry and exit Previous state Exception...
Page 58
For more information about the IT instruction and Undefined instruction, and an example of the exception handler code, see the ARM Architecture Reference Manual.
Page 59
Forces the PC to fetch the next instruction from the reset vector address. Reverts to ARM state or Thumb state depending on the state of the TEINIT pin, and resumes execution. After reset, all register values except the PC and CPSR are indeterminate.
Page 60
LOW until the processor acknowledges the interrupt request from the software handler. Irrespective of whether exception entry is from ARM state or Thumb state, an FIQ handler returns from the interrupt by executing: SUBS PC, R14_fiq, #4 If Non-Maskable Fast Interrupts (NMFIs) are not enabled, you can mask FIQ exceptions by setting the CPSR.F bit to b1.
SoC peripheral that is developed, tested, and licensed by ARM. You can use the VIC port to connect a PL192 VIC to the processor. See the ARM PrimeCell Vectored Interrupt Controller (PL192) Technical Reference Manual for more information about the PL192 VIC.
Page 66
SPSR are correct for the instruction following the SVC. This means that the SVC handler does not have to perform any special action to accommodate the IT instruction. For more information on the IT instruction, see the ARM Architecture Reference Manual.
It can generate interrupts when the number of events reaches a given value. For more information on the programmers model of the performance counters see the ARM Architecture Reference Manual. See...
CRn, Opcode_1, CRm, Opcode_2. For more information on using the system control coprocessor and the general method of how to access CP15 registers, see the ARM Architecture Reference Manual. 4.3.1 Register allocation...
Page 87
TCM interfaces, can be considered equivalent to those defined in the ARM architecture. • The ARM architecture requires only the ITCM to be accessible from both instruction and data sides. In the Cortex-R4 processor, both ATCM and BTCM are accessible from both instruction and data sides. 4.3.5 c0, MPU Type Register...
Page 96
• Invalidate instruction cache by address • Invalidate all instruction cache entries. a. Both of these operations are NOP on Cortex-R4. To access the ID_MMFR3 read CP15 with: MRC p15, 0, <Rd>, c0, c1, 7 ; Read ID_MMFR3. 4.3.11 Instruction Set Attributes Registers The processor has eight Instruction Set Attributes Registers, ISAR0 to ISAR7, but three of these are unused.
Page 99
, and T bit in PSRs • , and PC loads have BX behavior • data-processing instructions in the ARM instruction set with the PC as the destination and the S bit clear have BX-like behavior. [23:20] Immediate Indicates support for immediate instructions:...
Page 109
1 = MPU enabled. If no MPU is implemented, this bit is SBZ. To use the SCTLR, ARM recommends that you use a read-modify-write technique. To access the SCTLR, read or write CP15 with: MRC p15, 0, <Rd>, c1, c0, 0 ; Read SCTLR MCR p15, 0, <Rd>, c1, c0, 0 ;...
Page 110
• accessible in Privileged mode only • ARM recommends that any instruction that changes bits [31:28] or [7] is followed by an instruction to ensure that the changes have taken effect before any dependent instructions are executed.
Page 114
System Control • ARM recommends that any instruction that changes bits [20:16] is followed by an instruction to ensure that the changes have taken effect before any dependent instructions are executed. Configurations Available in all processor configurations. Attributes Table 4-25.
This processor does not contain an address-based branch predictor array. Invalidate and clean operations The terms that describe the invalidate, clean, and prefetch operations are defined in the ARM Architecture Reference Manual. You can perform invalidate and clean operations on: •...
Page 131
To access the Data Synchronization Barrier operation, write CP15 with: MCR p15, 0, <Rd>, c7, c10, 4 ; Data Synchronization Barrier operation For more information about memory barriers, see the ARM Architecture Reference Manual. Data Memory Barrier operation The purpose of the Data Memory Barrier operation is to ensure that all outstanding explicit memory transactions complete before any following explicit memory transactions begin.
Page 152
The PFU fetches instructions from the memory system under the control of the DPU, and the internal coprocessors CP14 and CP15. In ARM state the memory system can supply up to two instructions per cycle. In Thumb state the memory system can supply up to four instructions per cycle.
Page 155
The return stack consists of a 4-entry circular buffer. When the PFU detects a taken procedure call instruction, the PFU pushes the return address onto the return stack. The instructions that the PFU recognizes as procedure calls are, in both the ARM and Thumb instruction sets: •...
Prefetch Unit Controlling instruction prefetch and program flow prediction In the Cortex-R4 processor, the Z-bit, bit [11] of the SCTLR, does not control the program flow prediction. The Z-bit is read-as-one, writes-ignored and instead a number of control bits in the ACTLR control the program flow and prefetch features.
Page 173
This is the only mechanism that signals this interrupt to the processor. Note ARM expects that the Performance Monitor interrupt request signal, nPMUIRQ, connects to a system interrupt controller. 6.3.12...
The processor also has two event input pins, ETMEXTOUT[1:0]. This bus is normally intended for connection to the ETM, and enables the Cortex-R4 performance monitor to count events generated by the ETM. These inputs can alternatively be used for composite events generated external to the processor.
About the MPU The MPU works with the L1 memory system to control accesses to and from L1 and external memory. For a full architectural description of the MPU, see the ARM Architecture Reference Manual. The MPU enables you to partition memory into regions and set individual protection attributes for each region.
Page 179
To ensure correct operation, only a memory region that has permission for data read access can execute instructions. For more information, see the ARM Architecture Reference Manual. For information about how to program access permissions, see Table 4-36 on page 4-56.
Memory Protection Unit Memory types The ARM architecture defines a set of memory types with characteristics that are suited to particular devices. There are three mutually exclusive memory type attributes: • Strongly-ordered • Device • Normal. MPU memory regions can each be assigned a memory type attribute.
Note In earlier versions of the ARM architecture, the TEX, C, and B bits were known as the Type Extension, Cacheable and Bufferable bits. These names no longer adequately describe the function of the B, C, and TEX bits.
, can be used with the appropriate memory monitoring LDREX STREX to provide inter-process or inter-processor synchronization and semaphores. See the ARM Architecture Reference Manual for more information. The internal monitor can handle some exclusive monitoring internally to the processor, see Internal exclusive monitor on page 8-34 more information.
In silicon devices, stray radiation and other effects can cause the data stored in a RAM to be corrupted. The TCMs and caches on the Cortex-R4 processor can be configured to detect and correct errors that can occur in the RAMs. Extra, redundant data is computed by the processor and stored in the RAMs alongside the real data.
Page 191
The processor contains features that enable it to recover from some hard errors. If you are implementing the processor and require these features, contact ARM to discuss the features and your requirements.
Page 195
Note When a prefetch abort has occurred, ARM recommends that you do not use the link register value for determining the aborting address, because 32-bit Thumb instructions do not have to be word aligned and can cause an abort on either halfword.
Page 203
TCMs. Although the Cortex-R4 processor can use semaphores in the TCMs for inter-process synchronization, you must not use the AXI slave interface to write to TCM semaphores. The processor has no logic to preserve its own exclusivity against such writes.
See Cache error detection and correction on page 8-20 for more information. For more information on the general rules about memory attributes and behavior, see the ARM Architecture Reference Manual. 8.5.1 Store buffer The cache controller includes a store buffer to hold data before it is written to the cache RAMs or passed to the AXI master interface.
Page 207
Level One Memory System detection schemes. The processor includes features that enable it to detect some address decoder faults. If you are implementing the processor and require these features, contact ARM to discuss the features and your requirements. Handling cache parity errors...
Page 218
ACTLR must only be changed when both caches are turned off and the entire cache must be invalidated after the change. ARM recommends the following code sequence to perform the change: MRC p15, 0, r0, c1, c0, 0 ; Read SCTLR BIC r0, r0, #0x1 <<...
The internal exclusive monitor holds exclusivity state for the Cortex-R4 processor only. It does not record the address of the memory that a load-exclusive access was performed to. Any store exclusive access performed when the state is open fails. If the state is exclusive, the access passes if it is to non-shared memory but, if it is to shared memory, the access must be performed as an exclusive using the L2 memory interface.
In this section, AXI slave describes the AXI slave in the external system that is connected to the Cortex-R4 AXI master port. This might not be the Cortex-R4 AXI slave port. The following sections describe the attributes of the AXI master interface, and provide information about the types of burst generated: •...
The EVB is removed if cache RAMs are not implemented for the processor. 9.2.5 Memory attributes The Cortex-R4 AXI master interface uses the ARCACHEM, AWCACHEM, ARUSERM, and AWUSERM signals to indicate the memory attributes of the transfer, as returned by the MPU.
This section describes the types of AXI transaction that the Cortex-R4 AXI master does not generate. If you are designing an AXI slave to work only with the Cortex-R4 processor, and there are no other AXI masters in your system, you can take...
Page 231
Level Two Interface 9.3.1 Restrictions on AXI transfers The Cortex-R4 AXI master interface applies the following restrictions to the AXI transactions it generates: • A burst never transfers more than 32 bytes. • The burst length is never more than 8 transfers.
Page 244
9.4.5 AXI slave control By default, both privileged and non-privileged accesses can be made to the Cortex-R4 TCM RAMs through the AXI slave port. To disable non-privileged accesses, you can set bit [1] in the Slave Port Control Register. You can disable all slave accesses by setting bit [0] of the register.
RAMs, is powered down. In dormant mode, the processor state, apart from the cache and TCM state, is stored to memory before entry into this mode, and restored after exit. For more information on how to implement and use dormant mode in your design, contact ARM. 10.2.4...
Page 261
FPU Programmers Model This chapter describes the programmers model of the Floating Point Unit (FPU). The Cortex-R4F processor is a Cortex-R4 processor that includes the optional FPU. In this chapter, the generic term processor means only the Cortex-R4F processor. This chapter contains the following sections: •...
Manual for information on the VFPv3 instruction set. 11.1.1 FPU functionality The FPU is an implementation of the ARM Vector Floating Point v3 architecture, with 16 double-precision registers (VFPv3-D16). It provides floating-point computation functionality that is compliant with the ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic, referred to as the IEEE 754 standard.
FPU Programmers Model 11.2 General-purpose registers The FPU implements a VFP register bank. This bank is distinct from the ARM register bank. You can reference the VFP register bank using two explicitly aliased views. Figure 11-1 shows the two views of the register bank and the way the word and doubleword registers overlap.
Page 266
= r1p0 = r1p1 = r1p2 = r1p3 = r1p4 a. For information about the Common VFP subarchitecture see the ARM Architecture Reference Manual. 11.3.2 Floating-Point Status and Control Register, FPSCR The FPSCR Register characteristics are: Purpose Provides all necessary User level control of the floating-point system.
Page 268
• are accessible in Privileged modes only. • ARM recommends that any software attempting to determine the presence or absence of double-precision floating point hardware support uses the MVFR1 register. Configurations Use this register if the device is configured as a Cortex-R4F processor.
[51] = 1 bits [21:0] are all zeros bits [50:0] are all zeros Processing of input NaNs for ARM floating-point functionality and libraries is defined as follows: • In full-compliance mode, NaNs are handled as described in the ARM Architecture Reference Manual.
Page 273
IEEE 754 standard to generate Underflow exceptions. In flush-to-zero mode, results that are tiny before rounding, as described in the IEEE 754 standard, are flushed to a zero, and the UFC flag, FPSCR[3], is set. See the ARM Architecture Reference Manual for information on flush-to-zero mode.
The debug target is the lowest level of the system. An example of a debug target is a development system with a Cortex-R4 test chip or a silicon part with a Cortex-R4 macrocell. The debug target must implement some system support for the protocol converter to access the processor debug unit using the Advanced Peripheral Bus (APB) slave port.
The state of the processor is preserved in the same manner as all ARM exceptions. The monitor target communicates with the debugger to access processor and coprocessor state, and to access memory contents and peripherals. Monitor debug-mode requires a debug monitor program to interface between the debug hardware and the software debugger.
You can access the processor debug register map using the APB slave port. This is the only way to get full access to the processor debug capability. ARM recommends that if your system requires the processor to access its own debug registers, you choose a system interconnect structure that enables the processor to access the APB slave port by executing load and stores to an appropriate area of physical memory.
Page 282
The Vector Catch Register (DBGVCR) sets breakpoints on exception vectors as instruction addresses. The Watchpoint Fault Address Register (DBGWFAR) reads an address and a processor state dependent offset, +8 for ARM and +4 for Thumb. 12.3.6 Power domains The processor has a single power domain. Therefore, it does not support the Event Catch Register, the OS Lock, or the OS Save and Restore functionality.
Page 283
Debug Locks permission You can lock the APB slave port so that access to some debug registers is restricted. ARM architecture v7 defines two locks: Software lock The external debugger can set this lock to prevent software from modifying the debug registers settings.
Page 294
Function [31:1] Address This is the address of the watchpointed instruction. When a watchpoint occurs in ARM state, the DBGWFAR contains the address of the instruction causing it plus an offset of . When a watchpoint occurs in Thumb state, the offset is plus RAZ.
Page 296
The DBGITR is a write-only register. Reads from the DBGITR return an Unpredictable value. The Instruction Transfer Register, bits [31:0] contain the ARM instruction for the processor to execute while in debug state. The reset value of this register is Unpredictable.
Page 303
Unpredictable. • To watch for a write to any byte in an 8-byte aligned object of size 8 bytes, ARM recommends that a debugger sets DBGWCR[28:24] to b00111, and DBGWCR[12:5] to b11111111. This is compatible with both ARMv7 debug compliant implementations that have an 8-bit DBGWCR[12:5] and with those that have a 4-bit DBGWCR[8:5] byte address select field.
Page 306
Invasive debug enable field features enabled [3:0] Not implemented Non-secure debug features a. Cortex-R4 does not implement the Security Extensions, so all the debug features are considered secure. 12.4.18 Device Power-down and Reset Control Register The DBGPRCR Register characteristics are: Purpose Controls reset and power-down related functionality.
Page 314
4 bits Indicates the manufacturer revision number. This number starts at and increments by the integrated circuit manufacturer on metal fixes. For the Cortex-R4 processor, the initial value is but this value can be changed by the manufacturer. Customer 4 bits Indicates an endorsed modification to the device.
Page 322
• The PC is frozen on entry to debug state. That is, it does not increment on the execution of ARM instructions. However, the processor still updates the PC as a response to instructions that explicitly modify the PC. •...
Page 323
While the processor is in debug state, it always decodes instructions from the DBGITR as per the ARM instruction set, regardless of the value of the T and J bits of the CPSR. The following restrictions apply to instructions executed through the DBGITR while in debug state: •...
This section provides some examples of using the processor debug functionality, both from the point of view of a software engineer writing code to run on an ARM processor and of a developer creating debug tools for the processor. In the former case, examples are given in ARM assembly language.
Page 332
• The mechanism for forcing the processor to execute ARM instructions, when the processor is in debug state. For more information, see Executing instructions in debug state on page 12-49.
Page 334
Programming simple breakpoints and the byte address select When programming a simple breakpoint, you must set the byte address select bits in the control register appropriately. For a breakpoint in ARM state, this is simple. For Thumb state, you must calculate the value based on the address.
Page 338
// Step 1. Read the DBGDSCR to determine the cause of debug entry. state->dscr := ReadDebugRegister(34); // Step 2. Issue a DataSynchronizationBarrier instruction if required; // this is not required by Cortex-R4 but is required for ARMv7 // debug. if ((state->dscr & (1<<19)) == 0) ExecuteARMInstruction(0xEE070F9A) // Step 3.
Page 339
// set the T bit to Thumb state state->pc := state->pc - 4; elseif (state->cpsr & (1<<24)) // Set the J bit to Jazelle state. Note: ARM Cortex-R4 does not support // Jazelle state but ARMv7 debug does. state->pc := state->pc - IMPLEMENTATION DEFINED value;...
Page 340
// Step 2. Read the register value through DBGDTRTX. reg_val := ReadDCC(); return reg_val; Example 12-14 shows a similar sequence for writing an ARM register. Example 12-14 Writing an ARM register WriteRegister(int Rd, uint32 reg_val) // Step 1. Write the register value to DBGDTRRX.
Page 346
Accessing coprocessor registers The sequence for accessing coprocessor registers is the same for the PC and CPSR. That is, you must first execute an instruction to transfer the register to an ARM register, then read the value back through the DTR.
This section describes the behavior and use of the Integration Test Registers that are in the processor. It also describes the Integration Mode Control Register that controls the use of the Integration Test Registers. For more information about the DBGITCTRL, see the ARM Architecture Reference Manual.
Page 354
• ARM strongly recommends that the processor is halted while in debug state, because toggling input and output pins might have an unwanted effect on the operation of the processor. You must not set the DBGITCTRL Register until the processor has halted.
The first instruction must not use the PC as a destination register. • Both instructions must belong to the same instruction set, ARM or Thumb. • There must be no data dependency between the two instructions. That is, the second instruction must not have any source registers that are destination registers of the first instruction.
Page 432
ECC Schemes ECC scheme selection guidelines When deciding to implement a Cortex-R4 processor with an ECC scheme on one or both of the TCM interfaces, give careful consideration between using 32-bit or 64-bit ECC. To calculate or check the ECC code for data, the processor must know the value of all bytes in the data chunk protected by the scheme.
Page 435
12-60. PollDCC() Updated reset value of MVFR1 Table 11-1 on page 11-4. Updated instruction descriptions to comply with the ARM Appendix C Cycle Timings and Interlock Architecture Reference Manual. Behavior. Clarified configuration signal descriptions and added references Table A-2 on page A-4.
Need help?
Do you have a question about the Cortex-R4 and is the answer not in the manual?
Questions and answers