Intel XScale Core User manual

Intel XScale® Core
Developer’s Manual
January, 2004
Order Number: 273473-002

2 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Information in this document is provided in connection with Intel®products. No license, express or implied, by estoppel or otherwise, to any intellectual
property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability
whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to
fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not
intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
Intel®internal code names are subject to change.
THIS SPECIFICATION, THE Intel XScale® Core Developer’s Manual IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING
ANY WARRANTY OF MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY
OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.
Intel disclaims all liability, including liability for infringement of any proprietary rights, relating to use of information in this specification. No license,
express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein.
Copyright © Intel Corporation, 2004
AlertVIEW, i960, AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, Commerce Cart, CT Connect, CT Media, Dialogic,
DM3, EtherExpress, ETOX, FlashFile, GatherRound, i386, i486, iCat, iCOMP, Insight960, InstantIP, Intel, Intel logo, Intel386, Intel486, Intel740,
IntelDX2, IntelDX4, IntelSX2, Intel ChatPad, Intel Create&Share, Intel Dot.Station, Intel GigaBlade, Intel InBusiness, Intel Inside, Intel Inside logo, Intel
NetBurst, Intel NetStructure, Intel Play, Intel Play logo, Intel Pocket Concert, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel TeamStation,
Intel WebOutfitter, Intel Xeon, Intel XScale, Itanium, JobAnalyst, LANDesk, LanRover, MCS, MMX, MMX logo, NetPort, NetportExpress, Optimizer
logo, OverDrive, Paragon, PC Dads, PC Parents, Pentium, Pentium II Xeon, Pentium III Xeon, Performance at Your Command, ProShare,
RemoteExpress, Screamline, Shiva, SmartDie, Solutions960, Sound Mark, StorageExpress, The Computer Inside, The Journey Inside, This Way In,
TokenExpress, Trillium, Vivonic, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and
other countries.
The ARM* and ARM Powered logo marks (the ARM marks) are trademarks of ARM, Ltd., and Intel uses these marks under license from ARM, Ltd.
*Other names and brands may be claimed as the property of others.

Developer’s Manual January, 2004 3
Intel XScale® Core Developer’s Manual
Contents
Contents
1 Introduction....................................................................................................................................13
1.1 About This Document .........................................................................................................13
1.1.1 How to Read This Document.................................................................................13
1.1.2 Other Relevant Documents ...................................................................................14
1.2 High-Level Overview of the Intel XScale®Core..................................................................15
1.2.1 ARM Compatibility .................................................................................................15
1.2.2 Features.................................................................................................................16
1.2.2.1 Multiply/Accumulate (MAC)....................................................................16
1.2.2.2 Memory Management............................................................................17
1.2.2.3 Instruction Cache...................................................................................17
1.2.2.4 Branch Target Buffer..............................................................................17
1.2.2.5 Data Cache............................................................................................17
1.2.2.6 Performance Monitoring.........................................................................18
1.2.2.7 Power Management...............................................................................18
1.2.2.8 Debug ....................................................................................................18
1.2.2.9 JTAG......................................................................................................18
1.3 Terminology and Conventions............................................................................................19
1.3.1 Number Representation.........................................................................................19
1.3.2 Terminology and Acronyms...................................................................................19
2 Programming Model ......................................................................................................................21
2.1 ARM Architecture Compatibility..........................................................................................21
2.2 ARM Architecture Implementation Options.........................................................................21
2.2.1 Big Endian versus Little Endian.............................................................................21
2.2.2 26-Bit Architecture .................................................................................................21
2.2.3 Thumb....................................................................................................................21
2.2.4 ARM DSP-Enhanced Instruction Set.....................................................................22
2.2.5 Base Register Update............................................................................................22
2.3 Extensions to ARM Architecture.........................................................................................23
2.3.1 DSP Coprocessor 0 (CP0).....................................................................................23
2.3.1.1 Multiply With Internal Accumulate Format .............................................24
2.3.1.2 Internal Accumulator Access Format.....................................................27
2.3.2 New Page Attributes..............................................................................................29
2.3.3 Additions to CP15 Functionality.............................................................................31
2.3.4 Event Architecture .................................................................................................32
2.3.4.1 Exception Summary...............................................................................32
2.3.4.2 Event Priority..........................................................................................32
2.3.4.3 Prefetch Aborts ......................................................................................33
2.3.4.4 Data Aborts............................................................................................34
2.3.4.5 Events from Preload Instructions...........................................................35
2.3.4.6 Debug Events ........................................................................................36
3 Memory Management....................................................................................................................37
3.1 Overview.............................................................................................................................37
3.2 Architecture Model..............................................................................................................38
3.2.1 Version 4 vs. Version 5..........................................................................................38
3.2.2 Memory Attributes..................................................................................................38
3.2.2.1 Page (P) Attribute Bit.............................................................................38

4 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Contents
3.2.2.2 Cacheable (C), Bufferable (B), and eXtension (X) Bits..........................38
3.2.2.3 Instruction Cache...................................................................................38
3.2.2.4 Data Cache and Write Buffer.................................................................39
3.2.2.5 Details on Data Cache and Write Buffer Behavior.................................40
3.2.2.6 Memory Operation Ordering..................................................................40
3.2.3 Exceptions.............................................................................................................40
3.3 Interaction of the MMU, Instruction Cache, and Data Cache.............................................41
3.4 Control................................................................................................................................42
3.4.1 Invalidate (Flush) Operation ..................................................................................42
3.4.2 Enabling/Disabling.................................................................................................42
3.4.3 Locking Entries......................................................................................................43
3.4.4 Round-Robin Replacement Algorithm...................................................................45
4 Instruction Cache...........................................................................................................................47
4.1 Overview.............................................................................................................................47
4.2 Operation............................................................................................................................48
4.2.1 Operation When Instruction Cache is Enabled......................................................48
4.2.2 Operation When The Instruction Cache Is Disabled..............................................48
4.2.3 Fetch Policy...........................................................................................................49
4.2.4 Round-Robin Replacement Algorithm...................................................................49
4.2.5 Parity Protection ....................................................................................................50
4.2.6 Instruction Fetch Latency.......................................................................................51
4.2.7 Instruction Cache Coherency ................................................................................51
4.3 Instruction Cache Control...................................................................................................52
4.3.1 Instruction Cache State at RESET........................................................................52
4.3.2 Enabling/Disabling.................................................................................................52
4.3.3 Invalidating the Instruction Cache..........................................................................53
4.3.4 Locking Instructions in the Instruction Cache........................................................54
4.3.5 Unlocking Instructions in the Instruction Cache.....................................................55
5 Branch Target Buffer.....................................................................................................................57
5.1 Branch Target Buffer (BTB) Operation...............................................................................57
5.1.1 Reset.....................................................................................................................58
5.1.2 Update Policy.........................................................................................................58
5.2 BTB Control........................................................................................................................59
5.2.1 Disabling/Enabling.................................................................................................59
5.2.2 Invalidation.............................................................................................................59
6 Data Cache....................................................................................................................................61
6.1 Overviews...........................................................................................................................61
6.1.1 Data Cache Overview............................................................................................61
6.1.2 Mini-Data Cache Overview....................................................................................63
6.1.3 Write Buffer and Fill Buffer Overview.....................................................................64
6.2 Data Cache and Mini-Data Cache Operation.....................................................................65
6.2.1 Operation When Caching is Enabled.....................................................................65
6.2.2 Operation When Data Caching is Disabled...........................................................65
6.2.3 Cache Policies.......................................................................................................65
6.2.3.1 Cacheability ...........................................................................................65
6.2.3.2 Read Miss Policy ...................................................................................66
6.2.3.3 Write Miss Policy....................................................................................67
6.2.3.4 Write-Back Versus Write-Through.........................................................67

Developer’s Manual January, 2004 5
Intel XScale® Core Developer’s Manual
Contents
6.2.4 Round-Robin Replacement Algorithm ...................................................................68
6.2.5 Parity Protection ....................................................................................................68
6.2.6 Atomic Accesses ...................................................................................................68
6.3 Data Cache and Mini-Data Cache Control .........................................................................69
6.3.1 Data Memory State After Reset.............................................................................69
6.3.2 Enabling/Disabling.................................................................................................69
6.3.3 Invalidate and Clean Operations ...........................................................................69
6.3.3.1 Global Clean and Invalidate Operation..................................................70
6.4 Re-configuring the Data Cache as Data RAM....................................................................71
6.5 Write Buffer/Fill Buffer Operation and Control ....................................................................75
7 Configuration .................................................................................................................................77
7.1 Overview.............................................................................................................................77
7.2 CP15 Registers...................................................................................................................80
7.2.1 Register 0: ID & Cache Type Registers.................................................................81
7.2.2 Register 1: Control & Auxiliary Control Registers..................................................83
7.2.3 Register 2: Translation Table Base Register.........................................................85
7.2.4 Register 3: Domain Access Control Register.........................................................85
7.2.5 Register 4: Reserved.............................................................................................85
7.2.6 Register 5: Fault Status Register...........................................................................86
7.2.7 Register 6: Fault address Register ........................................................................86
7.2.8 Register 7: Cache Functions .................................................................................87
7.2.9 Register 8: TLB Operations ...................................................................................89
7.2.10 Register 9: Cache Lock Down ...............................................................................90
7.2.11 Register 10: TLB Lock Down.................................................................................91
7.2.12 Register 11-12: Reserved......................................................................................91
7.2.13 Register 13: Process ID.........................................................................................91
7.2.13.1 The PID Register Affect On Addresses .................................................92
7.2.14 Register 14: Breakpoint Registers.........................................................................93
7.2.15 Register 15: Coprocessor Access Register...........................................................94
7.3 CP14 Registers...................................................................................................................96
7.3.1 Performance Monitoring Registers ........................................................................96
7.3.1.1 XSC1 Performance Monitoring Registers..............................................96
7.3.1.2 XSC2 Performance Monitoring Registers..............................................97
7.3.2 Clock and Power Management Registers..............................................................98
7.3.3 Software Debug Registers.....................................................................................99
8 Performance Monitoring..............................................................................................................101
8.1 Overview...........................................................................................................................101
8.2 XSC1 Register Description (2 counter variant).................................................................102
8.2.1 Clock Counter (CCNT; CP14 - Register 1)..........................................................102
8.2.2 Performance Count Registers (PMN0 - PMN1;
CP14 - Register 2 and 3, Respectively)...............................................................103
8.2.3 Extending Count Duration Beyond 32 Bits ..........................................................103
8.2.4 Performance Monitor Control Register (PMNC) ..................................................103
8.2.4.1 Managing PMNC..................................................................................105
8.3 XSC2 Register Description (4 counter variant).................................................................106
8.3.1 Clock Counter (CCNT).........................................................................................106
8.3.2 Performance Count Registers (PMN0 - PMN3)...................................................107
8.3.3 Performance Monitor Control Register (PMNC) ..................................................108
8.3.4 Interrupt Enable Register (INTEN).......................................................................109

6 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Contents
8.3.5 Overflow Flag Status Register (FLAG)................................................................110
8.3.6 Event Select Register (EVTSEL).........................................................................111
8.3.7 Managing the Performance Monitor ....................................................................112
8.4 Performance Monitoring Events .......................................................................................113
8.4.1 Instruction Cache Efficiency Mode......................................................................115
8.4.2 Data Cache Efficiency Mode ...............................................................................115
8.4.3 Instruction Fetch Latency Mode...........................................................................115
8.4.4 Data/Bus Request Buffer Full Mode....................................................................116
8.4.5 Stall/Writeback Statistics .....................................................................................116
8.4.6 Instruction TLB Efficiency Mode..........................................................................117
8.4.7 Data TLB Efficiency Mode...................................................................................117
8.5 Multiple Performance Monitoring Run Statistics...............................................................118
8.6 Examples..........................................................................................................................119
8.6.1 XSC1 Example (2 counter variant)......................................................................119
8.6.2 XSC2 Example (4 counter variant)......................................................................120
9 Software Debug...........................................................................................................................121
9.1 Definitions.........................................................................................................................121
9.2 Debug Registers...............................................................................................................121
9.3 Introduction.......................................................................................................................122
9.3.1 Halt Mode............................................................................................................122
9.3.2 Monitor Mode.......................................................................................................122
9.4 Debug Control and Status Register (DCSR) ....................................................................123
9.4.1 Global Enable Bit (GE) ........................................................................................124
9.4.2 Halt Mode Bit (H).................................................................................................124
9.4.3 SOC Break (B).....................................................................................................124
9.4.4 Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR) ..........................................................125
9.4.5 Sticky Abort Bit (SA)............................................................................................125
9.4.6 Method of Entry Bits (MOE).................................................................................125
9.4.7 Trace Buffer Mode Bit (M) ...................................................................................125
9.4.8 Trace Buffer Enable Bit (E)..................................................................................125
9.5 Debug Exceptions.............................................................................................................126
9.5.1 Halt Mode............................................................................................................127
9.5.2 Monitor Mode.......................................................................................................129
9.6 HW Breakpoint Resources ...............................................................................................130
9.6.1 Instruction Breakpoints........................................................................................130
9.6.2 Data Breakpoints.................................................................................................131
9.7 Software Breakpoints........................................................................................................133
9.8 Transmit/Receive Control Register (TXRXCTRL) ............................................................134
9.8.1 RX Register Ready Bit (RR)................................................................................135
9.8.2 Overflow Flag (OV)..............................................................................................136
9.8.3 Download Flag (D)...............................................................................................136
9.8.4 TX Register Ready Bit (TR).................................................................................137
9.8.5 Conditional Execution Using TXRXCTRL............................................................137
9.9 Transmit Register (TX).....................................................................................................138
9.10 Receive Register (RX)......................................................................................................138
9.11 Debug JTAG Access ........................................................................................................139
9.11.1 SELDCSR JTAG Register...................................................................................139
9.11.1.1 hold_reset............................................................................................140
9.11.1.2 ext_dbg_break.....................................................................................140

Developer’s Manual January, 2004 7
Intel XScale® Core Developer’s Manual
Contents
9.11.1.3 DCSR (DBG_SR[34:3])........................................................................140
9.11.2 DBGTX JTAG Register........................................................................................141
9.11.2.1 DBG_SR[0] ..........................................................................................141
9.11.2.2 TX (DBG_SR[34:3]) .............................................................................141
9.11.3 DBGRX JTAG Register .......................................................................................142
9.11.3.1 RX Write Logic.....................................................................................143
9.11.3.2 DBG_SR[0] ..........................................................................................143
9.11.3.3 flush_rr.................................................................................................143
9.11.3.4 hs_download........................................................................................143
9.11.3.5 RX (DBG_SR[34:3]).............................................................................143
9.11.3.6 rx_valid.................................................................................................144
9.12 Trace Buffer......................................................................................................................145
9.12.1 Trace Buffer Registers.........................................................................................145
9.12.1.1 Checkpoint Registers...........................................................................146
9.12.1.2 Trace Buffer Register (TBREG)...........................................................147
9.13 Trace Buffer Entries..........................................................................................................148
9.13.1 Message Byte......................................................................................................148
9.13.1.1 Exception Message Byte .....................................................................149
9.13.1.2 Non-exception Message Byte..............................................................150
9.13.1.3 Address Bytes......................................................................................151
9.13.2 Trace Buffer Usage..............................................................................................152
9.14 Downloading Code in the Instruction Cache.....................................................................154
9.14.1 Mini Instruction Cache Overview.........................................................................154
9.14.2 LDIC JTAG Command.........................................................................................155
9.14.3 LDIC JTAG Data Register ...................................................................................155
9.14.4 LDIC Cache Functions.........................................................................................156
9.14.5 Loading Instruction Cache During Reset.............................................................158
9.14.6 Dynamically Loading Instruction Cache After Reset............................................160
9.14.6.1 Dynamic Download Synchronization Code..........................................162
10 Performance Considerations.......................................................................................................163
10.1 Interrupt Latency...............................................................................................................163
10.2 Branch Prediction .............................................................................................................164
10.3 Addressing Modes............................................................................................................164
10.4 Instruction Latencies.........................................................................................................165
10.4.1 Performance Terms.............................................................................................165
10.4.2 Branch Instruction Timings ..................................................................................167
10.4.3 Data Processing Instruction Timings ...................................................................167
10.4.4 Multiply Instruction Timings..................................................................................168
10.4.5 Saturated Arithmetic Instructions.........................................................................170
10.4.6 Status Register Access Instructions ....................................................................170
10.4.7 Load/Store Instructions........................................................................................171
10.4.8 Semaphore Instructions.......................................................................................171
10.4.9 Coprocessor Instructions.....................................................................................172
10.4.10 Miscellaneous Instruction Timing.........................................................................172
10.4.11 Thumb Instructions ..............................................................................................173
A Optimization Guide......................................................................................................................175
A.1 Introduction.......................................................................................................................175
A.1.1 About This Guide.................................................................................................175
A.2 The Intel XScale®Core Pipeline.......................................................................................176

8 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Contents
A.2.1 General Pipeline Characteristics .........................................................................176
A.2.1.1. Number of Pipeline Stages..................................................................176
A.2.1.2. The Intel XScale®Core Pipeline Organization ....................................177
A.2.1.3. Out Of Order Completion.....................................................................178
A.2.1.4. Register Scoreboarding.......................................................................178
A.2.1.5. Use of Bypassing.................................................................................178
A.2.2 Instruction Flow Through the Pipeline.................................................................179
A.2.2.1. ARM* V5TE Instruction Execution.......................................................179
A.2.2.2. Pipeline Stalls ......................................................................................179
A.2.3 Main Execution Pipeline......................................................................................180
A.2.3.1. F1 / F2 (Instruction Fetch) Pipestages.................................................180
A.2.3.2. ID (Instruction Decode) Pipestage.......................................................180
A.2.3.3. RF (Register File / Shifter) Pipestage..................................................181
A.2.3.4. X1 (Execute) Pipestages .....................................................................181
A.2.3.5. X2 (Execute 2) Pipestage....................................................................181
A.2.3.6. WB (write-back) ...................................................................................181
A.2.4 Memory Pipeline..................................................................................................182
A.2.4.1. D1 and D2 Pipestage...........................................................................182
A.2.5 Multiply/Multiply Accumulate (MAC) Pipeline ......................................................182
A.2.5.1. Behavioral Description.........................................................................182
A.3 Basic Optimizations..........................................................................................................183
A.3.1 Conditional Instructions .......................................................................................183
A.3.1.1. Optimizing Condition Checks...............................................................183
A.3.1.2. Optimizing Branches............................................................................184
A.3.1.3. Optimizing Complex Expressions........................................................186
A.3.2 Bit Field Manipulation ..........................................................................................187
A.3.3 Optimizing the Use of Immediate Values.............................................................188
A.3.4 Optimizing Integer Multiply and Divide ................................................................189
A.3.5 Effective Use of Addressing Modes.....................................................................190
A.4 Cache and Prefetch Optimizations...................................................................................191
A.4.1 Instruction Cache.................................................................................................191
A.4.1.1. Cache Miss Cost..................................................................................191
A.4.1.2. Round-Robin Replacement Cache Policy............................................191
A.4.1.3. Code Placement to Reduce Cache Misses .........................................191
A.4.1.4. Locking Code into the Instruction Cache.............................................192
A.4.2 Data and Mini Cache...........................................................................................193
A.4.2.1. Non Cacheable Regions......................................................................193
A.4.2.2. Write-through and Write-back Cached Memory Regions ....................193
A.4.2.3. Read Allocate and Read-write Allocate Memory Regions...................194
A.4.2.4. Creating On-chip RAM.........................................................................194
A.4.2.5. Mini-data Cache...................................................................................195
A.4.2.6. Data Alignment ....................................................................................196
A.4.2.7. Literal Pools.........................................................................................197
A.4.3 Cache Considerations .........................................................................................198
A.4.3.1. Cache Conflicts, Pollution and Pressure..............................................198
A.4.3.2. Memory Page Thrashing......................................................................198
A.4.4 Prefetch Considerations ......................................................................................199
A.4.4.1. Prefetch Distances...............................................................................199
A.4.4.2. Prefetch Loop Scheduling....................................................................199
A.4.4.3. Prefetch Loop Limitations ....................................................................199
A.4.4.4. Compute vs. Data Bus Bound..............................................................199
A.4.4.5. Low Number of Iterations.....................................................................200

Developer’s Manual January, 2004 9
Intel XScale® Core Developer’s Manual
Contents
A.4.4.6. Bandwidth Limitations..........................................................................200
A.4.4.7. Cache Memory Considerations............................................................201
A.4.4.8. Cache Blocking....................................................................................203
A.4.4.9. Prefetch Unrolling ................................................................................203
A.4.4.10. Pointer Prefetch...................................................................................204
A.4.4.11. Loop Interchange.................................................................................205
A.4.4.12. Loop Fusion .........................................................................................205
A.4.4.13. Prefetch to Reduce Register Pressure ................................................206
A.5 Instruction Scheduling ......................................................................................................207
A.5.1 Scheduling Loads ................................................................................................207
A.5.1.1. Scheduling Load and Store Double (LDRD/STRD) .............................210
A.5.1.2. Scheduling Load and Store Multiple (LDM/STM).................................211
A.5.2 Scheduling Data Processing Instructions ............................................................212
A.5.3 Scheduling Multiply Instructions ..........................................................................213
A.5.4 Scheduling SWP and SWPB Instructions............................................................214
A.5.5 Scheduling the MRA and MAR Instructions (MRRC/MCRR)...............................215
A.5.6 Scheduling the MIA and MIAPH Instructions.......................................................216
A.5.7 Scheduling MRS and MSR Instructions...............................................................217
A.5.8 Scheduling CP15 Coprocessor Instructions ........................................................217
A.6 Optimizing C Libraries ......................................................................................................218
A.7 Optimizations for Size.......................................................................................................218
A.7.1 Space/Performance Trade Off.............................................................................218
A.7.1.1. Multiple Word Load and Store .............................................................218
A.7.1.2. Use of Conditional Instructions ............................................................218
A.7.1.3. Use of PLD Instructions .......................................................................218
B Test Features ..............................................................................................................................219
B.1 Overview...........................................................................................................................219

10 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Contents
Figures
1-1 Architecture Features .................................................................................................................16
3-1 Example of Locked Entries in TLB .............................................................................................45
4-1 Instruction Cache Organization..................................................................................................47
4-2 Locked Line Effect on Round Robin Replacement.....................................................................54
5-1 BTB Entry...................................................................................................................................57
5-2 Branch History............................................................................................................................58
6-1 Data Cache Organization ...........................................................................................................62
6-2 Mini-Data Cache Organization ...................................................................................................63
6-3 Locked Line Effect on Round Robin Replacement.....................................................................74
9-1 SELDCSR.................................................................................................................................139
9-2 DBGTX .....................................................................................................................................141
9-3 DBGRX.....................................................................................................................................142
9-4 Message Byte Formats.............................................................................................................148
9-5 Indirect Branch Entry Address Byte Organization ....................................................................151
9-6 High Level View of Trace Buffer...............................................................................................152
9-7 LDIC JTAG Data Register Hardware........................................................................................155
9-8 Format of LDIC Cache Functions.............................................................................................157
9-9 Code Download During a Cold Reset For Debug.....................................................................158
9-10Downloading Code in IC During Program Execution................................................................160
A-1 The Intel XScale®Core RISC Superpipeline............................................................................177

Developer’s Manual January, 2004 11
Intel XScale® Core Developer’s Manual
Contents
Tables
2-1 Multiply with Internal Accumulate Format...................................................................................24
2-2 MIA{<cond>} acc0, Rm, Rs ........................................................................................................25
2-3 MIAPH{<cond>} acc0, Rm, Rs ...................................................................................................25
2-4 MIAxy{<cond>} acc0, Rm, Rs.....................................................................................................26
2-5 Internal Accumulator Access Format..........................................................................................27
2-6 MAR{<cond>} acc0, RdLo, RdHi................................................................................................28
2-7 MRA{<cond>} RdLo, RdHi, acc0................................................................................................28
2-9 Second-level Descriptors for Coarse Page Table.......................................................................30
2-10Second-level Descriptors for Fine Page Table...........................................................................30
2-8 First-level Descriptors.................................................................................................................30
2-11Exception Summary....................................................................................................................32
2-12Event Priority ..............................................................................................................................32
2-13Encoding of Fault Status for Prefetch Aborts..............................................................................33
2-14Encoding of Fault Status for Data Aborts ...................................................................................34
3-1 Data Cache and Buffer Behavior when X = 0.............................................................................39
3-2 Data Cache and Buffer Behavior when X = 1.............................................................................39
3-3 Memory Operations that Impose a Fence ..................................................................................40
3-4 Valid MMU & Data/mini-data Cache Combinations....................................................................41
7-1 MRC/MCR Format......................................................................................................................78
7-2 LDC/STC Format when Accessing CP14...................................................................................79
7-3 CP15 Registers...........................................................................................................................80
7-4 ID Register..................................................................................................................................81
7-5 Cache Type Register..................................................................................................................82
7-6 ARM* Control Register ...............................................................................................................83
7-7 Auxiliary Control Register...........................................................................................................84
7-8 Translation Table Base Register ................................................................................................85
7-9 Domain Access Control Register................................................................................................85
7-10Fault Status Register..................................................................................................................86
7-11Fault Address Register...............................................................................................................86
7-12Cache Functions.........................................................................................................................87
7-13TLB Functions.............................................................................................................................89
7-14Cache Lockdown Functions........................................................................................................90
7-15Data Cache Lock Register..........................................................................................................90
7-16TLB Lockdown Functions ...........................................................................................................91
7-17Accessing Process ID.................................................................................................................91
7-18Process ID Register....................................................................................................................91
7-19Accessing the Debug Registers..................................................................................................93
7-20Coprocessor Access Register ....................................................................................................95
7-21Accessing the XSC1 Performance Monitoring Registers ...........................................................96
7-22Accessing the XSC2 Performance Monitoring Registers ...........................................................97
7-23PWRMODE Register..................................................................................................................98
7-24Clock and Power Management...................................................................................................98
7-25CCLKCFG Register....................................................................................................................98
7-26Accessing the Debug Registers..................................................................................................99
8-1 XSC1 Performance Monitoring Registers.................................................................................102
8-2 Clock Count Register (CCNT) ..................................................................................................102
8-3 Performance Monitor Count Register (PMN0 and PMN1)........................................................103
8-4 Performance Monitor Control Register (CP14, register 0)........................................................104
8-5 Performance Monitoring Registers...........................................................................................106

12 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Contents
8-6 Clock Count Register (CCNT) ..................................................................................................106
8-7 Performance Monitor Count Register (PMN0 - PMN3) ............................................................107
8-8 Performance Monitor Control Register.....................................................................................108
8-9 Interrupt Enable Register..........................................................................................................109
8-10Overflow Flag Status Register..................................................................................................110
8-11Event Select Register...............................................................................................................111
8-12Performance Monitoring Events ...............................................................................................113
8-13Some Common Uses of the PMU ............................................................................................114
9-1 Debug Control and Status Register (DCSR)............................................................................123
9-2 Event Priority............................................................................................................................126
9-3 Halt Mode R14_DBG Updating ................................................................................................127
9-4 Monitor Mode R14_DBG Updating...........................................................................................129
9-5 Instruction Breakpoint Address and Control Register (IBCRx).................................................130
9-6 Data Breakpoint Register (DBRx).............................................................................................131
9-7 Data Breakpoint Controls Register (DBCON)...........................................................................131
9-8 TX RX Control Register (TXRXCTRL)......................................................................................134
9-9 Normal RX Handshaking..........................................................................................................135
9-10High-Speed Download Handshaking States ............................................................................135
9-11TX Handshaking.......................................................................................................................137
9-12TXRXCTRL Mnemonic Extensions ..........................................................................................137
9-13TX Register...............................................................................................................................138
9-14RX Register ..............................................................................................................................138
9-15CP 14 Trace Buffer Register Summary....................................................................................145
9-16Checkpoint Register (CHKPTx)................................................................................................146
9-17TBREG Format.........................................................................................................................147
9-18Message Byte Formats.............................................................................................................148
9-19LDIC Cache Functions .............................................................................................................156
9-20Steps For Loading Mini Instruction Cache During Reset..........................................................159
9-21Steps For Dynamically Loading the Mini Instruction Cache .....................................................161
10-1Branch Latency Penalty............................................................................................................164
10-2Latency Example......................................................................................................................166
10-3Branch Instruction Timings (Those predicted by the BTB).......................................................167
10-4Branch Instruction Timings (Those not predicted by the BTB).................................................167
10-5Data Processing Instruction Timings........................................................................................167
10-6Multiply Instruction Timings ......................................................................................................168
10-7Multiply Implicit Accumulate Instruction Timings ......................................................................169
10-8Implicit Accumulator Access Instruction Timings......................................................................169
10-9Saturated Data Processing Instruction Timings .......................................................................170
10-10Status Register Access Instruction Timings............................................................................170
10-11Load and Store Instruction Timings ........................................................................................171
10-12Load and Store Multiple Instruction Timings...........................................................................171
10-13Semaphore Instruction Timings..............................................................................................171
10-14CP15 Register Access Instruction Timings.............................................................................172
10-15CP14 Register Access Instruction Timings.............................................................................172
10-16Exception-Generating Instruction Timings..............................................................................172
10-17Count Leading Zeros Instruction Timings...............................................................................172
A-1 Pipelines and Pipe stages ........................................................................................................177

Developer’s Manual January, 2004 13
Intel XScale® Core Developer’s Manual
Introduction
Introduction 1
1.1 About This Document
This document is the authoritative and definitive reference for the external architecture of the Intel
XScale®core1.
This document describes two variants of the Intel XScale®core that differ only in the performance
monitoring and the size of the JTAG instruction register. Software can detect which variant it is
running on by examining the CoreGen field of Coprocessor 15, ID Register (bits 15:13). (See
Table 7-4, “ID Register” on page 7-81 for more details.) A CoreGen value of 0x1 is referred to as
XSC1 and a value of 0x2 is referred to as XSC2.
Intel Corporation assumes no responsibility for any errors which may appear in this document nor
does it make a commitment to update the information contained herein.
Intel retains the right to make changes to these specifications at any time, without notice. In
particular, descriptions of features, timings, and pin-outs does not imply a commitment to
implement them.
1.1.1 How to Read This Document
It is necessary to be familiar with the ARM Version 5TE Architecture in order to understand some
aspects of this document.
Each chapter in this document focuses on a specific architectural feature of the Intel XScale®core.
•Chapter 2, “Programming Model”
•Chapter 3, “Memory Management”
•Chapter 4, “Instruction Cache”
•Chapter 5, “Branch Target Buffer”
•Chapter 6, “Data Cache”
•Chapter 7, “Configuration”
•Chapter 8, “Performance Monitoring”
•Chapter 9, “Software Debug”
•Chapter 10, “Performance Considerations”
Several appendices are also present:
•Appendix A, “Optimization Guide” covers instruction scheduling techniques.
•Appendix B, “Test Features” describes the JTAG unit.
Note: All the “buzz words” and acronyms found throughout this document are captured in Section 1.3.2,
“Terminology and Acronyms” on page 1-19, located at the end of this chapter.
1. ARM* architecture compliant.

14 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Introduction
1.1.2 Other Relevant Documents
•ARM Architecture Version 5TE Specification Document Number: ARM DDI 0100E
This document describes Version 5TE of the ARM Architecture which includes Thumb ISA
and ARM DSP-Enhanced ISA. (ISBN 0 201 737191)
•StrongARM SA-1100 Microprocessor Developer’s Manual, Intel Order # 278105
•StrongARM SA-110 Microprocessor Technical Reference Manual, Intel Order #278104

Developer’s Manual January, 2004 15
Intel XScale® Core Developer’s Manual
Introduction
1.2 High-Level Overview of the Intel XScale®Core
The Intel XScale®core is an ARM V5TE compliant microprocessor. It has been designed for high
performance and low-power; leading the industry in mW/MIPs. The core is not intended to be
delivered as a stand alone product but as a building block for an ASSP (Application Specific
Standard Product) with embedded markets such as handheld devices, networking, storage, remote
access servers, etc.
The Intel XScale®core incorporates an extensive list of architecture features that allows it to
achieve high performance. This rich feature set allows programmers to select the appropriate
features that obtains the best performance for their application. Many of the architectural features
added to the Intel XScale®core help hide memory latency which often is a serious impediment to
high performance processors. This includes:
•the ability to continue instruction execution even while the data cache is retrieving data from
external memory.
•a write buffer.
•write-back caching.
•various data cache allocation policies which can be configured different for each application.
•and cache locking.
All these features improve the efficiency of the memory bus external to the core.
The Intel XScale®core has been equipped to efficiently handle audio processing through the
support of 16-bit data types and 16-bit operations. These audio coding enhancements center around
multiply and accumulate operations which accelerate many of the audio filter operations.
1.2.1 ARM Compatibility
ARM Version 5 (V5) Architecture added floating point instructions to ARM Version 4. The Intel
XScale®core implements the integer instruction set architecture of ARM V5, but does not provide
hardware support of the floating point instructions.
The Intel XScale®core provides the Thumb instruction set (ARM V5T) and the ARM V5E DSP
extensions.
Backward compatibility with StrongARM* products is maintained for user-mode applications.
Operating systems may require modifications to match the specific hardware features of the Intel
XScale®core and to take advantage of the performance enhancements added.

16 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Introduction
1.2.2 Features
Figure 1-1 shows the major functional blocks of the Intel XScale®core. The following sections
give a brief, high-level overview of these blocks.
1.2.2.1 Multiply/Accumulate (MAC)
The MAC unit supports early termination of multiplies/accumulates in two cycles and can sustain a
throughput of a MAC operation every cycle. Several architectural enhancements were made to the
MAC to support audio coding algorithms, which include a 40-bit accumulator and support for
16-bit packed data.
See Section 2.3, “Extensions to ARM Architecture” on page 2-23 for more details.
Figure 1-1. Architecture Features
Write Buffer
• 8 entries
• Full coalescing
Fill
Buffer
• 4 - 8 entries
Instruction Cache
• 32K or 16K bytes
• 32 ways
• Lockable by line
IMMU
• 32 entry TLB
• Fully associative
• Lockable by entry
DMMU
• 32 entry TLB
• Fully Associative
• Lockable by entry
JTAG
Debug
• Hardware Breakpoints
• Branch History Table
Branch Target
Buffer
• 128 entries
MAC
• Single Cycle
Throughput (16*32)
• 16-bit SIMD
• 40 bit Accumulator
Data Cache
• 32K or 16K bytes
• 32 ways
• wr-back or
wr-through
• Hit under
miss
Data RAM
•28K or 12K
bytes
• Re-map of data
cache
Power
Mgnt
Ctrl
Mini-
Data
Cache
• 2K or 1K
bytes
• 2 ways
Performance
Monitoring

Developer’s Manual January, 2004 17
Intel XScale® Core Developer’s Manual
Introduction
1.2.2.2 Memory Management
The Intel XScale®core implements the Memory Management Unit (MMU) Architecture specified
in the ARM Architecture Reference Manual. The MMU provides access protection and virtual to
physical address translation.
The MMU Architecture also specifies the caching policies for the instruction cache and data
memory. These policies are specified as page attributes and include:
•identifying code as cacheable or non-cacheable
•selecting between the mini-data cache or data cache
•write-back or write-through data caching
•enabling data write allocation policy
•and enabling the write buffer to coalesce stores to external memory
Chapter 3, “Memory Management” discusses this in more detail.
1.2.2.3 Instruction Cache
The Intel XScale®core comes with either a 16 K or 32 K byte instruction cache. The size is
determined by the ASSP. The instruction cache is 32-way set associative and has a line size of
32 bytes. All requests that “miss” the instruction cache generate a 32-byte read request to external
memory. A mechanism to lock critical code within the cache is also provided.
Chapter 4, “Instruction Cache” discusses this in more detail.
1.2.2.4 Branch Target Buffer
The Intel XScale®core provides a Branch Target Buffer (BTB) to predict the outcome of branch
type instructions. It provides storage for the target address of branch type instructions and predicts
the next address to present to the instruction cache when the current instruction address is that of a
branch.
The BTB holds 128 entries. See Chapter 5, “Branch Target Buffer” for more details.
1.2.2.5 Data Cache
The Intel XScale®core comes with either a 16 K or 32 K byte data cache. The size is determined
by the ASSP. Besides the main data cache, a mini-data cache is provided whose size is 1/16th the
main data cache. So a 32 K, 16 K byte main data cache would have a 2 K, 1 K byte mini-data cache
respectively. The main data cache is 32-way set associative and the mini-data cache is 2-way set
associative. Each cache has a line size of 32 bytes, supports write-through or write-back caching.
The data/mini-data cache is controlled by page attributes defined in the MMU Architecture and by
coprocessor 15.
Chapter 6, “Data Cache” discusses all this in more detail.
The Intel XScale®core allows applications to re-configure a portion of the data cache as data
RAM. Software may place special tables or frequently used variables in this RAM. See
Section 6.4, “Re-configuring the Data Cache as Data RAM” on page 6-71 for more information on
this.

18 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Introduction
1.2.2.6 Performance Monitoring
Performance monitoring counters have been added to the Intel XScale®core that can be configured
to monitor various events in the core. These events allow a software developer to measure cache
efficiency, detect system bottlenecks and reduce the overall latency of programs.
Chapter 8, “Performance Monitoring” discusses this in more detail.
1.2.2.7 Power Management
The Intel XScale®core incorporates a power and clock management unit that can assist ASSPs in
controlling their clocking and managing their power. These features are described in Section 7.3,
“CP14 Registers” on page 7-96.
1.2.2.8 Debug
The Intel XScale®core supports software debugging through two instruction address breakpoint
registers, one data-address breakpoint register, one data-address/mask breakpoint register, and a
trace buffer.
Chapter 9, “Software Debug” discusses this in more detail.
1.2.2.9 JTAG
Testability is supported on the Intel XScale®core through the Test Access Port (TAP) Controller
implementation, which is based on IEEE 1149.1 (JTAG) Standard Test Access Port and
Boundary-Scan Architecture. The purpose of the TAP controller is to support test logic internal and
external to the core such as built-in self-test and boundary-scan.
Appendix B discusses this in more detail.

Developer’s Manual January, 2004 19
Intel XScale® Core Developer’s Manual
Introduction
1.3 Terminology and Conventions
1.3.1 Number Representation
All numbers in this document can be assumed to be base 10 unless designated otherwise. In text and
pseudo code descriptions, hexadecimal numbers have a prefix of 0x and binary numbers have a prefix
of 0b. For example, 107 would be represented as 0x6B in hexadecimal and 0b1101011 in binary.
1.3.2 Terminology and Acronyms
ASSP Application Specific Standard Product
Assert This term refers to the logically active value of a signal or bit.
BTB Branch Target Buffer
Clean A clean operation updates external memory with the contents of the specified line in
the data/mini-data cache if any of the dirty bits are set and the lineis valid. There are
two dirty bits associated with each line in the cache so only the portion that is dirty
will get written back to external memory.
After this operation, the line is still valid and both dirty bits are deasserted.
Coalescing Coalescing means bringing together a new store operation with an existing store
operation already resident in the write buffer. The new store is placed in the same
write buffer entry as an existing store when the address of the new store falls in the
4 word aligned addressof the existing entry. This includes, in PCI terminology, write
merging, write collapsing, and write combining.
Deassert This term refers to the logically inactive value of a signal or bit.
Flush A flushoperation invalidatesthe location(s) in the cache by deasserting the valid bit.
Individual entries (lines) may be flushed or the entire cache may be flushed with one
command. Once an entry is flushed in the cache it can no longer be used by the
program.
XSC1 XSC1 refers to a variant of the Intel XScale®core denoted by a CoreGen
(Coprocessor 15,ID Register) valueof 0x1. This varianthas a 2counterperformance
monitor and a 5-bit JTAG instruction register. See Table 7-4, “ID Register” on
page 7-81 for more details.
XSC2 XSC2 refers to a variant of the Intel XScale®core denoted by a CoreGen
(Coprocessor 15,ID Register) valueof 0x2. This varianthas a 4counterperformance
monitor and a 7-bit JTAG instruction register. See Table 7-4, “ID Register” on
page 7-81 for more details.
Reserved A reserved field is a field that may be used by an implementation. If the initialvalue
of a reserved field is supplied by software, this value must be zero. Software should
not modify reserved fields or depend on any values in reserved fields.

20 January, 2004 Developer’s Manual
Intel XScale® Core Developer’s Manual
Introduction
This Page Intentionally Left Blank
Table of contents
Other Intel Processor manuals

Intel
Intel Quad-Core Xeon 3300 Series User manual

Intel
Intel i960 Series User manual

Intel
Intel Xeon User manual

Intel
Intel 80C186EB User manual

Intel
Intel i486 Quick user guide

Intel
Intel Xeon Guide

Intel
Intel core i9 X series User manual

Intel
Intel iAPX 86/88 User manual

Intel
Intel ARCHITECTURE IA-32 User manual

Intel
Intel Q77M vPro Guide

Intel
Intel core i7 X series User manual

Intel
Intel 8008 User manual

Intel
Intel 80386 Quick user guide

Intel
Intel Pentium II Xeon Quick start guide

Intel
Intel Centrino Pro User manual

Intel
Intel ARM Cortex-A9 Reference manual

Intel
Intel Core 2 Duo Processor User manual

Intel
Intel Pentium 4 Quick start guide