AMD ATI CTM Use and care manual

ATI CTM Guide
Technical Reference Manual
Version 1.01

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
ii
© 2006 Advanced Micro Devices, Inc. All rights reserved.
The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties
with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions
at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication.
Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty,
relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual
property right.
AMD’s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other
applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury,
death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice.
Reproduction of this manual, or parts thereof, in any form, without the express written permission of Advanced Micro Devices, Inc. is strictly prohibited.
Trademarks
AMD, the AMD Arrow logo, AMD Athlon, AMD Opteron and combinations thereof, AMD-XXXX, ATI and ATI product and product-feature names
are trademarks of Advanced Micro Devices, Inc.
HyperTransport is a licensed trademark of the HyperTransport Technology Consortium.
Microsoft is a registered trademark of Microsoft Corporation.
Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Disclaimer
While every precaution has been taken in the preparation of this document, Advanced Micro Devices, Inc. assumes no liability with respect to
the operation or use of AMD hardware, software, or other products and documentation described herein, for any act or omission of AMD
concerning such products or this documentation, for any interruption of service, loss or interruption of business, loss of anticipatory profits, or for
punitive, incidental or consequential damages in connection with the furnishing, performance, or use of the AMD hardware, software, or other
products and documentation provided herein.
Advanced Micro Devices, Inc. reserves the right to make changes without further notice to a product or system described herein to improve
reliability, function or design. With respect to AMD products which this document relates, AMD disclaims all express or implied warranties
regarding such products, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, and non-
infringement.
Documentation Updates
AMD is constantly improving its product and associated documentation. To maximize the value of your AMD product, you should ensure that
you have the latest documentation. AMD’s documentation contains helpful installation/configuration tips and other valuable feature information.

© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
Table of Contents
Chapter 1:
Introduction ............................................................................................. 1
1.1 About this Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2:
Specifications .......................................................................................... 3
2.1 CTM Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 The ATI Data Parallel Processor Array 4
2.1.2 Processor Execution Unit 4
2.1.3 Conditional Operation Unit 5
2.1.4 Memory Controller Unit 5
2.2 CTM Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Processor Execution Unit Commands 10
2.2.2 Memory Controller Unit Commands 13
2.2.3 Conditional Output Unit Commands 21
Chapter 3:
DPP Array Instruction Set Architecture .............................................. 23
3.1 Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Instruction Words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Synchronization of instruction streams 24
3.3 ALU Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Sources 25
3.3.2 Presubtract 26
3.3.3 Inputs 26
3.3.4 The Operation 27
3.3.5 Instruction Modifiers 29
3.3.6 Writemasks 30
3.3.7 Destination 30
3.3.8 Output 30
3.3.9 Setting Predicate Bits 31
3.3.10 ALU Result 32
3.4 Texture Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Operations 33
3.4.2 Semaphore 33

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
iv
3.5 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Dynamic Flow Control 34
3.5.2 Stacks and Branch Counters 35
3.5.3 Fields 36
3.5.4 Common Flow Control Statements 38
3.5.5 Optimizations 40
3.6 Note on Floating Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6.1 Pervasive Deviations from IEEE 40
3.6.2 ALU Non-Transcendental Floating Point 41
3.6.3 ALU Transcendental Floating Point 42
3.6.4 Texture Floating Point 43
3.7 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Chapter 4:
DPP Application Binary Interface ........................................................ 45
4.1 Executable Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 File Format 45
4.1.2 ELF Header 45
4.1.3 Program Code Sections 46
4.1.4 Program Loading 46
Chapter 5:
Device Interface..................................................................................... 51
5.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1.1 amCloseManagedConnection 51
5.1.2 amCommandBufferConsumed 51
5.1.3 amOpenManagedConnection 51
5.1.4 amSubmitCommandBuffer 52

© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
Chapter 1
Introduction
1.1 About this Document
ATI’s “Close To the Metal" (CTM) Device is designed to open up the high-performance, floating-point, parallel
processor array found in ATI graphics hardware. CTM consists of this processor array plus a handful of supporting
components that control and feed the array.
This manual provides a programmatic overview of the CTM.
1.2 Audience
This manual is intended for experienced design engineers.
1.3 Related Documents
• ATI CTM Device Interface included with CTM distibution.
• Assembler/Disassembler Guide included with CTM distibution.

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
2 Related Documents

© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
Chapter 2
Specifications
CTM is designed to expose the parallel array of floating-point processors found in ATI graphics hardware. It is
controlled with a few commands to set parameters, invalidate and flush caches, and start the processors in the
processing array. These commands reside in memory (see ATI CTM Device Interface for further information). This
chapter specifies how CTM reads these commands and its behavior upon processing each.
A block diagram of CTM is presented in Figure 2-1. In addition to the ATI Data Parallel Processor Array (DPP), CTM
comprises three major components: the Processor Execution Unit (PE), the Conditional Operation Unit (CO), and the
Memory Contoller Unit (MC).
Figure 2-1: CTM Block Diagram
The PE reads commands sequentially from a specified area of memory. Besides redirecting commands to other units
within CTM, the PE distributes processing work to the DPP. The computation on an individual processor is subject
to a condition returned by the CO. Program output results are written to memory, also based upon a condition returned
by the CO. Memory for instructions, constants, program inputs, program outputs, and a buffer used by the CO is
accessed through the MC. In addition to satisfying read and write requests, the MC computes memory address offsets,
based on a description of the format of the requested data in memory.
2.1 CTM Units
The following sections describe the CTM units in detail.

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
4 CTM Units
2.1.1 The ATI Data Parallel Processor Array
The ATI Data Parallel Processor (DPP) Array comprises one or more processors, each a programmable unit that can
execute a series of instructions.
Each processor in the array is directed by the Processor Execution Unit (see Section 2.1.2). If a processor is idle, the
PE may request that it execute a program. It does so by passing to the processor an identifier pair (i, j), where i and j
are non-negative (range-limited) integers, as well as its conditional value. Upon receiving the identifier pair and
conditional value, the processor informs the PE that it is busy, resets its program counter to zero, and begins program
execution. The processor remains busy until its internal program counter reaches the end of the program, as specified
in the Application Binary Interface. After the processor executes the instruction at the end-of-program address, the
processor halts and informs that PE that it is again idle.
Instructions for a program, as well as constants, inputs, and outputs to which the program may refer, are stored in
memory, and read or written through the MC (see Section 2.1.4). Conceptually, each processor maintains a separate
interface to the MC. This interface consists of two non-negative (range-limited) integer indices (x, y) and a field
identifying the type of memory access the processor is requesting (program instruction, floating-point constant,
integer constant, boolean constant, or input read; or output write).
The (x, y) pair is different for each of the types of memory that a processor may request. For instructions, (x, y) is
equal to (pc, 0), where pc is the current program counter. The index pair for each of the constants is (c, 0), where c is
the index specified by the program instruction requesting the constant. The index pair and identifier for inputs are
specified by the program instruction requesting an input value. The index pair for an output is always the pair assigned
to the processor by the PE (i, j), while the identifier for the output is specified in the requesting program instruction.
If conditional output is enabled, output write requests by a processor are conditionally generated, based on a value
returned by the Conditional Output Unit (see Section 2.1.3). The processor sends a conditional value (v) and its (i, j)
index pair to the CO, and the CO then performs a conditional test based on the value and index pair. If the test passes,
the processors dispatch l output write requests to the MC; otherwise no output write requests are generated. The
conditional value, v, depends on program that is currently being executed. The value may be specified directly in an
instruction in the program, or it may equal the conditional value sent to the processor by the PE. If conditional output
is disabled, the processor behaves as if the conditional output test always passes. Conditional output is enabled by
setting the condition location to the processors with the set_cond_loc command (see page 22).
All processors refer to the same instructions and constants, but may index different input, output, and conditional data.
Thus, if multiple processors are working simultaneously, CTM exports a SIMD programming model. Individual
processors, however, may or may not execute in SIMD lock-step in a particular CTM implementation; the behavior
of individual processors relative to other processors is unspecified.
2.1.2 Processor Execution Unit
The Processor Execution Unit interprets commands from a command buffer, forwarding them to other units in CTM
if necessary. Under normal operation, the PE consumes commands as fast as it can process them or pass them along.
If, however, the PE receives a wait_for_idle command (see page 11), it stops reading commands until all processors
in the processor array are idle. Once the processors are idle, the PE again starts to read commands, beginning with the
one following the wait_for_idle command.
In addition to parsing the command buffer, the PE is responsible for distributing work to the processors in the
processor array. The PE's distribution of work is based on the rectangular domain D \subset Z^2, with D = { (i,j) | i0
<= i <= i1, j0 <= j <= j1 }. The parameters i0, j0, i1, and j1 are specified to the PE through the set_domain command
(see page 10).
When the PE receives a start_program command (see page 11), it begins allocating work for the processors. If
conditional program execution is disabled, the PE schedules the processors to run the current program once for each
index pair (i, j) in the current domain. The specific partitioning of work among the processors and the order in which
the index pairs are scheduled is unspecified. As described in Section 2.1.1, the PE sends a corresponding index pair
and its conditional value to an idle processor in order to execute the program for that index pair. The result of the
entire computation is as if the program were executed in SIMD across all index pairs.[x]

Conditional Operation Unit 5
© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
If conditional program execution is enabled, however, program execution on a given pair may be skipped. Prior to
scheduling an index pair to a processor, the PE sends it to the Conditional Operation Unit (CO), along with its
conditional value. The CO performs a conditional test (as described in Section 2.1.3) based on the index pair and
conditional value, and returns its result to the PE. If the test fails, the index pair is skipped; otherwise a processor in
the array is scheduled to run the current program on the pair. The conditional value is set with the set_cond_val
command (see page 10). Conditional program execution is enabled by setting the conditional location to the PE with
the set_cond_loc command (see page 22).
The PE maintains counters that may be useful in analyzing CTM's performance. These are total clocks since last reset
and total clocks during which at least one processor was active since last reset. The init_perf_counters (see page 12
for performance counter commands) sets up the counters for initial use. Start_perf_counters resets the counters and
starts them counting; Stop_perf_counters stops the counters. The counters may be read into an array in GPU-
addressable system memory using read_perf_counters. Read_perf_counters takes one parameter, which gives the
GPU address of the first element of the array. Total clocks is written in the first element; the second element is clocks
active.
2.1.3 Conditional Operation Unit
The Conditional Operation Unit (CO) performs a conditional test for clients within CTM. Clients include the PE (for
conditional program execution) and the DPP (for conditional program output). The CO evaluates a condition based
on an index pair (i, j) and a conditional value v, which are both sent to the CO by a requesting client.
The CO test is one of three possible cases: the test always passes, the test always fails, or the test is a comparison
between the conditional value v from a client to a value b read from a conditional output buffer residing in memory:
result = v op b, where op is one of <, <=, =, >=, or >
The conditional output buffer value b is obtained by the CO issuing a read request to the Memory Controller Unit with
index pair (i, j) and the conditional output buffer identifier (see page 7). The CO test is set with the set_cond_test
command (see page 21).
In addition, if the conditional test passes, the CO will write the client conditional value v to the conditional output
buffer by issuing a write request to the MC with index pair (i, j) and the conditional output identifier.
2.1.4 Memory Controller Unit
The Memory Controller Unit (MC) translates addresses and satisfies requests to read and write memory for clients
within CTM. Clients include the PE (for command buffers), the CO (for the conditional buffer), and the DPP (for
program instructions, floating-point, integer, and boolean constants, inputs, and outputs). The MC can read or write
two kinds of memory: private memory that is accessible only by the MC (local memory), and memory that is
accessible both by the MC and a host processor (remote or system memory).
An MC memory address is a 32-bit unsigned integer. The MC distinguishes between local and remote memory by
maintaining distinct address ranges for each, within its 32-bit address range. The address mapping is system-
dependent and is described in the ATI CTM Device Interface.
The MC computes the memory address as a function of an index pair, (x, y), the number of elements in each row of
data (pitch), a base address offset (offset), a tiling format (linear or tiled plus an optional 2x2 superfine tiling on single-
channel input data reads), and the bytes per element (bpe) derived from the data format (format). The amount of data
read from or written to memory at this address is given by the bytes per element.
The address translation for the different data formats is summarized in the following two tables. If the tiling format
of the memory is linear, the address is provided in Table 1. If, on the other hand, the memory is tiled, the address is
given in Table 2.

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
6 CTM Units
1. Address Translation for Linear Memory Format
2. Address Translation for Tiled Memory Formats
The 2x2 superfine tiling option augments the linear and tiled format addresses described above. When applied to
single-channel inputs, it operates as if four independent data elements are requested with index pairs given by (x+1,
y), (x, y+1), (x+1, y+1), and (x, y). These four values are packed into the four channels (c0, c1, c2, c3), respectively,
of the register specified by the program making the memory request. Without the 2x2 superfine tiling option, a
program would need to make four independent input memory requests, across four independent instructions, to
achieve the same result. The 2x2 superfine tiling is ignored for all memory clients besides inputs, and its behavior is
undefined if the input memory format has more than one channel.
The index pair (x, y) and a unique identifier are sent to the MC by the client requesting the memory read or write. The
pitch, offset, tiling, and format parameters associated with the client identifier are maintained in the MC (commands
to set these parameters are summarized below), and they are accessed when a client requests a memory transaction.
The parameters passed to the memory control unit are different for each of the clients making a memory request. They
are detailed, for each of the possible identifiers, in the following subsections.
Input Parameters
The MC supports clients (processors in the data parallel processor array) requesting a memory read for up to 16
distinct program inputs. The (x, y) index pair for a given request is specified in an instruction being executed on one
of the processors. The index pair is sent to the MC along with the program input identifier, also specified in the
requesting instruction. The pitch, offset, tiling, and format for each input identifier are shared among all processors
in the processor array. These values are provided to the MC with the set_inp_fmt command (see page 15).
The MC may service any number of requests from the processors during program execution. If the data can be found
in the MC input read cache, then the MC will satisfy the request from the cache. Otherwise, it will pull data into the
cache (either from GPU or system memory, as appropriate), in the process of servicing the request. The input read
cache is shared among all 16 program inputs, and must be invalidated to guarantee correct reading of data that has
changed in memory. The input read cache is invalidated with the inv_inp_cache command (see page 20).
Bytes Bits[31:5] Bits [4:0]
1y[11:0]*pitch[13:5]+x[11:5]+offset[31:5] x[4:0]
2 y[11:0]*pitch[13:4]+x[11:4]+offset[31:5] x[3:0],0
4y[11:0]*pitch[13:3]+x[11:3]+offset[31:5] x[2:0],00
8 y[11:0]*pitch[13:2]+x[11:2]+offset[31:5] x[1:0],000
16 y[11:0]*pitch[13:1]+x[11:1]+offset[31:5] x[0],0000
Bytes Bits [31:11] Bits [10:9] Bits [8:7] Bits [6:5] Bits [4:0]
1y[11:5]*pitch[13:6]+x[11:6]+
offset[31:11]
y[4]^x[6],x[5]^y[5] y[3]^x[5],x[4]^y[4] y[2],x[3] y[1:0],x[2:0]
2 y[11:5]*pitch[13:5]+x[11:5]+
offset[31:11]
y[4]^x[5],x[4]^y[5] y[3]^x[4],x[3]^y[4] y[2],x[2] y[1:0],x[1:0],0
4y[11:4]*pitch[13:5]+x[11:5]+
offset[31:11]
y[3]^x[5],x[4]^y[4] y[2]^x[4],x[3]^y[3] y[1],x[2] y[0],x[1:0],00
8 y[11:4]*pitch[13:4]+x[11:4]+
offset[31:11]
y[3]^x[4],x[3]^y[4] y[2]^x[3],x[2]^y[3] y[1],x[1] y[0],x[0],000
16 y[11:3]*pitch[13:4]+x[11:4]+
offset[31:11]
y[2]^x[4],x[3]^y[3] y[1]^x[3],x[2]^y[2] y[0],x[1] x[0], 0000

Memory Controller Unit 7
© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
Output Parameters
The MC supports clients (processors in the data parallel processor array) requesting memory be written for up to 4
distinct program outputs. The (x, y) pair for a given output memory request is equal to the (i, j) output domain pair
assigned to a processor or conditional output unit by the PE as described in Section 2.1.2. This index is sent to the MC
along with the program output identifier that is specified in the program instruction being executed. The pitch, offset,
tiling, and format for each output identifier is shared among all processors in the processor array and the conditional
output block. These values are provided to the MC with the set_out_fmt command (see page 16).
The MC may service any number of requests from the processors during program execution. If the data can be placed
in the MC output write cache, then the MC will place the incoming data into the cache. Otherwise, it will push out a
portion of the cache (either to GPU or system memory, as appropriate), if necessary, to free space in the cache for the
new request. The output write cache is shared among all 4 program outputs, and must be flushed to guarantee that any
data written by the processors are placed in memory. The output write cache is flushed with the flush_out_cache
command (see page 20).
A write request to the output buffers can be masked with the set_out_mask command (see page 21). This command
specifies, per output channel, whether the output buffer is updated. If the writemask for a given output channel is 0,
then its values are not updated when the MC services a write request.
Conditional Operation Parameters
The MC supports clients (the CO) requesting memory be read and written for a single conditional output buffer. The
(x, y) pair is equal to the (i, j) output domain pair assigned to the CO by the PE as described in Section 2.1.2. This
pair is sent to the MC along with the conditional output identifier. The pitch, offset, tiling, and format for the output
are provided to the MC with the set_cond_out_fmt command (see page 18).
The MC may service any number of requests from the CO during program execution. These may be either read
requests or write requests.
If the data for a read request can be found in the MC conditional output cache, then the MC will satisfy the request
from the cache. Otherwise, it will pull data into the cache (either from local or remote memory, as appropriate), in the
process of servicing the request. The conditional output cache must be invalidated to guarantee that data that has
changed in memory is properly read. The cache can be invalidated with the inv_cond_out_cache command (see
page 20).
If the data for a write request can be placed in the MC conditional output cache, then the MC will place the incoming
data into the cache. Otherwise, it will push out a portion of the cache (either to GPU or system memory, as
appropriate), if necessary, to free space in the cache for the new request. The conditional output cache must be flushed
to guarantee that any data written by the conditional output block is placed in memory. The cache can be flushed with
the flush_cond_out_cache command (see page 20).
A write request to the conditional output buffer can be suppressed with the set_cond_out_mask command (see page
21). If the writemask set by this command is 0, then no values are written to the conditional output cache or memory.
Floating Point, Integer, and Boolean Constant Parameters
The Memory Contoller Unit supports clients (processors in the data parallel processor array) requesting a memory
read for floating point, integer, and boolean program constants. The (x, y) index pair for a given constant memory
request is obtained from the constant index referenced in the program instruction being executed. The index pair is
sent to the MCU along with the constant type identifier, which also is specified in the program instruction. The pitch,
offset, tiling, and format for each type of constant (three types) is shared among all processors in the processor array.
These values are provided to the MCU with the set_constf_fmt, set_consti_fmt, and set_constb_fmt commands.
The MC may service any number of requests from the processors during program execution. If the data can be found
in the corresponding MC constant read cache, then the MC will satisfy the request from the cache. Otherwise, it will
pull data into the cache (either from GPU or system memory, as appropriate), in the process of servicing the request.
The constant read caches must be invalidated to guarantee that data that has changed in memory is properly read. The
constant read caches can be invalidated with the inv_constf_cache, inv_consti_cache, and inv_constb_cache
commands.

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
8 CTM Commands
Instruction Parameters
The Memory Contoller Unit supports clients (processors in the data parallel processor array) requesting a memory
read for program instructions. The (x, y) index pair for a given memory request is obtained from an internal program
counter in each processor that is incremented as the program is executed. The index pair is then sent to the MCU. The
pitch, offset, tiling, and format for the instructions are shared among all processors in the processor array. These
values are provided to the MC with the set_inst_fmt command.
The MC may service any number of requests from the processors during program execution. If the data can be found
in the corresponding MC instruction read cache, then the MC will satisfy the request from the cache. Otherwise, it
will pull data into the cache (either from GPU or system memory, as appropriate), in the process of servicing the
request. The instruction read cache must be invalidated to guarantee that data that has changed in memory is properly
read. The instruction read cache can be invalidated with the inv_inst_cache command.
2.2 CTM Commands
CTM commands are packed into CTM Command Buffers. A CTM Command Buffer is a tightly packed region in
memory interpreted as a sequence of commands and their parameters, starting at the base address of the command
buffer. Each CTM command is a 32-bit unsigned integer. It is followed immediately in memory by one or more
parameters, each of which is also a 32-bit unsigned integer. No commands take a variable number of parameters and
all commands have at least one parameter.
All CTM Commands can be found in Table 3. The table contains the command name, unique opcode, parameters,
and a brief description. More complete descriptions are given in the following subsections. Some commands will
result in undefined behavior if a program is currently executing on the processor array. Such a command must follow
a wait_for_idle command and precede any subsequent start_program command in the command buffer to be
predictable. Other commands are pipelined within CTM, and can be issued at any time.
3. CTM Unit Commands
Opcode Pipelined Parameters Description
Processor Execution Unit Commands
init_perf_counters x'C0010200 'yes 0: flags Initialize performance counters.
start_perf_counters x'C0000300 'yes 0: Reserved Set performance counters to zero
and start them counting.
stop_perf_counters x'C0000400 'yes 0: Reserved Stop the performance counters
counting.
read_perf_counters x'C0010500 'no 0: GPU address Write the vales of the performance
counters, starting at the supplied
GPU address.
set_cond_val x'C0000600 'yes 0: Value Set the value sent by the PEU to the
Conditional Operation Unit when
conditional execution is enabled.
set_domain x'C0030700 'yes 0: i0
1: j0
2: i1
3: j1
Set the domain for program
execution to be the rectangle (i0, j0)
- (i1, j1) inclusive.
start_program x'C0000800 'yes 0: Reserved Instruct the Processor Execution
Unit to start the program.

Memory Controller Unit 9
© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
wait_for_idle x'C0000900 'yes 0: Reserved Block reading of the command
buffer until all processing in the
GPU has completed. Immediately
after processing has completed,
continue to consume the command
buffer.
Memory Controller Unit Commands
set_inst_fmt x'C0010A00 'no 0: Base Address
1: Format
Set the base address offset, pitch,
tiling, and data format for the
program instructions.
set_inp_fmt x'C0030B00 'no 0: Input Index
1: Base Address
2: Format
3: Height
Set the base address offset, pitch,
tiling, and data format for the given
input.
set_out_fmt x'C0030C00 'no 0: Output Index
1: Base Address
2: Format
3: Height
Set the base address offset, pitch,
tiling, and data format for the given
output.
set_cond_out_fmt x'C0020D00 'no 0: Base Address
1: Format
2: Height
Set the base address offset, pitch,
tiling, and data format for the
conditional output.
set_constf_fmt x'C0010E00 'no 0: Base Address
1: Format
Set the base address offset, pitch,
tiling, and data format for the
floating point constants.
set_consti_fmt x'C0010F00 'no 0: Base Address
1: Format
Set the base address offset, pitch,
tiling, and data format for the integer
constants.
set_constb_fmt x'C0011000 'no 0: Base Address
1: Format
Set the base address offset, pitch,
tiling, and data format for the
boolean constants.
inv_inst_cache x'C0001100 'yes 0: Reserved Invalidate the instruction cache.
inv_constf_cache x'C0001200 'yes 0: Reserved Invalidate the floating point
constant cache.
inv_consti_cache x'C0001300 'yes 0: Reserved Invalidate the integer constant
cache.
inv_constb_cache x'C0001400 'yes 0: Reserved Invalidate the boolean constant
cache.
inv_cond_out_cache x'C0001500 'yes 0: Reserved Invalidate the conditional output
cache.
inv_inp_cache x'C0001600 'yes 0: Reserved Invalidate the input cache.
flush_out_cache x'C0001700 'yes 0: Reserved Flush the output cache.
flush_cond_out_cache x'C0001800 'yes 0: Reserved Flush the conditional output cache.
set_out_mask x'C0001900 'yes 0: Mask Set the write mask for the output
channels.
set_cond_out_mask x'C0001A00 'yes 0: Mask Set the write mask for the
conditional output buffer.
Opcode Pipelined Parameters Description

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
10 CTM Commands
2.2.1 Processor Execution Unit Commands
This section contains all CTM commands for the Processor Execution Unit.
Set Conditional Execution Value Command
The set_cond_val command takes a single parameter, a 32-bit value to send to the Conditional Operation Unit when
the conditional execution flag is set.
Parameter 0: value
Set Domain Command
The set_domain command takes four parameters, which specify i0, j0, i1, and j1 values for the current domain, as
described in Section 3.2. These are unsigned integers, of a range given by the bits in use.
Parameter 0: i0
Conditional Output Unit Commands
set_cond_test x'C0001B00 'yes 0: Condition Set the test the conditional output
unit will evaluate.
set_cond_loc x'C0001C00 'yes 0: Location Set location for the conditional test
(PE or DPP).
Opcode Pipelined Parameters Description
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
val
Bits Field Name Description
31:0 val The conditional execution value for the PEU to pass to the Conditional
Operation Unit
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
i0
Bits Field Name Description
11:0 i0 The i0 domain parameter.
31:12 Reserved Reserved

Processor Execution Unit Commands 11
© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
Parameter 1: j0
Parameter 2: i1
Parameter 3: j1
Start Program Command
The start_program command takes one, reserved parameter. Upon receiving this command, the PE distributes work
to the processors as described in Section 3.2.
Parameter 0:
Wait For Idle Command
The wait_for_idle command takes one, reserved parameter. After receiving this command, the PE blocks all further
command processing until all processors in the DPP are idle. Once all processors are idle, processing of subsequent
commands resumes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
j0
Bits Field Name Description
11:0 j0 The j0 domain parameter.
31:12 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
i1
Bits Field Name Description
11:0 i1 The i1 domain parameter.
31:12 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
j1
Bits Field Name Description
11:0 j1 The j1 domain parameter.
31:12 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bits Field Name Description
31:0 Reserved Reserved

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
12 CTM Commands
Parameter 0:
Initialize Performance Counters Command
The init_perf_counters command takes one parameter. This command executes any setup required for performance
counter use. If bit 0 of the parameter is 1, performance counters are enabled. If it is 0, performance counters are
disabled, and the other performance counter commands have no effect.
Parameter 0:
Start Performance Counter Command
The start_perf_counters command takes one, reserved parameter. This command sets enabled performance
counters to zero, and then starts the counters counting.
Parameter 0:
Stop Performance Counters Command
The stop_perf_counters command takes one, reserved parameter. This command stops any enabled counters
counting.
Parameter 0:
Read Performance Counters Command
The read_perf_counters command takes one parameter. This command transfers the performance counters to an
area in memory beginning at the GPU address supplied in the parameter. The counters are written as 32-bit unsigned
integers in the order given in Section 2.1.2 .
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bits Field Name Description
31:0 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
e
Bits Field Name Description
31:1 Reserved Reserved
0 enable Turn performance counters on or off.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bits Field Name Description
31:0 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bits Field Name Description
31:0 Reserved Reserved

Memory Controller Unit Commands 13
© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
If performance counters are disabled, this command has no effect.
Parameter 0: base address
2.2.2 Memory Controller Unit Commands
This section contains all of the commands for the memory controller unit.
All address values are provided in the same format: a 2K-aligned base format. The lower eleven bits are ignored by
CTM, and the address is treated as if all of the bits were zero. These base addresses are used in the address calculation
of the MC, as described in Section 2.1.4. The base address parameter is:
Similarly, all format parameters are expressed the same way. The format contains the pitch, tiling format, and data
format of the information in memory to which it refers. The pitch is given in multiples of 4 (the lowest two bits of the
pitch must be zero). The pitch, tiling, and data formats are used by the MC to calculate an address offset from an (x,
y) index pair, as described in Section 2.1.4. The format parameter is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
base address
Bits Field Name Description
10:0 Reserved Reserved
31:11 base address The 2K-aligned address where the performance counter results will be
placed.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
base address
Bits Field Name Description
10:0 Reserved Reserved
31:11 base address The 2K-aligned address.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
format tpitch
Bits Field Name Description
1:0 Reserved Reserved
12:2 pitch The pitch in multiples of 4
15:13 Reserved Reserved
17:16 tiling Tiling format (possible values):
• 0 - LINEAR
•1-TILED
• 2 - LINEAR_INP_2X2
• 3 - TILED_INP_2X2

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
14 CTM Commands
Set Instruction Format Command
The set_inst_fmt command takes two parameters. The first parameter is the base address for the initial program
instruction in memory. The second contains pitch, tiling, and format information for the program data.
Parameter 0: base address
23:18 Reserved Reserved
26:24 format Data format (possible values):
• 0 - UINT16_1
• 1 - UINT8_4
• 2 - FLOAT32_1
• 3 - FLOAT32_2
• 4 - FLOAT32_4
•>5-Reserved
31:27 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
base address
Bits Field Name Description
10:0 Reserved Reserved
31:11 base address The 2K-aligned address at which the first program instruction is located.
Bits Field Name Description

Memory Controller Unit Commands 15
© 2006 Advanced Micro Devices, Inc. ATI CTM Guide v. 1.01
Parameter 1: format
Set Input Format Command
The set_inp_fmt command takes four parameters. The first parameter is the index of the input to which the following
parameters apply; the second parameter is the base address for the given input in memory; the third contains its pitch,
tiling, and format information; and the fourth is the height of the input.
Parameter 0:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
format tpitch
Bits Field Name Description
1:0 Reserved Reserved
12:2 pitch The pitch in multiples of 4
15:13 Reserved Reserved
17:16 tiling Tiling format (possible values):
• 0 - LINEAR
•1-TILED
• 2 - LINEAR_INP_2X2
• 3 - TILED_INP_2X2
23:18 Reserved Reserved
26:24 format Data format (possible values):
• 0 - UINT16_1
• 1 - UINT8_4
• 2 - FLOAT32_1
• 3 - FLOAT32_2
• 4 - FLOAT32_4
•>5-Reserved
31:27 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
input
Bits Field Name Description
3:0 input The input to which this command applies.
31:4 Reserved Reserved

ATI CTM Guide v. 1.01 © 2006 Advanced Micro Devices, Inc.
16 CTM Commands
Parameter 1:
Parameter 2:
Parameter 3:
Set Output Format Command
The set_out_fmt command takes four parameters. The first parameter is the index of the output to which the
following parameters apply; the second parameter is the base address for the given output in memory; the third
contains its pitch, tiling, and format information; and the fourth is the height of the given input.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
base address
Bits Field Name Description
10:0 Reserved Reserved
31:11 base address The 2K-aligned address at which the given input is located.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
format tpitch
Bits Field Name Description
1:0 Reserved Reserved
12:2 pitch The pitch in multiples of 4
15:13 Reserved Reserved
17:16 tiling Tiling format (possible values):
• 0 - LINEAR
•1-TILED
• 2 - LINEAR_INP_2X2
• 3 - TILED_INP_2X2
23:18 Reserved Reserved
26:24 format Data format (possible values):
• 0 - UINT16_1
• 1 - UINT8_4
• 2 - FLOAT32_1
• 3 - FLOAT32_2
• 4 - FLOAT32_4
•>5-Reserved
31:27 Reserved Reserved
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
height
Bits Field Name Description
12:0 height The height
31:13 Reserved Reserved
Table of contents
Popular Recording Equipment manuals by other brands

Strymon
Strymon STARLAB quick start guide

Disaster Area Designs
Disaster Area Designs DPC-8EZ user manual

SLEIPNER MOTOR AS
SLEIPNER MOTOR AS S-Link 8730B installation guide

TC Electronic
TC Electronic DB6 manual

MACH SYSTEMS
MACH SYSTEMS MACH-ETH user manual

Kvaser
Kvaser BlackBird Getting started guide