This chapter describes the LeoFloat ASIC. For information on the microcode that controls LeoFloat, see the LeoFloat Instruction Set Manual.
The LeoFloat ASIC (U0501, U0601, U0701, and U0801 on the lower board) is a special floating-point chip designed specifically for 3D graphics. LeoFloat converts model space triangles, vectors, and dots into screen space fixed-point rendering parameters. LeoFloat supports specialized floating point instructions required by vector and triangle transform, lighting, and set-up algorithms.
The primary features of the LeoFloat ASIC are:
Figure 6-1 shows an example of a primitive object, a triangle, rendered through the LeoFloat pipeline. Since the actual process is under microprogram control, and the microprogram is subject to change, this is merely a possible example of the transformation process performed within LeoFloat. The transformation process for a triangle is as follows:
Figure 6-1 LeoFloat Graphics Pipeline
The single-plane clipping routine clips the triangle if one vertex is outside of the clipping plane. This creates two triangles. The second triangle is placed in the vertex list and the clip_pending bit is set in the State bits register so that the setup routine can process the second triangle.
The complex clipping routine clips all the vertices of the triangle that are out to the closest clipping plane. The triangles usually have more than one vertex out that are in different clipping planes. This causes more than one triangle to be generated when clipped. These triangles are put into the vertex list and the clip_pending bit is set. The setup routine traverses the vertex list until all the triangles are processed.
If the clip_pending bit is set in the State bits register, the microprogram branches to a routine that traverses the clip list to get the three vertices for the next triangle. After that, the routine branches back to the triangle code that called the setup routine and performs the respective calculations and conversion to screen space coordinates. This loop continues until all the triangles have been processed from the vertex list.
Figure 6-2 shows the LeoFloat external interface and typical SRAM connection. The internal detail is only functional.
Figure 6-2 LeoFloat ASIC Simplified Block Diagram
LeoFloat has four major interfaces: CF Bus (input), CD Bus (output), SRAM, and external.
The CF Bus interface contains those signals that interface between the LeoCommand and LeoFloat ASICs. Table 6-1 summarizes the CF Bus interface signals. See Chapter 3, "CF Bus," for more information on this interface.
Table 6-1 LeoFloat CF Bus Interface Signals
----------------------------------------------------------------------------------------------------------------
Signal Name No. Pins I/O Type Description ----------------------------------------------------------------------------------------------------------------
CF_DAT<15:0'> 16 I Bi-state LeoCommand to LeoFloat data bus. CF_CTL<2:0'> 3 I Bi-state LeoCommand to LeoFloat write control. CF_BUF_AVL<3:0'> 1 O Bi-state Buffer available for LeoCommand in LeoFloat. CF_LOAD<3:0'> 1 I Bi-state LeoCommand to LeoFloat write enable. ----------------------------------------------------------------------------------------------------------------
The CF_DAT<15:0 lines are a 16-bit data input bus. There are two types of data on the bus:
The encoded CF_CTL<2:0 bits describe the word format of the data on the bus. The end of the packet is also encoded on these bits. The code for the last word will also indicate whether the packet should be read or written to SRAM. The start of execution and interruption of the LeoFloat microprocessor is covered by one of the codes. Table 6-2 lists the encodings.
Table 6-2 CF_CTL<2:0 Encoding
---------------------------------------------------------------------------------------------
CF_CTL<2:0'> Format LeoFloat Operation ---------------------------------------------------------------------------------------------
000 Packed Load 16-bit fraction, convert to floating point. 001 Packed Load 16-bit fraction, convert to floating point. End of packet. 010 Unpacked Load 16-bit upper half of 32-bit word. 011 Unpacked Load 16-bit upper half of 32-bit word. End of packet. 100 Unpacked Load 16-bit lower half of 32-bit word, zero extend. 101 Unpacked Load 16-bit upper half of 32-bit word. SRAM read. End of packet. 110 CF_DAT<0 = 1 Reset all LeoFloats (CF_LOAD = 0). CF_DAT<0 = 0 Interrupt/run (push PC onto the stack, jump to 0; run if halted). 111 Unpacked Load 16-bit upper half of 32-bit word. SRAM write. End of packet. ---------------------------------------------------------------------------------------------
Each LeoFloat sends a "buffer available" signal to LeoCommand. This signal indicates to LeoCommand that in three phases (half clock cycles) the input buffer can accept data. The latency from the "buffer available" signal going active to data on the CF_DAT lines is three phases (half clock cycles). LeoFloat should try to predict in advance that its input buffer will become available in order to hide this latency.
Each LeoFloat receives a "load" signal from LeoCommand. This signal indicates to LeoFloat that it should receive the data on the CF_DAT<15:0 lines according to the encoding on the control bits.
The CD Bus interface contains those signals that interface between LeoFloat, LeoCommand, and LeoDraw. Table 6-3 summarizes the output interface signals. See Chapter 4, "CD Bus," for more information on this interface.
Table 6-3 LeoFloat CD Bus Interface Signals
---------------------------------------------------------------------------------------------------
Signal Name No. Pins I/O Type Description ---------------------------------------------------------------------------------------------------
CD_DAT<15:0'> 16 I/O Tri-state LeoFloat data out bus. FLTn_ST<1:0'> 2 O Bi-state LeoFloat output section status. FLT_ENn 1 I Bi-state Okay to output control. ---------------------------------------------------------------------------------------------------
Float data out, a 16-bit output data bus. This bus outputs a 32-bit value over two clock cycles.
This is a two-bit status output for each LeoFloat, where n identifies the LeoFloat ASIC: 0, 1, 2, or 3, encoded as follows:
----------------------------------------------
FLT_ST<1:0'> Meaning ----------------------------------------------
00 Idle 01 No data to output 10 Request to output to LeoDraw 11 Request to output to LeoCommand ----------------------------------------------
The FLT_EN signal to each LeoFloat tells the specific LeoFloat to go ahead and output the data. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.
The SRAM interface contains those signals that interface between LeoFloat and the SRAM. Table 6-4 summarizes the SRAM interface signals.
Table 6-4 LeoFloat SRAM Interface Signals
--------------------------------------------------------------------------------------------
Signal Name No. Pins I/O Type Description --------------------------------------------------------------------------------------------
LFn_SR_DAT<31:0'> 32 I/O Tri-state SRAM data bus Fn_SR_ADR<16:0'> 17 O Bi-state SRAM address bus LFn_SR_WE_L 1 O Bi-state SRAM read/write enable LFn_SR_OE_L 1 O Bi-state SRAM output enable --------------------------------------------------------------------------------------------
This is a 32-bit bi-directional SRAM data bus that carries data to or from SRAM. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.
This is a 17-bit output bus that carries the SRAM address. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.
This is the SRAM write-enable signal. When low, the SRAM is performing a write cycle. When high, the SRAM is performing a read cycle. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.
This is the SRAM output enable control signal. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.
The external interface contains those signals required to keep synchronization among chips to interface with JTAG for test and diagnostics. Table 6-5 summarizes the external interface signals.
Table 6-5 LeoFloat External Interface
-------------------------------------------------------------------------------------------------------------
Signal Name No. Pins I/O Type Description -------------------------------------------------------------------------------------------------------------
CLK_25M_LFn 1 I Bi-state System clock. LEO_RST_L 1 I Bi-state Reset TCK 1 I Bi-state JTAG test clock. Tied to the system clock. SCAN_TMS 1 I Bi-state JTAG test mode select. TDI 1 I Bi-state JTAG test data in. FLOATn_TDO 1 O Bi-state JTAG test data out. TEST_OE 1 I Bi-state Global test pin to tri-state all output pins. Normally pulled up. HRD_INT_L 1 I Bi-state Hard interrupt. Normally pulled up. BURN_IN_L 1 I Bi-state Burn in. -------------------------------------------------------------------------------------------------------------
This input signal drives the on-chip clock generator. It is a symmetrical clock with a nominal 40 nanosecond period. It is used to derive the chip internal clocks, PHASE_A and PHASE_B. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.
Figure 6-3 shows the system clock (CLK_25_L) and the two chip internal clocks. The PHASE_A and PHASE_B clocks are non-overlapping, with time Td1 or Td2 between the falling edge of one and the rising edge of the other. LeoFloat is designed with transparent latches that are open during the phase and closed at the falling edge of the phase.
Figure 6-3 PHASE_A and PHASE_B Clocks
This input is a reset signal used to initialize some of the chip internal nodes.
A free-running test clock signal imposed by JTAG. Tied to the system clock, CLK_25M_LFn.
A JTAG test mode select signal.
A global test pin driven by the board tester to tri-state all output pins. Normally pulled up.
A JTAG test data input signal. For a description of the scan test in and out signals, see "The JTAG Scan In and Out Signals" on page 2 - 19.
A JTAG test data out signal.
Hard interrupt signal. Normally tied up.
Burn in signal.
Table 6-6 summarizes the power and ground pins.
Table 6-6 LeoFloat Power and Ground Signals
-------------------------------------------
Signal Name No. Pins Description -------------------------------------------
IF_ORING_VSS 19 Output ring Vss. IF_ORING_VDD 19 Output ring Vdd. IF_CORE_VSS 6 Core Vss. IF_CORE_VDD 6 Core Vdd. IF_CLK_25_L_VSS 1 Clock Vss. IF_CLK_25_L_VDD 1 Clock Vdd. IF_IRING_VSS 2 Input ring Vss. IF_IRING_VDD 2 Input ring Vdd. IF_TCK_VSS 1 JTAG test Vss. IF_TCK_VDD 1 JTAG test Vdd. -------------------------------------------
The LeoFloat ASIC is basically divided into the following ten functional blocks:
Figure 6-4 shows the LeoFloat ASIC block diagram. The functional blocks in the figure are described in more detail following the block diagram.
Figure 6-4 LeoFloat ASIC Functional Block Diagram
The primary batch of microprogram instructions are arithmetic. These instructions have the same source and destination form. There are three sources (A Bus, B Bus, and C Bus) and one destination (D Bus).
The A Bus serves as one input leg to the floating point multiplier, as one input leg to the integer ALU, and as one of two possible inputs to one leg of the floating point ALU.
The B Bus serves as the other input leg to the floating point multiplier, the sole input to the floating point reciprocal unit, and as the other input leg to the integer ALU.
The C Bus serves solely as the other input leg to the floating point ALU.
The D Bus is the output. The D Bus also has a bypass path, which bypasses the register files for direct output to the other buses.
The floating point multiplier module multiplies two floating-point operands together. The result is normalized and rounded.
The floating point multiplier operates on data that conforms to the IEEE single precision floating point format. The multiply operations take two cycles, but because the module is pipelined, a new multiply can be initiated every cycle.
The floating point multiplier module consists of a multiplier block, adders, and normalization and rounding circuits.
The mantissa multiplier block is a 26 by 26 two`s-complement multiplier array. The multiplier array outputs (sums, carries, and least-significant bits) are latched before the final accumulate. A circuit in parallel with the multiplier array adds the exponents.
The result of the mantissa accumulate is normalized by a one bit right shifter if the most-significant bit of the mantissa accumulate is one. If the most- significant bit is not a one, the result is passed through unshifted. This is done by shifting right one bit and incrementing the exponent. The result of the mantissa accumulate is also rounded.
LeoFloat uses unbiased round-to-nearest mode. If the result is exactly half way between two representations, it is rounded to the nearest even fraction. This mode of rounding depends on the sticky bit, guard bit, and the least significand before rounding.
Floating point data elements are represented according to the ANSI IEEE 754- 1985 standard. The IEEE standard 32-bit single-precision floating point word, shown below, contains three fields: a sign bit (s), an eight-bit exponent, and a 23-bit fraction field.
------------------------------------------------------------------------------
Bits Name Content ------------------------------------------------------------------------------
31 Sign 1 if number is negative 23 - 30 Exponent Eight-bit exponent, biased by 127. Values of all zeros, and all ones, reserved. 0 - 22 Fraction 23-bit fraction component of normalized significand. The "one" bit is "hidden." ------------------------------------------------------------------------------
The floating point number is represented by the form:
- 2Sign \xb4 2(exponent - bias) \xb4 1.f
where
1.f is the significand
f is the bits in the significand fraction
bias is 127
Note that the significant part of the value contains a hidden bit, a binary point, and a 23-bit fraction. Inserted during arithmetic processing, the hidden bit has a value of one for all normalized numbers, and a zero for a value of zero or infinity. The fraction is from the 23-bit fraction field for normalized numbers. For values of zero and infinity, the fraction is zero. Thus, the value of an IEEE floating point number is determined by the following:
-----------------------------------------------------------------------------
Exponent Fraction Value Description -----------------------------------------------------------------------------
1 to 254 Any (-1)Sign \xb4 2e - 127 \xb4 (1.f) Normalized number 0 Any (-1)Sign \xb4 0.0 Zero 255 Any (-1)Sign \xb4 2e - 127 \xb4 (0.0) Plus or minus infinity -----------------------------------------------------------------------------
Table 6-7 summarizes how LeoFloat floating point multiplication results differ from the IEEE standard. In general, LeoFloat equates Infinity, NaN (Not a Number) to be Infinity and Zero, Denorm to be Zero.
Table 6-7 Floating Point Multiplication Results
-------------------------------------------------
Right Operand -------------------------------------------------
Left Operand NaN Inf Norm Denorm Zero -------------------------------------------------
NaN NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN NaN Norm NaN NaN Norm Zero Zero Denorm NaN NaN Zero Zero Zero Zero NaN NaN Zero Zero Zero -------------------------------------------------
The floating point add/subtract module performs the sign magnitude addition/subtraction. This is done by using two's complement operation, which in turn is converted to sign-magnitude representation.
This module consists of front end circuitry to detect exceptions such as zero for the hidden bit. The front end circuit consists of a shifter to denormalize the smaller of the two operands, an adder, a leading zero detector (priority encoder), a shifter to re-normalize the result, and circuits to round the result to the nearest representable value. If two numbers are equally near the result, the even number is chosen. Add or subtract operations take three cycles.
The floating point add/subtract module aligns two operands by shifting the mantissa of the smaller operand to the right. The shifter receives the shift amount from the exponent difference logic. The larger exponent is the common exponent. The smaller mantissa is also sent to the double word shifter to generate the sticky bit, which is used for rounding.
The leading zero detector is used in post normalization to provide the shifter with the shift amount. The shifter shifts left the result by the shift amount. The same shift amount is subtracted from the common exponent. In the case where the output of the add/subtract module overflows, a one place right shift is done to normalize the result. In this case, the common exponent is incremented by one.
The floating point add/subtract module generates a two-bit floating point condition code (fcc). These codes follow the SPARC model. The bits are in the state_bit register every cycle, and reflect the activity in the floating point ALU exactly three cycles past. The two-bit field has four possible fields, as follows:
Table 6-8 summarizes how the LeoFloat floating point addition and subtraction results differ from the IEEE standard.
-------------------------------------------------
Right Operand -------------------------------------------------
Left Operand NaN Inf Norm Denorm Zero -------------------------------------------------
NaN NaN NaN NaN NaN NaN Inf NaN NaN NaN NaN NaN Norm NaN NaN Norm Norm Nor m Denorm NaN NaN Norm Zero Zero Zero NaN NaN Norm Zero Zero -------------------------------------------------
The following table summarizes how the LeoFloat floating point add/subtract infinity results differ from the Sun SPARC standard:
---------------------------------------------------------------------------------------------------------------------------------------------------------
Floating ALU Input Input = Result Result Instruction A Source B Source LeoFloat SPARC ---------------------------------------------------------------------------------------------------------------------------------------------------------
FADD_PM 7FFx xxxx - FF8x xxxx FFFF FFFF 7FFF xxxx 7FAx xxxx - 7FCx xxxx FFFF FFFF 7FFF xxxx FADD_MP -7F8x xxxx + 7F8x xxxx FFFF FFFF 7FFF xxxx -7FAx xxxx + 7FCx xxxx 7FFF FFFF FFFF xxxx FADD_MM -7F8x xxxx - FF8x xxxx FFFF FFFF 7FFF xxxx FADD_PP FF8x xxxx + 7F8x xxxx FFFF FFFF 7FFF xxxx ---------------------------------------------------------------------------------------------------------------------------------------------------------
The integer ALU executes integer ALU instructions. This instruction group includes additions, subtractions, boolean operations, increments, and decrements. The integer ALU operates on the data present on the A bus and B bus. A summary of the integer ALU instructions is given in the LeoFloat Instruction Set Manual.
The integer ALU generates a four-bit integer condition code (icc). These codes follow the SPARC model. The bits are set in the state_bits register every cycle and reflect the activity in the integer ALU exactly three cycles past. These bits are defined as follows:
The LeoFloat input section, shown in Figure 6-5, interfaces with the LeoCommand chip. This module consists of a 16-bit data input bus (CF_DAT<15:0) from the LeoCommand chip. The data bus can be interpreted three different ways depending on CF Bus control bits (CF_CTL<2:0). The CF_CTL bits can indicate the CF Bus to be the following:
The CF_CTL bits are also used to initiate the SRAM block load/store instruction.
The input section contains an input data path that converts the CF Bus data to a 32-bit IEEE floating point format. The input data can be passed to the internal LeoFloat register in two different ways:
The input section also contains two input buffers, a counter, a flip-flop, and a shadow I0 register. The flip-flop keeps track of which of the two input buffers (I buffer) LeoCommand is writing to. The counter counts the number of words in the buffer currently being written to.
The input section is controlled by the input state machine. Depending on the state of the CF_CTL bits from LeoCommand and NEED_I and FREE_I from microcode, the state machine generates CF_BUF_AVL to LeoCommand and I_READY back to microcode. The state machine also generates control signals to all registers and multiplexers in the input section and keeps track of switching the I register bank for LeoCommand use.
After LEO_RST is issued, both input buffers are free. The state machine issues CF_BUF_AVL and informs microcode that there is no buffer ready for use. When LeoFloat receives CF_LOAD, it starts loading the first buffer until it decodes the end of packet (CF_CTL<0 = 1). Then the buffer is switched and signal `I ready" is issued.
Figure 6-5 LeoFloat Input Section Block Diagram
After receiving the next CF_LOAD, the state machine starts loading another buffer and CF_BUF_AVL is set to inactive. If any time the microcode sends FREE_I, the currently used buffer is freed for LeoCommand and CF_BUF_AVL is issued.
The interface to the rest of LeoFloat is given below (all signals are active high):
----------------------------------------------------------------------------------------------------------------------------------
Signal I/O Stable Description ----------------------------------------------------------------------------------------------------------------------------------
ADDR<5'> In Stable_B Input register address given by microcode I_DATA<32'> Out Stable_A Input data I0_DATA<32'> Out Stable_B Copy of data in I0 register SRAM_WR Out Stable_A Indicates current packet is SRAM write SRAM_RD Out Stable_A Indicates current packet is SRAM read I_READY Out Stable_A Indicates that another buffer is ready NEED_I In Stable_A Microcode requests next buffer FREE_I In Stable_A Microcode releases current buffer INTR Out Stable_A Interrupt from LeoCommand ----------------------------------------------------------------------------------------------------------------------------------
As far as the microcode is concerned, the input buffer has two states: buffer not allocated or buffer allocated. The NEED_I and FREE_I signals should be active for a single cycle. The NEED_I signal is ignored if a buffer is not available to be allocated. The FREE_I signal is ignored if a buffer is not currently allocated.
The I_READY signal indicates that the next buffer is available. If a buffer is currently allocated to microcode, I_READY indicates that the next buffer is ready for use. If a buffer is not currently allocated, I_READY indicates that a buffer can be allocated.
Packet lengths from 1 to 32 words are handled correctly. Any sequence of input data formats are allowed. A low data word followed by a high data word are put into the same register. A high data word followed by a low data word are put into successive registers. If a high data word isn't preceded by a low data word, the lower 16 bits of that register are undefined. If a low data word isn't succeeded by a high data word, the upper 16 bits of that register are undefined. All other sequences of data word formats are put into different registers.
The currently allocated buffer is tagged with SRAM_WR, SRAM_RD, and I0_DATA<32 values. These signals are valid only when a buffer is allocated to microcode.
The INTR signal indicates that LeoCommand has sent an interrupt. It is active for one cycle. An interrupt condition from LeoCommand is ignored by the rest of the input buffer logic.
LeoFloat contains 256 32-bit floating point (or integer) registers. They are arranged into four register groups: I, O, R, and P. The I and O register groups consist of 64 registers each, configured as double-buffered, 32 registers per buffer. The R and P groups are single-buffered. There are 64 R and 96 P registers.
The register address space is as shown in Figure 6-6.
The I, O, and P register group contains one write port (port A) and one read port (port B). The I registers only write data coming from the LeoCommand and microcode can only read data from the I read port. The O registers only write data from microcode and LeoCommand and LeoDraw can only read data from the read port. The P registers are general purpose registers for algorithmic use, both read and write ports are accessible to microcode.
The R register group contains one write port (port A) and two read ports (ports B and C).
The register files are described in more detail below.
The 64 I registers are for holding input parameters. At any one time, only half of these registers are visible to the programmer, as registers I0 through I31. The other 32 registers are available for use as an input FIFO for the next command.
The I registers are read-only to the microcode. Commands up to 32 parameters in length are placed into one bank of I registers by the input FIFO hardware. When the programmer asks for the next I register bank, the microsequencer waits if the next command is not yet complete in the alternate I register bank.
Figure 6-6 LeoFloat Register Files
The programmer can early free an I register bank when it has been processed, even if the command processing is not yet complete. This frees the I register bank for FIFO use. In effect, LeoFloat's input is almost triple-buffered.
By convention, the command operational code is placed into the least- significant five to nine bits of I0. A special command crack instruction can automatically dispatch to a microcode jump table based on these bits.
The 64 O registers are for holding output parameters. At any one time, only half of these registers are visible to the programmer, as registers O0 through O31. The other 32 registers are available for use as an input FIFO for the last command.
The O registers are write-only to the microcode. Output commands up to 32 parameters in length are read out of one bank of O registers by the output FIFO hardware. When the programmer asks for the next O register bank, the microsequencer waits if the previous (two) output commands have not yet drained from the alternate O register bank. The programmer can late request an O register bank in the middle of processing a command. This leaves the O register bank free for FIFO use until the last minute. In effect, the LeoFloat's output is almost triple-buffered.
The 64 R registers are completely general purpose registers for algorithmic use. Under some conditions, only half of the R registers are available for use within commands. By software convention, R0 is the throwaway destination, and should never be used to hold values to be read later.
The 96 P registers are dedicated for use as floating point multiplication constants (viewing matrices, screen coordinate conversions, etc.). The first 32 P registers are usually treated as read-only during most processing, although they are writable when the swap_pp bit in the state_bits register is set to 1. The second and third groups of 32 P registers are used as general temporaries as well as additional multiplication constants.
The clip bits register is a 32-bit clip status and revision register. The first 30 bits of the register are updated two bits at a time by one of two specialized clip-test instructions (one for vectors, one for triangles). Its state is tested by a number of specialized branch instructions. The remaining two bits are for the chip revision number.
The state bits register is a 32-bit condition code and status register. This register contains several specialized flags and state bits. The contents are side- effected by several instructions and explicitly set by two state update instructions.
The PC register is the 16-bit program counter.
PC stack is a hardware stack of eight 16-bit program counters for subroutine call/return.
The instruction sequencer consists of the instruction buffer, block load/store execution unit, and branch execution unit to execute instructions.
The instruction buffer is a 32-bit register. The instruction buffer is loaded from the SRAM every cycle except when LeoFloat needs to be halted due to an interlock (during non-microcode instruction execution). The register contents are decoded to generate control signals all over the chip.
The block load/store execution unit is initiated by either soft- or hard-coded instructions. The soft-coded instruction is microcode, while the hard-coded instruction is generated by LeoCommand through the CF Bus. Data can be stored into SRAM from any of the register file groups except the O registers.
Data read from SRAM can be written into any register file groups except the I registers. The block load/store execution unit pushes Program Counter (PC) onto the stack at the beginning of the execution. It pops PC from the stack at the end of instruction execution. The following summarizes the block load/store instruction execution.
The soft block load instruction loads 1 to 160 registers (R and P and I files) from SRAM. Block load from SRAM location (R[ra] + offset) to R[rs] - R[re]), where ra is register file address, rs is register file start address, and re is register file end address.
The soft block store instruction stores 1 to 160 registers (R and P and I files) to SRAM. Block store from registers R[rs] - R[re] to SRAM location (R[ra] + offset), where ra is register file address, rs is register file start address, and re is register file end address.
The hard block load instruction loads 1 to 160 registers (R, P, and O files) from SRAM. Block load from SRAM location R[SRAM-start-address] to R[start-reg- addr] - R[stop-reg-addr], where SRAM-start-address, start-reg-addr, and stop- reg-addr are fields in the shadow I0 register. The I0 register is shown below.
The hard block store instruction stores 1 to 160 registers (I, R, and P files) to SRAM. Block store from registers R[start-reg-addr] - R[stop-reg-addr] to SRAM location R[SRAM-start-address].
The branch execution unit executes the branch instructions and loads program counter (PC) with an absolute 16-bit address specified within the instruction. The hardware consists of a PC register, a PC incrementer, and an eight- subroutine stack ring buffer.
The instruction decoder consists of instruction decode logic. The instruction decode logic generates all the strobes required to execute the instruction in the instruction buffer. The instruction decode logic detects interlocks based on the instruction in the instruction buffer (IB) and some signals from state machines (I_BUFFER_AVAILABLE and O_BUFFER_AVAILABLE). If an interlock is detected, the next PC and the IB value are not loaded.
The Floating Point Reciprocal Unit, shown in Figure 6-7, computes the inverse of the value present on the B bus only. The reciprocal unit cannot be automatically started every cycle like the other units because it is not pipelined.
To compute a floating point reciprocal, the value to be reciprocated is placed on the B bus at time t and an ALU op-code is chosen that will start off the reciprocal. At no less than time t+9, the results of the reciprocal may be read out of the reciprocal unit onto the D bus by an instruction that allows the D bus source to be the reciprocal unit.
Unlike fully pipelined function units, the reciprocal unit need not have its results taken at a fixed time after it starts. Any time from t+9 on can be used to read the results as long as any additional reciprocal starts have not yet finished reciprocating. Any such additional starts cannot be issued until time t+7 without trashing the previous reciprocal operation. If a new reciprocal is started at time t, then a previous reciprocal result, if present, must be read out by time t+8.
Reciprocal is performed with the algorithm of shift and subtract. To achieve nine cycles pipelined throughout, two bits division instead of one bit is used. In this technique, two-bit shift and 4R-3D, 4R-2D, and 4R-D operations are performed in parallel (R is the remainder from the previous two bits divider and is less than the divisor D). The smallest positive result is selected as the new remainder and a quotient of either `11', `10', or `01' is selected accordingly. In the case that all the results of the three subtractions are negative, 4R is passed as the new remainder and a quotient of `00' is generated. Two three- bits dividers are cascaded to generate four bits of quotients each cycle.
There are some special cases that are treated differently. For those cases, the result is not calculated but is substituted in the last cycle. The special cases and the results are:
The output section, shown in Figure 6-8, interfaces LeoFloat with LeoDraw and LeoCommand via the CD Bus. The output section contains the output register file, an output state machine, a flip-flop and a counter. The state machine generates control signals for the output section and also generates control signals to the CF Bus (FLT_ST<1: 0) and back to microcode. The state machine checks status such as NEED_O, FREE_O from microcode and FLT_EN from LeoCommand. It switches the output buffer for microcode use. See Table 4-2 on page 4-6.
LeoCommand receives a steady stream of input packets over the SBus and issues them to any available LeoFloat. LeoCommand remembers the order in which the packets were sent to the LeoFloat chips for processing and passes them to LeoDraw in the same order. The LeoFloat status outputs (FLT_ST<1:0) provides the information LeoCommand needs to output data from LeoFloat.
Figure 6-8 LeoFloat Output Section Block Diagram
After reset, LeoFloat issues `00' (idle). Before each output transmission, the microcode sets state_bits register length, output_len (in words) of block to be sent and destination (output_dst bit). It also sets the null output bit (mcmb bit 3) and last output of command bit (mcmb bit 4). When the output state machine receives FREE_O from microcode, it starts transmission.
If LeoFloat completes an output packet for LeoDraw, it sends `10' (`11' for LeoCommand) and loads a counter with twice the number of words in the output packet. LeoFloat decrements this counter for each FLT_EN received from LeoCommand. For every data sent, it sets `10' for LeoDraw and `11' for LeoCommand. When the count drops to four and FLT_EN is active, LeoFloat sends `01' if this is the last output packet or `11' (`10' for LeoCommand) if it isn't.
If FLT_EN goes inactive on the next cycle, LeoFloat must continue to send this last word code. Whenever FLT_EN goes inactive in the middle of an output sequence to either LeoDraw or LeoCommand, LeoFloat must not change these two status pins. If FLT_EN stays active, LeoFloat can immediately send `10' (`11') for the next output packet. When the output packet counter drops to zero, the new word count can be loaded.
LeoFloat can address 64K words through the SRAM interface. The SRAM uses four 128K by 8 SRAMs with a 20 nanosecond access time. The LeoFloat SRAM interface generates RAM address, data outputs, and output enable signals during read cycles. The interface also generates RAM address, data inputs, and write enable during write cycles. The SRAM chip enable is tied low (enabled) at all times.
The first write takes two cycles. The first cycle is to turn the bus around and the second cycle is the write. The next writes are one cycle each. The last write takes two cycles, one to write and the other to turn the bus around. Reads are one cycle.
Each LeoFloat has four SRAMs, each SRAM containing eight bits of the 32-bit data word.
Figure 6-9 LeoFloat SRAM Interface
Figure 6-10 shows how the SRAM is organized. The SRAM is divided into four areas: dispatch table, microcode, data area, and context block.
The dispatch table takes up the first 512 locations and is the entry point for the transformation of all primitives. The dispatch table is addressed from the dispatch opcode in the CF Bus packet header.
The microcode area holds the actual microprogram instructions.
The data area contains the tables that change the dispatch table to point the initialization routines for each of the primitives that are not currently in use.
The context block contains context data saved to the host on a "store context" command and restored in LeoFloat by a "load context" command.
Figure 6-10 SRAM Map
Figure 6-11 shows the SRAM interface read cycle timing. Figure 6-12 shows the write cycle timing. For the write cycle, it is difficult to generate the write enable signal correctly when writing to the SRAM back-to-back. Therefore, a dead cycle is added for every write operation.
Figure 6-11 LeoFloat SRAM Interface Read Cycle Timing
The LeoFloat design attains high performance through a simple but rich instruction set and pipelined instruction execution. The design is compact and comprehensible thanks to clearly defined pipelines for each independent unit and comprehensive interface specifications between independent units.
In the following discussion, the following names for LeoFloat pipeline stages are used:
Instructions are fetched every cycle from the SRAM except in the case of an interlock. An interlock is any event that causes a temporary break in the continuous stream of execution of microcode instructions. All interlocks must be detected before or during the OP stage so that the next PC load may be disabled.
The LeoFloat power up sequence is activated when the LEO_RST pin is active low. This resets a reset flip-flop that generates an internal signal. The internal signal resets the program counter (PC) to 0 and halts the processor.
The CF_CTL<2:0 bits (110), along with CD_DAT<0 = 0, can be used as an interrupt/run condition. This condition can be thought of as the interrupt/run signal. The interrupt/run is used for starting the execution of the processor if it is halted (after a reset) or used for vectoring the processor to an interrupt if it was already running.
The CF_CTL<2:0 bits (110), along with CD_DAT<0 = 1, can be used as a soft reset. This does the same thing as LEO_RST.
When the processor is halted (after the reset), the interrupt/run causes the processor to start execution at location 0. When the processor is running, the interrupt/run pushes the PC onto the stack and the processor jumps to location 0 and begins normal execution. In effect, this is a forced jump to subroutine at location 0.
The power up sequence is as follows:
The interrupt/run can be used for debugging microcode. The debug sequence is as follows:
Note that correct working of microcode with an interrupt like this in the middle is not guaranteed because the state of various execution pipelines will be lost during the interrupt routine.
This section briefly describes the LeoFloat execution of primitive routines for dots, vectors, and triangles. For more information, look at the LeoFloat microcode.
If model clipping is enabled, model clipping rejection is done first. This is done so that if the dot is outside of the model clipping planes, the microprogram exits the dot routine and fetches another primitive.
Following model clipping rejection, the microprogram, using the input registers, transforms the vertex to NPC (Normalized Projection Coordinates) space and performs the view clipping on the vertex. Since a dot is either in or out of the view clipping planes, the trivial rejection test is performed next.
If mode clipping is enabled, the microprogram performs the model clipping calculations on the vertex. If lighting is required (dots with normals), light the vertex then convert the vertex to screen space and perform the perspective division. Else, convert the vertex to screen space and perform the perspective division then send the vertex to LeoDraw.
The LeoFloat input and output packets are further described below. See Chapter 3, "CF Bus," for more information on the LeoFloat input packets. See Chapter 4, "CD Bus," for more information on the LeoFloat output packets.
Plain dot input packet. The plain dot input packet consists of a 16-bit header and three 32-bit words, as follows:
------------------------------------------------------------
Type Description ------------------------------------------------------------
integer Input plain dot header (16 bits - dispatch = 001) float Input plain dot X (32 bits) float Input plain dot Y (32 bits) float Input plain dot Z (32 bits) ------------------------------------------------------------
Plain dot output packet. The plain dot output packet consists of a 16-bit header and three 32-bit words, as follows:
---------------------------------------------
Type Description ---------------------------------------------
integer Output dot header (16 bits - 00FF) integer Output dot X (32 bits) integer Output dot Y (32 bits) integer Output dot Z (32 bits) ---------------------------------------------
RGB dot input packet. The RGB dot input packet consists of a 16-bit header and six words, as follows:
-----------------------------------------------
Type Description -----------------------------------------------
integer Input RGB dot header (dispatch = 002) float Input RGB dot X (32-bits) float Input RGB dot Y (32-bits) float Input RGB dot Z (32-bits) float Input RGB dot Red (16 bits) float Input RGB dot Green (16 bits) float Input RGB dot Blue (16 bits) -----------------------------------------------
RGB dot output packet. The RGB dot output packet consists of a 16-bit header and six 32-bit words, as follows:
--------------------------------------
Type Description --------------------------------------
integer Output RGB dot header (02FF) integer Output RGB dot X integer Output RGB dot Y integer Output RGB dot Z integer Output RGB dot Red integer Output RGB dot Green integer Output RGB dot Blue --------------------------------------
Normal dot input packet. The normal dot input packet consists of a 16-bit header and six 32-bit words, as follows:
--------------------------------------------------
Type Description --------------------------------------------------
Integer Input normal dot header (dispatch = 003) float Input normal dot X float Input normal dot Y float Input normal dot Z float Input normal dot Nx float Input normal dot Ny float Input normal dot Nz --------------------------------------------------
Normal dot output packet. The normal dot output packet consists of a 16-bit header and six 32-bit words, as follows:
----------------------------------
Type Description ----------------------------------
integer Output normal dot header integer Output normal dot X integer Output normal dot Y integer Output normal dot Z integer Output normal dot Red integer Output normal dot Green integer Output normal dot Blue ----------------------------------
RGB normal dot input packet. The RGB normal dot input packet consists of a 16-bit header and nine words, as follows:
------------------------------------------------------
Type Description ------------------------------------------------------
integer Input RGB normal dot header (dispatch = 004) float Input RGB normal dot X (32 bits) float Input RGB normal dot Y (32 bits) float Input RGB normal dot Z (32 bits) float Input RGB normal dot Nx (32 bits) float Input RGB normal dot Ny (32 bits) float Input RGB normal dot Nz (32 bits) float Input RGB normal dot Red (16 bits) float Input RGB normal dot Green (16 bits) float Input RGB normal dot Blue (16 bits) ------------------------------------------------------
RGB normal dot output packet. The RGB normal dot output packet consists of a 16-bit header and six 32-bit words, as follows:
--------------------------------------
Type Description --------------------------------------
integer Output RGB normal dot header integer Output RGB normal dot X integer Output RGB normal dot Y integer Output RGB normal dot Z integer Output RGB normal dot Red integer Output RGB normal dot Green integer Output RGB normal dot Blue --------------------------------------
If model clipping is enabled, the model clipping rejection algorithm is performed. This is performed in model coordinates (MC) since the model clipping planes are sent to LeoFloat in model coordinates. This speeds up the cases where the vector is outside of the model clip planes and is thrown away.
If the vector has normals, the microcode performs the face determination calculations and sets the "face_we_got" bit in the state_bits register. These calculations are performed in model coordinates. If face culling is enabled and the vertices are facing the wrong direction, the vector is rejected.
The microprogram uses the input registers to transform the two vertices from model coordinates to normalized projection coordinates (NPC). Using the hardware-supported clip_test instruction, the clip bits are calculated. At this point, if the vector is outside the view port clipping planes, it is rejected.
If the primitive has normals (any primitive with normals), the lighting calculations are performed. Following the lighting calculations (or the generation of the clip codes), a check is made to see if either view clipping or model clipping needs to be done. For view clipping, the hardware clip_bits register is used to determine if one of the vertices needs to be clipped in more than one plane. Next, the microprogram branches to the correct clipping code: single plane clip or multi-plane clip. The vertex or vertices are clipped on all of six view clipping planes, if necessary.
If model clipping needs to be done, the "W" values are transformed back to model coordinates and model clipping is performed. Once done, the "W" value is restored back to NPC space for the setup calculations.
Now the microprogram is ready to perform the perspective division and convert the vertices from NPC to screen space. Next, the microprogram calculates the values needed in LeoDraw to draw the vector.
Plain vector input packet. The plain vector input packet consists of a 16-bit header and six words, as follows:
----------------------------------------------------
Type Description ----------------------------------------------------
integer Input plain vector header (dispatch = 005) float Input plain vector X1 float Input plain vector Y1 float Input plain vector Z1 float Input plain vector X2 float Input plain vector Y2 float Input plain vector Z2 ----------------------------------------------------
Plain vector output packet. The plain vector output packet consists of a 16-bit header and six words, as follows:
-------------------------------------------
Type Description -------------------------------------------
integer Output plain vector header (04FF) integer Output plain vector us integer Output plain vector vs integer Output plain vector zs integer Output plain vector ue integer Output plain vector dzDu integer Output plain vector dvDu -------------------------------------------
RGB vector input packet. RGB vector output packets consist of a 16-bit header and 12 words, as follows:
--------------------------------------------------
Type Description --------------------------------------------------
integer Input RGB vector header (dispatch = 006) float Input RGB vector X1 float Input RGB vector Y1 float Input RGB vector Z1 float Input RGB vector Red1 float Input RGB vector Green1 float Input RGB vector Blue1 float Input RGB vector X2 float Input RGB vector Y2 float Input RGB vector Z2 float Input RGB vector Red2 float Input RGB vector Green2 float Input RGB vector Blue2 --------------------------------------------------
RGB vector output packet. RGB vector output packets consist of a 16-bit header and 12 words, as follows:
-----------------------------------------
Type Description -----------------------------------------
integer Output RGB vector header (06FF) integer Output RGB vector us integer Output RGB vector vs integer Output RGB vector zs integer Output RGB vector Red integer Output RGB vector Green integer Output RGB vector Blue integer Output RGB vector ue integer Output RGB vector dzDu integer Output RGB vector drDu integer Output RGB vector dgDu integer Output RGB vector dbDu integer Output RGB vector dvDu -----------------------------------------
Normal vector input packet. Normal vector input packets consist of a 16-bit header and 12 words, as follows:
-----------------------------------------------------
Type Description -----------------------------------------------------
integer Input normal vector header (dispatch = 007) float Input normal vector X1 float Input normal vector Y1 float Input normal vector Z1 float Input normal vector Nx1 float Input normal vector Ny1 float Input normal vector Nz1 float Input normal vector X2 float Input normal vector Y2 float Input normal vector Z2 float Input normal vector Nx2 float Input normal vector Ny2 float Input normal vector Nz2 -----------------------------------------------------
Normal vector output packet. Normal vector output packets consist of a 16-bit header and 12 words, as follows:
--------------------------------------------
Type Description --------------------------------------------
integer Output normal vector header (06FF) integer Output normal vector us integer Output normal vector vs integer Output normal vector zs integer Output normal vector Red integer Output normal vector Green integer Output normal vector Blue integer Output normal vector ue integer Output normal vector dzDu integer Output normal vector drDu integer Output normal vector dgDu integer Output normal vector dbDu integer Output normal vector dvDu --------------------------------------------
Normal RGB vector input packet. Normal RGB vector input packets consist of a 16-bit header and 18 words, as follows:
---------------------------------------------------------
Type Description ---------------------------------------------------------
integer Input normal RGB vector header (dispatch = 008) float Input normal RGB vector X1 float Input normal RGB vector Y1 float Input normal RGB vector Z1 float Input normal RGB vector Nx1 float Input normal RGB vector Ny1 float Input normal RGB vector Nz1 float Input normal RGB vector Red1 float Input normal RGB vector Green1 float Input normal RGB vector Blue1 float Input normal RGB vector X2 float Input normal RGB vector Y2 float Input normal RGB vector Z2 float Input normal RGB vector Nx2 float Input normal RGB vector Ny2 float Input normal RGB vector Nz2 float Input normal RGB vector Red2 float Input normal RGB vector (Green 2 float Input normal RGB vector Blue 2 ---------------------------------------------------------
Normal RGB vector output packet. Normal RGB vector output packets consist of a 16-bit header and 12 words, as follows:
------------------------------------------------
Type Description ------------------------------------------------
integer Output normal RGB vector header (06FF) integer Output normal RGB vector us integer Output normal RGB vector vs integer Output normal RGB vector zs integer Output normal RGB vector Red integer Output normal RGB vector Green integer Output normal RGB vector Blue integer Output normal RGB vector ue integer Output normal RGB vector dzDu integer Output normal RGB vector drDu integer Output normal RGB vector dgDu integer Output normal RGB vector dbDu integer Output normal RGB vector dvDu ------------------------------------------------
The triangles microprogram routine swaps the triangle vertices around, sorting then in order from lowest to highest using the Y values. It then calculates all the slopes for each vertex and color and the starting point of the triangle and distance from the edges.
If the clip_pending bit is et in the state_bits register, the microprogram branches to a routine that traverses the clip list getting the three vertices for the next triangle, then does the perspective calculations and conversion to screen space coordinates. This loop is continued until all of the triangles in the vertex list have been processed.
Triangle output packets are always the same. Only the input packets differ.
RGB triangle input packet. The RGB triangle input packet consists of a 16-bit header and 18 words, as follows:
----------------------------------------------------
Type Description ----------------------------------------------------
integer Input RGB triangle header (dispatch = 010) float Input RGB triangle X1 float Input RGB triangle Y1 float Input RGB triangle Z1 float Input RGB triangle Red1 float Input RGB triangle Green1 float Input RGB triangle Blue1 float Input RGB triangle X2 float Input RGB triangle Y2 float Input RGB triangle Z2 float Input RGB triangle Red2 float Input RGB triangle Green2 float Input RGB triangle Blue2 float Input RGB triangle X3 float Input RGB triangle Y3 float Input RGB triangle Z3 float Input RGB triangle Red3 float Input RGB triangle Green3 float Input RGB triangle Blue3 ----------------------------------------------------
Vnormal triangle input packet. The Vnormal triangle input packet consists of a 16-bit header and 18 words, as follows:
-------------------------------------------------------
Type Description -------------------------------------------------------
integer Input normal triangle header (dispatch = 009) float Input normal triangle X1 float Input normal triangle Y1 float Input normal triangle Z1 float Input normal triangle norm X1 float Input normal triangle norm Y1 float Input normal triangle norm Z1 float Input normal triangle X2 float Input normal triangle Y2 float Input normal triangle Z2 float Input normal triangle norm X2 float Input normal triangle norm Y2 float Input normal triangle norm Z2 float Input normal triangle X3 float Input normal triangle Y3 float Input normal triangle Z3 float Input normal triangle norm X3 float Input normal triangle norm Y3 float Input normal triangle norm Z3 -------------------------------------------------------
RGB Vnormal triangle input packet. The RGB Vnormal triangle input packet consists of a 16-bit header and 27 words, as follows:
-------------------------------------------------------
Type Description -------------------------------------------------------
integer Input normal triangle header (dispatch = 011) float Input normal triangle X1 float Input normal triangle Y1 float Input normal triangle Z1 float Input normal triangle norm X1 float Input normal triangle norm Y1 float Input normal triangle norm Z1 float Input normal triangle Red1 float Input normal triangle Green1 float Input normal triangle Blue1 float Input normal triangle X2 float Input normal triangle Y2 float Input normal triangle Z2 float Input normal triangle norm X2 float Input normal triangle norm Y2 float Input normal triangle norm Z2 float Input normal triangle Red2 float Input normal triangle Green2 float Input normal triangle Blue2 float Input normal triangle X3 float Input normal triangle Y3 float Input normal triangle Z3 float Input normal triangle norm X3 float Input normal triangle norm Y3 float Input normal triangle norm Z3 float Input normal triangle Red3 float Input normal triangle Green3 float Input normal triangle Blue3 -------------------------------------------------------
Facet normal triangle input packet. The facet normal triangle input packets consist of a 16-bit header and 21 words, as follows:
-------------------------------------------------------
Type Description -------------------------------------------------------
integer Input normal triangle header (dispatch = 009) float Input normal triangle X1 float Input normal triangle Y1 float Input normal triangle Z1 float Input normal triangle norm X1 float Input normal triangle norm Y1 float Input normal triangle norm Z1 float Input normal triangle X2 float Input normal triangle Y2 float Input normal triangle Z2 float Input normal triangle norm X2 float Input normal triangle norm Y2 float Input normal triangle norm Z2 float Input normal triangle X3 float Input normal triangle Y3 float Input normal triangle Z3 float Input normal triangle norm X3 float Input normal triangle norm Y3 float Input normal triangle norm Z3 float Input facet normal X float Input facet normal Y float Input facet normal Z -------------------------------------------------------
RGB Fnormal triangle input packet. The RGB Fnormal triangle input packets consist of a 16-bit header and 30 words, as follows:
-------------------------------------------------------
Type Description -------------------------------------------------------
integer Input normal triangle header (dispatch = 010) float Input normal triangle X1 float Input normal triangle Y1 float Input normal triangle Z1 float Input normal triangle norm X1 float Input normal triangle norm Y1 float Input normal triangle norm Z1 float Input normal triangle Red1 float Input normal triangle Green1 float Input normal triangle Blue1 float Input normal triangle X2 float Input normal triangle Y2 float Input normal triangle Z2 float Input normal triangle norm X2 float Input normal triangle norm Y2 float Input normal triangle norm Z2 float Input normal triangle Red2 float Input normal triangle Green2 float Input normal triangle Blue2 float Input normal triangle X3 float Input normal triangle Y3 float Input normal triangle Z3 float Input normal triangle norm X3 float Input normal triangle norm Y3 float Input normal triangle norm Z3 float Input normal triangle Red3 float Input normal triangle Green3 float Input normal triangle Blue3 float Input RGB facet normal X float Input RGB facet normal Y float Input RGB facet normal Z -------------------------------------------------------
The triangle output packets consist of a 16-bit header and 21 words, as follows:
---------------------------------------
Type Description ---------------------------------------
integer Output triangle header (08FF) integer Output triangle xs integer Output triangle xe2 integer Output triangle zs integer Output triangle rs integer Output triangle gs integer Output triangle bs integer Output triangle xe integer Output triangle dzDu integer Output triangle drDu integer Output triangle dgDu integer Output triangle dbDu integer Output triangle ys integer Output triangle count12 integer Output triangle count13 integer Output triangle dxsDv integer Output triangle dxeDv integer Output triangle dxe2Dv integer Output triangle dzDv integer Output triangle drDv integer Output triangle dgDv integer Output triangle dbDv ---------------------------------------