6 LeoFloat ASIC

This chapter describes the LeoFloat ASIC. For information on the microcode that controls LeoFloat, see the LeoFloat Instruction Set Manual.

6.1 Introduction

The LeoFloat ASIC (U0501, U0601, U0701, and U0801 on the lower board) is a special floating-point chip designed specifically for 3D graphics. LeoFloat converts model space triangles, vectors, and dots into screen space fixed-point rendering parameters. LeoFloat supports specialized floating point instructions required by vector and triangle transform, lighting, and set-up algorithms.

The primary features of the LeoFloat ASIC are:

32-bit internal floating point throughout (IEEE 32-bit compatible on I/O)

Large internal multi-ported register file

Input and output FIFO buffers for data, automatically sequenced

Parallel floating point multiplier

Parallel floating point ALU

Parallel floating point iterative reciprocal circuit

Parallel integer ALU

Special clip-test hardware assist

Floating-point to fixed-point and fixed-point to floating-point conversion instructions

Integer bit-field extraction/insertion barrel shift and merge instructions

On-chip simple sequencer, external SRAM microcode

Configurable for multi-processor (parallel task) deployment

25 MHz (50+ megaflop operation)

6.1.1 The LeoFloat Graphics Pipeline

Figure 6-1 shows an example of a primitive object, a triangle, rendered through the LeoFloat pipeline. Since the actual process is under microprogram control, and the microprogram is subject to change, this is merely a possible example of the transformation process performed within LeoFloat. The transformation process for a triangle is as follows:

Start

Chapter 3, "CF Bus,"

Initialization

Model clip rejection

Face determination

"State Bits Register" on page 6 - 27

View transformation

Lighting

Figure 6-1 LeoFloat Graphics Pipeline

Clipping

The single-plane clipping routine clips the triangle if one vertex is outside of the clipping plane. This creates two triangles. The second triangle is placed in the vertex list and the clip_pending bit is set in the State bits register so that the setup routine can process the second triangle.

The complex clipping routine clips all the vertices of the triangle that are out to the closest clipping plane. The triangles usually have more than one vertex out that are in different clipping planes. This causes more than one triangle to be generated when clipped. These triangles are put into the vertex list and the clip_pending bit is set. The setup routine traverses the vertex list until all the triangles are processed.

Sort vertices

Calculate slopes

10. LeoFloat now sends the values to LeoDraw.

If the clip_pending bit is set in the State bits register, the microprogram branches to a routine that traverses the clip list to get the three vertices for the next triangle. After that, the routine branches back to the triangle code that called the setup routine and performs the respective calculations and conversion to screen space coordinates. This loop continues until all the triangles have been processed from the vertex list.

6.2 Block Diagram

Figure 6-2 shows the LeoFloat external interface and typical SRAM connection. The internal detail is only functional.

Figure 6-2 LeoFloat ASIC Simplified Block Diagram

6.3 Pin Descriptions

LeoFloat has four major interfaces: CF Bus (input), CD Bus (output), SRAM, and external.

6.3.1 CF Bus Interface

The CF Bus interface contains those signals that interface between the LeoCommand and LeoFloat ASICs. Table 6-1 summarizes the CF Bus interface signals. See Chapter 3, "CF Bus," for more information on this interface.

Table 6-1 LeoFloat CF Bus Interface Signals

----------------------------------------------------------------------------------------------------------------

                                                                      
Signal Name       No. Pins  I/O                             Type      Description
----------------------------------------------------------------------------------------------------------------

                                                                      
CF_DAT<15:0'>      16        I                               Bi-state  LeoCommand to LeoFloat data bus.
                                                                      
CF_CTL<2:0'>       3         I                               Bi-state  LeoCommand to LeoFloat write control.
                                                                      
CF_BUF_AVL<3:0'>   1         O                               Bi-state  Buffer available for LeoCommand in LeoFloat.
                                                                      
CF_LOAD<3:0'>      1         I                               Bi-state  LeoCommand to LeoFloat write enable.

----------------------------------------------------------------------------------------------------------------

6.3.1.1 CF_DAT<15:0'>

The CF_DAT<15:0 lines are a 16-bit data input bus. There are two types of data on the bus:

A signed 16-bit fraction, which needs to be converted to an IEEE 32-bit floating point number.

Two 16-bit numbers present during two consecutive cycles are combined into a single 32-bit un-interpreted number.

6.3.1.2 CF_CTL<2:0'>

The encoded CF_CTL<2:0 bits describe the word format of the data on the bus. The end of the packet is also encoded on these bits. The code for the last word will also indicate whether the packet should be read or written to SRAM. The start of execution and interruption of the LeoFloat microprocessor is covered by one of the codes. Table 6-2 lists the encodings.

Table 6-2 CF_CTL<2:0 Encoding

---------------------------------------------------------------------------------------------

                            
CF_CTL<2:0'>   Format        LeoFloat Operation
---------------------------------------------------------------------------------------------

                            
000           Packed        Load 16-bit fraction, convert to floating point.
                            
001           Packed        Load 16-bit fraction, convert to floating point. End of packet.
                            
010           Unpacked      Load 16-bit upper half of 32-bit word.
                            
011           Unpacked      Load 16-bit upper half of 32-bit word. End of packet.
                            
100           Unpacked      Load 16-bit lower half of 32-bit word, zero extend.
                            
101           Unpacked      Load 16-bit upper half of 32-bit word. SRAM read. End of packet.
                            
110           CF_DAT<0 = 1   Reset all LeoFloats (CF_LOAD = 0).
                            
              CF_DAT<0 = 0   Interrupt/run (push PC onto the stack, jump to 0; run if halted).
                            
111           Unpacked      Load 16-bit upper half of 32-bit word. SRAM write. End of packet.

---------------------------------------------------------------------------------------------

6.3.1.3 CF_BUF_AVL<3:0'>

Each LeoFloat sends a "buffer available" signal to LeoCommand. This signal indicates to LeoCommand that in three phases (half clock cycles) the input buffer can accept data. The latency from the "buffer available" signal going active to data on the CF_DAT lines is three phases (half clock cycles). LeoFloat should try to predict in advance that its input buffer will become available in order to hide this latency.

6.3.1.4 CF_LOAD<3:0'>

Each LeoFloat receives a "load" signal from LeoCommand. This signal indicates to LeoFloat that it should receive the data on the CF_DAT<15:0 lines according to the encoding on the control bits.

6.3.2 CD Bus Interface

The CD Bus interface contains those signals that interface between LeoFloat, LeoCommand, and LeoDraw. Table 6-3 summarizes the output interface signals. See Chapter 4, "CD Bus," for more information on this interface.

Table 6-3 LeoFloat CD Bus Interface Signals

---------------------------------------------------------------------------------------------------

                                                                      
Signal Name      No. Pins  I/O                             Type       Description
---------------------------------------------------------------------------------------------------

                                                                      
CD_DAT<15:0'>     16        I/O                             Tri-state  LeoFloat data out bus.
                                                                      
FLTn_ST<1:0'>     2         O                               Bi-state   LeoFloat output section status.
                                                                      
FLT_ENn          1         I                               Bi-state   Okay to output control.

---------------------------------------------------------------------------------------------------

6.3.2.1 CD_DAT<15:0'>

Float data out, a 16-bit output data bus. This bus outputs a 32-bit value over two clock cycles.

6.3.2.2 FLTn_ST<1:0'>

This is a two-bit status output for each LeoFloat, where n identifies the LeoFloat ASIC: 0, 1, 2, or 3, encoded as follows:

----------------------------------------------

              
FLT_ST<1:0'>   Meaning
----------------------------------------------

              
00            Idle
              
01            No data to output
              
10            Request to output to LeoDraw
              
11            Request to output to LeoCommand

----------------------------------------------

6.3.2.3 FLT_ENn

The FLT_EN signal to each LeoFloat tells the specific LeoFloat to go ahead and output the data. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.

6.3.3 SRAM Interface

The SRAM interface contains those signals that interface between LeoFloat and the SRAM. Table 6-4 summarizes the SRAM interface signals.

Table 6-4 LeoFloat SRAM Interface Signals

--------------------------------------------------------------------------------------------

                                                                        
Signal Name        No. Pins  I/O                             Type       Description
--------------------------------------------------------------------------------------------

                                                                        
LFn_SR_DAT<31:0'>   32        I/O                             Tri-state  SRAM data bus
                                                                        
Fn_SR_ADR<16:0'>    17        O                               Bi-state   SRAM address bus
                                                                        
LFn_SR_WE_L        1         O                               Bi-state   SRAM read/write enable
                                                                        
LFn_SR_OE_L        1         O                               Bi-state   SRAM output enable

--------------------------------------------------------------------------------------------

6.3.3.1 LFn_SR_DAT<31:0'>

This is a 32-bit bi-directional SRAM data bus that carries data to or from SRAM. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.

6.3.3.2 Fn_SR_ADR<16:0'>

This is a 17-bit output bus that carries the SRAM address. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.

6.3.3.3 LFn_SR_WE_L

This is the SRAM write-enable signal. When low, the SRAM is performing a write cycle. When high, the SRAM is performing a read cycle. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.

6.3.3.4 LFn_SR_OE_L

This is the SRAM output enable control signal. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.

6.3.4 External Interface

The external interface contains those signals required to keep synchronization among chips to interface with JTAG for test and diagnostics. Table 6-5 summarizes the external interface signals.

Table 6-5 LeoFloat External Interface

-------------------------------------------------------------------------------------------------------------

                                                                 
Signal Name  No. Pins  I/O                             Type      Description
-------------------------------------------------------------------------------------------------------------

                                                                 
CLK_25M_LFn  1         I                               Bi-state  System clock.
                                                                 
LEO_RST_L    1         I                               Bi-state  Reset
                                                                 
TCK          1         I                               Bi-state  JTAG test clock. Tied to the system clock.
                                                                 
SCAN_TMS     1         I                               Bi-state  JTAG test mode select.
                                                                 
TDI          1         I                               Bi-state  JTAG test data in.
                                                                 
FLOATn_TDO   1         O                               Bi-state  JTAG test data out.
                                                                 
TEST_OE      1         I                               Bi-state  Global test pin to tri-state all output pins. 
                                                                 Normally pulled up.
                                                                 
HRD_INT_L    1         I                               Bi-state  Hard interrupt. Normally pulled up.
                                                                 
BURN_IN_L    1         I                               Bi-state  Burn in.

-------------------------------------------------------------------------------------------------------------

6.3.4.1 CLK_25M_LFn

This input signal drives the on-chip clock generator. It is a symmetrical clock with a nominal 40 nanosecond period. It is used to derive the chip internal clocks, PHASE_A and PHASE_B. Where n identifies the LeoFloat ASIC: 0, 1, 2, or 3.

Figure 6-3 shows the system clock (CLK_25_L) and the two chip internal clocks. The PHASE_A and PHASE_B clocks are non-overlapping, with time Td1 or Td2 between the falling edge of one and the rising edge of the other. LeoFloat is designed with transparent latches that are open during the phase and closed at the falling edge of the phase.

Figure 6-3 PHASE_A and PHASE_B Clocks

6.3.4.2 LEO_RST_L

This input is a reset signal used to initialize some of the chip internal nodes.

6.3.4.3 TCK_L

A free-running test clock signal imposed by JTAG. Tied to the system clock, CLK_25M_LFn.

6.3.4.4 SCAN_TMS

A JTAG test mode select signal.

6.3.4.5 TEST_OE

A global test pin driven by the board tester to tri-state all output pins. Normally pulled up.

6.3.4.6 TDI

A JTAG test data input signal. For a description of the scan test in and out signals, see "The JTAG Scan In and Out Signals" on page 2 - 19.

6.3.4.7 FLOATn_TDO

A JTAG test data out signal.

6.3.4.8 HRD_INT_L

Hard interrupt signal. Normally tied up.

6.3.4.9 BURN_IN_L

Burn in signal.

6.3.5 Power and Ground

Table 6-6 summarizes the power and ground pins.

Table 6-6 LeoFloat Power and Ground Signals

-------------------------------------------

                           
Signal Name      No. Pins  Description
-------------------------------------------

                           
IF_ORING_VSS     19        Output ring Vss.
                           
IF_ORING_VDD     19        Output ring Vdd.
                           
IF_CORE_VSS      6         Core Vss.
                           
IF_CORE_VDD      6         Core Vdd.
                           
IF_CLK_25_L_VSS  1         Clock Vss.
                           
IF_CLK_25_L_VDD  1         Clock Vdd.
                           
IF_IRING_VSS     2         Input ring Vss.
                           
IF_IRING_VDD     2         Input ring Vdd.
                           
IF_TCK_VSS       1         JTAG test Vss.
                           
IF_TCK_VDD       1         JTAG test Vdd.

-------------------------------------------

6.4 Functional Description

The LeoFloat ASIC is basically divided into the following ten functional blocks:

Floating point multiplier

Floating point add/subtract

Integer ALU

Floating point reciprocal

Input section

Output section

Instruction sequencer

Instruction decoder

SRAM interface

6.4.1 Block Diagram

Figure 6-4 shows the LeoFloat ASIC block diagram. The functional blocks in the figure are described in more detail following the block diagram.

Figure 6-4 LeoFloat ASIC Functional Block Diagram

6.4.2 Data Paths

The primary batch of microprogram instructions are arithmetic. These instructions have the same source and destination form. There are three sources (A Bus, B Bus, and C Bus) and one destination (D Bus).

6.4.2.1 A Bus

The A Bus serves as one input leg to the floating point multiplier, as one input leg to the integer ALU, and as one of two possible inputs to one leg of the floating point ALU.

6.4.2.2 B Bus

The B Bus serves as the other input leg to the floating point multiplier, the sole input to the floating point reciprocal unit, and as the other input leg to the integer ALU.

6.4.2.3 C Bus

The C Bus serves solely as the other input leg to the floating point ALU.

6.4.2.4 D Bus

The D Bus is the output. The D Bus also has a bypass path, which bypasses the register files for direct output to the other buses.

6.4.3 Floating Point Multiplier

The floating point multiplier module multiplies two floating-point operands together. The result is normalized and rounded.

The floating point multiplier operates on data that conforms to the IEEE single precision floating point format. The multiply operations take two cycles, but because the module is pipelined, a new multiply can be initiated every cycle.

The floating point multiplier module consists of a multiplier block, adders, and normalization and rounding circuits.

The mantissa multiplier block is a 26 by 26 two`s-complement multiplier array. The multiplier array outputs (sums, carries, and least-significant bits) are latched before the final accumulate. A circuit in parallel with the multiplier array adds the exponents.

The result of the mantissa accumulate is normalized by a one bit right shifter if the most-significant bit of the mantissa accumulate is one. If the most- significant bit is not a one, the result is passed through unshifted. This is done by shifting right one bit and incrementing the exponent. The result of the mantissa accumulate is also rounded.

LeoFloat uses unbiased round-to-nearest mode. If the result is exactly half way between two representations, it is rounded to the nearest even fraction. This mode of rounding depends on the sticky bit, guard bit, and the least significand before rounding.

6.4.3.1 Data Format

Floating point data elements are represented according to the ANSI IEEE 754- 1985 standard. The IEEE standard 32-bit single-precision floating point word, shown below, contains three fields: a sign bit (s), an eight-bit exponent, and a 23-bit fraction field.

------------------------------------------------------------------------------

                  
Bits    Name      Content
------------------------------------------------------------------------------

                  
31      Sign      1 if number is negative
                  
23 - 30 Exponent  Eight-bit exponent, biased by 127. Values of all zeros, and 
                  all ones, reserved.
                  
0 - 22  Fraction  23-bit fraction component of normalized significand. The 
                  "one" bit is "hidden."

------------------------------------------------------------------------------

The floating point number is represented by the form:

- 2Sign \xb4 2(exponent - bias) \xb4 1.f

where

1.f is the significand
f is the bits in the significand fraction
bias is 127

Note that the significant part of the value contains a hidden bit, a binary point, and a 23-bit fraction. Inserted during arithmetic processing, the hidden bit has a value of one for all normalized numbers, and a zero for a value of zero or infinity. The fraction is from the 23-bit fraction field for normalized numbers. For values of zero and infinity, the fraction is zero. Thus, the value of an IEEE floating point number is determined by the following:

-----------------------------------------------------------------------------

                                                        
Exponent  Fraction  Value                               Description
-----------------------------------------------------------------------------

                                                        
1 to 254  Any       (-1)Sign \xb4 2e - 127 \xb4 (1.f)           Normalized number
                                                        
0         Any       (-1)Sign \xb4 0.0                       Zero
                                                        
255       Any       (-1)Sign \xb4 2e - 127 \xb4 (0.0)           Plus or minus infinity

-----------------------------------------------------------------------------

Table 6-7 summarizes how LeoFloat floating point multiplication results differ from the IEEE standard. In general, LeoFloat equates Infinity, NaN (Not a Number) to be Infinity and Zero, Denorm to be Zero.

Table 6-7 Floating Point Multiplication Results

-------------------------------------------------

                                                
              Right Operand                     
-------------------------------------------------

                                                
Left Operand  NaN            Inf  Norm  Denorm  Zero
-------------------------------------------------

                                                
NaN           NaN            NaN  NaN   NaN     NaN
                                                
Inf           NaN            NaN  NaN   NaN     NaN
                                                
Norm          NaN            NaN  Norm  Zero    Zero
                                                
Denorm        NaN            NaN  Zero  Zero    Zero
                                                
Zero          NaN            NaN  Zero  Zero    Zero

-------------------------------------------------

6.4.4 Floating Point Add/Subtract

The floating point add/subtract module performs the sign magnitude addition/subtraction. This is done by using two's complement operation, which in turn is converted to sign-magnitude representation.

This module consists of front end circuitry to detect exceptions such as zero for the hidden bit. The front end circuit consists of a shifter to denormalize the smaller of the two operands, an adder, a leading zero detector (priority encoder), a shifter to re-normalize the result, and circuits to round the result to the nearest representable value. If two numbers are equally near the result, the even number is chosen. Add or subtract operations take three cycles.

The floating point add/subtract module aligns two operands by shifting the mantissa of the smaller operand to the right. The shifter receives the shift amount from the exponent difference logic. The larger exponent is the common exponent. The smaller mantissa is also sent to the double word shifter to generate the sticky bit, which is used for rounding.

The leading zero detector is used in post normalization to provide the shifter with the shift amount. The shifter shifts left the result by the shift amount. The same shift amount is subtracted from the common exponent. In the case where the output of the add/subtract module overflows, a one place right shift is done to normalize the result. In this case, the common exponent is incremented by one.

The floating point add/subtract module generates a two-bit floating point condition code (fcc). These codes follow the SPARC model. The bits are in the state_bit register every cycle, and reflect the activity in the floating point ALU exactly three cycles past. The two-bit field has four possible fields, as follows:

Equal (e) - indicates whether the floating point add/subtract result was equal.

Less than (1) - indicates whether the floating point add/subtract result was less than.

Greater than (g) - indicates whether the floating point add/subtract result was greater than.

Unordered (u) - indicates whether the floating point add/subtract result was unordered.

Table 6-8 summarizes how the LeoFloat floating point addition and subtraction results differ from the IEEE standard.

Table 6-8 Floating Point Addition and Subtraction Results

-------------------------------------------------

                                                
              Right Operand                     
-------------------------------------------------

                                                
Left Operand  NaN            Inf  Norm  Denorm  Zero
-------------------------------------------------

                                                
NaN           NaN            NaN  NaN   NaN     NaN
                                                
Inf           NaN            NaN  NaN   NaN     NaN
                                                
Norm          NaN            NaN  Norm  Norm    Nor
                                                m
                                                
Denorm        NaN            NaN  Norm  Zero    Zero
                                                
Zero          NaN            NaN  Norm  Zero    Zero

-------------------------------------------------

The following table summarizes how the LeoFloat floating point add/subtract infinity results differ from the Sun SPARC standard:

---------------------------------------------------------------------------------------------------------------------------------------------------------

                                                                                                                                                    
Floating ALU                                                                                                Input          Input      =  Result     Result
Instruction                                                                                                 A Source       B Source      LeoFloat   SPARC
---------------------------------------------------------------------------------------------------------------------------------------------------------

                                                                                                                                                    
FADD_PM                                                                                                     7FFx xxxx   -  FF8x xxxx     FFFF FFFF  7FFF xxxx
                                                                                                                                                    
                                                                                                            7FAx xxxx   -  7FCx xxxx     FFFF FFFF  7FFF xxxx
                                                                                                                                                    
FADD_MP                                                                                                     -7F8x xxxx  +  7F8x xxxx     FFFF FFFF  7FFF xxxx
                                                                                                                                                    
                                                                                                            -7FAx xxxx  +  7FCx xxxx     7FFF FFFF  FFFF xxxx
                                                                                                                                                    
FADD_MM                                                                                                     -7F8x xxxx  -  FF8x xxxx     FFFF FFFF  7FFF xxxx
                                                                                                                                                    
FADD_PP                                                                                                     FF8x xxxx   +  7F8x xxxx     FFFF FFFF  7FFF xxxx

---------------------------------------------------------------------------------------------------------------------------------------------------------

6.4.5 Integer ALU

The integer ALU executes integer ALU instructions. This instruction group includes additions, subtractions, boolean operations, increments, and decrements. The integer ALU operates on the data present on the A bus and B bus. A summary of the integer ALU instructions is given in the LeoFloat Instruction Set Manual.

The integer ALU generates a four-bit integer condition code (icc). These codes follow the SPARC model. The bits are set in the state_bits register every cycle and reflect the activity in the integer ALU exactly three cycles past. These bits are defined as follows:

Negative (n) - indicates whether the integer ALU result was negative (1) or not negative (0).

Zero (z) - indicates whether the integer ALU result was zero (1) or non-zero (0).

Overflow (v) - indicates whether the integer ALU result overflowed (1) or did not overflow (0).

Carry (c) - indicates whether the integer ALU result caused a carry/borrow (1) or not (0).

6.4.6 Input Section

The LeoFloat input section, shown in Figure 6-5, interfaces with the LeoCommand chip. This module consists of a 16-bit data input bus (CF_DAT<15:0) from the LeoCommand chip. The data bus can be interpreted three different ways depending on CF Bus control bits (CF_CTL<2:0). The CF_CTL bits can indicate the CF Bus to be the following:

16-bit fixed-point fraction, which has to be converted to a 32-bit IEEE floating point number

Upper half of a 32-bit floating point number

Lower half of a 32-bit floating point number

The CF_CTL bits are also used to initiate the SRAM block load/store instruction.

The input section contains an input data path that converts the CF Bus data to a 32-bit IEEE floating point format. The input data can be passed to the internal LeoFloat register in two different ways:

Through 16 \xb4 32 conversion module for packed format

Through H register and L register for unpacked format

The input section also contains two input buffers, a counter, a flip-flop, and a shadow I0 register. The flip-flop keeps track of which of the two input buffers (I buffer) LeoCommand is writing to. The counter counts the number of words in the buffer currently being written to.

The input section is controlled by the input state machine. Depending on the state of the CF_CTL bits from LeoCommand and NEED_I and FREE_I from microcode, the state machine generates CF_BUF_AVL to LeoCommand and I_READY back to microcode. The state machine also generates control signals to all registers and multiplexers in the input section and keeps track of switching the I register bank for LeoCommand use.

After LEO_RST is issued, both input buffers are free. The state machine issues CF_BUF_AVL and informs microcode that there is no buffer ready for use. When LeoFloat receives CF_LOAD, it starts loading the first buffer until it decodes the end of packet (CF_CTL<0 = 1). Then the buffer is switched and signal `I ready" is issued.

Figure 6-5 LeoFloat Input Section Block Diagram

After receiving the next CF_LOAD, the state machine starts loading another buffer and CF_BUF_AVL is set to inactive. If any time the microcode sends FREE_I, the currently used buffer is freed for LeoCommand and CF_BUF_AVL is issued.

The interface to the rest of LeoFloat is given below (all signals are active high):

----------------------------------------------------------------------------------------------------------------------------------

                                                                                          
Signal        I/O  Stable                                                                 Description
----------------------------------------------------------------------------------------------------------------------------------

                                                                                          
ADDR<5'>       In   Stable_B                                                               Input register address given by microcode
                                                                                          
I_DATA<32'>    Out  Stable_A                                                               Input data
                                                                                          
I0_DATA<32'>   Out  Stable_B                                                               Copy of data in I0 register
                                                                                          
SRAM_WR       Out  Stable_A                                                               Indicates current packet is SRAM write
                                                                                          
SRAM_RD       Out  Stable_A                                                               Indicates current packet is SRAM read
                                                                                          
I_READY       Out  Stable_A                                                               Indicates that another buffer is ready
                                                                                          
NEED_I        In   Stable_A                                                               Microcode requests next buffer
                                                                                          
FREE_I        In   Stable_A                                                               Microcode releases current buffer
                                                                                          
INTR          Out  Stable_A                                                               Interrupt from LeoCommand

----------------------------------------------------------------------------------------------------------------------------------

As far as the microcode is concerned, the input buffer has two states: buffer not allocated or buffer allocated. The NEED_I and FREE_I signals should be active for a single cycle. The NEED_I signal is ignored if a buffer is not available to be allocated. The FREE_I signal is ignored if a buffer is not currently allocated.

The I_READY signal indicates that the next buffer is available. If a buffer is currently allocated to microcode, I_READY indicates that the next buffer is ready for use. If a buffer is not currently allocated, I_READY indicates that a buffer can be allocated.

Packet lengths from 1 to 32 words are handled correctly. Any sequence of input data formats are allowed. A low data word followed by a high data word are put into the same register. A high data word followed by a low data word are put into successive registers. If a high data word isn't preceded by a low data word, the lower 16 bits of that register are undefined. If a low data word isn't succeeded by a high data word, the upper 16 bits of that register are undefined. All other sequences of data word formats are put into different registers.

The currently allocated buffer is tagged with SRAM_WR, SRAM_RD, and I0_DATA<32 values. These signals are valid only when a buffer is allocated to microcode.

The INTR signal indicates that LeoCommand has sent an interrupt. It is active for one cycle. An interrupt condition from LeoCommand is ignored by the rest of the input buffer logic.

6.4.7 Register Files

LeoFloat contains 256 32-bit floating point (or integer) registers. They are arranged into four register groups: I, O, R, and P. The I and O register groups consist of 64 registers each, configured as double-buffered, 32 registers per buffer. The R and P groups are single-buffered. There are 64 R and 96 P registers.

The register address space is as shown in Figure 6-6.

The I, O, and P register group contains one write port (port A) and one read port (port B). The I registers only write data coming from the LeoCommand and microcode can only read data from the I read port. The O registers only write data from microcode and LeoCommand and LeoDraw can only read data from the read port. The P registers are general purpose registers for algorithmic use, both read and write ports are accessible to microcode.

The R register group contains one write port (port A) and two read ports (ports B and C).

The register files are described in more detail below.

6.4.7.1 I Registers

The 64 I registers are for holding input parameters. At any one time, only half of these registers are visible to the programmer, as registers I0 through I31. The other 32 registers are available for use as an input FIFO for the next command.

The I registers are read-only to the microcode. Commands up to 32 parameters in length are placed into one bank of I registers by the input FIFO hardware. When the programmer asks for the next I register bank, the microsequencer waits if the next command is not yet complete in the alternate I register bank.

Figure 6-6 LeoFloat Register Files

The programmer can early free an I register bank when it has been processed, even if the command processing is not yet complete. This frees the I register bank for FIFO use. In effect, LeoFloat's input is almost triple-buffered.

By convention, the command operational code is placed into the least- significant five to nine bits of I0. A special command crack instruction can automatically dispatch to a microcode jump table based on these bits.

6.4.7.2 O Registers

The 64 O registers are for holding output parameters. At any one time, only half of these registers are visible to the programmer, as registers O0 through O31. The other 32 registers are available for use as an input FIFO for the last command.

The O registers are write-only to the microcode. Output commands up to 32 parameters in length are read out of one bank of O registers by the output FIFO hardware. When the programmer asks for the next O register bank, the microsequencer waits if the previous (two) output commands have not yet drained from the alternate O register bank. The programmer can late request an O register bank in the middle of processing a command. This leaves the O register bank free for FIFO use until the last minute. In effect, the LeoFloat's output is almost triple-buffered.

6.4.7.3 R Registers

The 64 R registers are completely general purpose registers for algorithmic use. Under some conditions, only half of the R registers are available for use within commands. By software convention, R0 is the throwaway destination, and should never be used to hold values to be read later.

6.4.7.4 P Registers

The 96 P registers are dedicated for use as floating point multiplication constants (viewing matrices, screen coordinate conversions, etc.). The first 32 P registers are usually treated as read-only during most processing, although they are writable when the swap_pp bit in the state_bits register is set to 1. The second and third groups of 32 P registers are used as general temporaries as well as additional multiplication constants.

6.4.7.5 Clip _Bits Register

The clip bits register is a 32-bit clip status and revision register. The first 30 bits of the register are updated two bits at a time by one of two specialized clip-test instructions (one for vectors, one for triangles). Its state is tested by a number of specialized branch instructions. The remaining two bits are for the chip revision number.

6.4.7.6 State Bits Register

The state bits register is a 32-bit condition code and status register. This register contains several specialized flags and state bits. The contents are side- effected by several instructions and explicitly set by two state update instructions.

6.4.7.7 PC Register

The PC register is the 16-bit program counter.

6.4.7.8 PC Stack

PC stack is a hardware stack of eight 16-bit program counters for subroutine call/return.

6.4.8 Instruction Sequencer

The instruction sequencer consists of the instruction buffer, block load/store execution unit, and branch execution unit to execute instructions.

6.4.8.1 Instruction Buffer

The instruction buffer is a 32-bit register. The instruction buffer is loaded from the SRAM every cycle except when LeoFloat needs to be halted due to an interlock (during non-microcode instruction execution). The register contents are decoded to generate control signals all over the chip.

6.4.8.2 Block Load/Store Execution Unit

The block load/store execution unit is initiated by either soft- or hard-coded instructions. The soft-coded instruction is microcode, while the hard-coded instruction is generated by LeoCommand through the CF Bus. Data can be stored into SRAM from any of the register file groups except the O registers.

Data read from SRAM can be written into any register file groups except the I registers. The block load/store execution unit pushes Program Counter (PC) onto the stack at the beginning of the execution. It pops PC from the stack at the end of instruction execution. The following summarizes the block load/store instruction execution.

Soft Block Load/Store

The soft block load instruction loads 1 to 160 registers (R and P and I files) from SRAM. Block load from SRAM location (R[ra] + offset) to R[rs] - R[re]), where ra is register file address, rs is register file start address, and re is register file end address.

The soft block store instruction stores 1 to 160 registers (R and P and I files) to SRAM. Block store from registers R[rs] - R[re] to SRAM location (R[ra] + offset), where ra is register file address, rs is register file start address, and re is register file end address.

Hard Block Load/Store

The hard block load instruction loads 1 to 160 registers (R, P, and O files) from SRAM. Block load from SRAM location R[SRAM-start-address] to R[start-reg- addr] - R[stop-reg-addr], where SRAM-start-address, start-reg-addr, and stop- reg-addr are fields in the shadow I0 register. The I0 register is shown below.

The hard block store instruction stores 1 to 160 registers (I, R, and P files) to SRAM. Block store from registers R[start-reg-addr] - R[stop-reg-addr] to SRAM location R[SRAM-start-address].

6.4.8.3 Branch Execution Unit

The branch execution unit executes the branch instructions and loads program counter (PC) with an absolute 16-bit address specified within the instruction. The hardware consists of a PC register, a PC incrementer, and an eight- subroutine stack ring buffer.

6.4.9 Instruction Decoder

The instruction decoder consists of instruction decode logic. The instruction decode logic generates all the strobes required to execute the instruction in the instruction buffer. The instruction decode logic detects interlocks based on the instruction in the instruction buffer (IB) and some signals from state machines (I_BUFFER_AVAILABLE and O_BUFFER_AVAILABLE). If an interlock is detected, the next PC and the IB value are not loaded.

6.4.10 Floating Point Reciprocal Unit

The Floating Point Reciprocal Unit, shown in Figure 6-7, computes the inverse of the value present on the B bus only. The reciprocal unit cannot be automatically started every cycle like the other units because it is not pipelined.

To compute a floating point reciprocal, the value to be reciprocated is placed on the B bus at time t and an ALU op-code is chosen that will start off the reciprocal. At no less than time t+9, the results of the reciprocal may be read out of the reciprocal unit onto the D bus by an instruction that allows the D bus source to be the reciprocal unit.

Unlike fully pipelined function units, the reciprocal unit need not have its results taken at a fixed time after it starts. Any time from t+9 on can be used to read the results as long as any additional reciprocal starts have not yet finished reciprocating. Any such additional starts cannot be issued until time t+7 without trashing the previous reciprocal operation. If a new reciprocal is started at time t, then a previous reciprocal result, if present, must be read out by time t+8.

Figure 6-7 LeoFloat Floating Point Reciprocal Unit Block Diagram

Reciprocal is performed with the algorithm of shift and subtract. To achieve nine cycles pipelined throughout, two bits division instead of one bit is used. In this technique, two-bit shift and 4R-3D, 4R-2D, and 4R-D operations are performed in parallel (R is the remainder from the previous two bits divider and is less than the divisor D). The smallest positive result is selected as the new remainder and a quotient of either `11', `10', or `01' is selected accordingly. In the case that all the results of the three subtractions are negative, 4R is passed as the new remainder and a quotient of `00' is generated. Two three- bits dividers are cascaded to generate four bits of quotients each cycle.

There are some special cases that are treated differently. For those cases, the result is not calculated but is substituted in the last cycle. The special cases and the results are:

Unnormalized numbers (exponent = 0, mantissa 0) - result is infinity (exponent = 255, mantissa = 0)

Reciprocal of 0 - result is infinity

Result of infinity (exponent = 255, mantissa = any) - result is 0

6.4.11 Output Section

The output section, shown in Figure 6-8, interfaces LeoFloat with LeoDraw and LeoCommand via the CD Bus. The output section contains the output register file, an output state machine, a flip-flop and a counter. The state machine generates control signals for the output section and also generates control signals to the CF Bus (FLT_ST<1: 0) and back to microcode. The state machine checks status such as NEED_O, FREE_O from microcode and FLT_EN from LeoCommand. It switches the output buffer for microcode use. See Table 4-2 on page 4-6.

LeoCommand receives a steady stream of input packets over the SBus and issues them to any available LeoFloat. LeoCommand remembers the order in which the packets were sent to the LeoFloat chips for processing and passes them to LeoDraw in the same order. The LeoFloat status outputs (FLT_ST<1:0) provides the information LeoCommand needs to output data from LeoFloat.

Figure 6-8 LeoFloat Output Section Block Diagram

After reset, LeoFloat issues `00' (idle). Before each output transmission, the microcode sets state_bits register length, output_len (in words) of block to be sent and destination (output_dst bit). It also sets the null output bit (mcmb bit 3) and last output of command bit (mcmb bit 4). When the output state machine receives FREE_O from microcode, it starts transmission.

If LeoFloat completes an output packet for LeoDraw, it sends `10' (`11' for LeoCommand) and loads a counter with twice the number of words in the output packet. LeoFloat decrements this counter for each FLT_EN received from LeoCommand. For every data sent, it sets `10' for LeoDraw and `11' for LeoCommand. When the count drops to four and FLT_EN is active, LeoFloat sends `01' if this is the last output packet or `11' (`10' for LeoCommand) if it isn't.

If FLT_EN goes inactive on the next cycle, LeoFloat must continue to send this last word code. Whenever FLT_EN goes inactive in the middle of an output sequence to either LeoDraw or LeoCommand, LeoFloat must not change these two status pins. If FLT_EN stays active, LeoFloat can immediately send `10' (`11') for the next output packet. When the output packet counter drops to zero, the new word count can be loaded.

6.4.12 SRAM Interface

LeoFloat can address 64K words through the SRAM interface. The SRAM uses four 128K by 8 SRAMs with a 20 nanosecond access time. The LeoFloat SRAM interface generates RAM address, data outputs, and output enable signals during read cycles. The interface also generates RAM address, data inputs, and write enable during write cycles. The SRAM chip enable is tied low (enabled) at all times.

The first write takes two cycles. The first cycle is to turn the bus around and the second cycle is the write. The next writes are one cycle each. The last write takes two cycles, one to write and the other to turn the bus around. Reads are one cycle.

Each LeoFloat has four SRAMs, each SRAM containing eight bits of the 32-bit data word.

Figure 6-9 LeoFloat SRAM Interface

6.4.12.1 SRAM Map

Figure 6-10 shows how the SRAM is organized. The SRAM is divided into four areas: dispatch table, microcode, data area, and context block.

The dispatch table takes up the first 512 locations and is the entry point for the transformation of all primitives. The dispatch table is addressed from the dispatch opcode in the CF Bus packet header.

The microcode area holds the actual microprogram instructions.

The data area contains the tables that change the dispatch table to point the initialization routines for each of the primitives that are not currently in use.

The context block contains context data saved to the host on a "store context" command and restored in LeoFloat by a "load context" command.

Figure 6-10 SRAM Map

6.4.12.2 SRAM Interface Timing

Figure 6-11 shows the SRAM interface read cycle timing. Figure 6-12 shows the write cycle timing. For the write cycle, it is difficult to generate the write enable signal correctly when writing to the SRAM back-to-back. Therefore, a dead cycle is added for every write operation.

Figure 6-11 LeoFloat SRAM Interface Read Cycle Timing

Figure 6-12 LeoFloat SRAM Interface Write Cycle Timing (Write-Read)

6.5 LeoFloat Instruction Execution

The LeoFloat design attains high performance through a simple but rich instruction set and pipelined instruction execution. The design is compact and comprehensible thanks to clearly defined pipelines for each independent unit and comprehensive interface specifications between independent units.

In the following discussion, the following names for LeoFloat pipeline stages are used:

Instruction fetch (IF), during which the instruction is fetched

Decode (DC), during which the instruction is decoded

Operand (OP), during which the register file is accessed

Logical unit (LU), during which the operation takes place on the operands

Write (WR) during which the actual write to the register file takes place

Instructions are fetched every cycle from the SRAM except in the case of an interlock. An interlock is any event that causes a temporary break in the continuous stream of execution of microcode instructions. All interlocks must be detected before or during the OP stage so that the next PC load may be disabled.

6.5.1 LeoFloat Power Up/Debug Sequence

The LeoFloat power up sequence is activated when the LEO_RST pin is active low. This resets a reset flip-flop that generates an internal signal. The internal signal resets the program counter (PC) to 0 and halts the processor.

The CF_CTL<2:0 bits (110), along with CD_DAT<0 = 0, can be used as an interrupt/run condition. This condition can be thought of as the interrupt/run signal. The interrupt/run is used for starting the execution of the processor if it is halted (after a reset) or used for vectoring the processor to an interrupt if it was already running.

The CF_CTL<2:0 bits (110), along with CD_DAT<0 = 1, can be used as a soft reset. This does the same thing as LEO_RST.

When the processor is halted (after the reset), the interrupt/run causes the processor to start execution at location 0. When the processor is running, the interrupt/run pushes the PC onto the stack and the processor jumps to location 0 and begins normal execution. In effect, this is a forced jump to subroutine at location 0.

The power up sequence is as follows:

1. The LEO_RST signal resets PC to 0 and halts the processor.

2. The SRAM is loaded with microcode.

3. Interrupt/run is generated to cause the processor to start execution at location 0.

The interrupt/run can be used for debugging microcode. The debug sequence is as follows:

1. Processor is executing.

2. SRAM is loaded with the microcode debug routine starting at location 0.

3. Interrupt/run is generated to push the current PC plus one onto the stack and jump to location 0 and start executing the debug routine.

4. The debug routine examines the stack and working registers and, when done, can resume execution of the original code by doing a return from subroutine.

Note that correct working of microcode with an interrupt like this in the middle is not guaranteed because the state of various execution pipelines will be lost during the interrupt routine.

6.5.2 Execution of Primitive Routines

This section briefly describes the LeoFloat execution of primitive routines for dots, vectors, and triangles. For more information, look at the LeoFloat microcode.

6.5.2.1 Dot Execution

If model clipping is enabled, model clipping rejection is done first. This is done so that if the dot is outside of the model clipping planes, the microprogram exits the dot routine and fetches another primitive.

Following model clipping rejection, the microprogram, using the input registers, transforms the vertex to NPC (Normalized Projection Coordinates) space and performs the view clipping on the vertex. Since a dot is either in or out of the view clipping planes, the trivial rejection test is performed next.

If mode clipping is enabled, the microprogram performs the model clipping calculations on the vertex. If lighting is required (dots with normals), light the vertex then convert the vertex to screen space and perform the perspective division. Else, convert the vertex to screen space and perform the perspective division then send the vertex to LeoDraw.

The LeoFloat input and output packets are further described below. See Chapter 3, "CF Bus," for more information on the LeoFloat input packets. See Chapter 4, "CD Bus," for more information on the LeoFloat output packets.

Plain Dot Packets

Plain dot input packet. The plain dot input packet consists of a 16-bit header and three 32-bit words, as follows:

------------------------------------------------------------

         
Type     Description
------------------------------------------------------------

         
integer  Input plain dot header (16 bits  -  dispatch = 001)
         
float    Input plain dot X (32 bits)
         
float    Input plain dot Y (32 bits)
         
float    Input plain dot Z (32 bits)

------------------------------------------------------------

Plain dot output packet. The plain dot output packet consists of a 16-bit header and three 32-bit words, as follows:

---------------------------------------------

         
Type     Description
---------------------------------------------

         
integer  Output dot header (16 bits  -  00FF)
         
integer  Output dot X (32 bits)
         
integer  Output dot Y (32 bits)
         
integer  Output dot Z (32 bits)

---------------------------------------------

RGB Dot Packets

RGB dot input packet. The RGB dot input packet consists of a 16-bit header and six words, as follows:

-----------------------------------------------

         
Type     Description
-----------------------------------------------

         
integer  Input RGB dot header (dispatch = 002)
         
float    Input RGB dot X (32-bits)
         
float    Input RGB dot Y (32-bits)
         
float    Input RGB dot Z (32-bits)
         
float    Input RGB dot Red (16 bits)
         
float    Input RGB dot Green (16 bits)
         
float    Input RGB dot Blue (16 bits)

-----------------------------------------------

RGB dot output packet. The RGB dot output packet consists of a 16-bit header and six 32-bit words, as follows:

--------------------------------------

         
Type     Description
--------------------------------------

         
integer  Output RGB dot header (02FF)
         
integer  Output RGB dot X
         
integer  Output RGB dot Y
         
integer  Output RGB dot Z
         
integer  Output RGB dot Red
         
integer  Output RGB dot Green
         
integer  Output RGB dot Blue

--------------------------------------

Normal Dot Packets

Normal dot input packet. The normal dot input packet consists of a 16-bit header and six 32-bit words, as follows:

--------------------------------------------------

         
Type     Description
--------------------------------------------------

         
Integer  Input normal dot header (dispatch = 003)
         
float    Input normal dot X
         
float    Input normal dot Y
         
float    Input normal dot Z
         
float    Input normal dot Nx
         
float    Input normal dot Ny
         
float    Input normal dot Nz

--------------------------------------------------

Normal dot output packet. The normal dot output packet consists of a 16-bit header and six 32-bit words, as follows:

----------------------------------

         
Type     Description
----------------------------------

         
integer  Output normal dot header
         
integer  Output normal dot X
         
integer  Output normal dot Y
         
integer  Output normal dot Z
         
integer  Output normal dot Red
         
integer  Output normal dot Green
         
integer  Output normal dot Blue

----------------------------------

RGB Normal Dot Packets

RGB normal dot input packet. The RGB normal dot input packet consists of a 16-bit header and nine words, as follows:

------------------------------------------------------

         
Type     Description
------------------------------------------------------

         
integer  Input RGB normal dot header (dispatch = 004)
         
float    Input RGB normal dot X (32 bits)
         
float    Input RGB normal dot Y (32 bits)
         
float    Input RGB normal dot Z (32 bits)
         
float    Input RGB normal dot Nx (32 bits)
         
float    Input RGB normal dot Ny (32 bits)
         
float    Input RGB normal dot Nz (32 bits)
         
float    Input RGB normal dot Red (16 bits)
         
float    Input RGB normal dot Green (16 bits)
         
float    Input RGB normal dot Blue (16 bits)

------------------------------------------------------

RGB normal dot output packet. The RGB normal dot output packet consists of a 16-bit header and six 32-bit words, as follows:

--------------------------------------

         
Type     Description
--------------------------------------

         
integer  Output RGB normal dot header
         
integer  Output RGB normal dot X
         
integer  Output RGB normal dot Y
         
integer  Output RGB normal dot Z
         
integer  Output RGB normal dot Red
         
integer  Output RGB normal dot Green
         
integer  Output RGB normal dot Blue

--------------------------------------

6.5.2.2 Vectors Execution

If model clipping is enabled, the model clipping rejection algorithm is performed. This is performed in model coordinates (MC) since the model clipping planes are sent to LeoFloat in model coordinates. This speeds up the cases where the vector is outside of the model clip planes and is thrown away.

If the vector has normals, the microcode performs the face determination calculations and sets the "face_we_got" bit in the state_bits register. These calculations are performed in model coordinates. If face culling is enabled and the vertices are facing the wrong direction, the vector is rejected.

The microprogram uses the input registers to transform the two vertices from model coordinates to normalized projection coordinates (NPC). Using the hardware-supported clip_test instruction, the clip bits are calculated. At this point, if the vector is outside the view port clipping planes, it is rejected.

If the primitive has normals (any primitive with normals), the lighting calculations are performed. Following the lighting calculations (or the generation of the clip codes), a check is made to see if either view clipping or model clipping needs to be done. For view clipping, the hardware clip_bits register is used to determine if one of the vertices needs to be clipped in more than one plane. Next, the microprogram branches to the correct clipping code: single plane clip or multi-plane clip. The vertex or vertices are clipped on all of six view clipping planes, if necessary.

If model clipping needs to be done, the "W" values are transformed back to model coordinates and model clipping is performed. Once done, the "W" value is restored back to NPC space for the setup calculations.

Now the microprogram is ready to perform the perspective division and convert the vertices from NPC to screen space. Next, the microprogram calculates the values needed in LeoDraw to draw the vector.

Plain Vector Packets

Plain vector input packet. The plain vector input packet consists of a 16-bit header and six words, as follows:

----------------------------------------------------

         
Type     Description
----------------------------------------------------

         
integer  Input plain vector header (dispatch = 005)
         
float    Input plain vector X1
         
float    Input plain vector Y1
         
float    Input plain vector Z1
         
float    Input plain vector X2
         
float    Input plain vector Y2
         
float    Input plain vector Z2

----------------------------------------------------

Plain vector output packet. The plain vector output packet consists of a 16-bit header and six words, as follows:

-------------------------------------------

         
Type     Description
-------------------------------------------

         
integer  Output plain vector header (04FF)
         
integer  Output plain vector us
         
integer  Output plain vector vs
         
integer  Output plain vector zs
         
integer  Output plain vector ue
         
integer  Output plain vector dzDu
         
integer  Output plain vector dvDu

-------------------------------------------

RGB Vector Packets

RGB vector input packet. RGB vector output packets consist of a 16-bit header and 12 words, as follows:

--------------------------------------------------

         
Type     Description
--------------------------------------------------

         
integer  Input RGB vector header (dispatch = 006)
         
float    Input RGB vector X1
         
float    Input RGB vector Y1
         
float    Input RGB vector Z1
         
float    Input RGB vector Red1
         
float    Input RGB vector Green1
         
float    Input RGB vector Blue1
         
float    Input RGB vector X2
         
float    Input RGB vector Y2
         
float    Input RGB vector Z2
         
float    Input RGB vector Red2
         
float    Input RGB vector Green2
         
float    Input RGB vector Blue2

--------------------------------------------------

RGB vector output packet. RGB vector output packets consist of a 16-bit header and 12 words, as follows:

-----------------------------------------

         
Type     Description
-----------------------------------------

         
integer  Output RGB vector header (06FF)
         
integer  Output RGB vector us
         
integer  Output RGB vector vs
         
integer  Output RGB vector zs
         
integer  Output RGB vector Red
         
integer  Output RGB vector Green
         
integer  Output RGB vector Blue
         
integer  Output RGB vector ue
         
integer  Output RGB vector dzDu
         
integer  Output RGB vector drDu
         
integer  Output RGB vector dgDu
         
integer  Output RGB vector dbDu
         
integer  Output RGB vector dvDu

-----------------------------------------

Normal Vector Packets

Normal vector input packet. Normal vector input packets consist of a 16-bit header and 12 words, as follows:

-----------------------------------------------------

         
Type     Description
-----------------------------------------------------

         
integer  Input normal vector header (dispatch = 007)
         
float    Input normal vector X1
         
float    Input normal vector Y1
         
float    Input normal vector Z1
         
float    Input normal vector Nx1
         
float    Input normal vector Ny1
         
float    Input normal vector Nz1
         
float    Input normal vector X2
         
float    Input normal vector Y2
         
float    Input normal vector Z2
         
float    Input normal vector Nx2
         
float    Input normal vector Ny2
         
float    Input normal vector Nz2

-----------------------------------------------------

Normal vector output packet. Normal vector output packets consist of a 16-bit header and 12 words, as follows:

--------------------------------------------

         
Type     Description
--------------------------------------------

         
integer  Output normal vector header (06FF)
         
integer  Output normal vector us
         
integer  Output normal vector vs
         
integer  Output normal vector zs
         
integer  Output normal vector Red
         
integer  Output normal vector Green
         
integer  Output normal vector Blue
         
integer  Output normal vector ue
         
integer  Output normal vector dzDu
         
integer  Output normal vector drDu
         
integer  Output normal vector dgDu
         
integer  Output normal vector dbDu
         
integer  Output normal vector dvDu

--------------------------------------------

Normal RGB Vector Packets

Normal RGB vector input packet. Normal RGB vector input packets consist of a 16-bit header and 18 words, as follows:

---------------------------------------------------------

         
Type     Description
---------------------------------------------------------

         
integer  Input normal RGB vector header (dispatch = 008)
         
float    Input normal RGB vector X1
         
float    Input normal RGB vector Y1
         
float    Input normal RGB vector Z1
         
float    Input normal RGB vector Nx1
         
float    Input normal RGB vector Ny1
         
float    Input normal RGB vector Nz1
         
float    Input normal RGB vector Red1
         
float    Input normal RGB vector Green1
         
float    Input normal RGB vector Blue1
         
float    Input normal RGB vector X2
         
float    Input normal RGB vector Y2
         
float    Input normal RGB vector Z2
         
float    Input normal RGB vector Nx2
         
float    Input normal RGB vector Ny2
         
float    Input normal RGB vector Nz2
         
float    Input normal RGB vector Red2
         
float    Input normal RGB vector (Green 2
         
float    Input normal RGB vector Blue 2

---------------------------------------------------------

Normal RGB vector output packet. Normal RGB vector output packets consist of a 16-bit header and 12 words, as follows:

------------------------------------------------

         
Type     Description
------------------------------------------------

         
integer  Output normal RGB vector header (06FF)
         
integer  Output normal RGB vector us
         
integer  Output normal RGB vector vs
         
integer  Output normal RGB vector zs
         
integer  Output normal RGB vector Red
         
integer  Output normal RGB vector Green
         
integer  Output normal RGB vector Blue
         
integer  Output normal RGB vector ue
         
integer  Output normal RGB vector dzDu
         
integer  Output normal RGB vector drDu
         
integer  Output normal RGB vector dgDu
         
integer  Output normal RGB vector dbDu
         
integer  Output normal RGB vector dvDu

------------------------------------------------

6.5.2.3 Triangles Execution

The triangles microprogram routine swaps the triangle vertices around, sorting then in order from lowest to highest using the Y values. It then calculates all the slopes for each vertex and color and the starting point of the triangle and distance from the edges.

If the clip_pending bit is et in the state_bits register, the microprogram branches to a routine that traverses the clip list getting the three vertices for the next triangle, then does the perspective calculations and conversion to screen space coordinates. This loop is continued until all of the triangles in the vertex list have been processed.

Triangle output packets are always the same. Only the input packets differ.

RGB Triangle Packets

RGB triangle input packet. The RGB triangle input packet consists of a 16-bit header and 18 words, as follows:

----------------------------------------------------

         
Type     Description
----------------------------------------------------

         
integer  Input RGB triangle header (dispatch = 010)
         
float    Input RGB triangle X1
         
float    Input RGB triangle Y1
         
float    Input RGB triangle Z1
         
float    Input RGB triangle Red1
         
float    Input RGB triangle Green1
         
float    Input RGB triangle Blue1
         
float    Input RGB triangle X2
         
float    Input RGB triangle Y2
         
float    Input RGB triangle Z2
         
float    Input RGB triangle Red2
         
float    Input RGB triangle Green2
         
float    Input RGB triangle Blue2
         
float    Input RGB triangle X3
         
float    Input RGB triangle Y3
         
float    Input RGB triangle Z3
         
float    Input RGB triangle Red3
         
float    Input RGB triangle Green3
         
float    Input RGB triangle Blue3

----------------------------------------------------

Vnormal Triangle Packets

Vnormal triangle input packet. The Vnormal triangle input packet consists of a 16-bit header and 18 words, as follows:

-------------------------------------------------------

         
Type     Description
-------------------------------------------------------

         
integer  Input normal triangle header (dispatch = 009)
         
float    Input normal triangle X1
         
float    Input normal triangle Y1
         
float    Input normal triangle Z1
         
float    Input normal triangle norm X1
         
float    Input normal triangle norm Y1
         
float    Input normal triangle norm Z1
         
float    Input normal triangle X2
         
float    Input normal triangle Y2
         
float    Input normal triangle Z2
         
float    Input normal triangle norm X2
         
float    Input normal triangle norm Y2
         
float    Input normal triangle norm Z2
         
float    Input normal triangle X3
         
float    Input normal triangle Y3
         
float    Input normal triangle Z3
         
float    Input normal triangle norm X3
         
float    Input normal triangle norm Y3
         
float    Input normal triangle norm Z3

-------------------------------------------------------

RGB Vnormal Triangle Packets

RGB Vnormal triangle input packet. The RGB Vnormal triangle input packet consists of a 16-bit header and 27 words, as follows:

-------------------------------------------------------

         
Type     Description
-------------------------------------------------------

         
integer  Input normal triangle header (dispatch = 011)
         
float    Input normal triangle X1
         
float    Input normal triangle Y1
         
float    Input normal triangle Z1
         
float    Input normal triangle norm X1
         
float    Input normal triangle norm Y1
         
float    Input normal triangle norm Z1
         
float    Input normal triangle Red1
         
float    Input normal triangle Green1
         
float    Input normal triangle Blue1
         
float    Input normal triangle X2
         
float    Input normal triangle Y2
         
float    Input normal triangle Z2
         
float    Input normal triangle norm X2
         
float    Input normal triangle norm Y2
         
float    Input normal triangle norm Z2
         
float    Input normal triangle Red2
         
float    Input normal triangle Green2
         
float    Input normal triangle Blue2
         
float    Input normal triangle X3
         
float    Input normal triangle Y3
         
float    Input normal triangle Z3
         
float    Input normal triangle norm X3
         
float    Input normal triangle norm Y3
         
float    Input normal triangle norm Z3
         
float    Input normal triangle Red3
         
float    Input normal triangle Green3
         
float    Input normal triangle Blue3

-------------------------------------------------------

Facet Normal Triangle Packets

Facet normal triangle input packet. The facet normal triangle input packets consist of a 16-bit header and 21 words, as follows:

-------------------------------------------------------

         
Type     Description
-------------------------------------------------------

         
integer  Input normal triangle header (dispatch = 009)
         
float    Input normal triangle X1
         
float    Input normal triangle Y1
         
float    Input normal triangle Z1
         
float    Input normal triangle norm X1
         
float    Input normal triangle norm Y1
         
float    Input normal triangle norm Z1
         
float    Input normal triangle X2
         
float    Input normal triangle Y2
         
float    Input normal triangle Z2
         
float    Input normal triangle norm X2
         
float    Input normal triangle norm Y2
         
float    Input normal triangle norm Z2
         
float    Input normal triangle X3
         
float    Input normal triangle Y3
         
float    Input normal triangle Z3
         
float    Input normal triangle norm X3
         
float    Input normal triangle norm Y3
         
float    Input normal triangle norm Z3
         
float    Input facet normal X
         
float    Input facet normal Y
         
float    Input facet normal Z

-------------------------------------------------------

RGB Fnormal Triangle Packets

RGB Fnormal triangle input packet. The RGB Fnormal triangle input packets consist of a 16-bit header and 30 words, as follows:

-------------------------------------------------------

         
Type     Description
-------------------------------------------------------

         
integer  Input normal triangle header (dispatch = 010)
         
float    Input normal triangle X1
         
float    Input normal triangle Y1
         
float    Input normal triangle Z1
         
float    Input normal triangle norm X1
         
float    Input normal triangle norm Y1
         
float    Input normal triangle norm Z1
         
float    Input normal triangle Red1
         
float    Input normal triangle Green1
         
float    Input normal triangle Blue1
         
float    Input normal triangle X2
         
float    Input normal triangle Y2
         
float    Input normal triangle Z2
         
float    Input normal triangle norm X2
         
float    Input normal triangle norm Y2
         
float    Input normal triangle norm Z2
         
float    Input normal triangle Red2
         
float    Input normal triangle Green2
         
float    Input normal triangle Blue2
         
float    Input normal triangle X3
         
float    Input normal triangle Y3
         
float    Input normal triangle Z3
         
float    Input normal triangle norm X3
         
float    Input normal triangle norm Y3
         
float    Input normal triangle norm Z3
         
float    Input normal triangle Red3
         
float    Input normal triangle Green3
         
float    Input normal triangle Blue3
         
float    Input RGB facet normal X
         
float    Input RGB facet normal Y
         
float    Input RGB facet normal Z

-------------------------------------------------------

Triangle Output Packet Packets

The triangle output packets consist of a 16-bit header and 21 words, as follows:

---------------------------------------

         
Type     Description
---------------------------------------

         
integer  Output triangle header (08FF)
         
integer  Output triangle xs
         
integer  Output triangle xe2
         
integer  Output triangle zs
         
integer  Output triangle rs
         
integer  Output triangle gs
         
integer  Output triangle bs
         
integer  Output triangle xe
         
integer  Output triangle dzDu
         
integer  Output triangle drDu
         
integer  Output triangle dgDu
         
integer  Output triangle dbDu
         
integer  Output triangle ys
         
integer  Output triangle count12
         
integer  Output triangle count13
         
integer  Output triangle dxsDv
         
integer  Output triangle dxeDv
         
integer  Output triangle dxe2Dv
         
integer  Output triangle dzDv
         
integer  Output triangle drDv
         
integer  Output triangle dgDv
         
integer  Output triangle dbDv

---------------------------------------