2 Functional Description

This chapter provides a functional description, from a programmer's point of view, of the main components of the Leo Graphics Accelerator.

2.1 Introduction

As shown in Figure 2-1, Leo consists of the following sections:

LeoCommand, which acts as the system controller between the host SBus and all the Leo devices.

Floating point transform, which converts model space dots, vectors, and triangles into screen space fixed point rendering parameters.

Render, which receives the render parameters for dots, vectors, and triangles and converts them into pixel operations into the frame buffer.

Frame buffer, which consists of 1280 by 1024 pixels by 96 planes.

LeoCross, which provides pixel multiplexing and color look-up tables (LUTs).

RAMDAC, a digital-to-analog converter that generates an analog video signal to drive a CRT monitor.

Boot PROM, which is compliant with the Open Boot PROM specification.

Figure 2-1 Leo Block Diagram

Programming for these components consists of retrieving status from registers located at various addresses, sending commands to registers located various addresses, and sending data to appropriate memory locations. The Command and Status Registers for each of these components are described in Chapters 5 through 8 in an easy to reference format.

2.2 LeoCommand

The LeoCommand chip is the system controller for Leo. All Leo memory and devices are memory-mapped through LeoCommand, so all data passed between Leo and the host passes through LeoCommand. LeoCommand is functionally partitioned into a direct port and an accelerator port. LeoCommand also controls the three internal buses: CFBus, CDBus, and CXBus.

LeoCommand performs data conversion operations, such as converting chained vector and triangle data into isolated vectors or triangles. LeoCommand then converts the numerous vertex data formats into isolated triangle and line commands for transmission to one of the LeoFloat chips in the floating point transform section.

LeoCommand has access to the state of each LeoFloat chip to determine which are busy and which are idle. LeoCommand sends one primitive at a time to an available LeoFloat chip.

2.2.1 Host Interface

Leo is an SBus slave and a DVMA master with the ability to generate interrupts on the SBus. The slave interface provides direct host access to Leo memory and devices, such as control and status registers. The DVMA accesses are used for fetching graphics data from the display list in system virtual memory, along with moving raster data between the frame buffer and system virtual memory.

The host interface includes an interrupt ability that allows Leo to inform the host of various conditions.

The host loads and reads Leo Control and Status Registers (CSR) in programmed I/O mode. The Leo device registers and other storage locations are memory mapped in the Leo address space. In the programmed I/O mode, Leo is an SBus slave.

After the proper control registers are loaded, Leo reads graphics data from the host memory in DMA mode. In the DMA mode, Leo is an SBus master.

2.2.1.1 Leo Input Control in DMA Mode

In DMA Mode, LeoCommand fetches data as long as there is space in the Bucket Buffer to accommodate the data. When enough data has been assembled in the Bucket Buffer to complete a packet (as indicated by the Input Packet Size field) and there is enough room in the Vertex Buffer, LeoCommand converts the data to the proper format and places it in the Vertex Buffer.

2.2.1.2 Leo Input Control in Immediate Mode

Immediate Mode differs from DMA mode only in the way the Bucket Buffer is filled. In this case, the Relative Bucket Buffer is address mapped into the SBus address space occupied by Leo. The SPARC process polls LeoCommand until enough space is available in the Bucket Buffer to accommodate the packet size. It then writes each word of a packet, except the last word, to this buffer. The software writes the last word in the packet to the Launch Relative Bucket Buffer address space, which launches the packet. The Launch Relative Bucket Buffer is just another mapping of the Bucket Buffer. There is also an Absolute Bucket Buffer for diagnostic use.

There are two ways to tell if there is enough room to write another packet to the Bucket Buffer. The first way is to check bit 9 (Bucket Buffer Space Available Status) of the "Accelerator Port Status" register (see page 5-76) to see if there is enough space for the next packet. The size of the next packet is specified in bits 16 through 12 of the "Vertex Mode Control" register. The second way is to check bits 4 through 0 (Bucket Buffer Words Available) of the "Accelerator Port Status" register to see how many packets may be written.

2.2.1.3 Raster Copy Command

The raster copy command copies data between the SPARC main memory and the frame buffer using DMA mode. The data is actually sent over the direct port.

2.2.1.4 Interrupts

Status conditions within Leo can be enabled to generate SBus interrupts. The conditions that can cause an interrupt are:

Slave rerun time-out. A slave access caused more reruns than prescribed.

Slave illegal address. An attempted access has been made to an unspecified location in the address map, a write has been made to a read-only location, or a read is made to a write-only location.

DMA error acknowledge. An error acknowledgment has been received on a SBus transaction.

Invalid Page Table Entry or Page Table Descriptor. An invalid Page Table Entry (PTE) or Page Table Descriptor (PTD) entry is found during a table walk (See Figure 2-3 on page 2-13).

Write DMA done. The specified amount of data has been DMA'd.

Read DMA done. The read DMA operation is done.

2.2.1.5 Command Simplification

Leo supports a rich set of XGL commands. Most of these commands are processed by the host SPARC processor so that only a few simple commands are passed to the LeoCommand chip:

Attribute setting commands, such as primitive color, line width, and transformation matrix

Drawing commands, such as dot, vector, and triangle

There are many variations of the three drawing commands. For example, a triangle vertex may contain information on the vertex position, vertex color, vertex normal, and facet normal. This information may be sent to Leo in various orders and with various data formats. This data is stored in the LeoCommand chip Vertex Buffer.

The LeoCommand chip then converts these various types of commands into just three triangle types for the floating point processor. This is done by using the VSC Opcode register to specify the proper order, and conversion of each data format to the desired format.

The proper format for a triangle that is to be sent to the LeoFloat chips is:

First vertex

VPx, VPy, VPz
VNx, VNy, VNz (optional)
VCr, VCg, VCb (optional)

Second vertex

VPx, VPy, VPz
VNx, VNy, VNz (optional)
VCr, VCg, VCb (optional)

Third vertex

VPx, VPy, VPz
VNx, VNy, VNz (optional)
VCr, VCg, VCb (optional)
FNx, FNy, FNz (optional)

where:

VP = Vertex Position
VN = Vertex Normal
VC = Vertex Color
FN = Facet Normal

2.2.2 Data Paths

As shown in Figure 2-2, Leo uses two data paths to process graphics and text data stored in the frame buffer: the direct port and the accelerator port.

Figure 2-2 Leo Data Paths

2.2.2.1 Direct Port

The direct port is the data path for direct access to most Leo devices. This port is mainly used by the window system and as a path for loading complex primitives (derived by the host in pixel form) into the frame buffer. The host can read or write to the frame buffer contents directly using the SBus interface slave mode. There are facilities to improve the performance of the window system operation in this path, such as support for rendering text, area move, and fill.

There are two types of direct port commands:

Memory access commands. These are pixel read and write, stencil, direct port register load, and all memory mapped accesses.

Block commands. These are block fill and move, and vertical scroll. The software must check for completion of a previous command before starting another one. Furthermore, no memory access commands can be executed while block commands are being executed.

The direct port commands are:

Vertical scroll. This command includes the source and destination rectangle addresses. Each LeoDraw chip moves a single pixel in it's own memory bank resulting in a five pixel vertical scroll except for the beginning and ending conditions.

Block move. This command includes the source and destination rectangle addresses. LeoCommand parses this command to a single pixel read followed immediately by a single pixel write. In effect, LeoCommand raster scans the source rectangle into the destination rectangle a single pixel at a time.

Block fill. This command includes the destination rectangle addresses. LeoCommand parses the rectangle to five pixel write constant commands, which are aligned to the memory interleave boundaries and correspond to contiguous horizontal pixels. Each LeoDraw writes a single pixel in it's own memory bank resulting in a five pixel constant fill, except for the beginning and ending conditions.

Data and mask write. This command is also referred to as a one-to-N bit ROP. An address and a 32-bit (or 8-bit or 16-bit) mask is sent to LeoCommand. The mask represents 32 contiguous, horizontal pixels starting at the address. For each bit that is a 1, a foreground color is written to the frame buffer. For each bit that is a 0, either a background color is written or the pixel is left as is.

Byte packed access. This command is used on 8-bit pixel color memories to read and write four pixels packed into a 32-bit word. Only the host has access to this capability. For writes, the address corresponding to the data is sent in parallel with the data. The four pixels are written into four consecutive horizontal pixel locations starting at the address sent. For reads, the address is sent causing the data to be read from the frame buffer and packed together into a 32-bit word, which is sent back to the host.

2.2.2.2 Accelerator Port

The accelerator port is the data path that renders 3D primitives that the Leo accelerator is optimized to accelerate. The primitives accelerated include dots, lines, and triangles. There are several versions of each of these primitives, allowing various data formats for each primitive. For example, chained or isolated data, color present or not, packed or floating point normals and colors, and so on. The primitives are transferred using vertices or pass-through packets containing graphics parameters.

Leo provides reset and mode controls for context switching. Leo also transmits maskable interrupts to the host, which then reads the Leo status flags.

LeoCommand executes accelerator port commands in two modes:

DMA mode

Immediate mode

In DMA mode, LeoCommand fetches its own accelerator port data and places the data in a register, known as the Bucket Buffer. In Immediate Mode, the SPARC process builds a command in the LeoCommand Bucket Buffer and then gives LeoCommand a "go ahead" signal.

LeoCommand then converts the format of the input data to a standard format used by the LeoFloat chips. This data is placed in another register, known as the Vertex Buffer.

2.2.3 Hardware States

There are three classes of hardware states:

State Set Zero. Also called the direct port state. This state configures the render pipeline to render simple raster primitives.

State Set One. Also called the accelerator port state. This state configures the floating point transform pipeline and the render pipeline to render both geometric primitives and raster primitives.

Global State. The kernel writes this state during the boot sequence and the video state. The video state includes the window table, color table, and video table, which are located in the video output section.

2.2.3.1 Context Switching

Both State Set Zero and State Set One have a map to the frame buffer. There is also a map for both the State Set Zero registers and the State Set One registers.

Multiple processes can map the state sets. However, the context switch software uses the page fault feature to schedule just one process for each state set. Thus the context switch software validates the pages of only one process at a time for a particular state set. If the kernel schedules another process that attempts to access that state set, the access page faults. The kernel then saves the board states of the first process, invalidates its pages, restores the board state of the second process, and validates its pages. Thus, only one process owns a state set at a time.

The contents of the state sets are not the same. There are registers in State Set Zero that are not in State Set One, and there are registers in State Set One that are not in State Set Zero. There are three classes of pages for each state set:

Pages with features that are common to both state sets. Although there are different pages of these registers for each state set, the addresses of the registers within the page are the same.

Pages with features that are unique to each state set. A process controlling one state set can not access registers that belong to the other state set.

Pages with hidden pipeline registers. This pipeline state is visible only to the kernel process. The context switch software can access this state. The diagnostic software also can access the pipeline registers.

The Leo philosophy is to keep copies of all graphics attributes on the host SPARC, and to store only one context at a time in Leo. However, some long commands (such as draw dot, vector, or triangle) need to be able to be interrupted, swapped-out, and restarted.

All graphics commands are assembled into host-readable LeoCommand registers before they are sent to the floating point transform or render sections. This allows the context swapping at partial command boundaries. For example, suppose the host must context switch due to a page miss after reading half of a triangle vertex. In this case, the host reads out the LeoCommand context into a storage area, waits for the render pipe to drain, then overwrites LeoCommand with the context data of the next Leo process to be run. When the original Leo process is to be continued, the host rewrites the context into LeoCommand, and Leo continues from where it left off.

2.2.3.2 Software Access

There are four classes of software that can access Leo:

Window process software. The window process maps State Set Zero. The window clients all focus access to the Leo board through the window process. The window clients do not map the hardware.

XGL process software. The XGL process links with the XGL library. The XGL library must first request access from the window process. Once the window process allocates access to the board state, the XGL library maps either State Set Zero or State Set One. Some libraries map both state sets.

Kernel process software. The kernel process maps State Set Zero, State Set One, and the global state. The global state includes the board state, such as the window table and color table, which the kernel software must synchronize with the video hardware. The context switch software, which resides in the kernel, maps both state sets.

2.2.4 DMA Read Operations

A Memory Management Unit (MMU) in LeoCommand, in combination with the leo-specific mmu driver and the operating system segment driver, provides memory management for DMA read operations. The DMA read operations transfer vertex data from the host system memory to the LeoCommand Bucket Buffer.

The MMU uses the physical DMA on Sun4m and Sun4d platforms. However, it uses programmed IO on the SPARCstation 1 and 2, because these workstations do not support physical DMA.

2.2.4.1 Sun4m Platforms

When an application mmaps the Leo accelerator, the leo segment driver creates a tree structure of up to three levels of translation tables in the host system memory and loads the base address of the Level 1 Table into the Table Walk Root Pointer Register. The translation tables (see Figure 2-3) are used to convert a 32-bit virtual address to a physical address. This allows support for sparse address space.

The physical address consists of a physical page number (also called a Page Frame) and a Page Offset into the page. The physical address size is 36 bits for SPARCstation10, SPARCserver 1000, and SPARCcenter 2000. It is 31 bits for SPARCstation LX.

Note that the pages have a fixed size of 4,096 (4K) bytes and are aligned to 4K byte boundaries. Thus, the lower 12 bits of the virtual address do not need translations - they are the same as the lower 12 bits of the physical address.

The application first checks the "Read DMA Busy" bit (bit 2) of the "DMA Status" register (see page 5-17) to make sure that the DMA Read hardware is idle. If it is idle, the virtual address is loaded into the "DMA Read Virtual Address" register.

DMA Read Virtual Address Format:

Figure 2-3 Table Walking of Host Translation Tables

The application loads the number of words to be transferred into one of the two DMA Read Word Count registers. One of these registers, "DMA Read Word Count, Start DMA Read" starts the DMA read. If the count is loaded into the other register, "DMA Read Word Count, Do Not Start DMA," the application starts the DMA read by writing 0x0000 0001 to the "DMA Read On/Off" register. This starts the walking of the Translation Tables in main memory.

The address of the word selected in the Level 1 Table is equal to the Level 1 Base Address from the "Table Walk Root Pointer" register plus the Level 1 Offset from the "DMA Read Virtual Address" register. The format of the address that LeoCommand places on the SBus data bus is:

The selected table entry may be a Page Table Descriptor (PTD) or a Page Table Entry (PTE). The PTD provides the base address to select the next-level table. The PTE provides the Page Frame for the physical address. In either case, the entry is stored in the "DMA Read PTE/PTD" register. The type of table entry in a Table is defined by bits 1:0 of the word as follows:

--------------------------------------

          
Bits 1:0  Entry Type
--------------------------------------

          
00        Invalid
          
01        Page Table Descriptor (PTD)
          
10        Page Table Entry (PTE)
          
11        Reserved

--------------------------------------

Page Table Entry (PTE) Format:

Page Table Descriptor (PTD) Format:

If the Level 1 Page Table contains a PTE, the table walk is done. In this case, the physical address corresponding to the virtual address is created using the Page Frame from the PTE and the Page Offset from the virtual address.

Physical Address Format:

This physical address is the first read address for the DMA read operation.

If the Level 1 Table contains a PTD, the table walk continues with the Level 2 Table. In this case, LeoCommand places the following address on the SBus data bus:

Then, if the Level 2 Table contains a PTE, the table walk is done. Otherwise, the table walk continues with the Level 3 Table, which always contains a PTE. In this case, LeoCommand places the following address on the SBus data bus:

Note that the Page Table Offset is 8-bits wide for Level 1 and 6-bits wide for Level 2 and Level 3.

If LeoCommand detects an error during a table walk, the "Invalid PTE/PTD" bit (bit 2) is set in the "SBus Status" register (see page 5-7) and an interrupt is sent to the segment driver. The "DMA Read PTE/PTD" register contains the invalid PTE/PTD entry. The driver contains an interrupt service routine that loads the valid entries into the corresponding table.

2.2.4.2 Other Platforms

On platforms that do not support the physical DMA, such as SPARCstation 2, the application loads the data directly to Leo.

2.3 CFBus

LeoCommand sends commands to the LeoFloat chips over the CFBus in packets. Each packet consists of a 16-bit header followed by a variable number (1 to 31) of 32-bit data words. The header defines the type of packet. The 32-bit data words are sent 16 bits at a time over the 16-bit CF Data Bus. Data can be sent as a 16-bit fraction. Primitives and passthrough behave exactly the same on the CFBus.

There are three main packet types: graphics primitive packets, pass-through packets, and SRAM packets. The graphics primitive packets include dot, vector, and triangle packets, which are composed of vertices. The pass-through packets pass attribute commands to the LeoFloat chips. SRAM packets are written to or read from LeoFloat SRAM.

2.3.1 Data Format

There are four data types sent across the bus: header, fixed-point fraction, floating-point number, and pass-through packet.

2.3.1.1 Header

The header is the first word in a graphics primitive packet. It is followed by one, two, or three vertices. The header contains three fields, as shown below. The first field, bit 15, is always zero. The other two fields are described below.

The Vertex Header field contains two types of bits:

Highlight edge bits - Rendered with edge color if the corresponding bit is on (and the edge mode is active).

Hollow edge bits - Rendered in hollow triangles (with lighted colors) if the corresponding bit is on.

The vertex header field is coded as follows:

Table 2-1 Vertex Header Field

----------------------------------------------------

     
Bit  Meaning
----------------------------------------------------

     
14   Hollow edge between vertices 1 and 2
     
13   Hollow edge between vertices 3 and 1
     
12   Hollow edge between vertices 2 and 3
     
11   Draw highlighted edge between vertices 1 and 2
     
10   Draw highlighted edge between vertices 3 and 1
     
9    Draw highlighted edge between vertices 2 and 3

----------------------------------------------------

The Dispatch Opcode field is the dispatch address to the microcode in the LeoFloat chip. This is known as the dispatch table. The first part of the dispatch table is used for the geometry functions: dots, vectors, and triangles. The second part of the dispatch table is for the attributes.

2.3.1.2 Fixed-point Fraction

The fixed-point fraction, shown below, is a sign magnitude 16-bit fraction, used for colors and normals. The binary point is just to the right of the sign bit. LeoFloat converts this to a 32-bit floating-point word.

2.3.1.3 IEEE 32-bit Floating Point Number

The 32-bit floating-point number, shown below, is the way most vertices are presented to LeoFloat. LeoCommand converts the 32-bit floating-point number to two 16-bit words before sending the data across the bus. LeoFloat re- assembles the two 16-bit words into a 32-bit floating point number. The least- significant 16 bits are sent on the first clock cycle, the most-significant 16 bits are sent the second clock cycle.

2.3.1.4 Pass-Through Packet

The pass-through packet sends non-interpreted words to LeoFloat. The pass- through packets contain from 2 to 32 32-bit words.

2.3.2 Vertex Format

The graphics primitive packets contain vertex data. There are four types of vertex data: plain, RGB, normal, and RGB normal.

2.3.2.1 Plane Vertex Data

Plain vertex data consists of three 32-bit floating point numbers, as follows:

<-Paragraph_Hang_0.5 > X 32-bit IEEE floating point number
Y 32-bit IEEE floating point number
Z 32-bit IEEE floating point number

2.3.2.2 RGB Vertex Data

RGB vertex data consist of six floating point numbers or fixed-point fractions, as follows:

2.3.2.3 Normal Vertex Data

Normal vertex data consist of six floating point numbers or fixed-point fractions, as follows:

<-Paragraph_Hang_0.5 > X 32-bit IEEE floating point number
Y 32-bit IEEE floating point number
Z 32-bit IEEE floating point number
Nx 32-bit IEEE floating point number or 16-bit fraction
Ny 32-bit IEEE floating point number or 16-bit fraction
Nz 32-bit IEEE floating point number or 16-bit fraction

2.3.2.4 RGB Normal Vertex Data

RGB normal vertex data consist of nine floating point numbers or fixed-point fractions, as follows:

<-Paragraph_Hang_0.5 > X 32-bit IEEE floating point number
Y 32-bit IEEE floating point number
Z 32-bit IEEE floating point number
R 32-bit IEEE floating point number or 16-bit fraction
G 32-bit IEEE floating point number or 16-bit fraction
B 32-bit IEEE floating point number or 16-bit fraction
Nx 32-bit IEEE floating point number or 16-bit fraction
Ny 32-bit IEEE floating point number or 16-bit fraction
Nz 32-bit IEEE floating point number or 16-bit fraction

2.4 Floating Point Transform

The Floating Point Transform section consists of four LeoFloat chips and static RAM (SRAM). Each LeoFloat is a floating point microprocessor with optimized instructions for graphics. The LeoFloat chips execute microcode out of their dedicated SRAM. Each LeoFloat has a 128K by 32-bit SRAM. The LeoFloat chips are connected in a parallel configuration for performance enhancement.

LeoCommand has access to the state of each LeoFloat to determine which are busy and which can take more commands. LeoCommand sends accelerator port primitives to an available LeoFloat, which processes the primitive and signals LeoCommand when it has results available. LeoCommand then enables the results to be sent from LeoFloat to the Render section over the CDBus.

LeoFloat converts the individual dot, vector, and triangle from 3D model coordinates to 3D world coordinates to 3D device coordinates (frame buffer location and Z-buffer values). LeoFloat also performs lighting calculations that result in three floating-point values; the red, green, and blue values for each vertex. These values are a function of the color and surface properties of the triangle, the position of the lights, and the angle that the light hits the surface. LeoFloat also clips the image to fit the window.

LeoFloat processes drawing commands, attribute commands, and pass-through commands.

2.5 CDBus

The CDBus connects the LeoCommand, LeoFloat, and LeoDraw chips. As with the CFBus, the CDBus transfers are in the form of packets. There are two packet formats: accelerator port packets and direct port packets. Each packet consists of a 16-bit header followed by a variable number of 16- or 32-bit data words. The header defines the type of packet.

2.5.1 Accelerator Port Packets

Accelerator port packets are sent from LeoFloat to LeoDraw. These packets have a header followed by the data in the packet. The header identifies the type of packet and is encoded according to the hexadecimal values in Table 2-2.

Table 2-2 Accelerator Port Header Format

------------------------------------------------

        
Header  Packet Type
------------------------------------------------

        
00FF    Dot command
        
02FF    RGB dot command
        
04FF    Vector (x_major) command
        
05FF    Vector (y_major) command
        
06FF    RGB vector (x_major) command
        
07FF    RGB vector (y_major) command
        
08FF    Triangle (dec_x) command
        
09FF    Triangle (inc_x) command
        
0CFF    Write accelerator port register command
        
0EFF    Fast clear command
        
0FFF    Raster write command

------------------------------------------------

The packet formats for each command are shown below.

2.5.1.1 Dot Command

The dot command packet contains a 16-bit header and three 32-bit words, one each for x, y, and z, as shown below.

2.5.1.2 RGB Dot Command

The RGB dot command contains a 16-bit header and six 32-bit words, one each for x, y, z, r, g, and b, as shown below.

2.5.1.3 Vector Command

The vector command contains a 16-bit header and six 32-bit words, as shown below.

2.5.1.4 RGB Vector Command

The RGB vector command contains a 16-bit header and 12 32-bit words, as shown below.

2.5.1.5 Triangle Command

The triangle command contains a 16-bit header and 21 32-bit words, as shown below.

2.5.1.6 Write Accelerator Port State Register Command

The write accelerator state register command contains a 16-bit header and two 32-bit data words, as shown below.

2.5.1.7 Fast Clear Command

The fast clear command contains a 16-bit header and three 32-bit data words, as shown below.

2.5.1.8 Raster Write Command

The raster write command contains a 16-bit header and seven 32-bit data words, as shown below.

2.5.2 Direct Port Packets

The direct port packets are sent from LeoCommand to LeoDraw. A two- character hex code in bits 15 through 11 of the header defines the packet type. The header for direct port packets also contains command information. The header identifies the type of packet and is encoded according to the hexadecimal values in bits 15 through 11, as listed in Table 2-3.

Table 2-3 Direct Port Header Format

----------------------------------

          
Header    Packet Type
<15:11'>                            
----------------------------------

          
00        Stencil write command
          
01        Pixel read command
          
02        Pixel write command
          
03        Byte read command
          
04        Byte write command
          
05        LeoDraw read command
          
06        LeoDraw write command
          
07        Vertical scroll command
          
08        Blt read command
          
09        Blt readx command
          
0A        Blt writexread command
          
0B        Block fill command
          
0C - 1F   Not used

----------------------------------

2.5.2.1 Stencil Write Command

The stencil write command writes the specified data through the specified mask to the specified address of the specified plane. The stencil write command packet contains a 16-bit header and two 16-bit data words, as shown below.

2.5.2.2 Pixel Read Command

The pixel read command reads the pixel data at the specified address of the specified plane. The pixel read command packet contains a 16-bit header and three 16-bit data words, as shown below.

2.5.2.3 Pixel Write Command

The pixel write command contains a 16-bit header and three 16-bit data words, as shown below.

2.5.2.4 Byte Read Command

The byte read command reads the data of the specified Image plane channel at the specified address. The byte read command contains a 16-bit header and two 16-byte data words, as shown below.

2.5.2.5 Byte Write Command

The byte write command writes the data at the specified address in the Image plane. The Image Write Mask in LeoDraw determines which of the four bytes in the Image plane are written. The byte write command contains a 16-bit header and two 16-bit data words, as shown below.

2.5.2.6 LeoDraw Read Command

The LeoDraw read command reads one of the internal 32-bit status or control registers in LeoDraw. The LeoDraw read command contains a 16-bit header, as shown below. This one-word packet reads one of the internal 32-bit status or control registers in LeoDraw.

2.5.2.7 LeoDraw Write Command

The LeoDraw write command writes a 32-bit data word into any of the internal registers in LeoDraw. Note that writing to the destination register for a block copy or block fill initiates that operation. Block copy and block fill are atomic operations - LeoDraw can not be stopped in the middle of either operation. The LeoDraw write command contains a 16-bit header and two 16-bit data words as shown below.

2.5.2.8 Vertical Scroll Command

The vertical scroll command tells the LeoDraws to read a pixel and write it to another location within the same interleave. The vertical scroll command contains a 16-bit header, as shown below.

2.5.2.9 Blt Read Command

The Blt read command tells the LeoDraws to read the first pixel for the beginning of a block move operation. The Blt read command contains a 16-bit header, as shown below.

2.5.2.10 Blt Readx Command

The Blt readx command tells the LeoDraw chips to read the second pixel in a block move operation. This command is always broadcast to all LeoDraw chips. It is the second command of a block move operation. State set 0 only. The Blt readx command contains a 16-bit header and two 16-bit data words, as shown below.

2.5.2.11 Blt Writexread Command

The Blt writexread command tells the LeoDraw chips to write the enclosed data to the destination address and read the pixel with the source address. State set 0 only. The Blt writexread command contains a 16-bit header and two 16-bit data words as shown below.

2.5.2.12 Block Fill Command

The block fill command tells the LeoDraw chips to write the foreground register to the destination pixel. State set 0 only. The block fill command contains a 16-bit header, as shown below.

2.6 Render

The Render section consists of five LeoDraw chips. Each LeoDraw chip controls one-fifth of the frame buffer memory, which is organized into a five- by-one interleave factor. After receiving their input, each LeoDraw operates independently.

LeoDraw has two paths: the accelerator path and the direct path. The accelerated geometry path contains the drawing hardware for 3D images. The direct path allows free access to the frame buffer for the window system and for 2D applications.

For 3D accelerated operations, LeoDraw converts the dot, vector, or triangle parameters into pixel operations into the frame buffer. For the vertices, LeoDraw receives various values, such as the slopes of the sides of a triangle, and the corresponding increments for the red, green, and blue color values. LeoDraw fills in all the intermediate pixels outlined by the vertices.

The LeoDraw chip contains the following modules:

2.6.1 Accelerator Port Bus Interface Unit

The accelerator port bus interface unit monitors the CDBus for accelerator command packets. Since the type of command is encoded in the packet header, the sequence and format of each word in the packet is known. As successive words are written to LeoDraw, the interface unit saves the words in the appropriate double-buffered register. When a complete command is assembled in the register, the interface unit initiates a handshake sequence that loads each word of the command into its respective current buffer register in the DDA unit.

2.6.2 Direct Port Bus Interface Unit

The direct port bus interface unit monitors the CDBus for direct port commands. This manages the execution of all direct port commands and the handshake sequence with LeoCommand.

2.6.3 DDA Unit

The DDA unit performs edge walking and span interpolation functions for triangles, a simple DDA for vectors, and a pass operation for dots. This unit also computes end point correction, anti-aliasing alpha calculations, and depth queue scale factors.

The output from the DDA unit consists of X and Y coordinates, red, green, blue, and depth values, and alpha for every pixel rendered to the frame buffer. The output is put into a set of double-buffered registers. When an output pixel is complete, a handshake sequence is started that loads the command into its respective double buffered register in the memory control unit.

2.6.4 Memory Control Unit

The memory control unit receives requests for frame buffer access from the DDA unit, the direct port bus interface unit, and VRAM and DRAM refresh. The memory control unit arbitrates among these requests and generates the control signals to read or write pixels to the frame buffer.

The memory control unit also performs several address- and data-related functions. Address-related functions include address translation, viewport clipping, and page-mode access detection. Data related functions include blending and logical operations on data, Z-buffering, window ID checking, screen door transparency, and so on.

2.7 Frame Buffer

The frame buffer consists of 1280 by 1024 pixels by 96 planes, as shown in Figure 2-4. The memory planes are organized as follows:

-------------------------------------------------------------------------------

                        
Type         Number     Description/Normal Configuration
             of Planes                                                           
-------------------------------------------------------------------------------

                        
Image        48         Holds the color value for each pixel. Organized as two 
                        buffers of 24 planes each
                        
Overlay      8          The overlay data can be transparent or solid. 
                        Organized as two buffers of four planes each.
                        
Depth        24         Holds the depth value for the last pixel written into 
                        the current write buffer. 
                        
P window ID  6          Stores the window ID code for windows used by the 
                        accelerator port processes. Known as the PWID.
                        
Q window ID  4          Stores the window ID code for windows used by the 
                        direct port processes. Known as the QWID.
                        
Fast clear   6          Used to implement the fast clear feature for three 
                        selected image windows.

-------------------------------------------------------------------------------

Figure 2-4 Frame Buffer Memory Plane Groups

The frame buffer supports multiple resolutions, as shown in Figure 2-5. Leo can also operate in a non-standard quad-buffered stereo mode of 960 by 680 pixels. A stereo output signal is provided to switch left and right shutters on stereo goggles or a monitor face plate.

Note that Leo does not have a cursor plane in the frame buffer. The cursor is handled in LeoCross.

Figure 2-5 Frame Buffer to Display Mapping

2.7.1 Image Planes

The 48 image planes hold a color value for each pixel to be displayed. These planes can store image data using one of two color models:

24-bit RGB true color

8-bit indexed color.

The color model is selected on a per-pixel basis by the window ID planes.

In the 24-bit true color model, two separate 24-bit planes are dedicated to displaying the 16.7 million color gamut. Each 24-bit plane is divided evenly into eight bit planes each for the three primary colors: red, green, and blue. The two frame buffers are referred to as buffer A and buffer B. As Leo displays the image in buffer A, the next version of the image is being drawn in buffer B. When the contents of buffer B are complete, the display is switched from buffer A to buffer B.

Normally, the image memory is configured as two 1024 by 1280 (double- buffered) arrays, as shown below:

LeoDraw may reconfigure the image memory to appear as four 960 by 680 two-dimensional arrays (quad buffered):

The ability to reconfigure the image memory aspect ratio and the ability to program the screen refresh circuitry allows the frame buffer to support several different screen resolutions.

2.7.2 Overlay Planes

The eight overlay planes can be thought of as an extra eight-bit indexed color frame buffer. The overlay data can be transparent or solid. The overlay plane can be made visible or invisible. Changing the contents or visibility of the overlay buffer does not alter the image buffer contents. The overlay is used to run the user's desktop and applications not requiring 3D graphics acceleration.

Overlay planes behave much like slide projector transparencies. They enable an image, a mail tool for example, to be temporarily superimposed over another image. In this manner, the data in the image beneath are not changed or affected; the image need not be redrawn when the overlay image is removed.

2.7.3 Depth Planes

The 24-bit depth (Z-buffer) plane stores the depth value for the last pixel written into the current image write buffer. Z-buffering enables Leo to make the portions of an object visible that are nearer to the viewer and hide the portions that should be concealed by other portions of the object. This process is known as hidden surface removal.

Hidden surface removal is performed in the LeoDraw chips. This process tests to determine which faces are in front and removes those surfaces that should be hidden. First, the Z-buffer value at each pixel is set to maximum so that any value written to a pixel is nearer to the viewer than the initial pixel value.

As the application draws the object, LeoDraw compares the depth of the face with the depth of the last value written to the Z buffer for that pixel. If the new face is closer or equal, its color is written to the image memory, and LeoDraw stores the new depth value in the Z buffer, overwriting any previous value for that pixel. If the face is farther away, the face's pixel is discarded, and nothing is changed.

2.7.4 Window ID Planes

The six P window ID planes and the four Q window ID planes store the window identification (WID) code for each pixel in the image buffer and overlay buffer. The WID planes for image and overlay are separate to support un-correlated overlays.

During writes, the current WID code is compared with the stored image WID code for each pixel; writes are not done if the two codes do not match. During overlay plane writes, the current WID code is compared with the stored overlay WID code for each pixel; writes are not done if the two codes do not match. If the Q WID is zero, the P WID is used; else, the Q WID is used.

During display cycles, the stored WID code is used to determine overlay transparency, to specify the current image display buffer, and the output color model for each pixel on the screen.

The ten-bit window ID acts as an index into a window lookup table (WID LUT) in LeoCross to define the window's display properties:

Color mode selection: 24-bit color or 8-bit indexed color

Double-buffered image plane window control

Overlay plane window control

Selection of color maps

Specification of the fast clear windows

2.7.5 Fast Clear Planes

The six fast clear planes are used to rapidly clear the screen between frames so that animation of objects appears smooth on the screen. The six fast clear planes are used to implement the fast clear function for three selected double- buffered image windows. Each fast clear plane pair can be assigned to clear one double-buffered window at hardware speeds. Before the start of a new frame, the appropriate fast clear plane is cleared to all zeros, using a special high-speed clear mode, indicating that the values stored in the image and depth planes are invalid for the specified WID.

As pixels are rendered into the buffer, a 1 is written into the fast clear plane at the pixel location, indicating that the image and depth are now valid. During display refresh, all valid pixels (Fast Clear = 1) are displayed using the color value stored in the image or overlay buffers. Invalid pixels (Fast Clear = 0) are displayed using the color value stored in the fast clear background color assigned to that fast clear set.

2.8 LeoCross

The LeoCross chip contains the window ID look-up tables (WID LUTs) and the color lookup tables (CLUTs), along with the programmable video timing generation and hardware cursor generation logic. LeoCross performs output multiplexing and pseudo-color look-up. The lookup tables are bypassed in the 24-bit bypass mode.

2.8.1 Lookup Tables

As described above, the WID LUT defines the window's display properties of color mode, double-buffering, and so on. The CLUT is used primarily in the eight-bit indexed, or pseudo, color mode. The CLUT is a color map, containing a selection of colors for the particular application.

In the indexed color mode, an eight-bit value from the image plane addresses a location in the CLUT. The CLUT has as many entries as there are pixel values, meaning there are 256 (0 to 255) color map entries for the eight bits. Each of the 256 possible bit combinations, rather than directly dictating the intensity of the CRT electron beam, references an entry in the color map. Figure 2-6 illustrates a sample color map.

Figure 2-6 An Example Color Map (CLUT)

In the above example, the eight-bit color index input from the frame buffer selects an entry in the CLUT containing three eight-bit values; eight bits each for red, green, and blue. The resulting 24-bit output defines the color of the pixel on the screen. Thus, although the application is limited to 256 colors, each of the colors can be selected from a range of 16.7 million colors.

The values in the CLUT are defined by the application developer and are loaded via the direct port.

The RAMDAC contains a third type of lookup table, known as the gamma correction table. Gamma correction is an adjustment to the normal color mapping to make up for non-linearity of the luminescent phosphor in color CRTs. The gamma correction table may be used by all color models.

2.8.2 Programmable Video Timing Generator

The programmable video timing generator provides support for multiple display resolutions. Leo supports the following display resolutions:

1280 by 1024 at 76 Hz, non-interlaced

1280 by 1024 at 67 Hz, non-interlaced

1152 by 900 at 76 Hz, non-interlaced

1152 by 900 at 66 Hz, non-interlaced

1024 by 768 at 76 Hz, non-interlaced

1024 by 768 at 60 Hz, non-interlaced

640 by 480 at 60 Hz NTSC, interlaced

768 by 576 at 50 Hz PAL, interlaced

960 by 680 at 108 Hz stereo, non-interlaced 54 Hz field rate per eye

960 by 680 at 112 Hz stereo, non-interlaced 56 Hz field rate per eye

The programmable video timing generator consists of several programmable registers, which contain information that control the pixel starting and stopping points of such output timing signals as horizontal blanking pulse, horizontal sync pulse, equalization interval, serration interval, vertical blanking pulse, and so on.

Each pixel on the screen has an x address, the pixel number on a line, and a y address, the line number the pixel is on. The top left corner of the screen is pixel number 0 on line 0. The pixel number increases from left to right and the line number increases from top to bottom.

To keep track of the pixel or group of pixels being processed, a horizontal counter, a pixel counter or an x-address counter, are used. The first display pixel of the first line starts at the same rising edge of the clock that disables the vertical blanking signal. At that clock edge, both counters are set to zero - the pixel group 0 at line 0 is being processed. When the counter's values match the values of the event's registers, the event's signals are set or reset, depending on the types of registers.

The programmable video timing generator uses turn-on registers and turn-off registers for many of the video controls. When values of the counters match the contents of the turn-on register, the event's signals are active at the next rising edge of the clock pulse. When values of the counters match the contents of the turn- off register, the event's signals are disabled at the next rising edge of the clock pulse.

There are eight events in the programmable video timing generator (see Figure 2-7):

Horizontal blanking pulse. Uses two registers:

Hblank start address. Enable the start of the horizontal blanking pulse (which pixel).

Hblank end address. Disable the horizontal blanking pulse (which pixel).

Figure 2-7 Video Timing

Horizontal synchronizing pulse. The horizontal sync pulse occurs only during the blanking signal. Uses two registers:

Hsync start address. Enables the start of the horizontal sync pulse (which pixel number).

Hsync end address. Disables the horizontal sync pulse (which pixel number).

Vertical blanking pulse. Uses two registers:

Vblank start address. Enables the start of the vertical blanking pulse (which line number). The vertical blanking pulse is asserted at the Hsync turn-on pixel location.

Vblank end address. Disables the vertical blanking pulse (which line number). The vertical blanking pulse is deasserted at the Hsync turn-on pixel location.

Vertical synchronizing pulse. The vertical sync pulse occurs only during the vertical blanking pulse. Uses two registers:

Vsync start address. Enables the start of the vertical sync pulse, during the vertical blanking interval (which line number). The vertical synchronizing pulse is asserted at the Hsync turn-on pixel location.

Vsync end address. Disables the vertical sync pulse (which line number). The vertical synchronizing pulse is deasserted at the Hsync turn-on pixel location.

Video clock generator. The video clock generator register controls the pixel clock frequency for the specific screen resolution. The register defines the type of frequency synthesizer, the synthesizer output frequency, and the LeoCross prescale value. The pixel clock synthesizer is described in more detail under Section 2.8.3, "Video Clock Generator," on page 2-50.

Equalization pulse. The equalization pulse occurs only during the equalization interval where the serration interval is inactive. Uses two registers:

Equalization pulse start address. Enables the start of the equalization pulse, during an equalization interval (which pixel).

Equalization pulse end address. Disables the equalization pulse (which pixel).

Equalization interval pulse. Uses four registers:

Equalization interval 1 start address. Enables the start of the equalization interval (which line).

Equalization interval 1 end address. Disables the equalization interval (which line).

Equalization interval 2 start address. Enables the start of the equalization interval (which line).

Equalization interval 2 end address. Disables the equalization interval (which line).

Serration pulse. The serration pulse occurs only during the serration interval. Uses two registers:

Serration pulse start address. Enables the start of the serration pulse, during a serration interval (which pixel).

Serration pulse end address. Disables the serration pulse (which pixel).

2.8.3 Video Clock Generator

The video clock generator generates the different pixel clock frequencies necessary to support the various display resolutions. The video clock generator uses a frequency synthesizer (shown as the pixel clock synthesizer in Figure 2-8) that is software controllable via the LeoCross chip Video Clock Generator register.

The video clock generator creates pixel information on five pixel boundaries. The RAMDAC requires that events be placed on two pixel boundaries. To provide the 5-to-2 synchronization, LeoCross uses two state machines that regulate the data extraction process: EXmach and WMach. The EXmach state machine extracts image or overlay data and generates signals that control the 5:2 serialization multiplexer. The WMach state machine extracts window ID information and regulates the serial output enable controls for the frame buffer VRAM, which contains image and overlay data.

Figure 2-8 LeoCross Video Clock Generator

The two extraction state machines are operated from a clock that is derived from the same clock that controls the loading of pixel data into the RAMDAC (DAC_LD and PIX_CLK_DIV2). The clock prescaler is programmable, via the Video Clock Generator register, to divide the PIX_CLK_DIV2 clock by 1, 2, or 4. The rate at which the state machines operate is determined by the screen resolution. Figure 2-8 elaborates a little more on the clock prescaler.

Note that the extraction state machines operate at a clock rate that is defined by the RAMDAC multiplexing factor (two). The Sync Generator operates at some subset of that clock. For example, for a screen resolution of 1280 by 1024 at 67 Hz, the pixel clock is 135 MHz (f), the output of the RAMDAC into LeoCross (DAC_LD) is 67.5 MHz (f/2), and the clock into the Sync Generator is 16.875 MHz (using a prescale value of 4).

2.8.4 Hardware Cursor Generation

Rather than using a cursor plane in the frame buffer, Leo provides hardware cursor generation logic in LeoCross. The cursor information is limited to 32 by 32 pixels. A cursor larger than 32 by 32 pixels must be rendered in software.

Two 32 by 32 by 1-bit RAMs are used to store the cursor data. One RAM contains the cursor color, the other contains the cursor enable. The cursor color is one bit of information per pixel, selecting between two cursor colors. The cursor enable is one bit of information per pixel, enabling or disabling the display of the cursor color. A value of 0 disables the display of the cursor, showing the pixel underneath.

2.9 RAMDAC

The RAMDAC is an Analog Devices ADV7152 10-bit Video RAMDAC. It interfaces with the LeoCommand chip over the CXBus and with the LeoCross chip, to provide video output for the monitor. It includes three 10-bit by 256 word color look-up tables and three 10-bit video digital-to-analog converters.

Leo uses the two-to-one multiplexing capability of the RAMDAC. The pixel inputs that come from the LeoCross chip consists of two eight-bit RGB channels. These channels correspond to two consecutive pixels on the display. In other words, the red, green, and blue pixel inputs all have both an 8-bit A and 8-bit B port for pixel signals from the LeoCross chip.

The software can write and read the red, green, and blue color look-up tables, which are used to provide gamma correction. Gamma correction solves two problems. First, it converts the linear coded color values stored in the LeoCross CLUTs to ratio values for display on the monitor. The ratio values are used because the eye is sensitive to ratios of intensity levels rather than their absolute values. Second, gamma correction compensates for any non-linearity in the monitor.

2.10 CXBus

The CXBus, shown in Figure 2-9, is an eight-bit data bus that connects LeoCommand with LeoCross, the RAMDAC, and the Boot PROM. LeoCommand controls CXBus accesses. The LeoCross and RAMDAC devices contain several registers and memories used to program the video timing generator and to control the cursor, as well as the transformation of pixel data. The CXBus also serves the subsidiary purpose of conveying the data component of Boot PROM transactions.

2.10.1 Bidirectional

The CXBus is bidirectional. LeoCommand serves as the bus master; the other devices are bus slaves. LeoCommand initiates and regulates all transactions. The slaves are only capable of responding.

Figure 2-9 CXBus Block Diagram

2.10.2 Synchronous and Asynchronous Operations

The bus between LeoCommand and LeoCross operates in a synchronous fashion. Information on the bus is not clocked by a strobe signal (chip enable) but rather by the local clock. Data on the bus is guaranteed to be stable prior to the rising clock edge subsequent to the assertion of the data. Similarly, control signals do not cause actions immediately upon assertion. The control signals are sampled by the Leo system clock and the intended actions are produced on subsequent clock boundaries.

The bus between LeoCommand and the RAMDAC is asynchronous. LeoCommand performs the necessary adaptation between synchronous and asynchronous transfers.

2.10.3 Slave Device Response Times

The slave devices on the CXBus have different response times. LeoCross responds more quickly to CXBus transfers than the RAMDAC. The RAMDAC responds more quickly to CXBus transfers than the Boot PROM. Additionally, the RAMDAC and Boot PROM do not produce a signal that indicates the completion of a transaction. Consequently, LeoCommand regulates bus timing, dependent on the slave device being accessed.

2.10.4 Multi-Byte Transfers

The CXBus is limited to transfers of one byte per clock. Bus slave devices, however, have data widths that vary between one and four bytes. Master and slave controllers act as either senders or receivers of data. Senders decompose multi-byte transfers into two or more single-byte CXBus transfers. Receivers re-compose these transfers into a single, multi-byte word.

The CXBus interface between LeoCross and the RAMDAC uses an address auto-increment mechanism in both the byte and word dimensions. This technique improves the bus band-width by obviating the need to transfer word address and datum pairs. LeoCommand transfers a single base address, or index, followed by multiple data transfers that may consist of multiple bytes per word and multiple words. The address increment mechanism automatically adjusts the address in both the byte and word dimensions.

This technique does not preclude the use of the word address and datum pair access mode. In fact, not all devices within LeoCross and the RAMDAC use the incrementing mechanism. Therefore, there are two access strategies: direct and indirect. These two strategies are differentiated by the CXBus CX_C<2:0 signals (which are derived from the SBus address). This addressing is completely hardware controlled; direct and indirect accesses are otherwise transparent to the host software.

2.10.4.1 Direct Accesses

CXBus direct accesses use neither word dimension auto-incrementing nor an index to specify the entry location. The entry location is specified directly by control signals.

2.10.4.2 Indirect Accesses

Indirect byte accesses are subdivided into two types: register and table. Both access types use dimension auto-incrementing, but only the table type uses byte and word dimension auto-incrementing. This distinction is imposed by the RAMDAC to allow read-modify-write operations on registers. The distinction is applied to LeoCross only to maintain interface uniformity.

Both register and table accesses require a preamble phase that consists of a direct write to the address pointer. The data field of this write contains either a register address or an index into a table.

Register Type Accesses

Register type accesses consist of two SBus cycles, the previously-mentioned preamble being the first cycle. The second SBus cycle comprises the data transfer.

The receiving device uses byte dimension auto-incrementing. Upon completion of a word transfer, the receiving device resets the byte dimension address pointer - it does not increment the word dimension pointer.

Table Type Accesses

Table type accesses may consist of a minimum of two SBus cycles or a maximum of n SBus cycles, where n is defined by the depth of the table being accessed. As with register accesses, the first SBus cycle is the preamble and subsequent cycles comprise data transfers.

The receiving device uses byte dimension auto-incrementing. Upon completion of a word transfer, the receiving device resets the byte dimension address pointer and increments the word dimension address pointer.

2.11 Boot PROM

The Boot PROM is a read-only memory that receives its address from the CDBus and places the resulting data on the CXBus. The Boot PROM is addressed in both state set zero and state set one.