1 Introduction





This chapter gives an overview of the Leo card.

1.1 Leo Card Overview

The Leo card is Sun's high-end 3D solids graphics accelerator. The Leo card is a double-wide, double-board-high, SBus card assembly that installs in the workstation, connecting the workstation SBus with the monitor, as shown in Figure 1-1.

    Figure 1-1 Leo Card System Level Block Diagram

1.1.1 Host Interface

Leo is an SBus slave and a DVMA master with the ability to generate interrupts on the SBus. The slave interface provides direct host access to Leo memory and devices, such as control and status registers. The DVMA accesses are used for fetching graphics data from the display list in system virtual memory, along with moving raster data between the frame buffer and system virtual memory.

The host interface includes an interrupt ability that allows Leo to inform the host of various conditions.

The host loads and reads Leo Control and Status Registers (CSR) in programmed I/O mode. The Leo device registers and other storage locations are memory mapped in the Leo address space. In the programmed I/O mode, Leo is an SBus slave.

After the proper control registers are loaded, Leo reads graphics data from the host memory in DMA mode. In the DMA mode, Leo is an SBus master.

The Leo SBus port has the following features:

The Leo SBus interface was designed to the B.0 SBus specification. The following information details the SBus interface specifications.

1.1.1.1 Slave Accesses

Slave supported sizes, for 32-bit mode accesses, are:

    32-bit mode access:

      1-byte transfers
      2-byte transfers (half word)
      4-byte transfers (word)

64-bit mode slave accesses are not supported.

1.1.1.2 Master Accesses

Master write accesses are word size.

Master read accesses are:

    32-bit mode access:

      4-byte transfers
      16-byte transfers
      32-byte transfers
      64-byte transfers

    64-bit mode master accesses are not supported.

1.1.2 Data Paths

The Leo interface to the host workstation is through the LeoCommand ASIC. LeoCommand acts as the system controller for all Leo devices. There are two paths to access the frame buffer from the host: a direct path and an accelerator path. The direct path is accessed through the direct port and the accelerator path is accessed through the accelerator port. The accelerator port is unidirectional, going from LeoCommand to floating point transform, render, and to the frame buffer. The direct port path is bidirectional.

1.1.2.1 Accelerator Port

The accelerator port is used for rendering those primitives that Leo is optimized to accelerate. The primitives that are accelerated are: dots, antialiased dots, vectors, antialiased vectors, and triangles. There are several versions of each of these primitives, allowing various data formats for each primitive. For example, chained or isolated data, color present or not, packed or floating point normals and colors, and so on.

The data to be rendered is put into LeoCommand using programmed I/O or using a DMA engine that fetches the data directly using DMA from host memory. LeoCommand converts chained vertex data in a variety of numeric formats into a small variety of isolated drawing primitives. This means that the data passed to the Floating Point Transform section is isolated triangles, vectors, or dots with floating point coordinates. The consolidation of data formats results in simplifying the Floating Point Transform section, as only a small set of data formats has to be handled.

    Figure 1-2 Leo Data Paths

The Floating Point Transform section converts model space dots, vectors, and triangles into screen space fixed point rendering parameters. The transform, clip-test and clipping, lighting, and set up of screen space rendering is done using microcoded floating point engines.

The Render section receives the render parameters for dots, vectors, and triangles and converts them into pixel operations into the Frame Buffer. The various blend and transparency operations along with Window ID and Z Buffer checks are part of the pixel manipulation operations supported.

The Video Output section provides pixel multiplexing, color look-up tables (LUTs), and digital-to-analog conversion (DAC), resulting in an analog video signal to drive a CRT monitor. The pixel multiplexing operation provides support for various partitions in the frame buffer (Image, Overlay, Cursor, and so on) along with double-buffering of the partitions on a per-pixel basis.

1.1.2.2 Direct Port

The direct port is used to access the frame buffer contents directly along with accessing other Leo devices (other than the frame buffer), such as control and status registers.

The host workstation can read or write to the frame buffer contents directly using the SBus interface slave mode. This path is used mainly for the window system. There are facilities to improve the performance of the window system operation in this path, such as support for rendering text, area move, and fill.

The direct port can also be used to render pixels directly on the frame buffer by the host workstation or the second processor (in multiple-processor configurations). This provides a path to render complex primitives that are not supported through the accelerator port.

The direct port is used to access miscellaneous other Leo devices, including control and status registers, the Boot PROM, and the look-up tables in the Video Output section.

1.2 Leo Card Block Diagram

Figure 1-3 shows the Leo Card block diagram. The individual blocks are described below.

1.2.1 LeoCommand

The LeoCommand ASIC is the system controller for Leo. All Leo memory and devices are memory-mapped through LeoCommand, so all data passed between Leo and the host workstation passes through LeoCommand. LeoCommand is functionally partitioned into a direct port and an accelerator port. LeoCommand also controls the three buses, Command to Float (CF), Command to Draw (CD), and Command to Cross (CX).

The LeoCommand accelerator port contains a DMA engine, input data buffers, floating-to-fixed point and fixed-to-floating point converters, vertex buffers, and bus interface and controls.

Every 3D object that Leo displays is made up of several primitives, such as dots, vectors (line segments), and triangles. For most 3D images, these primitives are combined into triangle strips (meshes of connected triangles).

PHIGS passes commands to XGL which in turn sends some commands and the address of the vertex data to Leo. LeoCommand then acts as a master device on the SBus and reads the data from memory.

LeoCommand performs data conversion operations, such as converting chained vector and triangle data into isolated vectors or triangles. LeoCommand then converts the numerous vertex data formats into a few standard Leo formats, for transmission to one of the LeoFloat ASICs.

LeoCommand has access to the state of each LeoFloat to determine which are busy and which are idle. LeoCommand sends one primitive at a time to an available LeoFloat.

    Figure 1-3 Leo Card Block Diagram

1.2.2 Floating Point Transform

The Floating Point Transform section consists of four LeoFloat ASICs and static RAM (SRAM). Each LeoFloat is a floating point microprocessor with optimized instructions for graphics. The LeoFloat ASICs execute microcode out of their dedicated SRAM. Each LeoFloat has a 128K by 32-bit SRAM. The LeoFloat ASICs are connected in a multiple input, multiple data (MIMD) configuration for performance enhancement.

LeoCommand has access to the state of each LeoFloat to determine which are busy and which are idle. LeoCommand sends accelerator port primitives to an available LeoFloat, which processes the primitive and signals LeoCommand when it has finished processing. LeoCommand then enables the results to be sent from LeoFloat to the Render section.

LeoFloat converts the individual dot, vector, and triangle from 3D model coordinates to 3D world coordinates to 3D device coordinates (frame buffer location and Z-buffer values). LeoFloat also performs lighting calculations that result in three floating-point values; the red, green, and blue values for each vertex. These values are a function of the color and surface properties of the triangle, the position of the lights, and the angle that the light hits the surface. LeoFloat also clips the image to fit the window.

LeoFloat also "sets up" the triangles, lines, and dot primitives for LeoDraw. This invloves computing edge slopes, color slopes, and so on.

When LeoFloat has finished its tasks, LeoCommand enables the results to pass from LeoFloat to the render section; the LeoDraw ASICs. LeoFloat is now ready to begin the transformation of another dot, vector, or triangle.

1.2.3 Render

The Render section consists of five LeoDraw ASICs. Each LeoDraw controls one-fifth of the Frame Buffer memory, which is organized into a five-by-one interleave factor. After receiving their input, each LeoDraw operates independently.

LeoDraw has two paths: the accelerated path and the direct path. The accelerated geometry path contains the drawing hardware for 3D images. The direct path allows free access to the frame buffer for the window system and for 2D applications.

For 3D accelerated operations, LeoDraw converts the dot, vector, or triangle parameters into pixel operations into the frame buffer. For the vertices, LeoDraw receives various values, such as the slopes of the sides of a triangle, and the corresponding increments for the red, green, and blue color values. LeoDraw fills in all the intermediate pixels outlined by the vertices.

Each side of a triangle has a line that is formed by several points on the pixel sampling grid, and each of the points needs R, G, B, and Z values. LeoDraw calculates these values, taking care to adjust the sampling grid so that there are no cracks or other artifacts in polygon mesh surfaces.

LeoDraw now determines, through interpolation, the R, G, B, and Z values for the pixels that comprise the span. Each pixel of the interpolated span is passed to the frame buffer. It is important to note here that Leo uses subpixel precision arithmetic to eliminate motion artifacts due to endpoint sampling.

LeoDraw is also responsible for raster operations, vector anti-aliasing, alpha transparency, and Z-buffer algorithms. LeoDraw compares the Z value of each pixel with the previous value in the Z buffer (one of the frame buffers). If the new Z value is greater, meaning the new pixel is further in the background than the previous pixel, the previous value is not changed. This avoids display of hidden surfaces, as the new pixel is part of a surface that is behind what is already showing on the screen. If the new Z value is less than or equal (the new surface is in front of the old), LeoDraw writes the pixel's RGB values into the frame buffer image planes and updates the contents of the Z buffer.

Each pixel has associated with it a window ID (WID) number. If the new WID does not match the current WID, the pixel belongs to a window currently obscured on the screen, and no writes occur to the image planes. If the two WID values match, the image memory is updated with the new RGB values.

1.2.4 Frame Buffer

The Frame Buffer consists of 1280 by 1024 pixels by 96 planes. The memory planes are organized as follows:

-------------------------------------------------------------------------------
Type Number Description/Normal Configuration of Planes -------------------------------------------------------------------------------
                        
Image        48         Holds the color value for each pixel. Organized as two 
                        buffers of 24 planes each
                        
Overlay      8          The overlay data can be transparent or solid. 
                        Organized as two buffers of four planes each.
                        
Depth        24         Holds the depth value for the last pixel written into 
                        the current write buffer. 
                        
P window ID  6          Stores the window ID code for windows used by the 
                        accelerator port processes. Known as the PWID.
                        
Q window ID  4          Stores the window ID code for windows used by the 
                        direct port processes. Known as the QWID.
                        
Fast clear   6          Used to implement the fast clear feature for three 
                        selected image windows.

-------------------------------------------------------------------------------

    Figure 1-4 Frame Buffer Memory Plane Groups

The Frame Buffer supports multiple resolutions, as shown in Figure 1-5. Leo can also operate in a quad-buffered stereo mode of 960 by 680 pixels. A stereo output is provided to switch left and right shutters on stereo goggles or a monitor face plate.

    Figure 1-5 Frame Buffer to Display Mapping

1.2.4.1 Image Planes

The 48 image planes hold a color value for each pixel to be displayed. These planes can store image data using one of two color models:

The color model is selected on a per-pixel basis by the window ID planes.

In the 24-bit true color model, two separate 24-bit planes are dedicated to displaying the 16.7 million color gamut. Each 24-bit plane is divided evenly into eight bit planes each for the three primary colors: red, green, and blue. The two frame buffers are referred to as buffer A and buffer B. As Leo displays the image in buffer A, the next version of the image is being drawn in buffer B. When the contents of buffer B are complete, the display is switched from buffer A to buffer B.

Normally, the image memory is configured as two 1280 by 1024 (double- buffered) arrays, as shown below:

LeoDraw may reconfigure the image memory to appear as four 960 by 680 two-dimensional arrays (quad buffered):

The ability to reconfigure the image memory aspect ratio and the ability to program the screen refresh circuitry allows the frame buffer to support several different screen resolutions.

1.2.4.2 Overlay Planes

The eight overlay planes can be thought of as an extra eight-bit indexed color frame buffer. The overlay data can be transparent or solid. The overlay plane can be made visible or invisible. Changing the contents or visibility of the overlay buffer does not alter the image buffer contents. The overlay is used to run the user's desktop and vanilla applications not requiring 3D graphics acceleration.

Overlay planes behave much like slide projector transparencies. They enable an image, a mail tool for example, to be temporarily superimposed over another image. In this manner, the data in the image beneath are not changed or affected; the image need not be redrawn when the overlay image is removed.

1.2.4.3 Depth Planes

The 24-bit depth (Z-buffer) plane stores the depth value for the last pixel written into the current image write buffer. Z-buffering enables Leo to make the portions of an object visible that are nearer to the viewer and hide the portions that should be concealed by other portions of the object. This process is known as hidden surface removal.

Hidden surface removal is performed in LeoDraw. This process tests to determine which faces are in front and removes those surfaces that should be hidden. First, the Z-buffer value at each pixel is set to infinity so that any value written to a pixel is nearer to the viewer than the initial pixel value.

As the application draws the object, LeoDraw compares the depth of the face with the depth of the last value written to the Z buffer for that pixel. If the new face is closer, its color is written to the image memory, and LeoDraw stores the new depth value in the Z buffer, overwriting any previous value for that pixel. If the face is farther away, the face's pixel is discarded, and nothing is changed.

1.2.4.4 Window ID Planes

The six P window ID planes and the four Q window ID planes store the window identification (WID) code for each pixel in the image buffer and overlay buffer. The WID planes for image and overlay are separate to support un-correlated overlays.

During writes, the current WID code is compared with the stored image WID code for each pixel; writes are not done if the two codes do not match. During overlay plane writes, the current WID code is compared with the stored overlay WID code for each pixel; writes are not done if the two codes do not match. If the Q WID is zero, the P WID is used; else, the Q WID is used.

During display cycles, the stored WID code is used to determine overlay transparency, to specify the current image display buffer, and the output color model for each pixel on the screen.

The ten-bit window ID acts as an index into a window lookup table (WID LUT) in LeoCross to define the window's display properties:

1.2.4.5 Fast Clear Planes

The six fast clear planes are used to rapidly clear the screen between frames so that animation of objects appears smooth on the screen. The six fast clear planes are used to implement the fast clear function for three selected double- buffered image windows. Each fast clear plane pair can be assigned to clear one double-buffered window at hardware speeds. Before the start of a new frame, the appropriate fast clear plane is cleared to all zeros, using a special high-speed clear mode, indicating that the values stored in the image and depth planes are invalid for the specified WID.

As pixels are rendered into the buffer, a 1 is written into the fast clear plane at the pixel location, indicating that the image and depth are now valid. During display refresh, all valid pixels (Fast Clear = 1) are displayed using the color value stored in the image or overlay buffers. Invalid pixels (Fast Clear = 0) are displayed using the color value stored in the fast clear background color assigned to that fast clear set.

1.2.5 Video Output

The Video Output section consists of a LeoCross ASIC and a digital-to-analog converter (RAMDAC). The LeoCross ASIC contains the window ID look-up tables (WID LUTs) and the color lookup tables (CLUTs), along with the programmable video timing generation and hardware cursor generation logic.

1.2.5.1 Lookup Tables

As described above, the WID LUT defines the window's display properties of color mode, double-buffering, and so on. The CLUT is used primarily in the eight-bit indexed, or pseudo, color mode. The CLUT is a color map, containing a selection of colors for the particular application.

In the indexed color mode, an eight-bit value from the image plane addresses a location in the CLUT. The CLUT has as many entries as there are pixel values, meaning there are 256 (0 to 255) color map entries for the eight bits. Each of the 256 possible bit combinations, rather than directly dictating the intensity of the CRT electron beam, references an entry in the color map. Figure 1-6 illustrates a sample color map.

    Figure 1-6 An Example Color Map (CLUT)

In the above example, the eight-bit color index input from the frame buffer selects an entry in the CLUT containing three eight-bit values; eight bits each for red, green, and blue. The resulting 24-bit output defines the color of the pixel on the screen. Thus, although the application is limited to 256 colors, each of the colors can be selected from a range of 16.7 million colors.

The values in the CLUT are defined by the application developer and are loaded via the direct port.

The output RAMDAC contains a third type of lookup table, known as the gamma correction table. Gamma correction is an adjustment to the normal color mapping to make up for non-linearity of the luminescent phosphor in color CRTs. The gamma correction table may be used by all color models.

1.2.5.2 Video Timing Generator

The programmable video timing generator provides support for multiple display resolutions. Leo supports the following display resolutions:

The programmable video timing generator consists of several programmable registers, which contain information that control the pixel starting and stopping points of such output timing signals as horizontal blanking pulse, horizontal sync pulse, equalization interval, serration interval, vertical blanking pulse, and so on.

1.2.5.3 Hardware Cursor Generation

Rather than using a cursor plane in the frame buffer, Leo provides hardware cursor generation logic in LeoCross. The cursor information is limited to 32 by 32 pixels. A cursor larger than 32 by 32 pixels must be rendered in software.

Two 32 by 32 by 1-bit RAMs are used to store the cursor data. One RAM contains the cursor color, the other contains the cursor enable. The cursor color is one bit of information per pixel selecting between two cursor colors. The cursor enable is one bit of information per pixel, enabling or disabling the display of the cursor color.

1.2.6 Boot PROM

The Leo Boot PROM is compliant with the Open Boot PROM specification. The device used is reprogrammable, allowing for simple upgrade of the PROM contents without having to replace the device.

1.3 Software Description

Figure 1-7 shows the software interfaces to Leo. The four major software elements are:

There are two independent ways to communicate with the hardware; the direct port and the accelerator port. The hardware keeps two separate contexts to allow simultaneous access by the direct and accelerator ports without context switching.

The direct port interface is used by the window server using the Shapes interface along with window clients doing DGA (Direct Graphics Access) style graphics to access the frame buffer. The direct port interface is meant to be used for many off-the-shelf applications that don't require the sophistication and performance provided by the accelerator port.

The accelerator port is used by 3D applications to use the full extent of the Leo accelerator. The accelerator port is driven by the XGL. The single accelerator port context must be saved and restored while switching between multiple accelerator applications.

The use of a second processor to execute LeoSparc code may provide concurrency to allow the host CPU to do other tasks while graphics are being rendered. In MCAD, for example, the model database can be updated while the graphics are being rendered in parallel by the second CPU.

    Figure 1-7 Leo Software Interfaces

1.4 Board Layout

Leo consists of two boards, known as the upper board and the lower board. Figure 1-8 shows the layout of the upper board, Figure 1-9 shows the layout of the lower board.

    Figure 1-8 Leo Upper Board Layout

    Figure 1-9 Leo Lower Board Layout

1.5 What the Schematics Contain

This section describes the contents of each page of the schematics. The schematics are divided into two sets: lower board and upper board.

1.5.1 Lower Board (502-1844)

Sheet 1. Schematic cover page and list of contents.

Sheet 2. The main SBus connector (Leo is a double-wide assembly, spanning two SBus connectors).

Sheet 3. The second SBus connector, clock generator and crystal, and scan (JTAG) connector. The clock generator generates the 25 MHz clocks used by the circuits on both boards. The scan connector is used for board test.

Sheet 4. LeoCommand ASIC and Boot PROM. The LeoCommand ASIC is described in Chapter 2. The Boot PROM is described in Chapter 12.

Sheets 5 through 8. LeoFloat ASICs and microcode SRAM. The LeoFloat ASIC and associated SRAM are described in Chapter 6.

Sheet 9. Inter-board connectors. Connects the lower board to the upper board. Signals shown as "IN" are from the lower board to the upper board. Signals shown as "OUT" are from the upper board to the lower board.

Sheet 10. Clock fan-out driver.

Sheet 11. Pull-up and pull-down resistors and source of Boot PROM Vpp for power-up. Also includes the PROM Vpp jumper used for programming the boot PROM (a flash EEPROM).

Sheet 12. Bypass capacitors.

Sheets 13 and 14. Test point connectors for board bring-up only.

1.5.2 Upper Board (502-1843)

Sheet 1. Schematic cover page and list of contents.

Sheet 2. Inter-board connector. Connects the lower board to the upper board. Signals shown as "IN" are from the upper board to the lower board. Signals shown as "OUT" are from the lower board to the upper board.

Sheets 3 through 7. LeoDraw ASICs. The LeoDraw ASICs are described in Chapter 7.

Sheet 8. Z buffer DRAMs for banks 0 through 4. The DRAMs are described in Chapter 8.

Sheets 9 through 13. VRAMs for image, overlay, fast clear, and window ID planes, banks 0 through 4. The VRAMs are described in Chapter 8.

Sheet 14. LeoCross ASIC. The LeoCross ASIC is described in Chapter 9.

Sheet 15. Output RAMDAC and video connector. The Output RAMDAC is described in Chapter 10.

Sheet 16. Pixel clock synthesizer and source of +10 Vdc for the stereo connector. The synthesizer is described in Chapter 11.

Sheet 17. Pull-up and pull-down resistors.

Sheet 18. Termination and damping resistors.

Sheet 19. Bypass capacitors.