The TAYRA 3-D Graphics Raster Processor

The TAYRA 3-D Graphics Raster Processor

Comput. & Graphics, Vol. 21, No. 2, pp. 129-142, 1997 0 1996 ElsevierScienceLtd. All rights reserved Printed in Great Britain 009778493/97s17.oo+o.oo ...

9MB Sizes 24 Downloads 154 Views

Comput. & Graphics, Vol. 21, No. 2, pp. 129-142, 1997 0 1996 ElsevierScienceLtd. All rights reserved Printed in Great Britain 009778493/97s17.oo+o.oo

PII: soo97-8493(9t5)fMo76-3

Graphics Hardware

THE TAYRA

3-D GRAPHICS

RASTER PROCESSOR

MARTIN WHITE+, MIKE BASSETT, DAIRSIE LATIMER, SHAUN MCCANN, ALEX MAKRIS, MARCUS WALLER, GRAHAM DUNNETT, JOACHIM BINDER and PAUL LISTER University of Sussex, School of Engineering, Centre for VLSI and Computer Graphics, Falmer, Brighton, BNl 9QT, UK e-mail: [email protected]

Abstract-This paperdescribes the architectureof a 3-D GraphicsRaster Processor called TAYRA. TAYRA consists of a Graphics Raster Pipeline with five major external interfaces: PC1 Master/Target, Depth, Texture, CoIour and Video Interfaces. The Graphics Raster Pipeline performs all the major OpenGL style raster functions: scan conversion of lines, spans, triangles and rectangles, perspective correction of texture co-ordinates, mip-map level of detail selection, and many other texture modes, alpha blending, and other functionahties. Further, through TAYRA’s fast PC1 to buffer access mechanisms it can do advanced stencilling, multi-pass antiahasing, and other algorithms; all accelerated in hardware with a sustained pixel write speed of 29 Mpixels/s (peak of 33 Mpixels/s). This translates to an estimated peak performance of 890 K/triangles/s for 25 pixel triangles. 0 1997 Elsevier Science Ltd

1. TAYRA

FEATURES

3. INTERFACE

TAYRA is a 3-D Graphics Raster Processoraimed at the high end of 3-D rastergraphics.It is the latest in a generation of 2-D and 3-D graphicsprocessors developed on European funded projects involving several European companiesand universities: IMAGE [1], a 3-D Gouraud shadingand antialiasing ASIC; STEP [2], a 3-D texture mappingASIC; and MEDIA [3], a 2-D graphics and video controller ASIC. TAYRA provides high quality and speed through its extensive functionalities and interfaces, and alsoprovidesa low systemcomponentcount due to its high level of integration. TAYRA’s comprehensive selection of features are summarizedin Table 1.

SPECIFICATION

TAYRA employs five major interfaces to the system environment: PC1 Master/Target, Depth Buffer, Texture Buffer, Colour Buffer, Video Interface. We have implementeda 32 bit Master/Target PC1 Local Bus Revision 2.1 Interface [5]. The memory clock frequency is always twice that of the graphicspipeline, which helps to remove the traditional memory accessbottleneck. Both the graphics pipeline and the memory systems are clocked independentlyfrom the PC1bus. We have alsochosento implementthree memory interfaces:texture, depth and colour, giving complete independencebetweenthe accessprotocol used by each and enabling a dedicated and optimized memory systemto be configured for each system’s 2. PERFORMANCE ESTIMATES particular requirements.It alsomeansthat the latest Table 2 givesthe estimatedperformancefiguresfor memory technology can be employed.For example, TAYRA. the advanced block write feature of SGRAM is The performance figures above are basedon a useful for buffer clear operations in the depth 33 MHz graphicspipeline clock and 66 MHz mem- memory and the advancedBitBLT capabilitiesand ory controller clocks.However, we are still PC1input block write functions of the new Window RAM bandwidth limited to a peak 890Ktriangles/s. A (WRAM) can be employedto bestadvantagein the future solution to this bottleneck could be the colour buffer. The dual port capability of the Accelerated Graphics Port (AGP) architecture [4]. WRAM can also be usedto optimize the display of AGP is an extensionto the PC1 architecturewhich the colour data. addsa demultiplexedaddressbus,pipelinedtransfers Thesememory interfaceshave beendesignedin a and 133MHz transfer rates to boost graphics genericand programmableway, which allows us to performance.Another solution is to integratea setup support SGRAM, SDRAM, VRAM, DRAM and engine and vertex level interface to allow triangle WRAM, for accessing the depth, texture and colour strips to be processed.This effectively decreases buffers. These memory controllers all arbitrate primitive vertex data crossing the PC1 from three between the host (via TAYRA’s PC1 interface vertices per triangle to about one. through the on-chip communication FIFOs) and the graphicspipeline.Also, a global memory refresh controller which, discounting any memory refresh + Author for correspondence. accesses, refreshesevery row every 17ms. 129

130

M. White et al. Table 1. TAYRA

features

Features 32 bit PC1 Local Bus Interface High Performance Communication FIFOs Scan Conversion of Lines, Spans, Triangles and Rectangles Integratzd Depth Test and Buffer Controller OpenGL Fogging Fast Context Switching Program Register Set Colour Buffer Controller True Colour Double Buffer (Up to 16 bit, 1600x1200) Display Resolution (320x240 to 1600x 1200) Depth Buffer Controller True Floating Point Perspective Correction Hardware Mip-Mapped Level of Detail 32 bit RGBA Texture Mip-Mapping Square, Linear, Point2D M&Map Support

Wrap (W), Invert (In), Ignore (Ig), Clamp (C) Texture Tiling Texture Colour Map Format Modes OpenGL Texture Decal and Modulate 32 bit RGBA Texture Point, Linear, Bilinear, Trilinear Filtering OpenGL Blending (Transparency, Antialiasing) Multi-pass Operations, e.g. Antialiasing, OpenGL Stencilling Clipping to regions Host to Depth, Texture, Colour Buffer Access Fast 8.5 Gbyte/s Depth Buffer Clear Operation Fast 8.5 Gbyte/s Colour Buffer Clear Operation Depth and Texture Buffer Support (SGRAM or SDRAM) Colour Buffer support (DRAM, VRAM or WRAM) Global Depth, Texture, Colour Refresh Controller

Table 2. TAYRA peak performance comparison Performance Equivalents Pixel writes TAYRA* 3-D triangles 32 bit RGBA Gouraudshaded, 32bit depth

25 pixels/triangle

50 pixels/triangle

890 K/s

515 K/s

buffered,clipped,stencilled, alphablended,mip-mapped, texturemapped

Pixel writes

10 pixels/line

TAYRAt lines 32 bit RGBA Gouraud shaded, 24 bit depth buffered, antialiased TAYRA$ blockmoves 32 bit RGBA and 32 bit de&i

1.6 M/s 132 Mbytes/s

*T&c performance figures are based on TAYRA’s PC1 bandwidth and Z-Buffer memory controiler. For example, if TAYRA’s pipeline is clocked at 33 MHz (66 MHz memory controller) then for a triangle whose pixels exhibit no page faults TAYRA can output 33 Mpixels/s. However, TAYRA has to buffer 12% of a triangles pixels due to Z-Bit&r page faults. TAYRA, therefore outputs 29 Mpixels/s. Thus, TAYRA canstream 29 Mpixels/st25pixels/triangle= 1.16Mtriangles/s past the Z-Buffer controller. But anaverage texturedandi!hnninated triangleat thePC1 input consists of 148 bytes of dataloadedby thehost.Thus,with apeakPC1bandwidthof 132Mbytes/s we can see that for 25 pixel triaugles TAYRA is limited by the PC1 bandwidthto 132 Mbytes/s+148 bytes/triangle= 890 Ktrian&s/s. Similarly, 50 pixel triangles are limited by the pipeline bandwidth. tLimited by the peak PC1 bandwidth of 132 Mbytes/s. A line on average is composed of about 84 bytes of data. Therefore, 132 Mbytes/s +84 bytes = 1.6 Mlines/s. $TAYRA’s block move drawing performance is based on the PC1 input bandwidth of 132 Mbytes/s and TAYRA’s WRAM colour memory controller output bandwidth of 132 Mbytes/s. The datapath through TAYRA can also sustain 132 Mbytes/s.

4. ARCHITRCTURAL

OVERVIEW

The objectivesfor TAYRA were to designa high performance 3-D Graphics Raster Processorwhich implements all the common 3-D graphics raster functionalities. Further, TAYRA shouldinterface to the latest memory technologiesin order to achieve unrivalled performance.Ail this is to be integratedin a single VLSI package interfaced to a PC1 based host. An overview of the TAYRA is illustrated in Fig. 1. 4.1. Scan conversionmodule The scanconversionmoduleexhibits the following functionality:

Fig. 1. TAYRA

architecture overview.

TAYRA 3-D Graphics Raster Processor

131

Address Register

Colour, ’ Depth ) Depth Parameter Test ~terpoladon Request

Depth Test Return t

2 Test in Depth Memory Controller

1

Fig. 2. Scan conversion module.

(I) Primitive traversal using linear edge functions [6(2) (3) (4) (5) (6) (7)

81

Incremental interpolation of all parameters Antialiasing using coverage masks [9] Floating point texture interpolation Floating point perspective correction Integer colour and depth interpolation Depth test integrated with depth memory controller (8) Texture mip-map level of detail calculation The organization of these functionalities within the scan conversion module [lo] is illustrated in Fig. 2. It

P

1

“ev

is beyond the scope of this paper to describe in detail all the functional implementations of the scan conversion module. Instead we select several which may prove interesting. 4.1.1. Primitive traversal module. An abstract diagram of the primitive traversal module architecture is shownin Fig. 3. Note that the data inputs to the interpolators and the b, El, E2 edge function interpolator outputs are not shown to maintain clarity. 4.1.1.1. Interface description. The primitive identity code format is shownin Fig. 4. This code not only definesthe primitive type but alsoprovides

im sea

Fig. 3. TAYRA scanconversion modulearchitecture.

132

M. White et al

bit

6

5

mpl

purpose

EWZ

swap x,y

4

3

reverse mvefse X y

2

I

0

ID,

ID,

ID,

Fig. 4. Primitive ID code.

the information required to map the primitive from the default octant to another. The three identity bits, ID*+, define the primitive to be drawn. Bits three and four allow the direction of scanconversionto be reversed,essentialfor octant re-mapping of lines. The default is to increment x and y. Bit five tells the scanconverter that it should swap the x and y co-ordinates, so reflecting the primitive in the line y =x. Bit six defineswhat should occur when the interpolated midpoint line error is equal to zero. This bit allows the user to be consistentwhen drawing linesin different octants. In addition to the primitive identity, primID, there are three other control words associatedwith the scan conversion module: exiCTRL, edgeType, and aaCTRL. The exiCTRL control output--short for external interpolator control--controls interpolators external to the scanconversionmodule.The control signal is identical to the internal edge function control signal. The edgeType control input definesthe characteristics of triangle edges.There are two bits per edge, one indicating if the edgeN is thick or thin, thickE,, the other specifieswhether the edge should be antiaiiased,AAN. The various fieldsof the edgeType control work are definedin Fig. 5. The aaCTRL control output specifieswhat the antialiasingunit shoulddo with the pixel, seeFig. 6. It is comprised of the edgeType control word, aaCTRLS4, and bits that specify the nature of the pixel with regard to the three edges.If thickP, is a

bit

logical “I” then the pixel is thick with regard to edge N. Similarly, if insideNis a logical ” 1” then the pixel is totally covered by edgeN. If a pixel is thick with regard to edge N, (thickP,=“l”), but the edgehas been defined as a thin edge, (thickEN=“O”), then the pixel must be marked invalid by the antialiasingmodule, thus not displayed. 4.1.1.2. Traversal FSM descriptions. The primitive traversal module implements four traversal algorithms which are used to draw: lines, spans. trianglesand rectanglesin various modes.The scan conversionalgorithmsuselinear edgefunctions and the concept of look ahead [I 11. Look ahead in general involves looking at neighbouring pixels in advance in order to determine the next pixel to move to. In the caseof TAYRA we look aheadto two neighbouring pixels, one in the horizontal plane, and the other in the vertical plane. The current pixel parametershave already been calculated and are waiting in interpolation registers,and so can be passedon to the next pipeline stage without further computation. However, the parameter values for the two neighbouring pixels are calculated,and dependingon the current state, are stored in interpolation registersat the end of the current cycle. They then become available in the next cycle. 4.1.1.3. Triangle FSM. The eight states of the triangle scanconversionalgorithm and its FSM are illustrated in Fig. 7. The current pixel is marked by a

5

4

3

2

thickE,

AA,

thickE,

AA,

I

thickE,

Fig. 5. Edge type controlword (edgeType).

hit

pwposc

II

thickP,

IO

inside,

9

thiokP,

R

inside,

7

thickP,

Fig. 6. Anti&as

6

inside.,

5

thickE$

4

AA,

control word (mCTRL).

0

AA,

TAYRA

3-D Graphics Raster Processor

133

tri-down

tri-right

tri-right-push

tri-back-to_centre-push

tri-back-to-centre

Fig. 7. Scan conversion of triangles and the triangle FSM.

E : edge distance 21srls

s : slope 5.8 ns

ä

26.2 ns

)

Total 53.5 11s

Propagation delays for ES2 ecdm05 library using MAX-IND operating conditions, (industrial worst cast).

octant

mirroliug r E,%

Mask LUT

octaut mirroriug

Fig. 8. Pixel coverage module.

Subpixel count

134

M. White et ~1.

r-l-~

1

sJ3ut

t$ut

t-mult

wait out

Fig. 9. Architecture of the perspectivecorrection module.

thick boundary and the two neighbouring pixels looked ahead to are marked with a dot at their centres. The lightly shaded pixel is the previous pixel, if there was one. The eight states are described below:

(8) tri-fp: This state is similar to tri-1 bit assumes that a valid pixel hasnot beenfound on the next scan line, and so continues to push the pixel beneath the current pixel into the storage register.

This is the first state in the scan conversion of any triangle, and so there is no previous pixel. The scanconverter must start at the uppermost point of the triangle. After starting, the scan converter will try to go right, and so looks ahead to the pixel immediatelyto the right of the start pixel. (2) tri-r: When in this state the pixel to the right of the previouspixel becomesthe current pixel, and is output. The scanconverter looks aheadto the pixel to the right of the current pixel. (3) tri-btc: This state returns the rasterizer to the first pixel visited on the scanline, assumedto be near the centre of the scanline. (4) tri_l: It is the exact oppositeto tri-r. (5) tri-dz It takes the scan converter down to the next scanline. (6) tri-rp: This state is similar to tri-r but it assumes that a valid pixel hasnot beenfound on the next scan line, and so continues to push the pixel beneath the current pixel into the storage register. (7) tri-btcp: This state is similar to tri-btc but assumesthat a valid pixel has not been found on the next scanline, and so continuesto push the pixel beneath the current pixel into the storageregister.

The algorithm describedabove is a little inefficient in that the initial pixel in a scanline is alwaysvisited twice, and flaggedasinvalid on the secondoccasion. This inefficiency can be overcome by adding extra look aheadlogic to determinethe pixel co-ordinates to the left of the initial pixel in a scanline aswell [Ii]. However, the final choice of algorithm is a cost versusperformancetrade-off basedon the level of look aheaddesired. Our scan conversionalgorithm treats all triangle edgesas thick. When a thin edge is encountered, somepixels will be produced that are outside that edge.Thesepixels will be marked invalid in a latter pipelinestage.This approachis necessarywhenusing edgefunctions to define the boundary of a triangle, otherwise pixels will be omitted from very thin triangles. 4.1.1.4. Line, span and rectangles FSM. The line scanconversionalgorithm is a hardware implementation of the midpoint line algorithm describedby Foley [12]. The algorithm can only step in two directionsfrom any given pixel, East and Southeast. This only gives linesin one octant but mechtisms are provided to map the line into other octants. There are two types of span:the horizontal and the vertical span, but only the horizontal span is implemented by the scan conversion unit. The

(1) tri-start:

135

TAYRA 3-D Graphics Raster Processor exponent 30

signifiesad 23 22

I5

I I

0

14

I

I

//8

iS weight

t-adz v /

i-table 256x25

exponent

V

\ j-table 256x16

sigaitkand

Fig. 10.Architectureof thefloating-pointreciprocalunit.

vertical span is derived from the horizontal spanby swappingthe co-ordinates.This is performedby the octant re-mappingmechanisms. The scanconversion of rectanglesstart at the corner nearestthe origin. The scan converter scansright until the rectangle boundary is met, and then movesdown to the next scanline. The FSMs for lines,spansand rectangles can easily be derived, and are thus not shown for clarity. 4.1.2. Pixel coverage module. The pixel coverage module implementationis basedon the algorithms described in [9], see Fig. 8, which incidentally indicates the propagation delays for the ES2 ECDMOS technology library. Clearly, for the

performance figures given above, pipelining is required. The coverage values generatedare used in the alpha blending module further down the graphics pipelinefor antialiasing. 4.13. Perspective correction module. Without going into the theory of why texture mapping in particular suffersfrom perspectivedistortion effects, see[13-H], the function of the perspectivecorrection module is to divide the floating point interpolated texture co-ordinatesSand T by Q at pixel rate. Fig. 9 illustrates the architecture of the floating point perspectivecorrection module. Note that because the reciprocal calculation and the multiplication

M. White et

136

ul.

Vi&k bscwed

179/256 1541256 3 4

1281256

E 102Q56

511256

EXP~:

./'= P"'

-+ Lookup[(Z,,,Z)(Z,,,Z)] I

261256 -.+-..

-0

0,s

I

I.5

2

2.5

3

3.5

4

4.5

5

5.5

0

h.5

T:

7.5

.-; 8

Density x Depth (d.Z) Fig.

Il. Fog factor lookup table.

cannot be done in one clock cycle there is a second pipeline register set in between them. A more detailed explanation of how the perspective correction module works and the rationale for its architecture is given in [ 161.

Program Register Set

The architecture chosen for implementing the floating point reciprocal components was influenced by work done at the IBM Thomas J. Watson Research Center by Narayanaswami ]17] and at the University of Sussex by Westmore [18]. For interests

Program Register Set

Texture Mwry Controkr Fig. 12.Overview of texture mapping module

TAYRA 3-D Graphics Raster Processor col-result

137

So, a fixed-point division is used in any case. The floating-point division is just expanded by some adjustments and exponent calculations. After studying all the above mentioned alternatives in terms of gate count and calculation time the final choice was made for splitting up the division in a floating-point reciprocal unit followed by a floating-point multiplier. In the special case of texture mapping this partitioning was made due to the fact, that the reciprocal unit is used once for calculating l/Q, and the multiplier is used twice for calculating S/Q and T/Q for the texture parameters S and T, respectively. The schematic for the division unit is shown in Fig. 10.

4.2. Depth test module The depth interpolator is implementedearly to Fig. 13. co&,,,, in dependenceof frac for co&=0 and col,= 255. eliminate unnecessarytexture and colour accesses. The depth test takes a variable amount of time due to pagemisses,etc. so we soak up this latency with FIFOs throughout the graphics pipeline. Incidensakewe indicate the approachusedto compute the tally, only about 19 Kbits of on-chip memory is floating point reciprocal unit. The floating point needed to implement FIFOs, however, without format of the reciprocalunit is definedby the IEEE DRAM macrocellsthis translatesinto about 130K Standard 154 on floating-point arithmetic, which gateequivalents.The depth test operationsare those definessingle-precisionfloating-point format to be a supported by OpenGL [21]: GL-NEVER, sign bit, an 8 bit exponent and a 23 bit significand, GL-EQUAL, GL-GREATER, GL-GEQUAL, see[19, 201. GL-NOTEQUAL, GL-LESS, GL-ALWAYS, Becausea divider cannot be synthesizedautoma- GL-LEQUAL. tically by a synthesistool (like Synopsysor AutoLogic II) it has to be designedby hand. Several alternatives can be used(refer to standard texts on 4.3. Fogging module We have implemented the OpenGL fogging the designof arithmetic logic) to realisea divider: modes: GL LINEAR, GL-EXP and GL-EXP2. l By sequentialshift-subtract/adddivision The expone&als are implementedthrough optimised l By convergencedivision, for example Newtonlookup tables, see Fig. 11, which illustrates the Raphson-iteration theory and architecture. l By an array divider l By splitting up the division in a reciprocatorand a 4.4. Texture mapping module multiplier [ 17, 181 The texture mapping module provides point Furthermore, you have the choice of either using sampling, linear, bilinear and trilinear filtering, fixed-point numbers or floating-point numbers. palletized 8 bit colour mode and OpenGL style Note, that if you designa floating-point divider the texture illumination (decal and modulate). Figure division of the significandsis a fixed-point operation. 12 provides an overview of the texture mapping

col-result black

White

white

black

t

Fig.

14. Correct

and wrong

values of CO&,,,

on a black/white

texture

map.

138

M. White et (11. 1

cohxlrl

33 ,

-

32, ,

-

m‘-61 TF

> colod!

-

32, ,

-

colourd

32, t

-

g1zan8x4

32, I

blucllx4

32,

>

r

lb

filkr-,lt

5 I

filterJ3

Linearc&

>

coloar3

8/ 5

I

xl--

8 7{

fibrJ3

-nFig. 15. Texture filter architecture.

module, while shows in detail the architecture of the filter component. It is beyond the scope of this paper to describe every part of the texture mapping module, however, because texture filtering plays an important part in high quality graphics, we describe this component in more detail. 4.4.1. Texture filter component. The most basic functional unit of the texture filter componentis a linear interpolator. The linear interpolator is arrangedin various ways to implementlinear, bilinear and trilinear interpolation of texture values. In general, a linear interpolation unit is used for interpolating between two given values, the start point and the end point, respectively.In addition to the start and end point there is a third parameter used in linear interpolation, called the weighting

factor which defines the distance of the resulting value to the two others. Geometrically speaking,the resultingvalue lieson a line betweenthe two others. The mathematicalequation is: valt4e,,it = valueszaP*+ weight * (vaZle,,d - value.yl,,l)

In the context of texture filtering, the two input valuesare texture colour valuesco& and toll and the weighting factor is calledthe fraction. So, the above mentionedequation becomes wt-mi~ = co10 +fiac(cofl

- coI0)

where frac is a 8 bit fraction value, and the two colour components co10and co11are treated as integer numbers,alsofrac is in the range 0 to 25.51 256. In contrast to some special applications (for

frac

colourl 8

colo”rO 8

139

3-D Graphics RasterProcessor

TAYRA

8

colourZ colour3

colo”rO colo”rl

frac0 fracl

sub 9 diff

:

A-

C

mult

Prod

V add

sum

8

,8

f Tlin-inter 2 8

bilin-colour

/

Linear Interpolator

Bilinear

Fig. 16. Linearand bilinearinterpolators used

and LinearCalc

in the BilinearCalc texture filter.

examplethe illumination equation,seediscussionon alpha blending below, where you want to have FF*O. FF=FF in order to avoid a decreasein brightness),here in texture filtering, you want to have FF*O .FF=FE. This is because,for example,suppose that co10= 0 and col, = FF, then if frac goesfrom O/ 256 to 2551256,co&,,,* takesall valuesbetween0 and 254.Note that in Fig. 13, for simplification,col,,,l, is drawn as a line but to be precise, it is a step function. It is stressedhere again that in texture filtering you don,t want COI~~~~ to be 255 (you want 254) for the input values co&= 0, coil= 255, and frac=255. You will have co&,,, to be 255 if you leave that interval (which is, in this context, the distancefrom one texel value to the next) andfrac becomes0 for the next interval. Then the start point is color 255 and therefore col,,lr as well, because ,fiac is zero. If you have a texture map consistingof black and white texelsin alternating order, the resultingcolour

Interpolator

Point sampling Linear filtering Bilinear filtering Conventional RGBA trilinear filtering Mode8x4 trilinear filtering

of the

valuewould go from 0 to 255and back to 0 and soon aslong asyou movefrom onetexel to the next (shown asthe straight line in Fig. 14). If co&,,,, was 255 for frac = 255, then you would have the dotted line. Let us now look at the architectureof the texture filter component(seeFig. 15). The basiccomponents are the BilinearCalc component, and the LinearCalc component. These components are constructed from bilinear interpolators and linear interpolators respectively(seeFig. 16). Becausethe complete trilinear interpolation cannot be done in one clock cycle the filter module was split up into thesecomponentsand a secondpipeline register set hasbeeninsertedbetweenthem. The BilinearCalc component is constructed from six bilinear interpolators (seeFig. 16). The LinearCalc component includesthe registersto store the result of the first phase of a conventional RGBA trilinear filtering operation, and the linear interpolators to calculate the trilinear value in the secondphase.

Table 3. Supported mode

units

filter-mode

(1:O)

00 01 10 11 11

mode8x4

0 0 0 0 1

140

M. White

4.42. Modes supported by the texture filtering module. The texture filtering modulesupportsseveral

et ul. O.FF*(FF-O)=FE,Ol.

Therefore, when col,, is 0, col,,,~,,,,will be FE. 01. Even with rounding it is not possible to get FF becausethe first binary digit right of the decimal point is zero. To avoid this, a correction factor is needed which expands the range of ,fiac from 0.. .255/256 to 0.. . 1. We note that this correction 4.5. Alpha blending module factor appearsin the linear interpolators for illumiAlthough it is possibleto implement an alpha nation (OpenGL modulate equations) and alpha blendingmodulein a similarstyle to that given in the blending logic. It can easily be seen that this texture filter module, i.e. with linear interpolators, correction factor hasto be256/255= 1+ 11255.Using two factors suggesta variation. First, the OpenGL this correction factor, the above mentioned multialpha blending operation suggeststhat a blending plication resultsin: factor setupstageand a blendingcalculationstagebe ,~ac*correction~,,,,,*(c~l~coil) used.The architectureshownin Fig. 17illustratesthe blendingcalculation stagefor onecolour component. It alsodepictsa correction factor which is explained Using a correction factor of 1+ l/255, we can next with respect to the texture filtering described above. The linear interpolation componentusedfor concatenateas many interpolation units as we want texture filtering interpolatesbetweentwo 8 bit colour to, and we will have no decreasein the final coiour value. But it is difficult to handle 1+ l/255, which is values co&, and toll according to the equation: in hex notation 1,01010101.It is much easier to cofresu~, = co10 +frac(col, - co&) handle 1+ l/256, which is in hex notation 1,01000000.The error that we make is just As shown for texture filtering, if the equation aboveis calculatedwith a certain assignment of input error = (1 + l/255) - (1 + l/256) values then CO~,,,,,,~ will result in a value lessthan it = l/255 - l/256 = l/65280 shouldbe. For example,let co10be0, toll be 255,and frac be 255/256. Then the multiplication will give If we multiply this approximatedcorrection factor you: 1+ l/256 by the fraction ,frac, we will get the corrected fraction frac’ to be: modesof filtering. All supportedmodesare listed in Table 3. To selecta specificmode, the 2 bit wide control bus filter-mode and the control line mode8x4 is used.

,frac’ =fkac * (1 + l/256) =.frac * 257/256

5. SOFTWARE

muadhg Lrl

‘i8 t

bkmiedsdmu

Fig.

17. Block

diagram

of the blending

module.

CONSlDERATfONS

TAYRA was the first model at the system and algorithmic or functional level in C++. This produced an environment in which rapid (compared to the RTL VHDL) debuggingof graphics algorithms could be effected. The software simulation was in fact a mix of high level software descriptions and low level software implementedstate machines. For example, state machine descriptions of scan conversion algorithms and software modeling of OpenGL style alpha blending. This software environment provided us with a reference model with which to compare our hardware results from the RTL VHDL simulations,seebelow. The software environmentincludeda detailedmodelof TAY RA’s registersetwhich would allow potential usersto start building device drivers. TAYRA macrocellsare currently being revised to mapinto an FPGA basedsystemarchitecture.In this new project we are implementing the reference board, device drivers and example applications. The device drivers being implementedinclude an OpenGL 1.1 MCD for Windows NT 4.0, and support for Microsoft Direct3-D when it becomes standard on Windows NT 4.0. BecauseOpenGL is

TAYRA 3-D Graphics Raster Processor

(a)

(b)

(4

(e)

(c) Fig. 18. TAYRA VHDL RTL simulation results. (a) Texture mapped plane tiling and exhibiting linear fog, (b) exponential fog, (c) exponential squared fog, (d) Gouraud shaded teapot, (e) video backdrop with two Gouraud shaded and antialiased triangles, (f) triangles from (e) above with antialiasing magnified.

142

M. White et ul.

more mature we have concentrated on providing device drivers and applications for OpenGL. This implieswe are targeting OpenGL style markets.

2. Centre for VLSI and Computer Graphics, The Sussex Texture Processor (STEP), http://www.susx.ac.ukkJ engg/research/vlsi/index.html. 3. Pearce, S. F., Bassett, M. C. and Lister, P. F., The Sussex

6. IWRDWARE

SIMULATION

BFSULTS

The imagesdepicted in Fig. 18 are the result of hardwaresimulationswhich took more than 3 h each on a 133MHz Pentium using Model Technology’s V-System VHDL simulator. The simulation was all RTL VHDL. All simulationswere performed on a 133 MHz Pentium using Model Technology’s VSystemVHDL simulator. Figure 18(e)is interesting in that it showsan imagewhich has been manipulatedwith Adobe Photoshopto producethe lensflare effect, this was then used as a video backdrop (insteadof the normal backgroundcolour), and then the two triangles were rendered with transparency and antialiasing.This illustratesthe useof TAYRA two channels-graphics pipelinechanneland PC1to buffers channel. 7. CONCLUSIONS

TAYRA hasbeenfully implementedin C code to provide algorithmic simulation verification, and in VHDL at the registertransfer level (RTL). Gate level simulationshave beenperformed on most modules after technology mapping to the ES2 ECDMOS 0.6 micron library, which indicatesa total gate equivalent count of around 270 K gatesand about 19 Kbits of on-chip DRAM are required. We are currently mapping TAYRA’s VHDL models to an FPGA based(with off-the shelf systemcomponents)system architecture to provide a working prototype. Further, we are developingTAYRA’s modulesinto a library of macrocellsfor future designreuse.

4. 5. 6. 7. 8.

9. IO.

11.

12. 13. 14.

Multimediu

Macrocell.

ASK

and MCM.

The

European Design and Test Conference 1996, User Forum, Paris, France, 1 l-14 March 1996. Intel Corporation, Accelerated Graphics Port Speczjkabon, Revision 0.9. 9 May 1996. Lister, P., PC1 Bus Toolkit VO. 1 Data Sheet. Centre for VLSI and Computer Graphics, University of Sussex. Schilling, A., Some practical aspects of rendering. In Advunces in Computer Gruphics Hurdwure V. SpringerVerlag, 1992, pp. 54-66. Juan, P.. A parallel algorithm for polygon rasterization. Computer Graphics, 1988, 22, 17-20. Fuchs, H., Poulton, J., Eyles, J. and Greer, T. Coarsegrain and fine-grain parallelism in the next generation Pixel-planes graphics system. In Parallel Processing for Computer Vision, ed. P. M. Dew, R. A. Earnshaw and T. R. Heywood. Addison-Wesley Publishing Company, 1989, pp. 241-253. Schilling, A. G., A new simple and efficient antialiasing with subpixel masks. Computer Graphics, 1991.25, 133141. Waller, M., White, M. and Lister, P., TAYRA Scan Conversion Module. Technical Report Sussex/Monograph/044, Centre for VLSI and Computer Graphics, University of Sussex, 12 October 1995. White, M., Three-dimensional computer graphics rendering. Doctor of Philosophy, T’he University of Sussex, Centre for VLSI and Computer Graphics, February 1994. Foley, J. D., Van Dam, A., Feiner, S. K. and Hughes, J. F., Computer Graphics Principles and Pructice. Addison Wesley, 1990. Wolberg, G., Digital Image Warping. IEEE Computer Society Press, 1990. Heckbert, P. and Moreton, H. P., interpolation for polygon texture mapping and shading. In State of the Art

in Computer

Graphics

Vizuulistztion

and Modeling,

ed. D. Rogers and R. Eamshaw, Springer Verlag, 1991. 15. Blinn, J. F., Hyperbolic Interpolation,. IEEE Computer Graphics

and Applications,

1992, 12, 89-94.

16. Binder, J. W., Wailer, M. D., White, M. and Lister, P. Acknowledgements-We would like to acknowledge the F., TAYRA architecture: perspective correction modEuropean Commission for providing part of the funding for ule. Monograph Project, Centre for VLSI and Comthis work under an Esprit initiative called the Monograph puter Graphics, University of Sussex. Project. We would also like to extend our thanks to past 17. Narayanaswami, C. and Luken, W., Efficient evaluaand present members of the Monograph consortium, and tion -of I/x for texture mapping. IBM Thomas J. independent consultants to the project, namely, Michael Watson Research Center. Yorktown Heights. NY, Malms (IBM Germany),Bengt-OlafSchneider, ChandraUSA, 25 August 1995. sehar Naraynaswami, Wolfgang Bultmann (IBM Thomas J. 18. Westmore, R. J., Real time texture synthesis in Watson Research Center), Andreas &hilling, Anders computer generated imagery. Doctor of Philosophy, Kugler, Wolfgang Strasser (Universitgt Tubingen), and The University of Sussex, Centre for VLSI and many other peopletoo numerous to mention,but neverComputer Graphics, November 1983. theless contributed in no small way to the Monograph 19. Omondi, A. R., Computer Arithmetic Systems AlgoProject. rithms, Architecture und Implementutions. Prentice Hall, 1994. 20. Feldman, J. M. and Retter, C. T., Compurcr ArchitccREFERENCES

1. Dunnett, G. J., White, M., Lister, P. F., Grimsdale, R. L. and Glemot, F., The IMAGE chip for high performance 3-D rendering. IEEE Computer Gruphics and Applications, 1992, 12,41-52.

ture:

A Designer’s

Text

Based

on a Generic

RISC

McGraw-Hill, 1994. 21. Neider, J., Davis, T. and Woo, M., OpenGL Programming Guide. The O&ial Guide to Learning OpenGL, Release 1, Addison Wesley, 1993.