A performance analysis of a microprogram-based coprocessor

A performance analysis of a microprogram-based coprocessor

Journal of Microcomputer Applications (1995) 18, 165-181 A performance analysis of a microprogram-based coprocessor A. A. Wardak, G. A . King and K. ...

685KB Sizes 11 Downloads 78 Views

Journal of Microcomputer Applications (1995) 18, 165-181

A performance analysis of a microprogram-based coprocessor A. A. Wardak, G. A . King and K. Walsh

Southampton Institute, Systems Engineering Division, East Park Terrace, Southampton S014 OYN, UK A technique for evaluating the performance of an Am29300 microprogrammable computer board in comparison to a host (an MC68020 single-board computer) is presented. The Am29300 microprogrammable computer board in this application is used as a coprocessor to a general-purpose MC68020 single-board computer. The Am29300 microprogrammable coprocessor board is used for speeding up the highlyrepetitive and time-consuming floating point processing functions involved in threedimensional image generation. The performance analysis of the Am29300 microprogrammable coprocessor is evaluated in terms of MC68020 processor clock cycles. A mathematical expression, containing floating-point operations is chosen as an application example. The analysis in this application has shown that the Am29300 microprogrammable coprocessor is eight times faster than the MC68020, six times faster than the MC68030, and twice as fast as the MC86040 processor.

1. Introduction Performance analysis is a major factor which should be considered in selecting hardware for a particular application. The current trend of research is towards achieving realtime execution in many applications. This requires hardware having higher performance and high-speed number-crunching capabilities. However, some of the highly-repetitive and computational-intensive processing routines are implemented in dedicated hardware or firmware for fast execution-time [1,2]. In the application described, the Am29300 microprogrammable coprocessor board (Am29300 M I C R O C O P B ) is used for speeding up those image generation functions which are highly-repetitive and time-consuming, such as the multiplication of two (4 × 4) matrices, back-face removal and clipping [3,4]. All of these processing functions require floating-point mathematics. Different methods and approaches, which mainly depend upon the nature of the hardware employed, have been used for the performance analysis of various processors [5-11]. However, a literature review has not revealed a direct reference to the performance analysis of a microprogram-based coprocessor with respect to a host. In this paper, a technique is described which evaluates the performance of the Am29300 MICROCOPB in terms of the host processor clock cycle. The technique represents a solution to the problem of evaluating the performance of a microprogrammable coprocessor with respect to a host. In order to fully understand the presented technique, the host (the MC68020 generalpurpose single-board computer) and the coprocessor (the Am29300 MICROCOPB), are briefly described in the following sections. 165 0745-7138/95/020165+ 17 $08.00/0

© 1995 Academic Press Limited

166

A . A . W a r d a k et al. ADDP,~SD~ODE and IHI'~J~T LOGIC

A/8

)

)

S~ I,

D/B 16 i~

15

DIB 16 A/8 -- ~bess ~s

~23

D/g : Data 8us ~8

,~IAI. 4

POXTS

"~

~

D

H H E ¢ 0X

Figure 1. MC68020 single-board computer.

A perforniance analysis of a microprogram-based coprocessor

167

Host (tKZ8828Single-Board Cc~pder) De-D31

IIIII I I

32

I.~"

t 4 11

t2

¢m2933116-bit

,,=

L

t

32

t~'~Y334Register-File ~-YA31 ~/IIS-YB31

Microprogna Hemortj

Control Signals

3Z-BR FPIJ

t'32

32-Bit t~

Figure 2. Block-diagramof the Am29300 MICROCOPB.

1.1 The MC68020 single-board computer The Motorola 68020 32-bit microprocessor and its compatible floating-point math coprocessor are used to construct a single-board computer [12,13]. This computer (see Fig. 1), which is considered as a single processing element, is used as a host to the Am29300 MICROCOPB. The main features incorporated in the MC68020 single-board computer are as follows:

168 A.A. Wardak et aL 32

32

S-BUS - -

R-BUS

PEG S

14 NUX

iq9

¢LK

32

"~ ~

NUX

R£G.'-R

I3

Cl,X STATUSI~G Cd~I~TOR

I8 II

FL~TING-~INT AIAJ

STITUSFLAG[

I2

REGISTER

32

FROJ/~

32

~8

:I I

UI~ER~OI4

m

1~-1~1

II~ALID

OE ZERO

Y-BUS (9-32)

Figure 3. Am29325FPU functional block-diagram.

(a) (b) (c) (d)

MC68881 floating-point coprocessor, on-board two SRAM chips, up to 32 kbyte each, on-board two EPROM chips, up to 32 kbyte each, two RS232C serial I/O ports for communication between the board, a terminal, and a development station through a DUART interface, (e) VMEbus interface through a 16 Mbyte direct addressing range, (f) a mezzanine 96-way connector for local bus access to a mezzanine memory expansion card if required. Further information about the MC68020 single-board computer may be obtained [14]. 1.2 The Am29300 microprogrammable coprocessor board

The Am29300 MICROCOPB is designed using the 32-bit high-performance microprogrammable processors from Advanced Micro Devices[2]. The Am29300 MICROCOPB (see Fig. 2) consists of the Am29331 sequencer, the Am29325 floatingpoint processor, the Am29332 integer ALU, the Am29334 register files, the host interfacing circuitry, and the relevant hardware for microcodes and pipeline

A perforn~ance analysis of a microprogram-based coprocessor

169

FPU Fro. YAO-YAI5of ~g-File Nod Fro.

YAO-YAI5of Reg-File Ho.2k

Frox VBO-YBI5ot' Beg-File ~.1 Fro. VBi-YBI5 ot' Beg-File ~.2 )

Fro. CLOCX r+ Fro. )'PU-CO~IIOL1~II:I,1)- ~

25&1:lrPU-CO~ItOL ~IID Sn.r~f

RS-R31

SO-S3I

k ¢I,X

D~ I~9-FTI

FO-F15: To DBS-DB15of Beg-File No.l FO-F31 .--+ FI6-F31: To DBO-DBI5of Peg-File Ho.2 INE~C[ --k To Test Input T6 (Sequencer) IHUALIi) ~t, To Test Input T5 (Sequencer)

~H

To Test Input T4 (Sequencer)

--1, To Test Input T3 (Sequencer)

74}'32

_ ~

Fro. II~O-l~I. FIELD

) O[

ON~S Sl~/~ 10"-14 PROJ/I1FF

UI~[Re'LOM-~

ZERO

To Test Input T2 (Sequencer)

To Test Input TI (Sequencer)

Figure 4. Microcodeimplementation of the Am29325 FPU for this application. registers [2-4]. Four chips of the Am29300 family building blocks, which form the core of the Am29300 MICROCOPB, are briefly described in the following paragraphs. The Am29325 is a high-speed, single-precision, floating-point processor. It has a three-bus, 32-bit architecture, where the I/O mode is user-selectable. In this application, it is configured as 32-bit two input bus mode with all functions and internal registers under microcode control. Its detailed functional block-diagram and microcode implementation for this application are shown in Figs 3 and 4, respectively. The Am29332 is a 32-bit wide ALU, which supports 1-, 2-, 3-, and 4-byte data for arithmetic and logic operations. It also supports a 2 bit-at-a-time modified Booth's algorithm for high-speed multiplications. Figures 5 and 6 represent the functional block-diagram and the microcode implementation of the Am29332 ALU, respectively. The Am29331 is a 16-bit high-speed sequencer, which is used to control the execution sequence of the microinstructions. The detailed functional block-diagram of the Am29331 sequencer is presented in Fig. 7 and the microcode control-signals implemented for this application are shown in Fig. 8. The Am29334 is a high-speed register file, which provides high-speed storage to other members of the Am29300 family. In this application, two Am29334 register files are connected in parallel to provide 64 32-bit RAM locations. Figures 9 and 10 represent the functional block-diagram and the microcode implementation of the Am29334 register file, respectively.

170 A.A. Wardak et aL A 16

D

IW

z

4,1

33 x 16I STACK I,

MIX

, ii

I"

H2

¢¢, OITii

STACKNUXI

*tt*

TB-TXI

FUll.

]

1 I~.RET.

TO-T7

Al,g~T

ADDII.I~:$.

SO-S3

MIX

~GISTn I /

I

EQUAL

1p

I0-15

MIIX

L

Figure 5. Am29331.sequencerfunctional block-diagram.

The MC68020 host processor drives the Am29300 MICROCOPB, supplies the required data, and then selects one of the microsubroutines through a control register. A horizontal microinstruction format (see Fig. 11) where each bit controls one control-line is implemented [4]. The microcode, which is placed in nine EPROMs, controls the hardware and executes the mathematical and logical operations. The initialization and synchronization of the host of the coprocessor are explained by the flow-chart shown in Fig. 12. For detailed information on the hardware design and software implementation of the Am29300 MICROCOPB, the reader is referred to references [3,4,21].

2. Performance analysis This section describes the method, which is employed for evaluating the performance analysis of the Am29300 MICROCOPB in comparison to the host (the 68020 singleboard computer). In this method, which is based on the interrupt mechanism, the performance analysis of the Am29300 MICROCOPB is evaluated in terms of the MC68020 processor clock

A performance analysis of a microprogram-based coprocessor

171

Control~ister

'I

FC

llaDI

M,O

I~!1

Ni,l

IiOLD

M,2

SI/IUE

M,3

0D

I] I] I] Q3

CP ) RST

N-l)2 .~

XIGi

D3-D?

68828:)e-l)4

¢P 4

¢0HTDI

~

68828 RESET

m

RESET

) SO-S3

Fro. Seq-?est Field Fro. Seq-Inst Field - k ~

I8-]5

Fro. Seq-l)ata Field LOH Fro. A~:9325 Status

.~ DO-D11

Fro. A.29332 Status

.~ TT-TII

) D12-1)15 ) TI-T6O

YO-YIO

) ~o DROlis

A-FUI,I,

~ H.C = Hot Connected .~ H.¢

MI,O-HI,3 ~,0-H2,3 ~,0-N3,3

Figure 6. Microprogrammed-implementationof the Am29331 sequencerfor this application.

cycles. The Am29300 MICROCOPB signals the completion of the task by sending a higher priority interrupt request to the MC68020 processor. The MC68020 processor then reads the final computed data from the register-file for further processing. The MC68020 host processor is kept in a busy loop by executing the following MC68020 assembly language instruction, while the Am29300 MICROCOPB is performing the microprogramming task. LOOP:

DBRA

COUNTER,LOOP

It should be mentioned that the MC68020 processor is kept in the busy loop only during the performance analysis and not during the normal operation. The above MC68020 assembly language instruction will be referred to as the instruction decrement and branch (DBRANCI-I) in the text which follows. If a label is put at the beginning of the interrupt service routine (ISR), then, from the map file, 'fllename.map', the address of this label can be obtained. The program execution can then be temporarily stopped at this label using a breakpoint, in order to find the total number of executions of the instruction, DBRANCI-I, by the MC68020 processor. The file, 'l]lename.map', is generated by the Motorola SYS 1131 System, [15] after compiling and assembling the relevant modules of the main program, 'filename.e'. When the Am29300 MICROCOPB interrupts the MC68020 processor to signal the end of the assigned task, the program execution is temporarily suspended by using a breakpoint at the beginning of the interrupt service routine. At that instant, the

172 A.A. Wardak et aL DB

N 18

HEIIL

I~L ACCESS J~

~C

Ht~C

fax 18 6

1

AIIB

1 6

LEA

I, E

L~T~4

LATCH E 4

. LEB

VA Figure 7. Functionalblock-diagramof the Am29334register-file.

COUNTER content is displayed on the screen, from which one can find out the exact number of times the instruction DBRA_NCH, has been executed by the MC68020 processor before being interrupted by the Am29300 MICROCOPB. In other words, the difference between the initial and the displayed contents of the COUNTER is the number of times, the instruction DBRANCH has been executed by the MC68020 host processor. The mathematical expression, which has been used for detecting and removing the back-faces involved in three-dimensional scenes is chosen for evaluating the performance factor. The expression employed as shown below is comprised of nine floating-point multiplications, two floating-point additions, and three floating-point subtractions. BFRTF = Ax(ByCz- BzCy) + Ay(BzCx - BxCz) + Az(BxCy- ByCx) During the performance analysis, it has been found that the time taken by the

A performance analysis of a microprogram-based coprocessor

Fro. 68629~TA

1r16-1"31

FPU lr6-)'31

II

n-m

I

N-DI5l ~tO-l~15

173

l~i-l~ll5 DDO-DBI5

DSO-DSI5

H / A ~

in

OEA

m

eLK

IZ,A

LEA

Ora

m

WL 4--I~H--)

HE

~L /

Hr~H

/

DAI?

HI?

Di116 4----I,0H---)

DBI6

DDI? 4--I,0)i---1

DBI?

/

/we-vtu5 YBO-YB).5 (116-P,.1.51

~16

I'N (SO-SlSI

WC

HEAL W,~H

AI~0-APJ5 IIIIB0-AI~

YAO-YAI5 YDg-YBI5 I'N (ILL6-1131I

1 (S16-S311

Figure 8. Microcode implementation of the Am29334 register-file.

Am29300 MICROCOPB to execute the above mathematical expression is the same as the time taken by the MC68020 processor to execute the instruction DBRANCI-I 10 times. From the execution-time tables, given in the MC68020 and MC68881 user manuals [12, 13], one can find the total number of clock cycles required for the execution of the instruction, DBRANCH by the MC68020 processor. In fact, the MC68020 processor requires nine clock cycles at 12 MHz to execute the instruction DBRANCI-I. Because it has been executed 10 times by the MC68020 processor during the entire process, the total number of clock cycles is equal to 90 (10 x9=90). Therefore, the Am29300 MICROCOPB takes 90 clock cycles of the MC68020 processor to execute the microcode for computing the mathematical expression BFRTF. However, the MC68020 processor multiplies two floating point numbers in 80 clock cycles using the MC68881 coprocessor at 12 MHz [12,13]. Therefore, the MC68020 processor itself requires 1080 clock cycles to compute the same mathematical expression; BFRTF (720 for nine multiplications, 216 for three subtractions, and 144 for two additions of floating-point numbers). As a result, there is a time factor of 12 (1080/90= 12) involved in their comparative performance. In other words, the Am29300 MICROCOPB has been shown to be 12 times faster than the MC68020 processor.

174 A.A. Wardak et al. BOIL~

)

OE-Y

)

SI,AQ[

)

II~T211~ION

DECODE

4

9

18-18

1~0-1~3 PBO-PB3

4

I(01~

~DA31

':1 _~ 4

I~TNIflION 32 DECOD[

DN-DB31

_1

NClN

5 • I

m ~X

AI~ and

ID4CODDI

I ¢P

(---

•"lt STIITIIS~GISTtR] PREO/P POSTO/P

I~STF3,/SI~UE C~II~I~TOII

C,Z,N,Q,L

,.I"1 Q.iIEGISTDil I~AITY

GIE~TI~

YO-Y31 P/O-P/3

Figure 9. Functionalblock-diagramof the Am29332integerALU.

It should be mentioned that the performance figure, 12, has been calculated, while the cache of the MC68020 processor was disabled. The reason for disabling the cache during the performance analysis is that the figures given in the execution-time tables [12, 13] are valid only when the cache is disabled. However, it has been reported that the performance of the MC68020 processor improves by a factor of 1.38, if the cache is enabled [10]. Therefore, the performance factor between the Am29300 MICROCOPB and the MC68020 processor when cache is enabled, should be 8.69 (12/1.38 = 8.69). It could be argued that using the above approach (interrupt mechanism), will add some timing overhead to the overall execution-time of the system software in use, which is undesirable in many cases, especially in real-time applications. However, the temporary

A performance analysis of a microprogram-based coprocessor

CP

i~0-1~31

From ~g-Files

SIAUE

DBS-DB31

Frot ~g-Files

I,ICin

Y0-Y31

NLII~

C

N0-PA3,PBO-PB3

H

I~I-P~2,PBI-PB2

Q

BOP,R04J

L

1o ~g-Files 1o SEQ (ST8) .~

F~mFPO-I~T. FIELD F ~ FI~J-COHTROLFIELD ~ l . ~

PO-~

HOLD

es

MO-H4

I0-I8

175

To SEll (SLID Io S~ (SI9)

.~

~

Io S~ (SilO) Io S~ (SI?I Fro,, FI~-COCffROLFIEI~ FreM FI~-I~T. FIELD

Figure 10. Microcode implementation of the Am29332 integer ALU.

overhead (the relevant software portions) can be totally removed from the assembly language and microcode, once the exact performance figure is obtained. The COUNTER can then be loaded with the exact performance figure, to provide the required delay for the Am29300 MICROCOPB to perform the microprogramming task. Similarly, since the Am29300 MICROCOPB has been used as a coprocessor and the necessary data has to be provided by the host; some timing overhead will be involved in transferring the required data into the Am29300 MICROCOPB register-file and then reading back the computed data. Hence, the adjusted performance figure when the effect of the data transfer is considered has been found to be 8. The timing overhead due to the data transfer can be reduced to a minimum value, by increasing the capacity of the register-file, in which case the data will be transferred only once at the beginning of the task, and all the intermediate results will then be stored in the register-file for further processing. The presented result of the above technique has been verified using two test programs ('MC68020.s', and 'Am29300.s') and an oscilloscope, which is a relatively approximate method compared with the first one. In the first program, a bit is set in the control register when the MC68020 processor starts executing the mathematical expression (BFRTF) and it is cleared at the end of the calculation. This process has been continuously executed in a loop, and an oscilloscope is used to measure the time interval for which the signal remains HIGH (155 ~s), which represents the time taken by the MC68020 processor in computing the given expression when cache is enabled. In the second program, the bit is set and cleared the same way when the Am29300 MICROCOPB starts executing the microcode for the same mathematical expression. The time interval for which the signal remains HIGH (18 ~s) represents the time taken by the Am29300 MICROCOPB to execute the given expression; while the time interval for which the signal remains LOW (2 ~s) represents the time added due to

176

A . A . Wardak et al. NiceoinstmctionFormat

~dbess

Sequer~rField FFLVINY-ALUField Ilegi~ter'-FileField 4

M I I I

24- BITS .."."

24- BITS

"','..

24- BITS

)

IqlCll01~TIIUCII0H 14o.8 141¢1101HSTIIU~ I0H I~.1

4I I I

I I I

7FF 14

I I I

I I I

I I I

141ClI01HSTIIIICIIOH 140.7Wh

~I

Sequencer-Field

2_

_J_

±

111+1111 111+1111 1+1111+1

II,,l,,',l,+l,,l,+l,,II+IIol Is+Is'Is,l-lml+,.,l*,l.l l,'+'l~+l+fl-l.l*=IB,.l~l Floating-Point/ Integer~ Processor-Field

L

2_

I~IIOHHo.4 IPlqI-COHIIIOL

I

I I

l?l~q ~.5 iILU-POSITIOH

±

I I

~BON~.6 IPPII-II6TRU~I 0H

I

111++1 +1 ,,!,,!,2.,!,!.!.,!,2. ,,+,,,,..,,.,,,,.,+,_,,,,,,. + + 111++1 ,+,+,..+,.,,+.,,,,,.,,. Ilegisier-File Field

_J_

l

~1~ 14o,7 KGISIEII-,IIP..4D

1+111+

±

_t_

~1~ 14o,8 l I

AEGIST~-,READ

++1+1+

I~11~ Ho.9

l [

ItgGlSTI':II-BMIlTE

I

1111111

Immt,,ml,,,.l,ml,,~l,,+,l,~l

Figure 11. The microinstruction format.

the data communication (2/(2+ 18)=10%). The actual time taken by the Am29300 MICROCOPB will then be the sum of the two (20#s). Hence, the performance factor in this case will be 7"75 (155/20 = 7"75).

A p e r f o r m a n c e analysis of a m i c r o p r o g r a m - b a s e d E ]le~t the ~

HIeROCOPBI

Conditional ~anchin 9 flag (CBF) is cleared (i.e CBF := LOW)

The Nicmcnde-Bit 14/A is set to HI~4 During this stage the Host writes the required data into the shared mesoeg i,e. the ~ Register-Files. Base addeess is set for 16 fro,get addresses (i.e ~q := L~h)

Conditional branching flag (CBF)will be set to Hlg4 through control Begisto: )

coprocessor

J~ : Either Host ( ~ ) w tb29380 MICROCOPB has ~____~__to the shared Begistor Files C~ : A conditional leanchin9 tlq, which is ore ot' the extornal tost-inputo to the Sequm~ (18) that is set or Hlg4 through the Control Register = Base address t'em which 16 unconditional ~ps can be achieved (i.e ~ = L~ = 16h, 11~, 1~, 13h, 14h, ..... 11~ )

':=" implies set to

4

Pick up the target address t'mm the Base address region and jump to the beginning of the aicmsubmutine and start executing the micmsub~tise ~'i~J this stoge the Hict~code-Bit A is set to LOWto stop the Host b~ MICI~OFBinterrupts the N(Yo8828at the end d the assigned task and Hic~x~de-Bit IV ~ := HIgl

Figure 12. Synchronization and initialization of the host and Am29300 MICROCOPB.

177

178

A . A . Wardak et aL

For more clarity, the full listings of the implemented software are presented in listing 1. It has been reported that the latest member of the Motorola MC68000 family, the MC68040 processor, is four times faster than the MC68020 [5] and three times faster than the MC68030 processor [9]. As a result, the execution-time relationship between the MC68030, MC68040, and the Am29300 MICROCOPB can be established. This implies that based on the performance analysis, the Am29300 MICROCOPB should be six times faster than the MC68030, and twice as fast as the MC68040 processor. Similarly, the performance analysis for the MC68020 processor in comparison with some other 32-bit processors has been reported[10,20]. The performance has been computed for various cases, and their execution-times have been calculated using a test system in which everything but the processor itself and its related parts remain the same throughout the tests. As a result, the MC68020 processor has been shown to be 2.98 times faster than the Intel 80286, 1.29 times faster than the Intel 80346, 4.44 times faster than the NS32032, and 1.74 times faster than the AT&T32100 processor. Therefore, as a result of using the given analyses, a relation can be established between the Am29300 MICROCOPB and the Intel 80386 and the AT&T32100 processors.

3. Conclusion The technique implemented for the performance analysis of the Am29300 MICROCOPB with respect to the MC68020 processor has been comprehensively described. The same technique can be adopted for evaluating the performance factor of any microprogrammable coprocessor of this nature with respect to any host processor. Another relatively approximate method has been used to verify the result of the technique employed. The technique has been shown to be very accurate. This is because the performances of both (the host and the coprocessor) are analysed in terms of the clock cycles of the same processor, which implies that if there is some inaccuracy involved, its offset will be equally involved in both the host and the coprocessor. In applications where floating-point mathematics are involved, the Am29300 MICROCOPB has been shown to be eight times faster than the MC68020, six times faster than the MC68030, and twice as fast as the MC68040 processor.

References 1. T. Colman and S. Powers Jr 1984. XTAR graphics microprocessors. Byte, November, 179-I 86. 2. Advanced Micro Devices 1988. Am29C300/Am29300 Data Book, Sunnyvale, CA. 3. A.A. Wardak 1991. A three-dimensional image generation system using a microprogrammable computer, PhD Thesis, Department of Electrical Engineering, University of Bradford, UK, Chapter 5. 4. A. A. Wardak, G. A. King and R, L. Rhodes Jr 1994. A microprogram-based 3-D image generation system using the Am29300 family. Microprocessing and Microprogramming, The Euromicro Journal, 40, 65-75. 5. R. W. Edenfield et al. 1990. The 68040 Processor Part 1, Design and Implementation. IEEE MICRO, February, 66-79. 6. J. Fulcher and R. Hatton Jr 1991. Benchmarking fourth, Microprocessors and Microsystems, 15, 42--48.

A pedorman'ce analysis of a microprogram-based coprocessor

179

7. T. L. Johnson Jr 1986. A comparison of MC68000 family processors. Byte, September, 205-218. 8. D. MacGregor and J. Rubinstein Jr 1985. A performance analysis of MC68020-based systen. IEEE MICRO, December, 50-69. 9. T. Thompson Jr 1990. 040 Motorola's 68040 microprocessor, Byte, February, 96A-96C. 10. T. C. Cooper et al. 1986. A Benchmark Comparison of 32-bit Microprocessors, IEEE MICRO, 6, 53-58. 11. W. E. Ferguson Jr 1991. Selecting math coprocessor, IEEE Spectrum, July, 38--41. 12. MC68020 32-Bit Microprocessor User's Manual 1985. 2nd Edition, Englewood Cliffs, N J, USA: Prentice-Hall Inc. 13. MC68881 Floating-Point Coprocessor User's Manual, 1st Edition, 1985, Motorola Inc. 14. P. Shaun 1987. Undergraduate final year project. Design of MC68020 single-board computer, electrical & electronics engineering. University of Bradford. 15. Motorola SYSl131 System V/68 Technical Documentation 1985. Motorola Microsystems, 1-5. 16. MC68020 32-Bit Microprocessor User's Manual, 2nd Edition, 1985, Prentice-Hall Inc.; Englewood Cliffs, N J, USA. 17. MC68881 Floating-Point Coprocessor User's Manual, 1st Edition, 1985, Motorola Inc. 18. MC68030 Enhanced 32-Bit Microprocessor User's Manual, 1987. Motorola Inc. 19. MC68040 32-Bit Microprocessor User's Manual, 1989, Motorola Inc. 20. C. H. Pappas and W. H. Murray 1988.80386 Microprocessor Handbook, Berkely, California: Osborne McGraw-Hill. 21. A. A. Wardak, G. A. King and R. Backhouse Jr 1994. Interfacing high-level and assembly language and microcodes in 3-D image generation, Microprocessors and Microsystems, 18, 205-213.

180

A . A . Wardak et al.

Appendix The following is the main program, 'filename.c', which calls the two assembly language subroutines: 'Am29300()', and 'MC68020()'.

#include



#include typedef

struct

vert rec { float x; float y; float z;

} VRTX; extez~

Am29300 () ,MC68020() ;

main ()

{ VRTZ Int char

A,B,C; i,j; option[10];

A.x=l.0; B.x=4.0; C.X=7.0;

A.y=2.0; B.y=5.0; C.y=8.0;

A.z=3.0; B.z=6.0; C.z=9.0;

printf ("Enter your option?\n") ; scanf ("%s", option) ; if(strcmp("AM29300",option)==O)

( for(i=0;

; i++)

/*

forming an infinite-loop

*/

Am29300(&A,&B,&C); if (strcmp ("MC68020", option) ==0)

{ for(j=0; ; j++) MC68020 (&C, &B, &C) ;

/*

forming an infinite-loop

}

Listings I

The software for the 2 test programs

*/

A performance analysis of a microprogram-based coprocessor 181

MC68020:

Am29300:

wait:

Listings

text global global set set set

2 MC68020 Am29300 x ,Ox00 y ,0x04 z ,0x08

mov.l mov.l mov.1 mov. 1 mov.b fmov.s fsglmul.s fmov.s fsglmul.s fsub.x fsglmul.s fmov.s fsglmul.s fmov.s fsglmul.s fsub.x fsglmul.s fmov.s fsglmul.s fmov.s fsglmul.s fsub.x fsglmul.s fadd.x fadd.x mov.b rts

0x04 (%a7) ,%al 0x08 (%a7) ,%a2 Ox0c(%a7) ,%a3 &0xFFFFOI00,%a4

mov. 1 mov. 1 mov. 1 mov. 1 mov. w mov. 1 mov. 1 mov. 1 mov. 1 mov. 1 mov. 1 mov. 1 mov. 1 mov. 1 mov. 1 mov. b dbra fmov. s mov. b rts

0x04(%a7),%al 0x08(%a7),%a2 0xOo(%a7),%a3 &0xFFFF0000,%a4 &0x0032,%d3 (%al)+,(%a4)+ (%al)+,(%a4)+ (%al)+,(%a4)+ (%a2)+,(%a4)+ (%a2)+,(%a4)+ (%a2)+,(%a4)+ (%a3)+,(%a4)+ (%a3)+,(%a4)+ (%a3)+,(%a4)+ &0xFFFF0100,%a5 &0xl0,0x03(%a5) %d3,wait (%a4),%fpl &0x00,0x03(%a5)

I Continues

&OxOS,(%a4)) z(%a3),%fpO (y,%a2),%fpO (y,%a3),%fpl (z,%a2),%fpl %fpl,%fp0 (x,%al),%fp0 x(%a3),%fpl (z,%a2),%fpl z(%a3),%fp2 (x,%a2),%fp2 %fp2,%fpl (y,%al),%fpl y(%a3),%fp2 (x,%a2),%fp2 (x,%a3),%fp3 (y,%a2,%fp3 %fp3,%fp2 (z,%al),%fp2 %fp2,%fpl %fpl,%fp0 &0x00,(%a4)

....

# al=address of vertex A # a2=address of vertex B #a3=address of vertex C #a4=address of Cont Reg # set the Seq. bit M0,3 # fp0=Cz # fp0=Cz*By # fpl=Cy # fpl=Cy*Bz # fpO=By*Cz-Bz*Cy # fp0=Ax[By*Cz-Bz*Cy] # fpl=Cx # fpl=Cx*Bz # fp2=Cz # fp2=Bx*Cz # fpl=Bz*Cx-Bx*Cz # fpl=Ay[Bz*Cx-Bx*Cz] # fp2=Cy # fp2=Bx*Cy # fp3=Cx # fp3=Cx*By # fp2=Bx*Cy-By*Cx # fp2=Az[Bx*Cy-By*Cx] # fpl=fp2+fpl # fp0=fp0+fpl+fp2 # clear the bit M0,3

# al=address of vertex A # a2=address of vertex B # a3=address of vertex C # a4=AMD memory address # for delay # ax=FFFFO000 # ay=FFFF0004 # az=FFFF0008 # bx=FFFFO00C # by=FFFF0010 # bz=FFFF0014 # cx=FFFF0018 # cy=FFFF001C # oz=FFFF0020 # a5=control reg. # the process started # fpl=BFRTF # clear control reg

The "MC68020()",

and "Am29300()"