Performance of multiple-bus interconnections for multiprocessors

Performance of multiple-bus interconnections for multiprocessors

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 8,267-273 ( 1990) RESEARCH NOTES Performance of Multiple-Bus Interconnections for Multiprocesso...

780KB Sizes 3 Downloads 112 Views

JOURNAL

OF PARALLEL

AND DISTRIBUTED

COMPUTING

8,267-273

( 1990)

RESEARCH NOTES Performance

of Multiple-Bus

Interconnections

for Multiprocessors

*

QING YANG Department

ofElectrical

Engineering,

The University

ofRhode

Island,

Kingston,

Rhode

Island

02881

AND LAXMIN.BHUYAN' The Centerfor

Advanced

Computer

Studies,

OfSouthwestern Louisiana, Lafayette, Louisiana 70504

University

Bus structures, in general, are easily understood and therefore preferred by manufactures for implementation. Multiple-bus systems can be viewed as an incremental expansion of single-bus architectures that can provide high bandwidth and reliability. This research note provides a brief overview of various analytical techniques suitable for performance evaluation of the multiplebus multiprocessors. Some results and comparisons based on these techniques are also provided. o 1990 Academic POW, IIIC. 1. INTRODUCTION A computer system designer can build a multiprocessor system using several inexpensive microprocessors to match or exceed the performance of a high-cost uniprocessor system. In addition, a multiprocessor offers other advantages such as good modularity, reliability, and expandability.

However, the most important and crucial issue in a multiprocessor design is the design of its interconnection network (IN). The IN must offer a high-communication bandwidth between the processors and the memories while keeping the cost low. It should also have alternate paths in the network to allow the communication to proceed in case of a fault in the network. A single-bus interconnection is the simplest and least costly among all the INS. At the same time it has the lowest bandwidth and almost no fault-tolerance. A crossbar connection [I], on the other hand, has the maximum bandwidth at a prohibitively high cost. Fault-tolerance is possible if one allows graceful degradation [ 21. However, the 0( N*) * This Research was supported by NSF Grant DMC-58 1304 I and a grant from Louisiana Board of Regents and was carried out when Yang and Bhuyan were with the University of Southwestern Louisiana. ’ Present address: Department of Computer Science, Texas A&M University, College Station, TX 77843-3 112.

cost for an N* N crossbar does not allow a big system to be designed around this IN. The cost and performance of a multistage interconnection network (MIN) lie between the above two extremes [ 31. Like crossbar graceful degradation is possible and higher reliability than a shared bus can be obtained [ 41. Also, it is possible to design MINs that are fault-tolerant to start with [ 5 1. As a result, a lot of industrial and research multiprocessor projects are based on these. MINs. Although the MINs are very suitable for large scale systems, they are still complex to design. There is an 0( log N) delay in each communication. For a low to medium scale multiprocessor, the manufactures would like to have an IN that has the simplicity of a shared-bus but allows more bandwidth and reliability. One type of IN, called multiple-bus, has drawn a considerable interest of computer scientists and engineers recently [ 2,6- 16 1. The structure, shown in Fig. 1, consists of a few buses with each bus connected to all the processors and memory modules. The hardware cost and communication capacity of the system depend on the number of buses, B. The cost of interconnection is 0( BN) for an N*N multiprocessor. While B = 1 is clearly unacceptable, having a large number of buses will have a cost similar to the crossbar network. The multiple-bus IN is highly fault-tolerant. It is apparent from Fig. 1 that there are B alternate paths between a processor and a memory module that can be used in case of a fault. This IN retains the evolutionary features of a single-bus system, yet provides increased communication capacity and reliability. Choosing the right number of buses for an application environment is a critical design issue that needs a thorough knowledge of performance evaluation techniques. This paper describes analytical techniques that have been developed by us [ 61 and by others for the performance evaluation of multiple-bus multiprocessor

267

0743-73 15190 Copyright All

0 1990 by Academic

$3.00 Press, Inc.

rights of reproduction in any form reserved.

268

YANG

AND

BHUYAN

2. A uniform reference model is assumed which means that a request from a processor addresses any one of A4 memory modules equally likely. This is a reasonable assumption for an interleaved memory system. 3. Wherever a queue is constructed in a model, it is assumed that the buffer has infinite capacity. This assumption ensures that any “customer” arriving at the queue will not be lost.

Bl B2 Eib

2. CIRCUIT-SWITCHED FIG.

1.

A multiple-bus

multiprocessor

MULTIPLE-BUS

SYSTEM

system.

systems. Analytical techniques provide a quick evaluation of a system as opposed to simulations. In evaluation of a multiple-bus system, several operational characteristics of buses must be considered. These depend on timing philosophy, arbitration scheme, and switching methodology. The timing of the bus operations can be synchronous or asynchronous. Synchronous schemes are characterized by the existence of a global clock that broadcasts clock signals to all devices so that they work in a lock step fashion. Asynchronous operations, on the other hand, involve no global clock. The switching methodology can be either circuit or packet switched. In circuit switching, once a device is granted to use a bus it occupies the bus for the entire duration of data transfer. In packet switching, the data are broken into small packets and a bus is held only during the transfer of a packet. The access to various buses in the multiple-bus system can be controlled by either a centralized or a decentralized arbiter circuit. Design of a centralized arbiter is simple but it may create a system bottleneck. A decentralized arbiter, though difficult to design, is better from performance and reliability viewpoints. A classification of multiprocessor INS based on these operational characteristics is given in [ 3 1. The analytical technique applicable to a class of multiple-bus multiprocessor depends on to which one of the eight classifications the system belongs. This research note presents a summary of analytical techniques of various classes of multiple-bus multiprocessors and is organized as follows. Sections 2 and 3 deal with circuit- and packet-switched systems, respectively. Each section presents results for four classifications. Section 4 presents some results, discussions, and comparisons. Finally, Section 5 concludes the paper. The performance models presented in this paper are based on the following assumptions. Additional assumptions required for each analysis are mentioned later at appropriate places. 1. The system consists of N processors, M memory modules, and B system buses. All processors are considered independent and identical, and so are the memory modules and buses.

In a circuit-switched system, a processor holds the bus until the memory operation is finished. Extensive studies have been reported in the literature for circuit-switched multiple-bus multiprocessors. Performance models of these systems follow. 2.1. Synchronous Circuit-Switched

System

2.1.1. Centralized Control. In this situation the entire system operates synchronously based on a memory cycle; i.e., memory requests, issued by processors, begin and end simultaneously. There is a central controller that scans these requests and selects memories for service. Then the controller allocates buses for the processor-memory connections. Each processor generates a memory request at the beginning of a cycle with some probability p. The performance of this system is defined in terms of bandwidth, B W, which is defined as the mean number of memories remaining busy per cycle. A number of researchers [7-lo] have developed analytical models for this circuit-switched synchronous centralized multiple-bus system. In all these papers, simple and fairly accurate analytical models have been developed with an independence assumption, which states that the request generated by a processor at the current cycle is independent of its request in the previous cycle. Applying some simple combinatorial analysis, the following expression for the memory bandwidth of an N*M* B system can be obtained [ 7 1,

(X--)X! 5 x=8+1

; S(Y, x> ( ) 2 (1) MY

where ty = min( y, M) and S( y, x) is the Stirling number of the second kind. S( M, x) is defined as x!S(M,x)

= 5 (-1)’ i=O

: 0

(x-

iY.

(2)

MULTIPLE-BUS

It is stated [ 11, 12 ] and we have verified that this approximate analysis gives results similar to [lo] and is better than the analysis developed in [ 9, 121. The independence assumption, however, is unrealistic because a rejected request will definitely be resubmitted in the next cycle. In order to obtain more accurate results, the “adjusted request rate” technique [ 8,6] can be used. Let p’ be the adjusted request rate which takes into account the resubmission of rejected requests. Then +[

1 +Z(i-

l)[‘.

Now we can substitute the request rate p in Eq. ( 1) by p’ of Eq. (3 ) . The resulting equation has one unknown variable, SW, and can be solved by using any standard numerical method. The results are very close to those of simulation for a wide range of parameters [ 6 1. 2.1.2. Decentralized Control. In case of a circuitswitched, synchronous, and decentralized system, a bus may be allocated to a request that addresses a busy memory module. As a result the bus bandwidth is wasted and the overall system performance is degraded. In order to study this effect quantitatively, we develop the following simple model. Let y be the number of memory requests generated per cycle. The number of requests that can reach memory modules through B buses, y,, is given by YBM = min(y, B).

(4)

Each of yBM requests addresses a particular memory module with probability 1 /M. If the same assumptions described in last section hold, then the memory bandwidth under the condition that there are yBM requests passing through the buses is given by .,,,,y,~,=,[l-(l-~)-]. Thus, the memory bandwidth $‘(I

(5)

1 -(

1 +r”].

(6)

Clearly, the memory bandwidth calculated from Eq. (6) is always less than or equal to that obtained from Eq. ( 1) . The results have been further improved by applying the rateadjusted technique [ 6 1. 2.2. Asynchronous Circuit-Switched

A memory request from a processor can be generated at any instant of time and a request may be granted a free bus at any instant without the restriction of the beginning of a cycle. Continuous-parameter Markov models instead of discrete probabilistic analysis are more appropriate for such systems. 2.2.1. Centralized Case. A great deal of research has gone into the analysis of an asynchronous, circuit-switched multiple-bus system that has a centralized controller [ 13- 16 1. Marsan and Gerla [ 131 developed a set of Markov models for the system. In [ 151, local balance of the multiple-bus network has been proved. Towsley [ 161 modeled the system by using flow equivalence and surrogate delay techniques which turned out fairly accurate results. The flow equivalence technique [ 161 works as follows. The bus and memory subsystem, called aggregate, is replaced by a single flow equivalent center. Solving the model requires two steps. First, bus-memory service center (aggregate) is solved in isolation. Then the entire queuing network model is solved on the basis of the results in the first step. To solve the aggregate, all the processors are “shorted”; i.e., the processors are removed from the original system. In the resulting shorted network, memory requests that complete their memory transfers immediately cycle back to memories. The throughput is calculated for a different number of requests, n, cycling through this network. After the throughput is determined for all possible values of n , the original queuing network can be solved by taking the bus-memory subsystem as a single service center. Techniques, such as mean value analysis (MVA) algorithm [ 17 1, exist for solving such queuing networks. 2.2.2. Decentralized Case. In this operation, a station that has a request holds the bus until it gets the addressed memory. This system causes a lot of wastage in bus bandwidth and therefore should not be implemented. Therefore, a circuit-switched, asynchronous, and decentralized multiple-bus system is considered impractical for implementation. However, for completeness, we have done an analysis of the system based on surrogate delay technique [ 61. 3. PACKET-SWITCHED

is given by

-p)*‘M[

269

INTERCONNECTIONS

System

Asynchronous operation differs from the synchronous operation in the sense that it does not have the global clock.

MULTIPLE-BUS

SYSTEMS

In packet switching, a memory request packet generated by a processor is first put in the bus access queue to wait for an available bus. When a bus is granted, the processor transmits the packet to the addressed memory module and releases the bus immediately after the packet is sent. Since there may be more than one processor addressing the same memory module, the packet has to join the memory input buffer. After the request finishes the memory operation, a ready response packet is placed in the memory output buffer where the packet tries to get an available bus. Finally, the packet is transmitted back to the requesting processor. With this organization, every time a memory module com-

270

YANG

AND

pletes one service it puts the response packet in the output buffer and becomes available to serve the next request in the queue. The efficiency of the system is increased due to the fact that a memory module can now be busy serving different requests continuously. Also the buses can become free as soon as they deliver the packets, so they can be well utilized. 3.1. Synchronous Packet-Switched Systems In synchronous operation, the system cycle or the bus transfer time is a constant value and constitutes the basic time unit in our analysis. All processors are synchronized at this time unit called bus cycle. The cycle time of each memory is equal to T bus cycles for some integer value of T. Arbitration delay is considered to be included in this value. The arbitration policy for resolving bus conflict can be based on either random selection or cyclic selection. Statistically these two policies result in the same mean. The bus arbitration process is done within a cycle irrespective of whether the buses are centrally or decentrally controlled. Therefore, we do not distinguish between centralized and decentralized control strategies in synchronous analysis. Encore Multimax is an example of a single-bus packetswitched, synchronous, and centralized system [ 18 1. A processor is said to be active when it is busy in computation. The processor issues a bus request for memory access at the beginning of a cycle with some probability p . After it issues a memory request, the processor keeps idle until the memory service finishes. The proportion of time that a processor remains active is defined as processor utilization ( PJ. Let W, be the time from which a processor generates a request until the request arrives at the addressed memory module. Let W, represent the delay that a packet experiences in a memory input buffer which includes both the waiting time in the buffer and the time to perform the memory operation. And W, denotes the packet delay in a memory output buffer which is the sum of queuing delay and packet transmission time on a bus. The processor utilization, P, , can be expressed as P, =

k k+kp(W,+

IV,+

I+‘,)

(7)

1

The above equation can be easily derived by observing a processor for k cycles. During this time kp global memory requests would have been generated. Each of these kp memory requests requires W, + W, + W, cycles to finish. Assuming independence between various queues simple queuing analyses lead to the following formula [ 19 1,

BHUYAN

2 - pP,,NIMB - NT,P,,fM 2( 1 - NTpP,,f M) + M-NpP,, P,M-NpP,

T (8) . ’

-’ )I



where P, is the probability that a bus request is accepted by the bus system and its derivation can be found in [ 191. Equation (8) has one unknown variable P, and can be solved using a standard numerical method. An initial guess of P, is 1/ ( 3 f T) which is the maximum possible processor utilization. The results obtained from Eq. (8) have been verified through simulations and are shown to produce acceptable errors [ 19,6 ] . 3.2. Asynchronous Packet-Switched Multiple-Bus

Systems

Since in asynchronous operation a memory request can be generated or granted to a bus at any instance in time, the control strategy, centralized or decentralized, directly affects the system behavior, and we will consider them separately. The detailed analysis of these systems can be found in [ 19 ] . For brevity, we will only summarize the techniques here. 3.2.1. Centralized Control. A packet-switched, asynchronous, and centrally controlled system can be modeled by using a simple closed queuing network [ 191. Processors in the system are modeled as delay servers with a mean service time of Z and memory modules are modeled as FCFS servers. Since a central controller allocates the buses in a centrally controlled bus system, the bus system is modeled as an FCFS center with an equivalent service rate of B buses. A request packet generated by a processor is first put in the bus system queue, waiting for an available bus. After it gets access to a bus, the packet joins one of M memory queues with a probability l/M. The memory module which finishes the service of a packet puts the response packet again in the bus queue. From there, the response packet gets back to the requesting processor through a bus. To this point the packet finishes one rotation through the network, and the processor resumes its background activity. The queuing network is then solved by applying the approximate MVA algorithm [20] to take into account the fixed service times over the bus and memory centers. Simulation results also indicate a good match with those obtained from analysis [ 19,6]. 3.2.2. Decentralized Control. The actual implementation of decentralized multiple-bus can be either token bus or daisy chain bus as described in [ 2 11. The bus granting policy is round-robin and the bus grant signal on each bus can be seen as a token. The B tokens on the B buses are independent and operate asynchronously. If a device (processor or memory module) has no packet to transmit, the interface of the device simply passes the arriving token

MULTIPLE-BUS

271

INTERCONNECTIONS

the rest of the system, processors and memory modules, as a complement network. The solution of the resulting network involves two steps [ 191, ( 1) determining the throughput of the bus system queue for each possible population n = 1,2, . ..) N to obtain a flow equivalent service center (FESC); (2) solving the high-level queuing network with the bus system queue being FESC. Detailed analysis and results of this system are given in [ 19 ] . 4. DISCUSSIONS

I I 38.00 50.00 Proccsoor Thinking Time, Z

FIG. 2. A comparison between packet-switched and circuit-switched 16 * 16 multiple-bus multiprocessors with two buses.

to the next device. The delay incurred is assumed to be a constant r between two successive devices. When a device has a packet to transmit, the interface simultaneously monitors all the B buses and captures the earliest arriving token on any one of the B buses. Once it gets the token, it can transmit the packet in its buffer on the bus. The device will then pass the token as soon as it finishes the transmission [21]. We use hierarchical modeling techniques [ 17 ] by defining the bus system as the aggregate queue of the system, and

In the previous sections, we have described eight different categories of multiple-bus interconnections for multiprocessors. They differ from each other in design cost, reliability, and performance. For example, synchronous systems are simpler in circuit design than asynchronous systems but lack flexibility and expandability. A centrally controlled system always suffers from the problem of controller bottleneck and poor reliability. All these aspects are qualitatively true and need no further explanation. Given the input parameters and a configuration, the performance of the system can be readily predicted using our analytical models. A direct quantitative performance comparison of different systems is difficult because of the incompatibility of the system operations. Moreover, choosing wrong input parameters may give rise to wrong conclusions. However, the performance differences between some packet-switching and circuit-switching systems can be shown based on the analytical models presented in the previous sections. Figure 2 shows the processing powers (PJV) as functions of mean interval time (Z ) between two successive memory requests for a 16 * 16 asynchronous centrally controlled two-bus multiprocessor system. Curves for a circuitswitched system and for a packet-switched system are

TABLE I Bandwidth Comparison of Three Types of INS N=4

IN type and No. of Buses

B= B= B= B= B= B= B= B= B=

1 2 4 6 8

10 12 14 16

p = 0.5

N= 12

N=8

p=l

p= 0.5

p=l

p = 0.5

N= 16 p=l

p = 0.5

p=l

MIN:

1.56

2.44

2.81

4.13

4.22

6.20

5.13

7.20

Crossbar:

1.66

2.13

3.23

5.25

4.80

1.18

6.31

10.30

0.94 1.51 1.66

1.00 1.98 2.13

1 .oo 1.94 3.09 3.23 3.23

1 .oo 2.00 3.98 5.18 5.25

1.00 1.99 3.19 4.68 4.80 4.80 4.80

1 .oo 2.00 4.00 5.98 1.41 1.11 1.18

1.00 2.00 3.91 5.51 6.21 6.31 6.31 6.37 6.37

2.00 4.00 6.00 1.99 9.66 10.26 10.30 10.30

1.00

YANG

272

AND BHUYAN

0.8 4

0.6

0.4

0.2 rossL%r

MIN

B-4 B-2 B-l 0.0 0.0

I 0.2

I 0.4

I 0.6

I 0.8 Probability

I 110 of quest, p

Even less buses are required to obtain a similar performance in case the buses are packet-switched. This fact is illustrated in Fig. 3 which shows the processor utilization vs request rate for the three INS based on a synchronous packetswitched operation. In this figure, the memory access time is assumed to take four system cycles [ 18 1. It is shown that a multiple-bus-based multiprocessor system with only four buses can achieve almost the same performance as that of the crossbar- or MIN-based systems while reducing the hardware cost significantly. In addition to the advantages related to performance and cost discussed above, the multiple-bus structure is also very attractive from a reliability point of view. Figure 4 shows reliability curves vs time for the three types of INS in a 16 * I6 multiprocessor system [ 41. The failure rates used to obtain the curves are shown in the figure. Graceful degradation is allowed and three sets of curves are plotted for a task requiring 8, 12, and 16 processor-memory pairs, respectively. The reliability of the multiple-bus system is consistently better than that of the other two networks in all the three cases. Similar results for BWavailability were also obtained [ 4 ] .

FIG. 3. Processor utilization as a function of probability of request for 16 * 16 synchronous packet-switched systems.

shown in the figure. In circuit-switching mode, the only queuing delay results from the time during which the communication path between the requesting processor and the addressed memory module is being established. Once the path is established, it is dedicated to this entire memory operation. It is assumed that a memory operation takes four processor cycles [ 181. Hence, in the packet-switched multiple-bus system six cycles are needed for a memory operation excluding any queuing delay, two cycles for bus transactions and four cycles for memory access. The same assumption is also applied to the circuit-switched case to make a fair comparison. It is shown in the figure that the packet-switched system performs better than the circuitswitched system under this condition. In fact, the same result has been observed for a variety of system parameters [6]. Compared to multistage interconnection network and crossbar, the multiple-bus interconnections have a number of advantageous features. Let us first consider Table I which lists BW of N*N INS based on a synchronous circuitswitched operation. Results for probabilities of requests of 0.5 and 1.0 per cycle are presented in the table. The BW increases with the increase in the number of buses, but so does the cost. Hence there is a lot of flexibility on the part of the designer to choose the number of buses depending on the requirement. When the number of buses is equal to half of the number of processors, the bandwidths of multiplebus systems are comparable to those of crossbar networks.

5. CONCLUSIONS Two very important and central properties of any system are structure and behavior. As a structure for interconnecting shared memory multiprocessor systems, the multiplebus system has been considered a very attractive candidate that offers a number of advantages over other structures. R,(t)

0

600

1200

1800

2400

3000 Time

(hours)

FIG. 4. Reliabilities of a 16 * 16 * 8 multiple-bus-based system and 16 * 16 crossbar- and MIN-based systems for a task requiring I processors and Z memories. (m) Multiple-bus; (-) crossbar; (+) MIN. & = X, = 0.000 1; X, (crossbar, multiple-bus) = 0.000005; X, (omega) = 0.00002.

MULTIPLE-BUS

Particularly it is very suitable for medium scale systems. We have discussed the performance issues for various interconnection schemes of the multiple-bus system depending on its timing, switching, and control. All the performance models consider realistic situations such as fixed memory access time and the bus arbitration procedures. Approximation techniques such as Newton’s iteration techniques, flow equivalence service center model, and some heuristic algorithms were used in solving these performance models. The performance models, presented here, have been carefully compared with the simulation results indicating acceptable error margins [ 61. We believe that the multiple-bus system offers a simple, flexible, and reliable IN for future medium scale multiprocessor systems. REFERENCES 1. Wulf, W. A., and Bell, C. G. C.mmp-A multi-mini-processor. AFZPS Proc. Fall Joint Comput. Conf. 1912, pp. 765-111. 2. Das, C. R., and Bhuyan, L. N. Bandwidth availability of multiple-bus multiprocessors. IEEE Trans. Comput. C-34 (Oct. 1985), 918-926. 3. Bhuyan, L. N., Yang, Q., and Agrawal, D. P. Performance ofmultiprocessor interconnection networks. IEEE Comput. (Feb. 1989), 2537. 4. Das, C. R., and Bhuyan, L. N. Reliability simulation ofmultiprocessor systems. Proc. Int. Conf: on Parallel Processing, Aug. 1985, pp. 59 l598. 5. Adams, G. B., III, Agrawal, D. P., and Siegel, H. J. A survey and comparison of fault-tolerant multistage interconnection networks. IEEE Comput. (June 1987), 14-27. 6. Yang, Q. Analysis of cache based multiple-bus multiprocessors. Ph.D. dissertation, The Center for Advanced Computer Studies, University of Southwestern Louisiana, 1988. I. Bhuyan, L. N. A combinatorial analysis of multibus multiprocessors. Proc. Int. Conj on Parallel Processing, Aug. 1984, pp. 225-227. 8. Mudge, T. N., Hayes, J. P., Buzzard, G. D., and Winsor, D. C. Analysis of multiple-bus interconnection networks. J. Parallel Distrib. Comput. 3, (Sept. 1986). 9. Goyal, A., and Agerwala, T. Performance analysis of future shared storage system. IBM J. Res Develop. 28 (Jan. 1984), 95-108. 10. Valero, M., et al. Analysis for multiple-bus interconnection networks. Proc. ACMSIGMETRICS Conf. 1983, pp. 200-206. Holliday, M. A., and Vernon, M. K. Exact performance estimate for I I. multiprocessor memory and bus interference. IEEE Trans. Comput. C-36(Jan. 1987), 76-85. Received March 3, 1988; revised January 26, 1989

INTERCONNECTIONS

273

12. Mudge, T. N., Hayes, J. P., and Winsor, D. C. Multiple-bus architectures. IEEE Computer, Special Issue on Interconnection Networks, June 1987, pp. 42-48. 13. Marsan, M. A., and Gerla, M. G. Markov models for multiple bus multiprocessor system. IEEE Trans. Comput. C-31 (Mar. 1982), 239-248. 14. Yang, Q., and Zaky, S. G. Communication performance in multiplebussystems. IEEE Trans. Comput. 37 (July 1988), 848-853. 15. Irani, K. B., and Onyuksel, I. H. A closed-form solution for the performance analysis of multiple-bus multiprocessor system. IEEE Trans. Comput. C-33 (Nov. 1984), 1004-1012. 16. Towsley, D. Approximate models of multiple bus multiprocessor systems. IEEE Trans. Comput. C-35 (Mar. 1986), 220-228. 17. Lazowska, E. D., et al. Quantitative System Performance-Computer System Analysis Using Queueing Network Models. Prentice-Hall, Englewood Cliffs, NJ, 1984. 18. Multimax, Multimax Technical Summary. Encore Computer Corporation, Marlboro, MA, May 1985. 19. Yang, Q., Bhuyan, L. N., and Pavaskar, R. Performance analysis of packet-switched multiple-bus multiprocessor systems. Proc. the Eighth Real-Time Systems Symposium, Dec. 1987, pp. 170- 178. 20. Reiser, M. A queueing network analysis of computer communication networks with window flow control. IEEE Trans. Comm. COM-27 (Aug. 1979), 1199-1209. 21. Yang, Q., and Bhuyan, L. N. Design and analysis ofdecentralized multiplebus multiprocessor. Proc. Int. Conf on Parallel Processing, Aug. 1987, pp. 889-892. QING YANG is an assistant professor in the Department of Electrical Engineering at the University of Rhode Island. His research interests include parallel and distributed computer systems, design of digital systems, performance evaluation, and local area networks. Yang received a B.Sc. degree in computer science from Huazhong University of Science and Technology in China in 1982 and an M.A.Sc. in electrical engineering from the University of Toronto in 1985. He received a Ph.D. in computer engineering from the Center for Advanced Computer Studies, University of Southwestern Louisiana. Yang is a member of the IEEE Computer Society. LAXMI N. BHUYAN is an associate professor in the Department of Computer Science at The Texas A&M University. His research interests include parallel and distributed computer architecture, performance and reliability evaluation, and local area networks. Bhuyan received B.S. and M.S. degrees in electrical engineering from the Regional Engineering College, Rourkela, under Sambalpur University in India. He received a Ph.D. in computer engineering from Wayne State University in Detroit in 1982. Bhuyan is a senior member of the IEEE and a distinguished visitor of the IEEE Computer Society and served as guest editor of the Computer issue on interconnection networks in June 1987.