Design and performance of a multiple parallel shared memory switch

Design and performance of a multiple parallel shared memory switch

COMCOM 1227 Computer Communications 20 (1997) 639–648 Design and performance of a multiple parallel shared memory switch G. Kbar* Department of Comm...

522KB Sizes 0 Downloads 43 Views

COMCOM 1227

Computer Communications 20 (1997) 639–648

Design and performance of a multiple parallel shared memory switch G. Kbar* Department of Communication, University of New South Wales, Sydney 2052, Australia Received 29 September 1995; accepted 13 November 1996

Abstract A new ATM switch architecture is presented. This switch is based on multiple parallel shared memory switches (8 3 8) which are connected together in a Banyan interconnection network. The parallel shared memory switch (PSMS) is capable of achieving 100% throughput under bursty traffic. This switch has the advantage of requiring a small queue size as in other types of shared memory switches. In contrast to other shared memory, there is no constraint on the switch size implementation since it does not require a speed-up which depends on the size of the switch. Speed-up is needed in the parallel shared memory switch to maintain the sequence of cells, but at a small value dependent on the size of the parallel shared memory switch element (8 in this design). The parallel shared memory switch element is based on distributed shared memory where each memory queue stores cells coming from all inputs. This limits the performance degradation that may occur in a completely shared memory where long streams from a few inputs may unfairly hog the entire space. Analysis and simulation of the switch demonstrates a high performance in terms of end-to-end delay, throughput and cell loss probability. q 1997 Elsevier Science B.V. Keywords: ATM; BISDN; Switch architecture

1. Introduction There are many ongoing research efforts to develop ATM switches for BISDN. Researchers describe good switches as having the characteristics of nonblocking, low cell loss while using small buffers, maintainance of cell sequence and suitability for small as well as large switch implementations. Blocking can occur at the input, at internal links, and at the output when two or more input cells are contending for the use of the same resources within the switching fabric. Banyan networks and their variations of multistage switches are examples of blocking switches. Blocking can be solved using buffering of cells contending for the same outputs. Switches are classified according to the location of the buffers used for temporary storage as input queuing, output queuing, internal queuing, shared memory and mixtures of input and output queuing. Shared memory [1,2], Crossbar [3], Batcher-Banyan network [4,5], Parallel cross-connection Knockout [6], and Prelude [7] are classified as nonblocking switches. Input queuing suffers from head-of-line blocking and consequently has low throughput performance [8]. Output queuing has higher throughput performance than input * Tel.: +61 2 9385 5152; fax: +61 2 9385 5993; e-mail: g.kbar@unsw. edu.au

0140-3664/97/$17.00 q 1997 Elsevier Science B.V. All rights reserved PII S 01 40 - 36 6 4( 9 7) 0 00 5 3- 4

queuing, due to the availability of multiple paths [6]. Speed-up has also been used with input queuing to improve the throughput, but results in the necessity of output queuing to hold cells waiting for processing [5]. Consequently, a grouping of input–output queuing produces efficient switch designs [9,10]. Internal buffers have also been used to store cells that lose contention at internal links [11]. This requires large numbers of queues at each switching element of all stages of the fabric, and it also requires an internal speed-up at each stage of the switch fabric to reduce the blocking probability at each switch element. This increases the complexity of the internal speed-up which is proportional to the number of stages in the switch. Other techniques use shared memory [12] for storing all arriving cells in one logical memory. This reduces the size of the required queues and produces high throughput characteristics. Unfortunately, speed-up is required to multiplex all the input ports, which limits the maximum switch size implementation. Shared queuing suffers from small switch size implementation (because of the required speed-up for multiplexing N inputs in one time slot). The ATM Distributed Shared Memory Switch (ADSMS), proposed by Kbar and Dewar [13], uses multiple distributed shared memory switches. They are connected together in a Banyan interconnection network to solve the switch size implementation problem at low complexity.

640

G. Kbar/Computer Communications 20 (1997) 639–648

In this paper, a new shared memory switch, the ATM Multiple Parallel Shared Memory Switch, is described. This new switch is capable of achieving 100% throughput under uniform and bursty traffic and is suitable for all switch sizes using a simple implementation. This switch is an internal buffer switch type (Banyan switch) which uses a parallel shared memory switch (PSMS) of size 8 3 8 in each switch element, and uses a sort network to achieve a high throughput (100%) in comparison to a non-sort-using network [14] which achieves a throughput of 0.55% for the same size of switch element (8 3 8). The PSMS switch element is a shared memory switch which consists of two distributors using demultiplexers, multiplane parallel intermediate memory queues, and output queues. The cell sequence is maintained using a low value of speed-up, thus eliminating the out-of-order problem. The distribution of memory queues and cycling cells among all memory queues balances the queue size among the memory planes and avoids long streams belonging to one input port hogging the shared memory. Compared with other strategies, the complexity is minimised in terms of queue sizes and hardware implementation. In Section 2, the switch architecture is presented. In Section 3, the PSMS switch element structure and operation is given. Section 4 describes the performance of the PSMS for bursty traffic, including analysis, simulation of the switch and numerical results. Finally, in Section 5, the conclusion is presented.

2. The ATM multiple parallel shared memory switch Completely shared queuing has the advantage over other queuing disciplines (input and output queuing) in requiring a small buffer size. However, it suffers from two main problems, unfairness and limitation of small switch size because of the required speed-up for multiplexing N inputs into one time slot. The size of switch issue can be solved using multiple shared memory switches interconnected together to form a larger switch (e.g. Banyan switch interconnection). In this case the switch fabric becomes an internal buffer switching type where its element is based on a shared memory switch. However, since the Banyan interconnection is blocking, the throughput of the switch is reduced to half, according to the analysis of Bianchi and Turner [14]. To achieve a high throughput at the internal buffering of a Banyan switch, a sort network is needed at the front. Moreover, if the switch element is based on a size which is greater than 2 3 2, the complexity of the switch would also be reduced. The unfairness issue arises in completely shared queuing switches because the entire queue is accessible by every input port, which might cause hogging by one input with bursty traffic. Dedicated queuing switches, on the other hand, require larger buffers for each dedicated queue, since they do not have the facility of statistical multiplexing.

In this paper, a solution for both problems (unfairness and switch size scaling implementation) is given. The ATM ‘‘Multiple Parallel Shared Memory Switch’’ is a queuing discipline in which PSMS elements are interconnected together in a Banyan interconnection to form a large network of any size. A sort network has been used at the input of the Banyan switch to achieve a 100% throughput, as shown in Fig. 1. This switch can be implemented at any size using an internal speed-up, but at a small value dependent on the size of the switch element. This contrasts with the completely shared memory switch where arriving cells are multiplexed in TDM, which requires a speed-up of N times the speed of the input line. 2.1. The Banyan shared buffer switch This is a Banyan interconnection switch where the switch element is based on a PSMS of size 8 3 8. This switch routes cells according to their destination addresses and priorities. In the first stage of the Banyan switch, the cells are routed at the switch elements (PSMS) according to the three most significant bits. For example, for five destination bits (d 4d 3d 2d 1d 0) of a switch of size 32 3 32, the first stage switch elements route the cell according to (d 4d 3d 2). In the second stage of the Banyan switch, the cells are routed at the switch elements according to the three next most significant bits following the first most significant bit (d 3d 2d 1). In the final stage of the switch, cells are routed at the switch elements according to the three least significant bits (d 2d 1d 0). There is only one Banyan which is composed of (N=8)log2 N=4 shared memory Banyan switch elements (PSMS). The switch element PSMS consists of eight dedicated memories which are shared with eight inputs, as well as eight dedicated output queue buffers, as shown in Fig. 2. The unfairness problem is solved by circulating all eight input packets to the eight dedicated memory locations using a distributor, where each packet is stored in the subqueue of a multiplane memory according to its destination address. This switch element (PSMS) does not require a large memory queue size since arriving cells are distributed among eight different intermediate memories. The main advantage of the PSMS switch element is the balance operation under bursty traffic. Cells that belong to the same burst and are destined to the same output will not be stored in one queue as in dedicated queue systems. This balances the intermediate memory size and improves the throughput of the system. The PSMS requires a low

Fig. 1. ATM distributing shared memory switch.

G. Kbar/Computer Communications 20 (1997) 639–648

641

Fig. 2. ATM distributing shared memory switch.

speed-up to maintain the sequence integrity. Output queues are needed when speed-up is used in the PSMS switch element. 2.2. The sort switch The N 3 N batcher sort network is composed of 2 3 2 sorting elements arranged in log2 N(1 þ log2 N)=2 sorting stages and each stage has N/2 sorting elements. The use of a sort network has avoided the blocking probability of the Banyan network and consequently improved the throughput of the switching system, in comparison to the use of a Banyan network only (such as internal buffering at

the Banyan network). The cell is sorted at the sort switch element according to its destination address. If the arriving cells at both inputs of the sort switch element have the same level of priority bit, the sorting-up switch element (2 3 2) routes the cell of highest destination address to the bottom, and routes the lowest destination address to the top. If only one of these cells has low priority, it will be routed down irrespective of its destination address. If the arriving cells at the two inputs have the same destination addresses, the cell at the top input line will be routed up and the other cell will be given a low priority and will be routed down. This mechanism of assigning low priority to a cell that has the same destination address as another one at the same sort

Fig. 3. Packet routing at the switch fabric using a sort network.

642

G. Kbar/Computer Communications 20 (1997) 639–648

Fig. 4. Packet routing at the switch fabric without using a sort network.

switch element will sort all cells which have different destination addresses at the output into different groups. This avoids blocking at the Banyan switch, as shown in Fig. 3. In this figure, all cells at the input are sorted and routed at the Banyan switch according to their destination addresses. If more than one cell is addressed to the same output (e.g. the two cells 0000 and 0000 which are addressed to output 0), one can be routed to the output in the first time slot, while the other is stored in the shared memory switch element for further processing in the following time slot. At the end of the first cell time slot, 12 cells have been routed properly to the output, and the other four cells are stored in the switch element queues because of contention for the same output. If no sort network is used under the condition given in Fig. 3, the maximum number of cells that can be routed to the output is six. This is caused by blocking at the Banyan switch elements, as shown in Fig. 4. Consequently, the Banyan blocking reduces the switch throughput to half, which is six cells in comparison to 12 cells when a sort

network is used. Furthermore, the internal queue size has to be larger when no sort network is used.

3. Structure and operation of the PSMS switch element The 8 3 8 PSMS switch consists of a first distributor, a multiplane memory queue, a second distributor, an output queue and a counter synchroniser, as shown in Fig. 2. Arriving cells at any input are distributed in a circulating cycle among the eight dedicated memory queues, where each cell is stored in one of the memory queues according to its destination address. The second distributor reads the eight cells from all the dedicated memory queues and then distributes them to the eight output queues. 3.1. Distributors There are two identical distributors used in the 8 3 8 switch. Each one is constructed of eight demultiplexors

Fig. 5. Routing at the distributing shared memory switch.

G. Kbar/Computer Communications 20 (1997) 639–648

(1 3 8), where arriving cells at the inputs of the distributor are demultiplexed according to a synchronised clock. The Write Counter Synchroniser (WCS) generates the synchronised clock (Write Control Line, WCL) for the first distributor and the Read Counter Synchroniser (RCS) generates the other synchronised clock (Read Control Line, RCL) for the second distributor. The first demultiplexer of a distributor starts demultiplexing at output 1, the second demultiplexer at output 2, ... and the final demultiplexer at output 8. Then the WCL and the RCL are rotated one cycle. That demultiplexes the cell from input 1 at output 2, and the cell from input 8 at output 1, as shown in Fig. 5. Cells at the output of the first distributor are stored at different memory locations according to their destination addresses.

643

3.3.2. RCS The RCS starts by reading the first location of memory 1 (destination 1), the second location of memory 2 (destination 2), ... to the eighth location of memory 8 (destination 8) in the first time slot. In the second time slot, the RCS rotates the reading cycle by one: it reads the second location of memory 1 (destination 2), the third location of memory 2 (destination 3), ... to the first location of memory 8 (destination 1). The RCS keeps rotating the reading cycle by one at each time slot until it returns to the same cycle after eight rotations. By doing this rotation during the write cycle as well as the read cycle, arriving cells at one input line are distributed to eight different intermediate memory queues. Then they are sorted to the corresponding output queues according to their destination addresses.

3.2. Memory queues There are eight intermediate memory queues, and each one has eight subqueues representing different destination addresses, starting from the first location of memory 1 (destination 1), the second location of memory 2 (destination 2), ... to the eighth location of memory 8 (destination 8). Then in the second rotation cycle, cells are read from the second location of memory 1 (destination 2), the third location of memory 2 (destination 3), ... to the first location of memory 8 (destination 1). The read cycle and the second distributor are synchronised by the RCS to switch cells to the right output. 3.3. Counter synchroniser The counter synchroniser consists of two parts: the WCS and the RCS. 3.3.1. WCS The WCS synchronises the first distributor, where it generates a control signal for each demultiplexer. The control signal starts at 1 for demultiplexer 1, 2 for demultiplexer 2, ... and 8 for demultiplexer 8. Then the WCS rotates these control signals between the demultiplexers, as shown in Fig. 5. Thus cells are distributed among the eight intermediate memories. After eight rotations of the WCS, the control signal starts at the same value as it was in the first cycle. The WCS also synchronises the arriving cells at different intermediate memories.

3.4. Output queues Eight output queues are used to store the arriving cells from the output distributor. The cells arrive at the output queue eight times faster than the service rate, since the intermediate memories are speeded up by eight to maintain the sequence integrity of the cells. The output queues, which store cells contending for the same output, are FIFO (FirstIn-First-Out). 3.5. Complexity Distributing arrived cells at the PSMS switch element into eight intermediate memories solves the fairness problem at similar complexity to the shared TDM-based memory. To compare the complexity of the PSMS switch element with the shared TDM-based memory, Fig. 6 shows the components required for building the switch element of the shared TDM-based memory. Table 1 illustrates the difference in complexity between the two approaches. When comparing the complexity of the two techniques, we observe the following results: • •

Both techniques require speed-up of 8 to maintain the cell sequence integrity. The shared TDM-based memory requires eight input buffers, which are equivalent to eight intermediate memories of the PSMS, since the intermediate memories

Fig. 6. Switch element of shared TDM-based memory.

644

G. Kbar/Computer Communications 20 (1997) 639–648

Table 1 Complexity of the switch element for shared TDM-based memory and PSMS

Shared TDM memory PSMS

• • •



Speed-up

Input buffer

Mux

Write counter

Read counter

CDAW

CDAR

Shared memory

De-mux

8

8

1

1*

1*

1

1

8 3 B-size

1

1

1

8

8

8

temporarily hold cells to be switched to the output queues. The size of shared memory used in the shared TDMbased memory approximately equals the total size of the eight output queues (8 3 buffer size in cells). The shared TDM-based memory requires one multiplexer and one demultiplexer, but the PSMS requires 16 demultiplexers. The shared TDM-based memory requires one CDAW (Controller Destination Address Write) and one CDAR (Controller Destination Address Read), but the PSMS requires eight CDAWs and eight CDARs. Both architectures require one write counter synchroniser and one read counter synchroniser. However, the counter synchroniser (1*) which is used in the shared TDM-based memory is much more complex than the one used in the PSMS. The counter synchroniser of the PSMS is a normal counter used to synchronise the distributor, where its outputs are connected to different control signals of the demultiplexers. In contrast, the counter synchroniser used in the shared TDM-based memory must have address queues to store information about the next cells to be processed. The address queue size must be equal to the size of address fields multiplied by the number of cells stored in the memory buffers. The 14 extra demultiplexers and 14 extra CDAWs or CDARs which are required for the PSMS switch element (of size 8 3 8) would be of similar complexity to the extra address queues used in the shared TDM-based memory. The minimum total size of address queues equals the cell filed address multiplied by the size of shared memory queue, i.e. 5 bytes 3 8 buffer size. Therefore, both architectures have similar switch element complexity. However, the PSMS provides fairness between all inputs, achieves a maximum throughput and can be built at all sizes.

4. Performance of the PSMS

16

Output queue

Intermediate memory

8 3 B-size

8

to output j becomes:

8 < pr (arrivals), for new burst N pij ¼ :

pr (arrivals), for same burst

The probability of arrival at the switch, sr ¼ abl =(abl þ asl ), hence

8s < r pr new burst pij ¼ N : s p same burst r r

For a geometric distribution of average burst length a bl and average silence length a s l we have: probability(same burst) ¼ [probability(not generating silence)] and [probability (generating input burst i)] ¼ (1 ¹ x2 ) 3 1=N, where probability(generating input burst i) ¼ 1=N since there are N possible bursts coming from different inputs, and probability(not generating silence) is x2 ¼ (1 ¹ p)x ¹ 1 p, where p ¼ 1=mean ¼ 1=abl , x [ {1, `} for a geometric distribution. Hence, the distribution of silence:



1 x2 ¼ 1 ¹ asl

asl ¹ 1

1 asl

and the probability of generating the same burst is:





1 pr (same burst) ¼ (1 ¹ x2 )=N ¼ 1 ¹ 1 ¹ asl

asl ¹ 1

1 asl



3 1=N with pr (new burst) ¼ 1 ¹ pr (same burst). Substituting p r(new burst) and p r(same burst) into the p ij equation we get:





s (1 ¹ x2 ) s (1 ¹ x2 ) (1 ¹ x2 ) s s ¼ sr ¹ r þ r pij ¼ r 1 ¹ þ r N N N N N N (1)





asl ¹ 1



1 1  s s 3 3 sr ¹ r þ r N N asl N

4.1. Switch routing analysis

1 pij ¼ 1 ¹ 1 ¹ asl

For a new burst, the cell at input port i of the N 3 N switch will destine randomly to N output ports, and for the same burst, the cell will destine to output port j. Hence, the probability of frame arrivals at input i destined

If k arriving cells at N inputs are destined to one output port j, and N ¹ k are destined to different outputs, by using a binomial distribution, the total probability of all inputs that are destined to output j is given by the probability mass

645

G. Kbar/Computer Communications 20 (1997) 639–648

Fig. 7. Geom/D/1/m state diagram for maximum probability of arrival L.

Applying the local balance boundary between state 0 and state 1, which equates the flow from right and left, we get:

function: psw, j (k) ¼

N! (p )k (1 ¹ pij )N ¹ k k!(N ¹ k)! ij

(2) p1 b 0 ¼ p 0

L X

b(i)

i¼2

4.2. Geom/D/1/m output queue analysis Fig. 7 shows the state transition diagram for the output queue, where the probability of arrivals at the output queue is b(i) ¼ psw, j (k) from Eq. (2), where L ¼ 8 is the speed-up for an 8 3 8 shared memory switch element PSMS. The one-step transition equations for the output queue are: p0, 0 ¼ b1 þ b0 , one or zero arrivals pi, i ¼ b1 , one arrival 1, L pni, ¼ i þ n ¼ b(i þ 2), two or more arrivals 1, L pi, i ¹ 1 ¼ b0 , zero arrivalNote the term pni, ¼ i þ n means that a transition can occur from state i to state i þ n and for n changing from 1 to L, where L is the speedup at the switch.

Between state 1 and state 2 we get: p2 b 0 ¼ p 1

L X

b(i) þ p0

i¼2

L X

b(i)

i¼3

and between state n ¹ 1 and state n we get: p n b0 ¼ pn ¹ 1

L X

b(i) þ pn ¹ 2

i¼2

¼

min(L, Xn þ 1)

L X

b(i) þ ::: þ pn ¹ L þ 1

i¼3

pn ¹ k þ 1

K ¼2

L X

b(i)

!

L X

b(i)

i¼L

ð3Þ

i¼k

By iteration we can get the probability of n in the queue poq, n ¼ pn , and poq, 0 ¼



X

1 m p n ¼ 1 oq, n

for sufficiently large queue size m. The probability of loss at

Fig. 8. Probability of loss at output queue for N ¼ 8, speed-up ¼ 8 and abl ¼ 10.

Fig. 9. Probability of loss at output queue for N ¼ 8, speed-up ¼ 8 and abl ¼ 20.

646

G. Kbar/Computer Communications 20 (1997) 639–648

the output queue for size k can be computed as: ploss, oq (k) ¼

m X

poq, n

(4)

n¼kþ1

4.3. PSMS switch throughput analysis The normalised throughput at the output queue, or the average number of output cells per output link per slot is: y¯ ¼ probability(output queue is not empty) 3 probability(service at the output queue)

Fig. 10. Probability of loss at output queue for N ¼ 8, speed-up ¼ 8 and abl ¼ 30.

In one time slot and for the Geom/D/1 output queue, probability(service at the output queue) equals 1. Probability(output queue is not empty) is equal to 1 ¹ [probability(having 0 cell [ the queue) and probability(0 cell

Fig. 11. Simulation output0 queue size at N ¼ 8, abl ¼ 10, speed-up ¼ 8 and load ¼ 0.9.

Fig. 12. Simulation: sequence of cells at output0 for N ¼ 8, abl ¼ 10 and load ¼ 0.8.

G. Kbar/Computer Communications 20 (1997) 639–648

647

4.4. Numerical results Analytical and simulation results of the cell loss, sequence integrity and end-to-end delay have been developed under the assumption of bursty arrivals. A geometric distribution is used to model the average burst length and average silence length. The probability of loss at the output queue is investigated under different average burst length and under different loads. Simulation results are obtained for the probability of loss for the output queue, and for the sequence of cells for speed-up and for no speed-up, as well as simulation for the end-to-end delay. A comparison between analysis and simulation is done to validate the analytical results.

Fig. 13. Simulation: end-to-end delay at output0 for N ¼ 8, abl ¼ 10, speedup ¼ 8 and load ¼ 0.9.

arrival at the queue)] ¼ 1 ¹ p oq,0 3 b 0, where p oq,0 is obtained from Eq. (3), and b0 ¼ psw (0) is obtained from Eq. (2). Hence, the normalised throughput becomes: y¯ ¼ 1 ¹ psw, 0 3 b0

(5)

4.4.1. Analytical results The probability of loss at the output queue is given in Eq. (4). For average burst length abl ¼ 10, the output queue size is 30 for probability of loss 10 ¹12 at load 0.5, which increases to 104 for the same probability of loss and at load 0.9, as shown in Fig. 8. If the average burst length increases to 20 the queue size becomes 128 for the same condition, as shown in Fig. 9. The output queue size increases even more to 150 for average burst length 30 and under the same conditions, as shown in Fig. 10.

Fig. 14. Output queue size for N ¼ 8, speed-up ¼ 8 and probability of loss ¼ 10 ¹12.

648

G. Kbar/Computer Communications 20 (1997) 639–648

4.4.2. Simulation results Simulation of the PSMS switch is done under the bursty arrival of cells with size N ¼ 8, abl ¼ 10 and load 0.9. In Fig. 11 the output queue size is changing between 0 and a maximum of 55 cells for no overflow and under the same conditions. Fig. 12 shows the effect of speed-up on the sequence integrity of the cells. For no speed-up, cell number 16 arrived before cell number 9, but using a speed-up of 8 the sequence integrity is maintained. The end-to-end delay is 16 3 10 ¹5 for switch size 16 3 16 and load 0.9, as shown in Fig. 13. 4.4.3. Comparing analytical and simulation results For average burst lengths abl ¼ 10 and 20, Fig. 14 shows the compatibility between the simulation and analytical results at the output queue for no overflow or probability of loss 10 ¹12 and speed-up ¼ 8. The output queue size is 165 for load 0.95 and abl ¼ 20, which decreases to 128 for abl ¼ 10 at the same load. The output queue size decreases more at load 0.5: to 38 for abl ¼ 20 and to 28 for abl ¼ 10.

5. Conclusion A new ATM switch architecture based on multiple parallel shared memory is presented. This is a new type of Banyan shared memory that has the advantage of having a fair system, and is much easier to implement at any size. In contrast to the normal shared memory switches, which are based on a TDM multiplexer and require a bigger speed-up, this switch requires much less speed-up to maintain the sequence integrity of cells. This makes it suitable for large switch implementations. An analytical model for the performance of the switch architecture has been presented. The speed-up at the intermediate memory reduces the size of these memory queues at the price of introducing output queues to hold cells waiting for transmission at the speed of the input line. Under bursty traffic and similar to single shared memory switches, this switch has the advantage of requiring smaller queue size compared with dedicated input or output queue disciplines. The system performance in terms of the probability of loss, delay and sequence integrity has been analysed and simulated. Numerical results for the system under bursty traffic indicate that the proposed design exhibits good performance and simple switch implementation at all sizes.

References [1] A.K. Choudhury, E.L. Hahne, Space priority management in a shared memory ATM switch, IEEE Globecom, Vol 3 (1993) 1375–1383. [2] H.S. Kim, Design and performance of multinet switch, IEEE/ACM Transactions on Networking 2 (6) (1994) 571–580. [3] M.J. Karol, M.G. Hluchyi, S.P. Morgan, Input vs output queueing on a space-division packet switch, IEEE Trans. Comm. COM-35 (1987) 1347–1356. [4] J. Hui, E. Arthurs, A broadband packet switch for integrated transport, IEEE J. Select. Areas Commun. SAC-5 (1987) 1264–1273. [5] D.X. Chen, J.W. Mark, SCOQ: A fast packet switch with shared concentration and output queueing, IEEE J. Infocom, Vol 1 (1991) 145–153. [6] Y.-S. Yeh, The knockout switch, IEEE J. Sel. Areas in Commun. SAC-5 (8) (1987) 1274–1282. [7] J.P. Coudreuse, M. Servel, Prelude: An asynchronous time-division switched network, in: Proc. ICC ’87, Seattle, WA, 1987, pp. 769–773. [8] M.G. Hluchyj, M.J. Karol, Queuing in space-division packet switching, IEEE J. Infocom, Vol 1 (1988) 334–343. [9] A. Pattavina, G. Bruzzi, Analysis of input and output queueing for nonblocking ATM switches, IEEE/ACM Trans. on Netw. 1 (3) (1993) 314–328. [10] G. Kbar, W. Dewar, Multicast switch with feedback input queuing, priority sorting network, and output queuing, International Symposium on Information Theory and Its Applications, Sydney, Australia, 1994, vol. 1, pp. 627–632. [11] H.S. Kim, A. Leon-Garcia, Performance of buffered Banyan networks under nonuniform traffic patterns, IEEE J. Infocom, Vol 1 (1988) 344–353. [12] T. Kozaki, N. Endo, Y. Skurai, O. Matsubara, 32x32 Shared buffer type ATM switch VLSIs for BISDN, IEEE J. on Sel. Areas in Commun. 9 (8) (1991) 1239–1246. [13] G. Kbar, W.J. Dewar, ATM distributed shared memory switch (ADSMS), ATNAC’95, Sydney, 1995, pp. 71–76. [14] G. Bianchi, J.S. Turner, Improved queueing analysis of shared buffer switching networks, IEEE/ACM Trans. on Net. 1 (4) (1993) 482–490.

Ghassan Kbar was born in Lebanon in 1962. He received the Bachelor degree in electronic engineering from the University of Damascus, Syria in 1987, and the Master of Engineering Studies from the University of Sydney, Australia in 1990. He finished his PhD in electrical engineering at the University of New South Wales, Sydney, Australia in 1996. He worked at the State Rail Authority in Sydney from 1990 to 1992 as a communication design engineer. Since 1993 he has worked at the University of NSW as an Associate Lecturer. His research interests comprise the area of ATM switching architectures, LAN, WAN, Internetworking and their realisation.