Emulating output queueing with parallel packet switches

Emulating output queueing with parallel packet switches

Available online at www.sciencedirect.com Computer Communications 30 (2007) 3403–3415 www.elsevier.com/locate/comcom Emulating output queueing with ...

1MB Sizes 1 Downloads 113 Views

Available online at www.sciencedirect.com

Computer Communications 30 (2007) 3403–3415 www.elsevier.com/locate/comcom

Emulating output queueing with parallel packet switches Chia-Lung Liu a

a,b,*

, Chin-Chi Wu

a,c

, Woei Lin

a

Department of Computer Science, National Chung-Hsing University, 250, Kuo Kuang Road, Taichung, Taiwan b Information & Communications Research Labs, Industrial Technology Research Institute, Taiwan c Nan Kai Institute of Technology, Taiwan Received 7 March 2007; accepted 13 June 2007 Available online 21 June 2007

Abstract This work employs an approximate Markov chain to analyze that a parallel packet switch (PPS) can emulate a first-come first-served (FCFS) output-queued (OQ) packet switch only operating slower than the external line rate. The PPS comprises multiple packet switches operating independently and in parallel. The PPS class is characterized by deployment of parallel center-stage switches with memory buffers running slower than the external line rate. Each lower speed packet switch operates at a fraction of the external line rate R. For example, each packet switch can operate at internal line rate R/K, where K is the number of center-stage switches. This work presents a proposed novel Markov chain model successfully exhibits these performance characteristics for throughput, cell delay and cell drop rate. This novel Markov chain model extends from our previous paper [Chia-Lung Liu, Woei Lin, Performance analysis of the sliding-window parallel packet switch, in: IEEE 40th International Conference on Communications 2005, ICC 2005, May 16–20, 2005; Chia-Lung Liu, Woei Lin, Evaluation and analysis of the sliding-window parallel packet switch, in: IEEE 19th International Conference on Advanced Information Networking and Applications, AINA, vol. 2, 2005, pp. 355–358]. Simulation comparison demonstrates that the chains are accurate for practical network loads. With the proposed model, major finding are that: (1) it is theoretically possible for the throughput and cell drop rates of a PPS to emulate those of a FCFS-OQ packet switch when each lower speed packet switch operates at a rate of approximately R/K; and, (2) this work also demonstrates that it is theoretically possible for the cell delay of a PPS to emulate that of FCFS-OQ packet switch when each lower speed packet switch operates at a rate of approximately (3R/cell delay of FCFS-OQ switch). Additionally, this work develops and investigates a novel PIAO PPS which distributes cells or variable-length packets to center-stage switches and uses multiplexers with push-in arbitrary-out (PIAO) queues.  2007 Elsevier B.V. All rights reserved. Keywords: Markov chain; Parallel packet switch; Output-queued packet switch; Push-in arbitrary-out queues

1. Introduction The speed of memory in convention packet switches is at minimum equal to external line rates. When external line rates increase from OC192 (10 Gb/s) to OC768 (40 Gb/s), or even OC3072 (160 Gb/s), convention packet switches cannot process packets at the same rate as external line rate. In a high-speed network, a PPS is a good choice for efficient data transfer in high-capacity switching systems *

Corresponding author. Address: Department of Computer Science, National Chung-Hsing University, 250, Kuo Kuang Road, Taichung, Taiwan. Tel.: +886963105287; fax: +886422853869. E-mail addresses: [email protected] (C.-L. Liu), d8956001@ cs.nchu.edu.tw (C.-C. Wu), [email protected] (W. Lin). 0140-3664/$ - see front matter  2007 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2007.06.010

[1–12]. This work uses PIAO queues in multiplexers to support cell and packet switching [16]. This work which employs approximated Markov chains to evaluate the performance of the PPS for homogeneous, uniform, random traffic shows that it is possible for a PPS to emulate the performance of a FCFS-OQ switch. The approximated Markov chains extend from our previous paper [1,2]. The chains, developed for demultiplexers, center-stage switches and multiplexers, are constructed with PPS of arbitrarily switch sizes, buffers of arbitrary lengths, numerous center-stage switches and speedups. Simulation comparisons verify that these chains are accurate; the simulations results were significantly close to analytical results.

3404

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

If a packet switch is work conserving, its throughput is maximized, and the average delay of cells is minimized. Work conserving refers to that if an output of the switch is busy whenever there is a cell in the switch for it. FCFS-OQ switches are work conserving [3], but FCFS-OQ switches require buffer memories that operate at N times link speed. Therefore, FCFS-OQ switches are not feasible for implementation in case of gigabit line speed. FCFS-OQ switches are work conserving, and so a necessary condition for a switch to emulate the performance of FCFS-OQ switch is that it be work conserving [3]. By the pigeonhole principle, the previous research [3] proves that if the internal line rate of PPS is R/K (i.e. speed up = 1), the throughput and drop rate of the PPS can emulate a FCFS-OQ switch within a delay bound, 2N internal time slots. Hence, the previous research [3] proves that when internal speedup of PPS is 1, the throughput and drop rate of the PPS can emulate those of a FCFS-OQ switch but cell delay of the PPS is more than that of a FCFS-OQ switch 2N internal time slots. However, major findings of these Markov chains in this article further indicate that the throughput and drop rate of the PPS can emulate those of a FCFS-OQ switch with S = 1 (shown in Eq. (49)), and that cell delay of the PPS can emulate that of a FCFS-OQ switch with S = 3K/delay of FCFS-OQ switch (shown in Eq. (51)), where S is the speedup of the internal line rate (R/K). Previous work developed approximate Markov chains [1,2,13] and approximate queuing network models [14] to measure switch performance. These analyses applied the same assumption that all buffers/queues in a switch system are independent. Although this work applies the same assumption, simulation and analytical results demonstrate that the chains in this work are accurate for stable network loads. Furthermore, the derivations and final forms of equations in this work are simple and general. 2. Background A packet switch handles data packets or cells. Variable length packets are divided into fixed-size packets before being switched by the packet switch. The performance of different packet switches varies. In high-speed networks, PPS provides excellent performance [7]. A PPS which overcomes memory bandwidth limitations comprises multiple, identical low speed packet switches operating independently and in parallel. An incoming stream of packets is spread, one packet at a time, across the slow packet switches by a demultiplexer and then rearranged by a multiplexer located at the output of a PPS. The architecture of a PPS resembles that of a Clos network [15], as shown in Fig. 1. The demultiplexers, slower speed packet switches, and multiplexers correspond to the three stages of a Clos network. This article uses PIAO queues in multiplexers to support cell and packet switching. These PIAO queues are push-in arbitrary-out queues in which arriving cells are placed at an arbitrary location, as well as removed from the queue in an arbitrary order. In other words, it is not necessary the case

(SR/K) R

1

1

(SR/K) R 1

2 2

2

N

N K

Demultiplexer

N x N Output Queued Switch

Multiplexer

Fig. 1. Architecture of the PPS.

that the next cell to depart is the one currently at the head of the queue [16]. The approximate Markov chains, which extend from our previous paper [1,2], can be applied to the PPS of any switch size, queuing size, numbers of center-stage switches, and speedups provided that the same independent assumption is made for each network. There are currently no analytical studies of ‘‘complete’’ PPS architecture prior to this work. This work elucidates and analyzes the PPS for Bernoulli traffic only. The remainder of the paper is organized as follows. In Section 3, we introduce some terminologies and definitions. In Section 4, the architecture and algorithm of the PPS is introduced. Sections 5 and 6, present the Markov chains for FCFS-OQ switch and PPS. Section 7 investigates how a PPS can emulate a FCFS-OQ. Section 8 presents results from Markov chain analyses, and compares these results with simulation results. Finally, Section 9 presents conclusions. 3. Definitions Before proceeding it will be useful to define some terms used throughout this paper: Cell: A fixed-length packet, though not necessarily equal in length to a 53-byte ATM cell. Although packets arriving to the switch may have variable length, for the purposes of this paper we will assume that they are segmented and processed internally as fixed length cells. This is common practice in high-performance router; variable-length packets are segmented into cells as they arrive, and carried across the switch as cells. External time slot: Refers to the time taken to transmit or receive a fixed length cell at a link rate of R. Internal time slot: This is the time taken to transmit or receive a fixed length cell at a link rate of (S*R/K), where K is the number of center-stage switches in the PPS and S is the speedup of the internal line rate. OQ switch: A switch in which arriving packets are placed immediately in queues at the output, where they contend with other packets destined to the same output. The departure order might be FCFS, in which case we call it an FCFS-OQ switch. One characteristic of an OQ switch

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

is that the buffer memory must be able to accept (write) N new cells per time slot where N is the number of ports. Hence, the memory must operate at N times the line rate. Work conserving: A packet switch is said to be work conserving if an output is busy whenever there is a cell in the system for it. If a packet switch is work conserving, its throughput is maximized, and the average delay of cells is minimized. In this paper, we will compare the performance of a PPS and an OQ switch. The following definition helps us formally compare the two switches. Emulate: Two different switches are said to emulate each other, if under identical inputs, they have identical performance. Push-in arbitrary-out (PIAO) queues: A PIAO [16] queue is defined as follows: (1) Arriving cells are placed at (or push-in to) an arbitrary location in the queue. (2) The relative ordering of cells in the queue does not change once cells are in the queue, i.e. cells in the queue cannot switch places. (3) Cells are removed from the queue in an arbitrary order, and it is not necessary that cells may be selected to depart from the queue only from the head of line.

4. PPS architecture and algorithm 4.1. Architecture Fig. 1 shows the overall architecture of the PPS with a distributed pipeline control. The PPS can be divided into the following independent stages: (1) the demultiplexers; (2) the slow speed center-stage packet switches; and, (3) the multiplexers. This work focuses on a PPS (Fig. 1) in which the center-stage switches are OQ. Fig. 1 shows an N · N PPS, with each port operating at rate R. Each port is connected to all K OQ switches (the center-stage switches are referred to as layers). When a cell arrives at an input port, the demultiplexer selects a layer to which it will send the cell; this selection is based on a policy outlined in Section 4.3. Since cells are the external input of line rate R spread (demultiplexed) over K links, each input link must run at a minimum speed of R/K. Each layer of PPS consists of a single OQ or combined input/output-queued (CIOQ) switch with their memory operating slower than the external line rate. Each layer receives cells from the N input ports, and then switches each cell to its output port. During link congestion, cells are stored in the output queues of the center stage until the line to the multiplexer becomes available. When the line is available, the multiplexer selects a cell from the corresponding K output queues in each layer. Since each multiplexer receives cells from K output queues, each queue must operate at a minimum speed of R/K to keep the external line busy.

3405

4.2. The number of center-stage switches This work presents switches with memory that runs at speeds slower than the external line rate. Assume that the external line rate = R, S is the speedup of the internal line rate and that the number of center-stage switches is K. If the center-stage switches are CIOQ switches [16,17], then 2(SR/K) < R and, as a result, K > 2S. Similarly, for center-stage OQ switches, we required that N (SR/K) < R, and obtained K > NS. 4.3. PPS switching scheme The PPS distributes cells or packets to layers using round-robin scheme at the demultiplexers. At the output of the internal switch planes, a novel algorithm guaranteeing in-order cell (or packet) delivery is employed in the multiplexers. Step 1: Split every flow in the demultiplexer using a roundrobin procedure: The PPS demultiplexer contains K buffers with N-cell depths. Incoming cells are classified on a perflow basis. Flows can be identified as a unique source and destination address pair in an IP packet or by VCI/ VPI in an ATM cell. Cells from a flow are distributed using a round-robin scheme to the PPS switch layers. Fig. 2 shows the architecture of a demultiplexer and multiplexer. Fig. 3 shows the demultiplexer algorithm within each N demultiplexer [3,7]. Cells are received and classified, a source port number appended, a sequence number appended, and then queued in the next (round-robin) buffer by flow. The flow identifies the output port number (variable i in Fig. 3). The layer number is indexed by an output port (variable pn[ ]). The PPS demultiplexer does not require feedback from the internal switch layers. Demultiplexer i maintains N separate round-robin pointers (one for each output) pn [1]. . . pn[N]. These pointers contain a value in the range {1. . .K}. When pointer pn[j] = x, the next cell arriving, which is destined for output j, is sent to layer x. Before being sent, this cell is written temporarily into the buffer [pn[j]] in which it waits to be delivered to layer x. When the link from demultiplexer i to layer x is free, the head-of-line cell (if any) in buffer [pn[j]] is sent. r = SR/K R

K buffers of size N cells each

Algorithm

r = SR/K R Algorithm

K PIAO queues of size N cells each

Fig. 2. Demultiplexer and multiplexer.

3406

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

// Algorithm for PPS demultiplexer // // Define the following functions: // receive ( ) Receives a cell // classify (cell) Returns destination port // append (cell, sn [i], so) Appends sn and so // queue (cell, buffer [ ]) Queues cell // // Define the following variables: // i Destination port number index // j Source port number index // so Source port number // sn [ ] Sequence number // pn [ ] Plane number

// Algorithm for PIAO PPS multiplexer // // Define the following functions: // transmit (buffer [ ][ ]) Transmit cell // // Define the following constants: // MAX_INT Maximum size integer // Define the following variables: // i Port number index // j Layer number index // l Length number index // min_sn Minimum sn in cells // min_sn_j Layer number of min_sn // min_sn_l Length number of min_sn // next_sn [ ] Expected next sn by port

While (true) { cell= receive( ) i= classify (cell) sn [i]++ so = j append (cell, sn [i], so) pn[i]= (pn[i]%K) + 1 queue (cell, buffer[pn[i]]) }

While (true) { for i = 1 to N { min_sn = MAX_INT for j = 1 to K { for l = 1 to N { if (buffer [j][l]is occupied and so == i) { if (sn in buffer [j][l] < min_sn) { min_sn = sn of buffer [j][l] min_sn_l=l min_sn _j=j } } } if (min_sn == next_sn[i]) { transmit (buffer [min_sn _j][ min_sn _l]) next_sn [i]++ } } }

Fig. 3. Demultiplexer algorithm.

Step 2: Schedule cells in the center-stage switches: When scheduled in the center stage, cells are queued in the buffer. The head-of-line cell in the buffer is delivered to the output link of the center-stage switch. Step 3: Reordering the cells in the multiplexer: The goal for the multiplexer is that its buffer stores, reorders and then transmitted cells in the correct order. A novel architecture and switching method for the PPS multiplexer, PIAO PPS, can attain this goal. The concepts for the novel architecture and switching method are derived from previous research [7]. The PIAO PPS adopts PIAO queues in multiplexers. These PIAO queues are push-in arbitrary-out queues in which cells are stored at an arbitrary location, and removed from the queue in an arbitrary order, i.e. it is not necessary for the next cell to depart be at the head of the queue. The PIAO PPS multiplexer contains K buffers with N-cell depths. Fig. 4 shows the multiplexer algorithm applied to each N multiplexer. 5. Analysis for FCFS-OQ switch This section presents an analytical performance model for a FCFS-OQ switch under uniform traffic. As a FCFS-OQ switch is work conserving, a switch emulating FCFS-OQ switch performance must also be work conserving [3]. This requirement engenders the following question: can a PPS emulate the performance of a FCFS-OQ switch? The performance of a FCFS-OQ switch, then, is first analyzed. The FCFS-OQ switch is further simplified as an output queue represented by a Markov chain. Finally, three equations are derived for performance measures. The following assumptions are employed: • • • •

Cells arrive according to a Bernoulli process. Traffic load is uniformly distributed. The size of the cell is fixed. Each queue length has an infinite capacity.

Fig. 4. Multiplexer algorithm.

For clarity, the following is a list of notations used in the development of the performance model and its subsequent analysis. • • • •

N: Size of the FCFS-OQ switch. q: Input load. L: Size of buffer. Pj (t): Probability that j cells are stored in a buffer at network cycle t. • Pdrop (t): Probability that the buffer overflows at network cycle t. • gi: Probability that i cells arrive at same output buffer. • r: Probability that a cell in a buffer successfully moves to the output port. For understandability, Fig. 5 is given to illustrate the relations among these probabilities. 5.1. Markov chains for FCFS-OQ switch Because there is no head-of-line blocking (HOL) problem in FCFS-OQ switch, we assume the probability r = 1 [18,19]. The FCFS-OQ switch load = q and r = 1 obtains gi and following equations.  q i  q Ni gi ¼ C Ni   1 ; 06i6N ð1Þ N N In the equation above, the probability choosing one from N buffers is q/N for an arriving cell, and gi is the probability that a total of i cells arrive at the same output buffer (shown in Fig. 5). If N P L + 1, the following equations are derived.

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

1



N

gi

1 2

3407

1

5.2. Performance measures

N

The three primary performance measures for a buffer of the FCFS OQ switch are given as follows. FCFS OQ Switch Drop Rateðq; S; t; N ; K; LÞ

1

ð11Þ

¼ P drop ðtÞðEq: ð10ÞÞ

r

N

Because a cell, which arrives at the switch, either passes through or drops out of the system, the throughput of a FCFS-OQ switch buffer is given by

N

Fig. 5. Relationships among probabilities for FCFS-OQ switch.

FCFS OQ Switch Throughputðq;S;t;N ;K;LÞ P 0 ðt þ 1Þ ¼ P 0 ðtÞ  ðg0 þ g1  ðrÞÞ þ P 1 ðtÞ  ðg0 Þ  ðrÞ P j ðt þ 1Þ ¼

ð2Þ

¼ ðInputloadÞ  ðFCFS OQ Switch Drop RateðEq: ð11ÞÞÞ ð12Þ ¼ ðq  P drop ðtÞÞ

j X

P n ðtÞ  ðgjn Þ  ðrÞ

The cell delay of a FCFS-OQ switch buffer can be given by

n¼0

þ

jþ1 X

P n ðtÞ  ðgjþ1n Þ  ðrÞ;

n¼0

16j6L1 P L ðt þ 1Þ ¼

L X

ð3Þ

P n ðtÞ  ðgLn Þ  ðrÞ þ

n¼0

L X n¼0

L N X X

ð4Þ

P n ðtÞ  ðgi Þ  ðrÞ

n¼0 i¼Lþ1n L N X X

þ

ð13Þ

6. PPS analysis

P n ðtÞ

 ðgLþ1n Þ  ðrÞ þ P drop ðtÞ P drop ðt þ 1Þ ¼

FCFS OQ Switch Delayðq; S; t; N ; K; LÞ PL ½ i¼1 ðiÞ  ðP i ðtÞÞ þ L  ðP drop ðtÞÞ ¼ 1  P 0 ðtÞ

P n ðtÞ  ðgi Þ  ðrÞ

ð5Þ

n¼0 i¼Lþ2n

This section presents an approximate analytical model for PIAO PPS under uniform random traffic. This work, in fact, is a first-moment approximate Markov chain for the performance analysis of ‘‘complete’’ PPS. This analytic model comprises four distinct stages: (1) demultiplexers; (2) center-stage OQ switches; (3) multiplexers; and, (4) the complete PPS. The assumptions used in this analysis are the same as those in Section 5 with one additional assumption:

If N < L+1, the following equations are obtained. P 0 ðt þ 1Þ ¼ P 0 ðtÞ  ðg0 þ g1  ðrÞÞ þ P 1 ðtÞ  ðg0 Þ  ðrÞ P j ðt þ 1Þ ¼

j X

P n ðtÞ  ðgjn Þ  ðrÞ þ

n¼0

j X

16j6N1

P n ðtÞ  ðgjn Þ  ðrÞ þ

n¼jN

L X

jþ1 X

ð7Þ P n ðtÞ

n¼jNþ1

 ðgjþ1n Þ  ðrÞ; P L ðt þ 1Þ ¼

P n ðtÞ

n¼0

 ðgjþ1n Þ  ðrÞ; P j ðt þ 1Þ ¼

jþ1 X

N 6j6L1

P n ðtÞ  ðgLn Þ  ðrÞ þ

n¼LN

L X

ð8Þ

L X

Based on proofs of Theorem 6 and Theorem 7 in [3], the equations of cell drop rate and throughput are easily derived, when in parts of Markov chains analysis for the demultiplexer and multiplexer (in Sections 5.1 and 5.2). • Theorem 6: Each input in each PPS demultiplexer will not overflow with a round-robin order and K buffers with N-cell depths. • Theorem 7: Each output in each PPS multiplexer will not overflow with a round-robin order and K buffers with N-cell depths.

P n ðtÞ

N X

ð9Þ

L X

N X

n¼LNþ2 i¼Lþ2n

P n ðtÞ  ðgi Þ  ðrÞ

6.1. Markov chains for the demultiplexer The performance of the demultiplexer is analyzed for average delay, throughput and drop rate. The variables are defined as follows.

P n ðtÞ  ðgi Þ  ðrÞ

n¼LNþ1 i¼Lþ1n

þ

• Center-stage switches are OQ switches.

n¼LNþ1

 ðgLþ1n Þ  ðrÞ þ P drop ðtÞ P drop ðt þ 1Þ ¼

ð6Þ

ð10Þ

• N: Size of the PPS switch and size of demultiplexer buffer.

3408

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

• q: Input load. • K: Number of center-stage switches. • Pd_j (t): Probability that j cells are stored in a demultiplexer buffer at network cycle t. • Pd_drop (t): Probability that demultiplexer buffer is overflow at network cycle t. We assume the probability Pd_drop (t) = 0 by the proof of Theorem 6 in [3]. • gd_i: Probability that i cells arrive at the same demultiplexer buffer. • rd: Probability that a cell in a demultiplexer buffer successfully moves to the center-stage. • fj: Probability that no cells arrive at same demultiplexer buffer at the next internal time slot when there are j cells into the buffer at this time slot. • S: The speedup of the internal link. Fig. 6 shows the logical association of these probabilities. The probability rd is equal to 1 due to the HOL problem free in demultiplexer buffers [18,19]. Cells arriving at a specific demultiplexer buffer come in external cycle intervals and remain i.i.d. Bernoulli, but the arrival rate is q/ K (shown in Fig. 6). Thus, the probability for i (0 6 i 6 K) cell arrivals at a specific demultiplexer buffer during a internal cycle is given by

gd i ¼

C Ki



q 1   K S

i  Ki q 1  1  ; K S

06i6K

ð14Þ

In the above Eq. (14), because of probability rd = 1, it is unreasonable that the probability is larger than 1 when internal line rate requires speedup. Therefore, we can regard the internal line rate increased the speedup S (Fig. 7a) as the external line rate decreased the speedup S (Fig. 7b), i.e. as the external line rate increased the speedup 1/S. Simulation comparison demonstrates that this idea is correct for practical network loads (Figs. 15and 16). Now we consider that if there are j (1 6 j 6 N) cells arriving at same demultiplexer buffer during the internal cycle t and demultiplexer algorithm executes pn[ ]=(pn[ ]%K) + 1 (shown in Fig. 3., i.e. the pointer p[ ] will point to the next buffer) as each cell arrives, there will be at most only (N  j) cells arriving at same demultiplexer during the internal cycle t + 1. Accordingly, it will be impossible that there are x ((N  j + 1) 6 · 6 N) cells arriving at same buffer during the internal cycle t + 1. Thus, the following equation is derived. ! N X fj ¼ g d 0 þ gd i ð15Þ i¼Njþ1

1

gd_i

1

1

rd

gc_i

rc

K

Because the demultiplexer splits every flow with a roundrobin scheme (Section 4.3), the following equations are obtained. P d 0 ðt þ 1Þ ¼ P d 0 ðtÞ  ðgd 0 þ gd 1  ðrd ÞÞ þ P d 1 ðtÞ  ðf1 Þ  ðrd Þ ð16Þ

1

1

S(R/K)

1

gm N

N

N

rm

S(R/K)

R

N

K

Demultiplexer buffer 1 throughput (Eq.22) 1

1 1

1 K

/K

(a) External line rate R and internal line rate S(R/K) (R/K)

K N

(Eq.22)/N

N

(R/K)

R/S

Center stage switch buffer throughput 1 1 (Eq.34) 1

1 N

N

N K

Multiplexer buffer throughput (Eq.44)

N K

Fig. 6. Relationships among probabilities for PPS switch.

(b) External line rate R/S and internal line rate R/K Fig. 7. Relationships among external line rate and internal line rate.

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

" P d j ðt þ 1Þ ¼

j X

# P d n ðtÞ  ðgd

jnþ1 Þ

6.2. Markov chains for the center-stage switch

 ðrd Þ

n¼0

þ ½P d " P d N ðt þ 1Þ ¼

N X

jþ1 ðtÞ

 ðfjþ1 Þ  ðrd Þ;

16j6N1 ð17Þ

# P d n ðtÞ  ðgd

Nnþ1 Þ

 ðrd Þ

n¼0

 þ Pd

drop ðtÞ

 ðfNþ1 Þ  ðrd Þ



ð18Þ

PPS buffers run slower than the external line rate, and therefore we explain the relation between the external cycle and the internal cycle. If one external cycle is assumed to be one external time slot, then one internal cycle becomes K/S external time slots [12]. The average cell delay of a demultiplexer buffer is given by Demultiplexer Internal Delayðq; S; t; N ; KÞ PN ½ i¼1 ðiÞ  ðP d i ðtÞÞ ¼ 1  P d 0 ðtÞ

ð19Þ

Demultiplexer External Delayðq; S; t; N ; KÞ K ¼ Demultiplexer Internal DelayðEq: ð19ÞÞ  S PN ½ i¼1 ðiÞ  ðP d i ðtÞÞ  K ¼ S  P d 0 ðtÞ  S

ð20Þ

Accordingg to the proof of Theorem 6 in [3], the throughput and drop rate of a demultiplexer buffer are given by Demultiplexer Drop Rate ¼ P d Demultiplexer Throughput

drop ðtÞ

3409

¼0

ð21Þ

¼ ðInputloadÞ  ðDemultiplexe Drop RateðEq:ð21ÞÞÞ ¼q

ð22Þ

Considering the worst case of the demultiplexer algorithm (Section 4.3), there are K cells, which are destined to N different destination ports, arrive at the demultiplexer during an internal cycle (shown in Fig. 8). There are at most N cells arrive at the same demultiplexer buffer of Ncell depths, because demultiplexer algorithm will execute pn[ ]=(pn[ ]%K) + 1 when each cell arrives (i.e. the pointer p[ ] will point to the next buffer). Therefore, each demultiplexer buffer will not overflow. Consequently, Demultiplexer Drop Rate = Pd_drop (t) = 0 (Eq. (21)).

In this section a Markov chain for the center-stage switch which uses FCFS-OQ switch is derived. The following extra variables are defined (shown in Fig. 6): • Pc_j (t): Probability that j cells are stored in a centerstage switch buffer at network cycle t. • Pc_drop (t): Probability that the center-stage switch buffer overflows at network cycle t. • gc_i: Probability that i cells arrive at the same centerstage switch buffer. • rc: Probability that a cell in a center-stage switch buffer successfully moves to the multiplexer. • Lc: Size of center-stage switch buffer. The arrival rate of the center-stage switch is equal to (Demultiplexer Throughput (Eq. (22))*S/K) during an external cycle. Hence, the arrival rate of the center-stage switch is equal to Demultiplexer Throughput (see Eq. (22)) during an internal cycle (shown in Fig. 6). As the center-stage switches adopt FCFS-OQ switches, the results of following equations are similar to those for a Markov chain for the FCFS-OQ switch (Section 5). gc i ¼ C Ni  i Demultiplexer ThroughputðEq: ð22ÞÞ 1   N S  Ni Demultiplexer ThroughputðEq:ð22ÞÞ 1   1 N S  i  Ni q 1 q 1  ¼ C Ni   1  ; 06i6N N S N S ð23Þ In the above equation, (Demultiplexer Throughput (Eq. (22))/N) is the probability that the cell arrives at each center-stage switch buffer, and gc_i is the probability that a total of i cells arrive at same center-stage switch buffer (shown in Fig. 6). If N P Lc + 1, the following equations are derived. P c 0 ðt þ 1Þ ¼ P c 0 ðtÞ  ðgc 0 þ gc 1  ðrc ÞÞ þ P c 1 ðtÞ  ðgc 0 Þ  ðrc Þ ð24Þ P c j ðt þ 1Þ ¼

j X

P c n ðtÞ  ðgc jn Þ  ðrc Þ

n¼0

2 1

þ

1 2 1 1

jþ1 X

P c n ðtÞ  ðgc jþ1n Þ  ðrc Þ; 1 6 j 6 Lc  1

n¼0

ð25Þ

N=2, K=3 and pn[1]= pn[2]= 1

pn[1]= pn[1]+1=2 pn[2]= pn[2]+1=2 pn[1]= pn[1]+1=3

Fig. 8. Worst case of the demultiplexer algorithm.

P c L ðt þ 1Þ ¼

Lc X

P c n ðtÞ  ðgc Lc n Þ  ðrc Þ

n¼0

þ

Lc X n¼0

P c n ðtÞ  ðgc Lc þ1n Þ  ðrc Þ þ P c drop ðtÞ

ð26Þ

3410

Pc

drop ðt

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

þ 1Þ ¼

N X

Lc X

6.3. Markov chains for the multiplexer

P c n ðtÞ  ðgc i Þ  ðrc Þ

n¼0 i¼Lc þ1n Lc X

þ

N X

P c n ðtÞ  ðgc i Þ  ðrc Þ

ð27Þ

n¼0 i¼Lc þ2n

If N < Lc + 1, the following equations are derived. P c 0 ðt þ 1Þ ¼ P c 0 ðtÞ  ðgc 0 þ gc 1  ðrc ÞÞ þ P c 1 ðtÞ  ðgc 0 Þ  ðrc Þ ð28Þ j X

P c j ðt þ 1Þ ¼

P c n ðtÞ  ðgc jn Þ  ðrc Þ

n¼0

þ

jþ1 X

P c n ðtÞ  ðgc jþ1n Þ  ðrc Þ; 1 6 j 6 N  1 ð29Þ

n¼0

P c j ðt þ 1Þ ¼

j X

P c n ðtÞ  ðgc jn Þ  ðrc Þ

n¼jN jþ1 X

þ

P c n ðtÞ  ðgc jþ1n Þ  ðrc Þ; N 6 j 6 Lc  1 ð30Þ

This section presents an analytical performance model for the PIAO multiplexer. The variables are defined as follows (shown in Fig. 6): • Pm_j (t): Probability that j cells are stored in a multiplexer buffer at network cycle t. • gm: Probability that a cell arrives at the same multiplexer buffer. • rm: Probability that a cell in a multiplexer buffer successfully moves to the output port. • N: Size of the PPS switch and size of multiplexer buffer. Because PIAO PPS uses PIAO queues in the multiplexer, it avoids HOL blocking problem. Therefore, we assume rm = 1. In other words, a multiplexer buffer moves S cells at most to the output port during an internal cycle. The following equations are derived by the multiplexer algorithm (Section 4.3); the cell arrival rate of the multiplexer is the same as the throughput of center-stage switch (shown in Fig. 6).

n¼jNþ1

gm ¼ Center Stage ThroughputðEq:34Þ P c Lc ðt þ 1Þ ¼

Lc X

¼ q  Pc

P c n ðtÞ  ðgc Lc n Þ  ðrÞ

drop ðtÞ

ð37Þ

n¼Lc N Lc X

þ

P c n ðtÞ  ðgc Lc þ1n Þ  ðrÞ þ P c drop ðtÞ ð31Þ

P m 0 ðt þ 1Þ ¼ P m 0 ðtÞ  ðqm þ qm  rm Þ þ P m 1 ðtÞ  ðqm  rm Þ ð38Þ

n¼Lc Nþ1

Pc

Lc X

drop ðt þ 1Þ ¼

N X

P m j ðt þ 1Þ ¼ P m

 ðqm  rm Þ þ P m j ðtÞ  ðqm  rm þ qm  rm Þ þ P m jþ1 ðtÞ  ðqm  rm Þ;

P c n ðtÞ  ðgc i Þ  ðrc Þ

n¼Lc Nþ1 i¼Lc þ1n

þ

Lc X

N X

P c n ðtÞ  ðgc i Þ  ðrc Þ ð32Þ

j1 ðtÞ

16j6N1

ð39Þ

n¼Lc Nþ2 i¼Lc þ2n

The following equations for throughput, cell delay, and drop rate of a center-stage switch buffer are derived: Center Stage Drop Rateðq; S; t; N ; K; Lc Þ ¼ P c drop ðtÞðEq: ð32ÞÞ

ð40Þ

ð33Þ

Equations of performance of a multiplexer buffer are obtained as follows,

ð34Þ

Multiplexer Internal Delayðq; S; t; N ; KÞ PN ½ i¼1 ðiÞ  ðP m i ðtÞÞ ¼ 1  P m 0 ðtÞ Multiplexer External Delayðq; S; t; N ; KÞ

Center Stage Throughputðq; S; t; N ; K; Lc Þ ¼ ðDemultiplexer ThroughputðEq: ð22ÞÞÞ  ðCenter Stage Drop RateðEq: ð33ÞÞÞ ¼ q  P c drop ðtÞ Center Stage Internal Delayðq; S; t; N ; K; Lc Þ PLc ½ i¼1 ðiÞ  ðP c i ðtÞÞ þ Lc  ðP c drop ðtÞÞ ¼ 1  P c 0 ðtÞ Center Stage External Delayðq; S; t; N ; K; Lc Þ K ¼ Center Stage Internal DelayðEq: ð35ÞÞ  S Pc ½ Li¼1 ðiÞ  ðP c i ðtÞÞ  K þ Lc  ðP c drop ðtÞÞ  K ¼ S  P c 0 ðtÞ  S

P m N ðt þ 1Þ ¼ P m N1 ðtÞ  ðqm  rm Þ þ P m N ðtÞ  ðqm  rm þ qm  rm Þ

ð35Þ

ð36Þ

K ¼ Multiplexer Internal DelayðEq:ð41ÞÞ  S P ½ Ni¼1 ðiÞ  ðP m i ðtÞÞ  K ¼ S  P m 0 ðtÞ  S

ð41Þ

ð42Þ

According to the proof of Theorem 7 in [3], the throughput and drop rate of a multiplexer buffer are given by

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

Multiplexer Drop Rateðq; S; t; N ; KÞ ¼ 0 Multiplexer Throughputðq; S; t; N ; KÞ

ð43Þ

¼ Center Stage ThroughputðEq: ð34ÞÞ  Multiplexer Drop RateðEq: ð43ÞÞ ¼ Center Stage ThroughputðEq: ð34ÞÞ ¼ q  Pc

drop ðtÞ

6.4. Markov chains for PPS Finally, throughput, cell drop rate and cell delay of the entire PPS are obtained by PPS Drop Rateðq;S;t;N ;K;Lc Þ

FCFS OQ Switch Drop RateðEq: ð11ÞÞ ¼1 PPS Drop RateðEq: ð45ÞÞ FCFS OQ Switch Drop Rate ðEq: ð11ÞÞ ¼1 Center Stage Drop RateðEq: ð33ÞÞ

þ Center Stage Drop RateðEq: ð33ÞÞ þ Multiplexer Drop RateðEq: ð43ÞÞ ð45Þ

PPS Throughputðq;S; t;N ; K;Lc Þ ¼ Multiplexer ThroughputðEq: ð44ÞÞ

This means that P drop ðEq: ð10ÞÞ ¼1 P c drop ðEq: ð32ÞÞ so that

ð46Þ

PPS Internal Delayðq;S;t;N ;K;Lc Þ ¼ Demultiplexer Internal DelayðEq: ð19ÞÞ þ Center Stage Inernal DelayðEq: ð35ÞÞ þ Multiplexer Internal DelayðEq: ð41ÞÞ PN PLc ½ ðiÞ  ðP d i ðtÞÞ ½ i¼1 ðiÞ  ðP c i ðtÞÞ þ Lc  ðP c drop ðtÞÞ þ ¼ i¼1 1  P d 0 ðtÞ 1  P c 0 ðtÞ PN ½ ðiÞ  ðP m i ðtÞÞ ð47Þ þ i¼1 1  P m 0 ðtÞ PPS External Delayðq;S;t;N ;K; Lc Þ K ¼ PPS Internal DelayðEq: ð47ÞÞ  S PN ½ ðiÞ  ðP d i ðtÞÞ  K ¼ i¼1 S  P d 0 ðtÞ  S PLc ½ ðiÞ  ðP c i ðtÞÞ  K þ Lc  ðP c drop ðtÞÞ  K þ i¼1 S  P c 0 ðtÞ  S PN ½ ðiÞ  ðP m i ðtÞÞ  K þ i¼1 S  P m 0 ðtÞ  S

Because the FCFS-OQ switch is work conserving, its throughput is maximized, and the average delay of cells is minimized. Consequently, a necessary condition for a switch to emulate the performance of FCFS-OQ switch is that it be work conserving [3]. If the FCFS-OQ switch drop rate (Eq. (11)), divided by the PPS drop rate (Eq. (45)) is equal to 1, the drop rate of PPS emulates that of FCFSOQ switch. Assuming that size of a FCFS-OQ switch buffer is equal to size of center-stage switch buffer, L = Lc.

From Eq. (45), we obtain

¼ Demultiplexer Drop RateðEq: ð21ÞÞ

¼ q  P c drop ðtÞ

results to find the speedup requirement for the PPS to emulate the performances of the FCFS-OQ switch. This section is divided into two sub-sections: throughput and drop rate emulation of the PPS, and delay emulation of the PPS. 7.1. Throughput and drop rate emulation of the PPS

ð44Þ

!?A3B2 twb=.65w?>Because of probability rm = 1 and gm 6 rm, each multiplexer buffer will not overflow under uniform random traffic. Thus, Multiplexer Drop Rate = 0 (Eq. (43)).

¼ P c drop ðtÞ

3411

S¼1

ð49Þ

Applying S = 1 and L = Lc obtains gi = gc_i and, then, Pj = Pc_j, where 0 6 i 6 N and 0 6 j 6 L. Eventually, Pdrop is equal to Pc_drop. Throughput is calculated by subtracting drop rate from arrival rate. The PPS can emulate FCFSOQ switch throughput and drop rate with a speedup of S = 1 and internal link rate = R/K. 7.2. Delay emulation of the PPS Using a similar method, we find that PPS can achieve cell delay emulation of FCFS-OQ switch, if the following inequality is satisfied. If FCFS OQ Switch DelayðEq:ð13ÞÞ ¼1 PPS External DelayðEq: ð48ÞÞ we have FCFS OQ Switch DelayðEq:ð13ÞÞ ¼1 PPS Internal DelayðEq: ð47ÞÞ  KS

ð48Þ

We also obtain S¼

7. Speedup requirements for FCFS-OQ switch emulation This section describes how a PPS can emulate the performance of a FCFS-OQ switch. We now extend our

PPS Internal DelayðEq:ð47ÞÞ  K FCFS OQ Switch DelayðEq:ð13ÞÞ

ð50Þ

We also find that the PPS Internal Delay (Eq. (47)) can be effectively reduced by increasing speedup S; the Demultiplexer Internal Delay (Eq. (19)), Center Stage Internal

3412

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

Delay (Eq. (35)) and Multiplexer Internal Delay (Eq. (41)) can be reduced to minimum value of delay, which is 1. Accordingly, the PPS Internal Delay (q, S, t, N, K, Lc) (Eq. (47)) can be drastically reduced to 3 internal cycles, if we increase the speedup to S P Demultiplexer Internal Delay (q, 1, t, N, K), S P Center Stage Internal Delay (q, 1, t, N, K, Lc) and S P Multiplexer Internal Delay (q, 1, t, N, K). Hence, we can get S¼

3K FCFS OQ Switch Delayðq; t; N ; K; LÞðEq:ð13ÞÞ 3K  ¼   PL i¼1

ðiÞðP i ðtÞÞ þLðP drop ðtÞÞ

ð51Þ

1P 0 ðtÞ

3  K  ð1  P 0 ðtÞÞ  i¼1 ðiÞ  ðP i ðtÞÞ þ L  ðP drop ðtÞÞ

¼  PL

Fig. 9. Comparison of throughput of OQ switch.

In Eq. (51), we can get that: (1) if the number of centerstage switches (K) decreased or the buffer size of the FCFS-OQ switch (L) increased, the internal speedup of PPS (S) decreased; and, (2) if the cell arrival rate (q) increased, FCFS OQ Switch Delay (Eq. (13)) increased and then the internal speedup of PPS (S) decreased. 8. Comparison and simulation 8.1. Comparison of analytical results A time-progression method is adopted to calculate the analytical results for the PPS. This method works with Eqs. (1)–(51) in the following manner. First, initial values are entered into the performance probability equations: P0(0) = Pd_0(0) = Pc_0(0) = Pm_0(0) = 1; and, Pj(0) = PL(0) = Pdrop(0) = Pd_j(0) = Pd_N(0) = Pc_j(0) = Pc_L(0) = Pc_drop(0) = Pm_j(0) = Pm_N(0) = 0. The values of the following probabilities at the next time step are then calculated: P0(1), Pj(1), PL(1), Pdrop(1), Pd_0(1), Pd_j(1), Pd_N(1), Pc_0(1), Pc_j(1), Pc_L(1), Pc_drop(1), Pm_0(1), Pm_j(1), and Pm_N(1). The computation continues until P0(t), Pj(t), PL(t), Pdrop(t), Pd_0(t), Pd_j(t), Pd_N(t), Pc_0(t), Pc_j(t), Pc_L(t), Pc_drop(t), Pm_0(t), Pm_j(t) and Pm_N(t) attain some steady-state values. Then we plug these values into the closed forms of the three performance measures and compute the numerical results. It is worth noticing that this method can also compute throughput, drop rate and delay for the FCFS-OQ switch. The measures for performance evaluation are average throughput, cell loss ratio, and average cell delay. A 32*32 FCFS-OQ switch and PPS are constructed; K = 64 is the number of center-stage FCFS-OQ switches, and (L) = 16, 4 is the queue size. Figs. 9–14 present the mathematical analysis and simulation results of the OQ (L, x) and PPS (L, x, speedup). If x = S represents the simulation result, then x = M represents the Markov chains analysis result (the curves of hollow point of Figs. 9–14). The trends in the analytical results are similar to those obtained by the simulation. Figs. 15,16 present results for the performance measures cell delay, as speedup = 2, 4, 8 and L = 16, 4. The

Fig. 10. Comparison of cell delay of OQ switch.

Fig. 11. Comparison of cell drop rate of OQ switch.

analytical results are in strong agreement with the simulation results. These results are then compared to ensure the accuracy of the performance analysis.

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

Fig. 12. Comparison of throughput of PPS.

Fig. 15. Comparison of speedup of PPS (L = 16).

Fig. 13. Comparison of cell delay of PPS.

Fig. 16. Comparison of speedup of PPS (L = 4).

3413

Fig. 17. Emulation of throughput of PPS (L = 16). Fig. 14. Comparison of cell drop rate of PPS.

8.2. Emulation results of PPS Figs. 17–20 present the comparison of throughput and drop rate for OQ (L, x) and PPS (L, x, speedup = Eq. (49)) when speedup satisfies Eq. (49). It is shown that the

PPS precisely emulates the FCFS-OQ switch. In the aspect of the cell delay, the performance of PPS (L, x, speedup = Eq. (51)) is significantly close to the OQ (L, x) (Figs. 21 and 22), when the speedup of internal link rate contents Eq. (51). Eq. (51) indicates that if we want the cell delay of PPS to emulate that of FCFS-OQ switch, speedup varies

3414

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

Fig. 21. Emulation of delay of PPS (L = 16). Fig. 18. Emulation of throughput of PPS (L = 4).

Fig. 22. Emulation of delay of PPS (L = 4).

Fig. 19. Emulation of drop rate of PPS (L = 16).

rate R. When q = 0.9, K = 64 and FCFS-OQ Switch Delay (Eq. (13)) = 5, then S must be equal to 39 and internal line rate = (39/64)*R. Figs. 17–22 show that our analysis of the FCFS-OQ switch performance emulation achieves remarkable accuracy.

9. Conclusion

Fig. 20. Emulation of drop rate of PPS (L = 4).

with cell arrival rate (q). For example, when q = 1.0, K = 64, and FCFS-OQ Switch Delay (Eq. (13)) = 8, S is equal to 24 and internal line rate = (24/64)*external line

Line rates may soon exceed the speed of commercially available memory. In this scenario it is difficult to buffer packets as fast as they arrive using traditional packet switches. The PPS’s advantage is that all memory buffers and internal line rates run slower than the external line rates. This paper presents a novel analytical model for evaluating PPS, and it further shows that a PPS can emulate a FCFS-OQ packet switch only operating slower than the external line rate. This analytical model, which uses approximate Markov chains extending from our previous paper [1,2], is general in the sense that it assumes arbitrary switch size, buffer size, number of center-stage switches and speedup. It is shown that these chains are reasonably accurate for realistic network loads. The important findings of this work are that: (1) in aspects of throughput and cell

C.-L. Liu et al. / Computer Communications 30 (2007) 3403–3415

drop rate, a PPS can emulate a FCFS-OQ packet switch, that is work conserving, if each lower speed packet switch operates at a rate of approximately R/K (shown in Eq. (49)); and, (2) in respect of cell delay, a PPS can emulate a FCFS-OQ packet switch when each lower speed packet switch operates at a rate of approximately (3R/cell delay of FCFS-OQ switch) (shown in Eq. (51)). These findings are more accurate than previous ones [3]. A new architecture for a PPS based on PIAO queues in the multiplexers was also developed and evaluated. The PIAO PPS uses fully decentralized scheduling to implement per cell or packet distribution between center-stage switches. Acknowledgments The author is grateful to Prof. Lin of the National Chung-Hsing University. He offered author very useful advice. The author is thankful to Prof. Lin for his help and encouragement. References [1] Chia-Lung Liu, Woei Lin, Performance analysis of the slidingwindow parallel packet switch, in: IEEE 40th International Conference on Communications 2005, ICC 2005, May 16–20, 2005. [2] Chia-Lung Liu, Woei Lin, Evaluation and analysis of the slidingwindow parallel packet switch, in: IEEE 19th International Conference on Advanced Information Networking and Applications, AINA, vol. 2, 2005, pp. 355–358. [3] S. Iyer, Analysis of the Parallel Packet Switch Architecture, IEEE Transactions on Networking 11 (2) (2003) 314–324. [4] S. Iyer, On the speedup required for a multicast parallel packet switch, IEEE Communications Letters 5 (6) (2001) 269–271. [5] D.A. Khotimsky, S. Krishnan, Stability analysis of a parallel packet switch with bufferless input demultiplexers, ICC 2001 1 (2001) 100–111. [6] S. Iyer, Making parallel packet switches practical, Proceedings of Infocom 2001 3 (2001) 1680–1687. [7] A. Aslam, K. Christensen, Parallel packet switching using multiplexers with virtual input queues, LCN 2002 (2002) 270–277. [8] Yuguo Dong, Zupeng Li, Xiaodong Liu, Yunfei Guo, Analysis and designing of the stable parallel packet switch, PDCAT 2003 (2003) 296–300. [9] Qi Wangdong, Tian Chang, Chen Hua, Xu Bo, Reexamining the stability of a parallel packet switch with bufferless input demultiplexors, ICCT 2003 1 (2003) 514–520. [10] Yuguo Dong, Xiaodong Liu, Zupeng Li, Yunfei Guo, A novel traffic dispatch algorithm for the parallel packet switch, ICCT 2003 1 (2003) 417–420. [11] Wenjie Li, Yiping Gong, Bin Liu, Analysis of a QoS-based parallel packet switch for core routers, ICCT 2003 1 (2003) 243–246. [12] W. Wang, L. Dong, W. Wolf, A distributed switch architecture with dynamic load-balancing and parallel input-queued crossbars for terabit switch fabrics, INFOCOM 2002, in: Proceedings of IEEE, vol. 1, 2002, pp. 352–361.

3415

[13] T. Szymanski, S. Shaikh, Markov chain analysis of packet-switched banyans with arbitrary switch sizes, queue sizes, link multiplicities and speedups, in: Infocom 1989, vol. 3, 1989, pp. 960–971. [14] Chan L. Liao, Woei Lin, Performance analysis of general cut-through switching on buffered MIN switches, Journal of Information Science and Engineering 18 (2002) 745–762. [15] C. Clos, A study of Nonblocking switching networks, Bell System Technical Journal 32 (1953). [16] S. Chuang, A. Goel, N. McKeown, B. Prabhakar, Matching output queuing with a combined input/output-queued switch, IEEE Journal on Selected Areas in Communications 17 (1999) 1030–1039. [17] M. Yang, S.Q. Zheng, An efficient scheduling algorithm for CIOQ switches with space-division multiplexing expansion, Infocom 2003 3 (2003) 1643–1650. [18] J. Hui, E. Arthurs, A broadband packet switch for integrated transport, selected areas in communications, IEEE Journal 5 (8) (1987) 1264–1273. [19] M. Karol, M. Hluchyj, S. Morgan, Input versus output queueing on a space-division packet switch, Communications, IEEE Transactions, 35 (12) (1987), 1347–1356.

Chia-Lung Liu received the B.S. degrees in Computer Science from Tamkang University in 2001. Currently, he is working toward the Ph.D. degree in Computer Science at the Computer Science from National Chung-Hsing University. His research interests include computer network, switching system and interconnection network.

Chin-Chi Wu received the B.S. and M.S. degrees in Department of Computer Science from National Chiao-Tung University in 1990 and 1992, respectively. He is working toward the Ph.D. degree in the Institute of Computer Science, National Chung- Hsing University. He is currently a lecturer with the Department of Information Management, Nan-Kai Institute of Technology, Nantou, Taiwan. His research interests include computer networks, parallel processing, and high-speed switching.

Woei Lin received the B.S. degree from National Chiao-Tung University, Hsing-Chu, Taiwan in 1978, and the M.S. and Ph.D. degrees in Electrical and Computer Engineering from University of Texas of Austin, USA in 1982 and 1985, respectively. He is now a Professor in Institute of Computer Science, National Chung-Hsing University, During 1992–2005. His research interests include Network Switching System, Network QoS, System Performance Evaluation and Parallel/Distributed System.