TYPHOON: mobile distributed hash tables

TYPHOON: mobile distributed hash tables

J. Parallel Distrib. Comput. 65 (2005) 191 – 206 www.elsevier.com/locate/jpdc TYPHOON: mobile distributed hash tables Hung-Chang Hsiaoa,∗ , Chung-Ta ...

447KB Sizes 0 Downloads 53 Views

J. Parallel Distrib. Comput. 65 (2005) 191 – 206 www.elsevier.com/locate/jpdc

TYPHOON: mobile distributed hash tables Hung-Chang Hsiaoa,∗ , Chung-Ta Kingb , Chia-Wei Wangb a Computer and Communication Research Center, National Tsing-Hua University, Hsinchu 300, Taiwan b Department of Computer Science, National Tsing-Hua University, Hsinchu 300, Taiwan

Received 10 September 2004 Available online 23 November 2004

Abstract TYPHOON is a capability-aware peer-to-peer (P2P) system. It exploits the heterogeneity of nodes in the system based on the concept of virtual homes. Nodes participating in the system are classified as good and inactive. TYPHOON uses resources provided by good peers. It is thus more reliable and agile than a naive structured P2P system. When a good peer is overloaded, it picks a suitable inactive node and migrates some loads (i.e., virtual homes) to that node. However, migration of virtual homes may cause instability in the system. TYPHOON thus incorporates a mechanism for tracking virtual homes. A migrated home can receive states of relevant homes using an adaptive, logical tree structure that can also react to system heterogeneity, node loading and network locality. A migrated home can also proactively discover the state of an interested home to ensure the correctness of lookup. We evaluate TYPHOON using theoretical and simulation analysis. We also benchmark TYPHOON using a prototype system on 34 desktop PCs. The results all confirm the effectiveness of TYPHOON. © 2004 Elsevier Inc. All rights reserved. Keywords: Peer-to-peer system; Lookup; Capability awareness; Location management; Performance evaluation

1. Introduction A peer-to-peer (P2P) system typically views the participating nodes in the system as equal [20]. This view is especially emphasized in P2P systems that adopt distributed hashing [3,14,15,20,24]. Peers in such systems will pick a random ID through a uniform hash function. All the operations in the system are based on the ID. For example, the peer nodes may be interconnected through a logic structure, e.g. ring or hypercube, using the ID. The resultant system is called a structured P2P system. By using uniform hashing, such systems do not treat any peer special. This extra level of abstraction allows P2P systems to better accommodate underlying heterogeneity and dynamics. In the real world, however, heterogeneity and dynamics are nature of life. Explicitly dealing with these issues could bring about benefits such as higher performance, stable

∗ Corresponding author. Fax: +886 3 5723694.

E-mail addresses: [email protected] [email protected] (C.-T. King).

(H.-C.

Hsiao),

0743-7315/$ - see front matter © 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2004.09.012

quality, and robustness. For example, Tornado [8], a P2P system based on distributed hash tables (DHTs), exploits the heterogeneity and dynamics in node capabilities. The capability of a node refers generally to the amount of resources, the network bandwidth, the reliability, or any combination of these and other factors of the node. A precise definition is application dependent. Tornado tackles capability heterogeneity with two tactics. The first is the virtual home concept. A virtual home is a placeholder for data items and plays the same role as a peer in a traditional P2P system. It adds an extra level of abstraction between data items and physical nodes to accommodate capability heterogeneity. The second tactic is a classification of the physical nodes into good and inactive according to their capabilities. Good peers can host different numbers of virtual homes according to their capabilities. In this way, Tornado can leverage the reliability and performance of the good peers and ensure certain level of quality in services. When a good peer becomes overloaded, Tornado provides a load migration mechanism to migrate loads to a suitable inactive peer. The inactive peer is thus promoted to become a good peer.

192

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

A potential use of Tornado is to support storage overlay, information retrieval, or media streaming with incentive. In such applications, a node can contribute more resources to increase its reputation in order to serve upcoming operations. For example, before a node can download a video file of size 650 Mbytes, it needs to provide storage space with more than 650 Mbytes to the P2P community. After the file is downloaded, the node may reduce its contributed resources. This dynamic adjustment in resource contribution can be easily handled in Tornado by varying the number of hosted virtual homes. Homes may be migrated among good peers during resource adjustment. Another possible use of Tornado is to support load balance [13] through migration of virtual homes. The number of virtual homes is an indicator of node workload. One problem with Tornado is its use of the soft state technique [10,11,22] to maintain system states. Note that this is a common practice in DHT-based overlays [14,15,20,24]. With soft states, the state changes due to the migration of virtual homes may be propagated too slowly. In Tornado, this is done as follows. The home normally sends a keepalive message to each of its neighbors every Trt = 30 s. If there is no acknowledgement from a neighbor, the home issues additional l = 3 keep-alive messages every Tout = 5 s to that neighbor. If there is still no acknowledgement, the home considers that neighbor to be failed or migrated and tries to repair the state. This could take Trt + l · Tout = 45 s before the state change is identified and confirmed. If we note further that the state of a home will appear in the routing table of many other virtual homes, the slow update of states may reduce the stability and performance of the system. A careful choice of the parameters Trt , Tout and l is very crucial (see Section 3). The problem becomes even complicated if dynamic network traffic and node join/departure rates are taken into account [11]. In light of the design of capability-aware structured P2P systems such as Tornado, we propose in this paper a more robust capability-aware P2P system called TYPHOON. TYPHOON tracks locations of homes proactively instead of relying on the soft state approach. Since migrated homes are tracked, they are readily reachable and will not degrade the reliability and performance of the system. TYPHOON is evaluated with analysis and simulation. The results show the effectiveness of its design. We have prototyped TYPHOON and conducted real measurements on a 34-node cluster. The experiments reaffirm the results obtained from analysis and simulation. TYPHOON has the following features: • It addresses the state management issue in capabilityaware P2P systems that allow virtual homes to migrate. It actively tracks the location of virtual homes to maintain up-to-date states in the system. This improves the system reliability and performance. • The overhead of active tracking of system states is kept low. A number of optimization techniques are used. A lo-

cation advertisement structure is used for propagating the state of a home. This structure not only takes advantage of node heterogeneity but also restructures itself according to the load of participating members. It also exploits network locality to reduce communication delays and excessive state replication. • The design of TYPHOON is independent of the parameters, Trt , Tout and l. TYPHOON is thus not affected by factors such as dynamic traffic and node join/departure rate. In contrast, these factors will affect the choice of the optimal Trt , Tout and l in naive DHT-based P2P overlays. The remainder of the paper is organized as follows. Section 2 gives an overview of Tornado, focusing especially on virtual homes and home migration. The design of TYPHOON is described in Section 3. Section 4 presents performance evaluation results of TYPHOON. Related works are surveyed in Section 5. Conclusions of the paper are given in Section 6 with possible future research directions.

2. Overview of Tornado In this section, we give an overview of Tornado, focusing especially on the virtual home concept and home migration protocol. 2.1. Virtual home Each data item in Tornado has a virtual home, which is a logical entity that stores and manages this data item. The virtual home represents a placeholder for the data item, where the data can be found. It plays a role similar to a peer in a traditional structured P2P system and may contain several data items. A physical node in Tornado can host different numbers of virtual homes. If the node does not host any virtual home, it is inactive; otherwise, it is a good peer. The hosting node should provide the physical resources, e.g., CPU cycles and storage, required by the virtual homes. A peer that intends to participate in Tornado is initially inactive. A recent study by Sen et al. [18] has the following findings. (1) Less than 10% of nodes contribute 99% of the traffic. Those nodes may be regarded as supernodes in KaZaA [9]. (2) Supernodes are rather durable, with 20% of the participating nodes having on-time at least 1000 min /day. This is somewhat counter-intuitive, given the conventional notion that P2P systems are very dynamic. (3) Around 60% of nodes keep alive for less than 10 min. Therefore, to cope with system heterogeneity and to take account of machine capabilities, good nodes should be chosen from among those nodes that have plentiful resources, are reliable, and have access to a reliable and high-speed network. To leverage the reliability and performance of good nodes, Tornado ensures that each active node in the system is a good peer. If a good peer becomes overloaded, it will seek a suitable inactive peer and migrate some virtual homes to that node.

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

2.2. Naming Similar to previous works [14,15,20,24], Tornado names objects in the system with hash keys using a uniform hash function. Each data item will receive a unique hash key. Each virtual home will also have a hash key in the same hash space. A data item will be stored and managed by the virtual home that has the “numerically closest” key. If the state of the P2P system is stable, virtual homes will be allocated almost the same amount of data items due to the use of the uniform hash function. To access a data item, the request can be sent to the home with a key numerically closest to the key of the requested data item, where it can be found. The uniform hash scheme helps evenly distribute the data items to the virtual homes. Thus, the load of a good node can be estimated based on the number of virtual homes it hosts. The capability of a node in Tornado is defined as the number of virtual homes that it can host. Note however that the size of a data item may not be identical. In Tornado, a data item is partitioned into several identical segments. Different segments are presented by different hash keys when they are published into the system. Nevertheless, the management of segments is left for applications. In this study, a data item consists of only one data segment. 2.3. Lookup of virtual homes Tornado performs message routing at the virtual-home layer. Consider a virtual home with the hash key x. It keeps a list of other virtual homes that it knows how to send messages to. The first home in the list, referred to as the level-1 leader, has a hash key l1 that is “smaller than or equal to” u1 = (x + R 2 )mod R, where R is the maximum key value that can be assigned to a virtual home. The second virtual home has a hash key l2 that is smaller than or equal to u2 = (x + R 4 )mod R, and so on. This list of virtual homes is kept in a table called the routing table of the virtual home. Now, suppose virtual home x wants to send a message to another virtual home with a hash key of v, where v > (x + R 2 )mod R. Since x does not keep any information about the hash value v in its routing table, it has no idea of how to send the message towards a virtual home hosting v. What it can do is to forward the message to the leader whose hash key is the closest to but is smaller than v, i.e., the level-1 leader l1 in its routing table. Virtual home x knows how to send messages to l1 , and the message is sent to l1 . The virtual home l1 , after receiving the message, calculates v − l1 and checks its own routing table to determine which of its own leaders should forward the message. In this way, the message is routed closer and closer to the virtual home hosting v and reaches it finally. Tornado guarantees that a request message visits around 21 · O(log R) homes to reach the destination. If n is the number of nodes in the system, 1 1 2 · O(log R) = 2 · O(log n).

193

The routing protocol used by Tornado is inspired from the data structure, m-ary search tree [8]. Particularly, if k = 2, Tornado implements a Chord-like routing algorithm. However, Tornado differs from Chord in (1) the techniques to exploit network locality, and (2) the consideration of node heterogeneity. Note that the routing algorithm in the virtual layer can be replaced by other routing protocols, such as CAN [14], Chord [20], Pastry [15] and Tapestry [24]. The results remain valid. 2.4. Migration of virtual homes When a node x joins Tornado, it is assumed to be inactive initially. A virtual home will be “created for” x and is allocated to a good node (say y) which hosts a home with a key numerically closest to the created virtual home. The home created for x can be viewed as a delegate of x. Meanwhile, x registers its capability value in its home stored in y. Now consider the case when y is overloaded with excessive virtual homes. It will activate an inactive peer whose home is created in y to share its load. Since y collects the capability values of those inactive peers whose homes are in y, y thus can choose the most capable peer among those inactive peers to share its load. The selected peer then becomes a good peer that can accept creation of virtual homes for joining nodes. When a virtual home is migrated to a new good peer, it can present its up-to-date states, i.e. the location of the good peer that it resides, using a “rejoining” operation. A home created in a good peer is associated with a time-to-live (TTL) value (i.e., Tout mentioned in Section 1). When the TTL of a home is expired, it immediately performs a rejoining operation to refresh its routing table. In addition, this helps a home to present its up-to-date location to the system. 3. TYPHOON As far as virtual home migration is concerned, the most relevant state information of a virtual home in Tornado is the network address of the good peer where the home currently resides. This information is often associated with the hash ID of the home so that it can be retrieved from the routing table. In this paper, we will refer to the state of a home as the hash ID of the home and the network address of the hosting good peer. When a home first joins the P2P system, its state is scattered to several other homes. Those homes will store that state information in their routing table. This allows the virtual homes to be interconnected into a logic structure [8,14,15,20,24]. Now, if a home is migrated to a new good peer, its state is changed. This introduces state inconsistent and may degrade system reliability and performance. TYPHOON is a more robust capability-aware P2P system. It tracks locations of homes proactively instead of relying on the soft state approach. Since migrated homes are tracked,

194

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

k

k

k

k

Vector TTL Port IP Hash value

RT[1]

RT[2]

RT[3]

RT[i]

Fig. 2. The data structure of the routing table of a home.

Fig. 1. The message routing and node (re)joining algorithms in TYPHOON, where distance(x, y) denotes the difference of hash values x and y.

they are readily reachable and will not degrade the reliability and performance of the system. We will use two metrics to study the theoretical performance bounds of TYPHOON: the number of states that a home may publish to the system and the number of hops required for sending a message. These two metrics help us to analyze the timing and space overheads of TYPHOON (see Section 4). Before presenting the performance results, we first describe how TYPHOON performs message routing and node join. 3.1. Basic operations and motivation Fig. 1 shows a node joining algorithm in TYPHOON. The node rejoining algorithm is similar. As described in Section 2, when a node joins the system, a virtual home x is created to represent this node. It will consult an active home y to help join the system. Home y then issues at most log R 2  homes. The ith message is sent to messages to log R 2 R a home whose hash key is closest to (x + 2i ) mod R by invoking Typhoon_Route. Each intermediate home z tries to reduce the distance from z to (x + R )mod R as “much” 2i as possible. When one of those homes receives the joining message, it sends its entire routing table to x. In this way, x collects R log R 2  routing tables from log 2  homes. It then determines which state maintained in the collected routing tables should be put into its own routing table. The ith entry of

x’s routing table (denoted by RT [i]) maintains a number of states of those homes whose hash keys are smaller than or equal to x + R . The structure of a routing table is depicted 2i in Fig. 2. Note that x collects the states of at most k homes for each RT [i]. These k homes are chosen in such a way that they are geographically closer to x. To estimate whether two nodes are geographically close, a node joining the system needs to determine its coordinate in the global network. There are a number of Internet map services, such as GNP [12]. We assume that such an Internet map service is available to TYPHOON. A node can thus determine its virtual coordinate by consulting such a map service. It follows that if two nodes are close, their Euclidean distance in the Internet map is small. The joining home x can thus measure the distances according to the virtual coordinates of those homes in the collected routing tables. We note that the coordinate of a home is the coordinate of the good node that hosts this home. The advantages of relying on the coordinates to represent the position of a node in the network are two folds. (1) A node needs to consult the Internet map service only when it is joining the system. (2) The home representing this node can easily look through the coordinates in the collected routing tables to choose the geographically close homes. No traffic and latency are introduced for measuring the distance between any two homes. The effects of network locality on the performance of TYPHOON can be examined as follows. If TYPHOON does not consider network locality, the RT [i] variable of a home x will maintain k other homes with hash keys that are closest to (x + R )mod R. The parameter k will affect the number 2i of states of a home published in the system, the number of hops to deliver a message, and the delay penalty of sending a message. Figs. 3 and 4 show TYPHOON with and without exploiting network locality. The detailed experimental setting is described in Section 4. Figs. 3(a) and 4(a) illustrate the average number of states published by a home as the system size increases up-to 10, 000 peers. Since the number of states stored in the routing table of a home is k · log R 2 , the total number of states that the system can accommodate is n · k · log R 2 , where n is the total number of nodes in the system. Thus n · k · log R 2 can be further represented by n · k · log n2 . In the worst case, n·k·log

n

2 nodes. The the state of a home will be published to n simulation results confirm that the number of states published by a home is bounded by O(k · log n2 ). Note that in Fig. 3(a), the average number of states published by a home is 107 when the system size is 10, 000

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

200 log(n/2) 4log(n/2)

160

8log(n/2)

140

12log(n/2)

120

k = 12

Number of Hops

Number of States

180

k=8

100

k=4

80

k=1

60 40 20 0 0

2000

(a)

4000

6000

8000

10000

Relative Delay Penalty of Sending a Message

k = 12 k=8 k=4 k=1 log(n) 1/2log(n)

0

2000

(b)

Number of Nodes

(c)

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

195

4000

6000

8000

10000

Number of Nodes

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

k = 12 k=8 k=4 k =1

0

2000

4000

6000

8000

10000

Number of Nodes

Fig. 3. A P2P system without exploiting network locality, where (a) the average number of states published per home with the system size from 1 to 10,000 nodes, (b) the average number of hops required, and (c) the relative delay penalty of sending a message.

peers and k = 12. Now, when a home is migrated to another good node, these states become invalid. Any message that relies on these states will fail. The worst case occurs if all homes in the system are simultaneously moved to new locations. The number of invalid states in the system is thus 10, 000 · 107 = 1.07 · 106 . The ratio of invalid states in the 1.07·106 , which can be up-to system is given by 10,000 (12·log

2

)·10,000

73%. Figs. 3(b) and 4(b) show the average number of hops required for delivering a message. If TYPHOON does not take advantage of network locality (Fig. 3(b)), the average number of hops required is bounded by 21 · O(log n). Fig. 4(b) presents similar results. However, when k = 1, the number of hops taken is slightly greater than O(log n). This means exploiting network locality may lengthen the path length of a message, although it can improve the routing performance (see Figs. 3(c) and 4(c)). We thus estimate that the average number of hops to send a message is O(log n). Fig. 5 shows the effect of invalid states to the success ratio, number of hops and delay penalty of delivering a message in TYPHOON without exploiting network locality. We assume that a home is migrated with a probability 21 , i.e.

around 50% of homes are migrated, and no periodic updates (we also investigate the effects of percentage of migrated homes in Section 4). The results show that invalid states degrade the system reliability and performance. For example, when the number of nodes is 10, 000 and k = 12, the success ratio of sending a message is reduced by 15%. Note however Tornado will not degrade, because the number of paths between two Tornado homes is (log n)! [8]. As for performance, the hop counts increase by 18% and the delay penalty is lengthened by 9.6%. This is because in the experiments we do not consider the recovery time when a node fails and needs to find an alternate route for this message. For example, according to [11], identifying an invalid state in a routing table takes two consecutive timeouts, Tout and a probe per Trt , where Tout = 3 s and Trt = 120 s. TYPHOON is an enhanced structured P2P infrastructure based on the virtual home concept. It can track the location of each home in order to maintain the up-to-date state for the entire system (Section 3.2). TYPHOON also incorporates a simple load migration algorithm to share the load according to the capability of each peer node (Section 3.4). We defer the discussion of the lookup of virtual homes to Section 3.3.

196

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

200 log(n/2) 4log(n/2)

160

8log(n/2)

140

12log(n/2)

120

k = 12

Number of Hops

Number of States

180

k=8

100

k=4

80

k=1

60 40 20 0 0

2000

4000

6000

8000

10000

(c)

k = 12 k=8 k=4 k=1 log(n) 1/2log(n)

0

(b)

Number of Nodes Relative Delay Penalty of Sending a Message

(a)

15 14 13 12 11 10 9 8 7 6 5 4 3 2

2000

4000

6000

8000

10000

Number of Nodes

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

k = 12 k=8 k=4 k =1

0

2000

4000

6000

8000

10000

Number of Nodes

Fig. 4. A network locality-aware P2P system, where (a) the average number of states published per home with the system size from 1 to 10,000 nodes, (b) the average number of hops required, and (c) the relative delay penalty of sending a message.

3.2. Tracking virtual homes 3.2.1. Overview TYPHOON supports four operations, register, update, join and leave, for location management of virtual homes. A home x, which stores the state of another home y in its routing table, “registers” its “interest” of y with y. When y migrates to a good peer node, it “updates” its new network address (i.e., the address of the good peer where y is migrated to) to x. TYPHOON organizes those homes interested in a given y into a location advertisement tree (denoted by LAT). The tree supports multicast communication for state advertisement. This allows y to send an address update message to all the homes in the tree. A LAT will dynamically change its structure according to the load of the peers where those participating homes reside (described in Section 3.2.2). In addition to disseminating new network address to those interested peer, y also informs a home in the persistent layer of its changes of locations. The persistent layer comprises of a set of homes that will not change their location. This ensures that with high probability a home can discover the

address of a mobile home by consulting another home that stays in the persistent layer. We discuss the manipulation of the persistent layer in Section 3.2.3. 3.2.2. Register and update A home x sets up and maintains a list of states in two situations. (1) x newly joins TYPHOON (i.e., the peer node represented by x newly joins the system) and collects a set of states from other homes for constructing its own states. (2) x becomes an active home and receives state refreshment advertised by other homes. In either case, x registers itself to nodes whose states are replicated in x. These nodes are called registry homes. In the mean time, it also reports the capacity (denoted by Cx ) of the hosting node to these homes. Note that this hosting node must be a good peer. Scheduling the location update to the registry homes is based on the capacity values collected by a home. Note also that the capacity value presently used by TYPHOON is the maximum number of connections a peer can accommodate. Fig. 6 illustrates the location update algorithm for a home i to update its new network address to the set of homes registered with home i (denoted by R(i)). The idea is that

5 0.95

0.90

4 3 2

0.85 1

All homes do not change their locations

All homes do not change their locations

All homes change their locations

All homes change their locations

0.80

0 0

2000

4000

6000

8000

10000

0

(c)

2000

(b)

Number of Nodes Relative Delay Penalty of Sending a Message

(a)

197

6

1.00

Number of Hops

Success Ratio of Sending a Message

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

4000

6000

8000

10000

Number of Nodes

11 10 9 8 7 6 5 4 3 2 All homes do not change their locations

1

All homes change their locations

0 0

2000

4000

6000

8000

10000

Number of Nodes

Fig. 5. The effect of invalid states for k = 12, where (a) the successful ratio, (b) the number of hops and (c) the relative delay penalty of delivering a message.

home i first estimates the remaining capacity of the hosting peer. If the hosting peer is overloaded, home i only updates its new state to the registry home residing at the peer node that is geographically closest (estimated by (t, i)). Then, that registry home performs state advertisement to other registry homes on behalf of home i. Otherwise, according to the available capacity (denoted by Availi ) of the hosting node, home i advertises its state to the k homes residing at the peers that are geographically closest such that k · v  Availi  (k + 1) · v, where v is the unit cost to send an update message. The parameter v is usually 2, because we need two connections for sending and receiving location update and update acknowledgement messages. Note that in addition to receiving the new network address, each of the k homes also receives a disjoint subset comprised of the homes registering with home i. A LAT constructed by the algorithm shown in Fig. 6 has the following features. (1) It exploits network locality by estimating the geographical closeness of sending and receiving nodes based on their coordinates. (2) It is dynamically structured based on the workload of the participating nodes. The workload depends on the consumption of local resources of a node, such as the network bandwidth and

the memory used by other processes. (3) To prevent a LAT from skewing, a tree node tends to advertise states to the registry nodes evenly. This can greatly reduce the height of a LAT. 3.2.3. State repository In addition to advertising the state to the registry homes, the home also publishes its state to the persistent layer. Since a home may not receive the state of the home it is interested in using the associated LAT, the persistent layer provides a last resort. The persistent layer consists of a set of homes that provide a unified storage space for storing states representing migrated homes. The persistent layer must not contain homes that may be migrated. Otherwise, queries may not be resolved. Recall that Tornado initially provides at least a good peer (say a) whose home is created locally. 1 When a peer (say b) intends to join Tornado, it needs to contact with a to help create b’s home into the system. As mentioned in Section 2.1, b is initially assumed to be inactive. The home representing 1 To operate Tornado and TYPHOON, the system administrator needs to set up a robust and capable peer, initially.

198

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

3.2.4. State discovery In TYPHOON, each state stored in a peer node is associated with a time-to-live (TTL) value, which indicates the valid lifetime of the state. Once the contract of a state expires, the state is no longer valid. Based on the lease concept, two state binding approaches are used by TYPHOON as follows.

Fig. 6. The state advertisement algorithm, where Used i denotes the Avail present workload of the home i, partition(k) ⊆ list(i) for 1  k  v i ,  k partition(k) = list(i) and (t, i) denotes the estimated network delay between homes t and i according to their network coordinates.

b is thus created in a. When a is overloaded, it will choose a peer (say c) whose home is hosted by a and has the highest capacity to share the load of a. Node a then migrates some homes to c. Note that those migrated homes must not include the home of a, but must contain the home “representing” c. Then c immediately joins the persistent layer. Tornado incrementally evolves, comprising of only good peer nodes. These good peers not only host their homes locally, but also provide storage spaces for homes representing peers that have not been utilized. Assume that the set of good peers that participate in Tornado is A = {a1 , a2 , a3 , . . . , ak }. j The set of homes hosted by ai ∈ A is {h0i , h1i , h2i , . . . , hi }, 0 where ai ’s home is hi . TYPHOON can construct the persis tent layer over the set ki=1 {h0i }. This is accomplished as follows. When a peer x is invoked by sharing the load of a good peer y, y first migrates at least the home of x to x. The home representing x then issues a joining message to a home in the persistent layer. The message helps x’s home join the persistent layer and informs other homes of that home. These operations are similar to a peer joining operation used by P2P routing protocols such as CAN, Chord, Pastry and Tapestry. Presently, the persistent layer in TYPHOON adopts the routing protocol provided by Tornado (see Section 2.3).

• Early binding: Each home periodically publishes its state to the registry homes using the associated LAT. Meanwhile, each registry home also periodically registers itself to the home that it is interested in. • Late binding: It is possible that the registry home does not receive the state periodically advertised by the migrated home, because the registry node is migrating or the transmission of the advertised state fails. The registry node can thus issue a discovery message to the persistent layer to resolve the network address of the migrated home. Note that each home knows a universal home, which stays in the persistent layer, provided by the initial good peer (see Section 3.2.3). Then, a home can inquire the persistent layer by sending its request to the universal home. To prevent the performance bottleneck and single point of failure created by the universal home, each home dynamically learns homes that have joined the persistent layer from those homes visited by its query requests. Then, a home can send its query to the persistent layer using one of the homes that it knows so far. This can relieve the load of the universal home. 3.2.5. Join and leave When a node joins and leaves the system, the overlay structure maintained by virtual homes is changed. Fig. 7 presents the algorithm for a home i to join TYPHOON. Its representative home publishes its state to certain homes and then these homes return their registration to i. The newly created home i will collect the states maintained by each visited home and determine whether the collected states (i.e., r in Fig. 7) should be included into the set of states it maintains. This depends on the closeness of the keys of r and i, i.e., distance(r, i), and the network distance between r and i, i.e., (r, i). Maintaining states representing nearby homes allows the overlay to exploit the network locality. Consequently, a home can help forward a message to a geographically close node in the next hop. TYPHOON does not explicitly handle the departure of nodes. Each home in a node should periodically rejoin the system using their original hash values through an algorithm similar to the one shown in Fig. 7. A home periodically rejoining the system will collect appropriate states to reflect the dynamics of the system. This not only allows a home to link to geographically close homes as neighbors, but also registers its interests with those homes that the home is interested in. Clearly, if a node includes several homes locally, this node must be a good node and can thus accommodate a large number of connections required by its locally hosted homes.

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

199

Fig. 8. The lookup algorithm in which home i forwards a packet towards a home with the hash key closest to j .

3.4. Home migration

Fig. 7. A home joining operation.

If a good node fails, there may be several homes lost simultaneously. This includes the home that participates in the persistent layer. Note that the probability of the failure of a good node should be very small since a good node will not leave from the system arbitrarily [18]. If a good peer unfortunately fails, TYPHOON can implement the replication mechanism similar to CFS [4] and PAST [16] for high data availability. 3.3. Lookup As aforementioned, in TYPHOON a home’s state is represented as a tuple of its representative hash ID and network address. The network address field may be invalid if the network address cannot be resolved. Suppose a home x forwards a message to another home with a hash key k and finds in its local routing table of a state-pair < k, invalid >. It then seeks the help of the persistent layer by sending a discovery message with the hash key k to a home, say z, which can resolve the network address of k. Note that x detects the invalidity of k if k does not acknowledge the message forwarded by x. Once z determines the network address of k, said a, it forwards the message to k. At the mean time, z replies the resolved network address to x, which then updates its local state-pair from < k, invalid > to < k, a >. Fig. 8 shows the lookup operations to forward a message in a home with the hash key i towards a home whose hash key is closest to. Note that different DHT algorithms have different definitions for the “closeness” in the hash space. In addition, a discovery message is only routed in the persistent layer.

Using the notations defined in Section 3.2.3, the set of good peers in the system is denoted by A = {a1 , a2 , a3 , . . . , ak }. The set of homes hosted by ai ∈ A j (i = 1, 2, 3, . . . , k) is Hi = {h0i , h1i , h2i , . . . , hi }, where ai ’s home is h0i . The inactive peer represented by home hm i (m  = 0) is denoted by bim . The set of inactive peers whose homes j are hosted by ai is denoted by Bi = {bi1 , bi2 , bi3 , . . . , bi }. Consider an overloaded peer ai . It will choose an inactive peer x ∈ Bi such that Cx  Cy for any y ∈ Bi − {x}. Node ai then migrates a random subset (denoted as S) of homes it hosts such that S ⊂ Hi − {h0i } and {x  s home} ⊂ S. To share the load of the overloaded peer ai to the newly invoked peer x, the number of migrated homes to x is determined as follows   Cx , (1) (S) = (Hi ) · Cai + Cx where (K) denotes the number of elements in the set K. Note that TYPHOON currently adopts the intrinsic solution to share the load of peers. To pursue a load balancing algorithm is out of the scope of this paper. We believe that a better algorithm such as [13] can be integrated into TYPHOON to balance the load of the peers. 4. Performance evaluation In this section, we firstly analyze the overheads of manipulating a LAT and the overheads of discovering a requested state. Section 4.2 then presents the evaluation results using simulation. We also provide the results using real measurements based on our prototyping system (Section 4.3). 4.1. Theoretical analysis 4.1.1. Time complexity Basically, given a randomly designated key and a fairly selected requesting home, this home performing the lookup

200

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

operation for the designated key averagely takes O(log n) network hops (see Section 3). However, since homes may be migrated to another peer node in TYPHOON, performing a lookup is thus T = O(log n) · (D + F ),

(2)

where D is the cost (in terms of number of hops) to discover the network address of the home receiving the forwarded message in the next hop. F is the cost to forward the message to the home in the next hop, where F is 1 in terms of number of hops. Consider three cases as follows. • If the forwarding home (say a) receives the up-to-date network address of the home (say b) in the next hop, a can directly forward this message to b. As illustrated in Figs. 3(a) and 4(a), b’s state is scattered over at most O(k·log n2 ) homes that include a. Consider the worst case, i.e., the LAT used to disseminate b’s up-to-date address is a tree where the in- and out-degree of each tree node is 1, and a is the leaf node. Advertising b’s address to a thus takes O(k · log n2 ) hops. Consequently, sending the message takes T = O(log n) · (O(k · log n2 ) + 1) hops. • If a does not receive any network address advertised from b, a simply discovers b’s address before relaying this message to b. To discover b’s state published in the persistent layer requires O(log n) hops, i.e., the cost of a normal lookup operation. Thus, the overall cost of sending a message is T = O(log n) · (O(log n) + 1) hops. • Ideally, if a have received the up-to-date network address of b using the associated LAT before forwarding the message to b, then T = O(log n) since D = 0. Consequently, the time complexity is T = O(k · log n)2 . 4.1.2. Traffic complexity Clearly, the maximal number of messages required of performing a lookup request is O(k · log n)2 when the state of each intermediate home assembling the routing path for the lookup needs to be resolved. 4.1.3. Memory complexity We further analyze the memory overheads consumed by each home. Apparently, each home, a, takes memory cost as follows: M = U + V + W,

(3)

where U is the space overhead of a’s routing table, V is the space cost for pointers to structure LATs a participates in, and W is for states received when a becomes a member of the persistent layer. Since each home maintains O(k · log n) states, U is equal to O(k · log n). V is O(k · log n)2 − O(k · log n). This is because each home, a, publishes its state to O(k ·log n) homes, i.e., a LAT consists of O(k · log n) members. The maximum number of out-degree of an internal tree node can be O(k · log n) − 1 if

all tree nodes except the root are leaves. Clearly, the number of states published by all homes is n · O(k · log n). If the uniform hash function is used, each home will receive states from O(k · log n) various homes. In other words, a will participate in O(k · log n) LATs. Hence, a at most takes the memory cost V = O(k · log n) · (O(k · log n) − 1). For W, if there are m migrated homes (those have not been migrated to peers they represent) from n homes, each of those n − m homes participating in the persistent layer can m states. Assume that averagely receive the number of n−m the “average” capability of each peer is a constant number c, i.e., there are nc good peers. Therefore, each home appearing n in the persistent layer receives the number of c = c + n−m n cm  states. Since n − m = c and m < n, c = c + n < 2 · c. Consequently, M is U + V + W = O(k · log n) + O(k · log n)2 −O(k · log n) + c .

(4)

M can be thus roughly estimated as O(k · log n)2 . 4.2. Simulation study We also evaluate TYPHOON in simulation. Each simulated node maintains k = 8 neighbors for each entry of its routing table in default. The hash key of a peer node is randomly generated. The number of nodes, n, in the experiments is varied from 1 to 10, 000. We study that a node dynamically joins the system and performs at most 10 periodic updates for its routing tables during simulation. These n nodes are randomly and uniformly scattered over a network generated by the GT-ITM topology generator [23], where the Transit-Stub topology models an 2-level hierarchy of routing domains. The higher-level transit domains bridge the lowerlevel stub domains. We assume that the link between any two nodes is symmetric; simulate that each transit and stub domain is comprised of 10 and 100 routers, respectively. Table 1 summarizes the capability and capacity values used for modeling the heterogeneity of nodes. We refer to the results by Saroiu et al. [17] to classify participating nodes into five types (denoted as A, B, C, D and E), where the relative percentages of numbers of nodes for A, B, C, D and E are 20%, 45%, 30%, 4% and 1%. According to SPECint2000 [19], the respective capability values estimated in terms of number of hosted homes for A, B, C, D and E are 2 (represented by 25.0 SPECint2000 base rate, i.e., Advanced Micro Devices A4800 Model), 4, 8, 16 and 32 (represented by 430 SPECint2000 base rate, i.e., Sun Microsystems Sun Fire 15K Model), respectively. For A, B, C, D and E, the numbers of connections are respectively 2, 10, 100, 1000 and 10, 000 as those parameters used by Chawathe et al. [2]. The failure probability of forwarding a message by peers in A, B, C, D and E are 0.81, 0.27, 0.09, 0.03 and 0.01, respectively. The performance metrics measured are the average relative delay penalty of delivering a message and the average number of hops. Notably, if a message j takes k hops to reach its destination, the cost to transmit this message is TYPHOON

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

Table 1 The parameters used for modeling the heterogeneity of nodes

Percentage (%) Capability Capacity Failure

A

B

C

D

E

20 2 2 0.81

45 4 10 0.27

30 8 100 0.09

4 16 1000 0.03

1 32 10,000 0.01

k(j )

i=1 di,j , where di,j denotes the delay taken by delivering the message j in the ith hop and k(j ) is the total number of hops required to send the message j. The average relative w

j =1

k(j ) di,j i=1  dj

delay penalty is thus , where the total number w of source and destination pairs is w = n2 (i.e., all source and destination pairs) and the shortest end-to-end latency from the source to the destination for the message j is denoted as dj . Similarly, the average number of hops of delivering a w

k(j )

1

message is calculated as j =1 w i=1 . Note that all experiments presented later are based on locality-aware TYPHOON which relies on a global Internet map service to estimate a node’s position in the network as mentioned in Section 3. 4.2.1. Performance of state discovery Fig. 9 presents the simulation results for delivering a discovery message in the persistent layer. The number of nodes (i.e., good nodes) joining in the persistent layer is depicted in Fig. 9(a). It presents that the number of good peers participating in the persistent layer is enlarged up-to around 2000 peers when the system size is increased. The persistent layer is incrementally inflated by absorbing the more amounts of good nodes, which can accommodate increasing states published to it due to the increase of joining nodes. Figs. 9(b) and (c) show the average number of hops and relative delay penalty of resolving a state discovery request, respectively. When k becomes increased, the number of hops (bounded by O(log n) hops) required to discover a designated state is markedly reduced. This is because more redundant and efficient routing paths can be exploited when k is enlarged. Similar results are illustrated in Fig. 9(c). 4.2.2. Performance of state advertisement The averagely relative delay penalty measured for adverw

j =1

l(j )

maxi=1 ti,j l(j ) maxi=1 l i,j w

tising a state is calculated as , where there are w = n LATs and the LAT j is comprised of l(j ) leaf nodes. ti,j is the delay required for propagating the state from the root to the leaf node i in the LAT j. l i,j is the shortest endto-end 1-hop delay from the root to a leaf node i in the LAT j when the root has infinite number of outgoing connections. Similarly, we define the averagely maximal number of hops w

j =1

required to deliver a state as

l(j )

max w

i=1

lj

1

.

201

We first study a worst case in which there is only one free outgoing connection available in each participating peers. That means each LAT is structured as a linear list. Fig. 10(a) shows the simulation results by varying k’s. The results present that the averagely maximal number of hops required by propagating a state via a LAT is bounded by k ·O(log n2 ). The states are not widely replicated to the system since (1) the states are only replicated in good peers and the number of good peers is smaller than the participating peers. (2) The number of states that are published is limited by the number of good peers exploited. (3) TYPHOON is constructed over good peers to match the actual network locality. A good peer only maintains those geographically close good peers in its routing table. This can reduce the number of states representing a good peer to be replicated. When considering the real setting shown in Table 1, the average depth of a LAT is smaller than 2 (see Fig. 10(b)). Fig. 10(c) shows results for the relative delay penalty measured. When k is enlarged, the number of states that can be maintained by a peer is increased. That means the number of members assembling a LAT tree becomes increased. When a member helps advertise a state upon receiving such an up-to-date state, more amounts of members that are geographically close can be exploited to help advertise the received state. The simulation results show that when k  8, the relative delay penalty is reduced to around 1.25. 4.2.3. Performance of lookup The average number of hops for resolving a lookup request is shown in Fig. 11. As shown in Eq. (2), a lookup requests takes T = O(log n) · (D + F ) hops, where D and F are the costs to discover a state from the persistent layer and to receive an up-to-date state from the associated LAT, respectively. Instead of simulating different node’s arrival and departure rates and thus the optimal Trt , Tout and l that are estimated, we study several cases in which there are x% and y% of states that are, respectively, discovered and received from the persistent layer and LATs. This is because any home migration rate can be represented as a pair of x and y such that x + y ≈ 100. The (x, y)’s studied are set to (100, 0), (50, 50), (20, 80), (10, 90) and (0, 100). The setting (100, 0) means that the network address of each intermediate home along the lookup routing path needs to be resolved by consulting the persistent layer. On the contrary, the (0, 100) denotes that the network address of each intermediate home is rapidly updated via the associated LAT. This does not need any address resolution during performing a lookup. The result confirms our theoretical analysis in Section 4.1. Notably, k is 8 in this experiment. The number of hops required for performing a lookup is bounded by O(log n)2 . 4.3. Prototyping and real measurement We have prototyped TYPHOON using Java JDK 1.4.1, and evaluated the prototype system in an environment with 34

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

2000 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1800 1600 1400 Number of Hops

Accumulative Number of Good Nodes Exploited

202

1200 1000 800 600 400 200 0 0

2000

4000

6000

8000

10000

k=8 k=4 k=1 log(n)

0

2000

(b) Relative Delay Penalty of Sending a Message

(a)

k = 12

6000

8000

10000

7 6 5 4 3 k = 12

2

k=8 k=4

1

k =1 0 0

(c)

4000

Number of Nodes

2000

4000

6000

8000

10000

Number of Nodes

Fig. 9. (a) The number of peers joining the persistent layer versus the number of nodes joining in the system, and the performance metrics in terms of the average number of network hops; (b) and relative delay penalty; (c) of resolving a state discovery request issued to the persistent layer.

desktop machine nodes interconnected with 100 Mbits fast Ethernet. Each desktop machine is equipped with 1.33 Ghz AMD Athlon microprocessor and 512 Mbytes main memory. All machines are installed Redhat Linux with the kernel version 2.4.19. Their file systems are mounted together. We configure the prototyped TYPHOON using k = 3 and collect the results from 6 rounds. Notably, the experiments presented are based on the parameters shown in Table 1 except that each home allocates 4 TCP/IP connections for manipulating a LAT. 4.3.1. The delay of sending a message between two homes Fig. 12(a) depicts the delay measured for sending a message (in size 64 bytes) between all pairs of homes created in the system. We can estimate the average delay of sending a message as 7.0 · h1 , where 7.0 ms is the average delay of sending a message between two homes and h1 is the theoretically average number of hops of sending a message, i.e., 1 2 · log(number of nodes). The results present that the delay measured is akin to the estimated value. Similarly, the theoretically maximal delay of sending a message is 7.0 · h2 , where h2 is log(number of nodes). From Fig. 12(a), the re-

sults present that the measured delay values are comparable to the estimated values. 4.3.2. The delay of advertising a state Fig. 12(b) illustrates the results for the average delay of advertising a state. Notably, due to the costly implementation in thread scheduling for communication, if there are multiple communication requests to be scheduled, scheduling a communication request roughly takes 25.16 ms (we are considering to use Manta [21], an enhanced Java, to reduce the runtime overheads due to thread scheduling). This leads to a total of 25.16 + 7.0 = 32.16 ms to commit a communication message. Consequently, the delay of advertising a state can be estimated as log4 n · 32.16 ms since each node allocates at most 4 connections for advertising a received state. As Fig. 12(b) shows, the measured results are close to those estimated. 5. Related work The concept of virtual homes is similar to the concept of virtual servers [4]. Based on the virtual server concept,

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

160

160 log(n/2) 4log(n/2) 8log(n/2) 12log(n/2) k = 12 k=8 k=4 k=1

Number of Hops

120 100 80 60

log(n/2) 4log(n/2) 8log(n/2) 12log(n/2) k = 12 k=8 k=4 k=1

140 120 Number of Hops

140

100 80 60

40

40

20

20

0

0 0

2000

4000 6000 8000 Number of Nodes Relative Delay Penalty of Advertising States

(a)

203

(c)

10000

0

(b)

2000

4000 6000 8000 Number of Nodes

10000

2.00 1.75 1.50 1.25 1.00 0.75 k = 12 k=8 k=4 k=1

0.50 0.25 0.00 0

2000

4000 6000 8000 Number of Nodes

10000

Fig. 10. The averagely maximal delay required for a peer to receive an up-to-date address from a migrated home, where (a) is the worst case when each peer is with 1 outgoing connection; (b) is for the default parameters shown in Table 1, and (c) the relative delay penalty of the averaged maximal delay.

a peer can carry multiple virtual nodes to join the system. When its load is light, it can adaptively vary the number of virtual nodes to accommodate. If the load becomes heavy, it deletes some virtual nodes to relieve the load. The data items hosted by those virtual nodes are also removed. This can significantly reduce the availability for those published data items. In contrast, both [8,13] migrate loads from heavy peers to light peers. However, they rely on peers to periodically refresh the states they maintained. As mentioned in Section 3, the ratio of the number of invalid states representing homes participating in the system can be up to 73%, for a system size of 10, 000 homes and k = 12. Tracking the states of a node that are scattered over the system thus becomes an issue. Obviously, systems such as Chord/CFS [4] and CAN [14] that support the concept of virtual server can adopt the location management mechanism of TYPHOON to track a virtual server. Instead of tracking states, Mahajan et al. suggest to dynamically vary the rejoining period of a node [11]. When a system is relatively stable, this can significantly reduce the control traffic for repairing the routing tables. However, the authors show that when the failure rate of a node increases, the control traffic can be drastically increased.

Weatherspoon et al. also propose a node failure recovery mechanism by reducing the probing frequency of nodes further to reduce the wide-area network traffic [22]. An asymptotic analysis of node join and departure can be found in [10]. The major difference between these studies and TYPHOON are two folds. First, these studies are based on the fail-stop model, i.e., a node that leaves the system will not join the system again. If the node rejoins the system, it will be treated as a new one. However, in TYPHOON a home can migrate among a set of good peers. This will be treated as a sequence of failure and join operations. TYPHOON tracks the migrated homes. However, in TYPHOON a home still needs to periodically rejoin the system. This is mainly for the home to dynamically discover geographically close homes in order to maintain network proximity. Second, periodic probing needs a mechanism to dynamically calibrate the probing rate. Probing introduces additional network traffic to the system. However, in TYPHOON, the use of a LAT requires only p − 1 messages to update p members in the LAT. The design of TYPHOON is inspired from our earlier system called Bristle [7], which allows a node in a structured P2P overlay to change its network attachment point (i.e., IP address and port number). By utilizing the good nodes

180

log(n)*log(n)

160

100, 0

140

50, 50

120

20, 80

100

10, 90

80

0, 100

16 12 8 4 0

0

2000

4000 6000 Number of Nodes

8000

10000

Fig. 11. The number of hops of performing a lookup request.

exploited by Tornado and managing the location of virtual homes, TYPHOON is a well-conditioned lookup overlay in the virtual home layer [8]. Hauswirth et al. also propose a design [6], which is based on P-Grid [1], to allow participating nodes to dynamically change their network attachment points. Their design does not rely on a persistent layer as a last resort. It does not exploit good peers for persistent resources. Consequently, there is no guarantee of performance upper bound on resolving a query in terms of hops and messages required. On the contrary, TYPHOON ensures that a query is resolved in O(k · log n)2 hops and messages. Datta et al. propose a hybrid push/pull algorithm for advertising up-to-date replica to the system [5]. When a node does not receive an up-to-date replica that is propagated by gossip broadcasting (i.e., push), it inquires (i.e., pull) the system. Although the proposed algorithm is generic, it may not be suitable for maintaining states in a structured P2P overlay. First, as mentioned in Section 3, a state is published to at most O(k ·log n2 ) distinct locations. For updating states to such a small number of locations, gossip broadcasting may overwhelm the system with excessive messages. Second, the algorithm assumes that nodes that are interested in the same update have formed an overlay. However, nodes collecting identical states do not form a group. Their solution thus cannot be directly applied to a structured P2P overlay for handling updates of states. Third, replica management is typically implemented on top of a structured P2P overlay, which should be application-specific. In contrast, TYPHOON concentrates on a relatively lower layer. It first estimates the upper bound of the number of states that will be published into the system and then provides a group management mechanism, LATs, in the virtual home layer. From an application perspective, TYPHOON is identical to other DHTbased overlays [14,15,20,24], which allow replica management service installed on top of it.

Delay (in millisecond) of Sending a Message

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

60 Maximum Delay Measured log(n) x 7.0 Average Delay Measured 1/2log(n) x 7.0

50 40 30 20 10 0 5

10

15

20

25

30

35

Number of Nodes

(a)

Delay (in millisecond) of Advertising a State

Number of Hops

204

140

Delay Measured log4(n) * 32.16

120 100 80 60 40 20 0 5

(b)

10

15

20

25

30

35

Number of Nodes

Fig. 12. The measurement from the TYPHOON prototype, where (a) the average and maximum delay of sending a message and (b) the average delay of advertising a state.

6. Conclusions TYPHOON is a capability-aware P2P system that takes advantage of heterogeneity of nodes. It is constructed over good peers. When a good peer in TYPHOON is overloaded, it migrates some homes it hosts to another good peer to share the load. However, the state of a home becomes invalid when the home is migrated. This can reduce the system performance and reliability. TYPHOON thus associates a location advertisement tree (i.e., LAT) with a home. A home can rely on its LAT to advertise its state to those homes that are interested in its migration. Moreover, a home also publishes its state to the persistent layer that is assembled by those good peers in the system. If a home does not receive an up-to-date state via a corresponding LAT, it can discover the state from the persistent layer. We evaluate TYPHOON in theoretical analysis, simulation and real measurement. The results confirm the effectiveness of TYPHOON. We show that the number of hops and the

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

number of lookup messages are of O(k ·log n)2 , where k is a constant. The memory space required by a peer to maintain the states is also O(k · log n)2 . We are investigating the incentive issue, i.e., encouraging a peer that can be good to act as a good peer. If a peer that participates in TYPHOON has the intention to be a good peer, TYPHOON can exploit more reliable resources from that peer. We are also studying load balance algorithms when a peer dynamically changes the number of homes it intends to host.

[14]

[15]

[16]

Acknowledgements This work was supported in part by National Science Council, Taiwan, under Grant NSC 90-2213-E-007-076 and by Ministry of Education, Taiwan, under Grant MOE 89-EFA04-1-4.

[17]

[18] [19] [20]

References [1] K. Aberer, M. Hauswirth, M. Punceva, R. Schmidt, Improving data access in p2p systems, IEEE Internet Comput. 6 (1) (January/February 2002) 58–67. [2] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker, Making gnutella-like p2p systems scalable, in: Proceedings of the International Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM Press, New York, August 2003, pp. 407–418. [3] I. Clarke, S.G. Miller, T.W. Hong, O. Sandberg, B. Wiley, Protecting free expression online with freenet, IEEE Internet Comput. 6 (1) (January/February 2002) 40–49. [4] F. Dabek, M.F. Kaashoek, D. Karger, R. Morris, I. Stoica, Wide-area cooperative storage with cfs, in: Proceedings of the Symposium on Operating Systems Principles, ACM Press, New York, October 2001, pp. 202–215. [5] A. Datta, M. Hauswirth, K. Aberer, Updates in highly unreliable, replicated peer-to-peer systems, in: Proceedings of the International Conference of Distributed Computing Systems, IEEE Computer Society, Silver Spring, MD, May 2003, pp. 76–85. [6] M. Hauswirth, A. Datta, K. Aberer, Handling identity in peer-to-peer systems, in: Proceedings of the International Workshop on Database and Expert Systems Applications, IEEE Computer Society, Silver Spring, MD, September 2003. [7] H.-C. Hsiao, C.-T. King, Bristle: a mobile structured peer-topeer architecture, in: Proceedings of the International Parallel and Distributed Processing Symposium, IEEE Computer Society, Silver Spring, MD, April 2003. [8] H.-C. Hsiao, C.-T. King, Tornado: a capability-aware peer-to-peer storage overlay, J. Parallel Distrib. Comput. 64 (6) (June 2004) 747–758. [9] KaZaA, http://www.kazaa.com/. [10] D. Liben-Nowell, H. Balakrishnan, D. Karger, Analysis of the evolution of peer-to-peer systems, in: Proceedings of the Symposium on Principles of Distributed Computing, ACM Press, New York, July 2002, pp. 233–242. [11] R. Mahajan, M. Castro, A. Rowstron, Controlling the cost of reliability in peer-to-peer overlays, in: Proceedings of the International Workshop on Peer-to-Peer Systems, Springer, Berlin, February 2003. [12] T.S.E. Ng, H. Zhang, Predicting internet network distance with coordinates-based approaches, in: Proceedings of IEEE INFOCOM, June 2002, pp. 170–179. [13] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, I. Stoica, Load balancing in structured p2p systems, in: Proceedings of the

[21]

[22]

[23]

[24]

205

International Workshop on Peer-to-Peer Systems, Springer, Berlin, February 2003. S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A scalable content-addressable network, in: Proceedings of the International Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM Press, New York, August 2001, pp. 161–172. A. Rowstron, P. Druschel, Pastry: scalable, distributed object location and routing for large-scale peer-to-peer systems, Lecture Notes in Computer Science, vol. 2218, November 2001, pp. 161–172. A. Rowstron, P. Druschel, Storage management and caching in past, a large-scale, persistent peer-to-peer storage utility, in: Proceedings of the Symposium on Operating Systems Principles, ACM Press, New York, October 2001, pp. 188–201. S. Saroiu, P.K. Gummadi, S.D. Gribble, Measurement study of peer-to-peer file sharing systems, in: Proceedings of Multimedia Computing and Networking, January 2002. S. Sen, J. Wang, Analyzing peer-to-peer traffic across large networks, ACM/IEEE Trans. Networking 12 (2) (April 2004) 219–232. SPEC, http://www.spec.org/cpu2000/results/res2003q2/a4. I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan, Chord: a scalable peer-to-peer lookup service for internet applications, in: Proceedings of the International Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, ACM Press, New York, August 2001, pp. 149–160. R. Veldema, H.E. Bal, Optimizing java specific overheads, java at the speed of c?, in: Proceedings of the International Conference on High Performance Computing and Networking, June 2001. H. Weatherspoon, J. Kubiatowicz, Efficient heartbeats and repair of softstate in decentralized object location and routing systems, in: SIGOPS European Workshop, September 2002. E.W. Zegura, K. Calvert, S. Bhattacharjee, How to model an internetwork?, in: Proceedings of IEEE INFOCOM, March 1996, pp. 594–602. B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph, J.D. Kubiatowicz, Tapestry: a resilient global-scale overlay for service deployment, IEEE J. Selected Areas Commun. 22 (1) (January 2004) 41–53. Hung-Chang Hsiao is currently a PostDoc in Computer & Communication Research Center of National Tsing-Hua University, Taiwan. He received the BS degree in Computer Information Science from Soochow University, Taiwan, in 1995, and the PhD degree in computer science from National Tsing-Hua University, Taiwan, in 2000. His research interests include overlay networking, peer-to-peer computing, grid computing, parallel computer architecture, and wireless communication. He is a member of the IEEE.

Chung-Ta King received the B.S. degree in electrical engineering from National Taiwan University, Taiwan, R.O.C., in 1980, and the M.S. and Ph.D. degrees in computer science from Michigan State University, East Lansing, Michigan, in 1985 and 1988, respectively. From 1988 to 1990, he was an assistant professor of computer and information science at New Jersey Institute of Technology, New Jersey. In 1990 he joined the faculty of the Department of Computer Science, National Tsing Hua University, Taiwan, R.O.C., where he is currently a professor. His research interests include distributed processing, cluster computing, and embedded systems. He is a member of the IEEE.

206

H.-C. Hsiao et al. / J. Parallel Distrib. Comput. 65 (2005) 191 – 206

Chia-Wei Wang received the B.S. degree in computer science from National Tsing-Hua University, Taiwan, in 2002, and the M.S. degree in computer science from National Tsing-Hua University, Taiwan, in 2004. His research interests include overlay networking, peer-to-peer computing, multiplayer games, and distributed shared memory.