Computer Communications 29 (2006) 1730–1743 www.elsevier.com/locate/comcom
A self-organizable topology maintenance protocol for mobile group communications in mobile next-generation networks Guojun Wang a,b, Jiannong Cao a,*, Keith C.C. Chan a b
a Department of Computing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong School of Information Science and Engineering, Central South University, Changsha, People’s Republic of China, 410083
Received 3 January 2005; received in revised form 20 September 2005; accepted 21 September 2005 Available online 24 October 2005
Abstract The explosive growth of mobile and wireless communications has attracted interests in the integration of mobile and wireless networks with wired ones and the wired Internet in particular. In order to deal with the scalability and reliability issues for group communication services in such a network environment, many existing protocols divide the whole group into subgroups and organize them into a tree-based hierarchy. A special node in each subgroup is responsible for collecting acknowledgement messages and locally retransmitting lost messages within the subgroup. However, the tree-based hierarchy has the single point of failure problem, which may seriously affect the performance of group communications. We propose a RingNet hierarchy of proxies that is a combination of logical trees and logical rings. The proposed hierarchy has the selforganization property because it can heal itself as quickly as possible in the presence of failures. Therefore, it has no single point of failure problem. We formally prove that, with high probability of 99.899%, the proposed hierarchy with up to 10 000 proxies directly attached by a large number of mobile hosts only needs simple and efficient procedures to repair broken logical rings when the node failure probability is bounded by 0.1%. We also validate the proposed protocol by extensive simulations, which show that the proposed protocol scales very well when the size of the network becomes large, and that it is highly resilient to failures when the node failure probability becomes large. q 2005 Elsevier B.V. All rights reserved. Keywords: Group communications; Group membership; Multicast dissemination; Fault tolerance; Self-organization
1. Introduction The explosive growth of mobile and wireless communications has attracted interests in the integration of mobile and wireless networks with wired ones and the wired Internet in particular. The integrated networks are called mobile nextgeneration networks (mobile NGNs) [1]. Many information delivery applications that disseminate messages from one or multiple sources to a large number of recipients, such as data dissemination of news feeds, stock quotes, weather forecasts, traffic reports, and multimedia streaming of live audio and video, will be deployed in mobile NGNs in the near future. Furthermore, there is an increasing demand for enhanced services to help users do mobile collaborations such as * Corresponding author. Tel.: C852 27667275; fax: C852 27740842. E-mail addresses:
[email protected] (G. Wang), csgjwang@ mail.csu.edu.cn (G. Wang),
[email protected] (J. Cao), cskcchan@ comp.polyu.edu.hk (K.C.C. Chan). URL: http://www.comp.polyu.edu.hk/wcsjcao/.
0140-3664/$ - see front matter q 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2005.09.014
computer supported cooperative works, which requires the support for mobile group communications. Group communication systems provide communication services among groups of processes. A group consists of a set of processes called members of the group. A process may voluntarily join or leave a group, or cease to be a member due to failure. The membership of a group is the list of currently operational processes in the group. One major task of a group communication system is group membership management, which maintains the membership of a group with regard to Member-Join, Member-Leave, Member-Failure, and MemberHandoff events. Another major task is group multicast dissemination, which involves efficiently disseminating information from one or multiple sources to all the operational processes in the group. For large-scale information delivery in mobile NGNs, the membership information is necessary for the following purposes. First, for reliable multicast communications, it can be used to reconstruct the multicast infrastructure, e.g. a multicast tree, in response to membership changes and member handoffs. Second, it can be used to bill the users when a usageor time-based billing approach is adopted. Third, it can be
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
statistically collected and then used by data mining programs to find useful user-usage patterns. For example, in order to maximize the profit of both the advertisers and the advertisement service providers, it is important to answer questions like “when and where to insert advertisement programs into the normal data/multimedia flows?” Many existing group communication systems are designed for generic WAN environment, which do not explicitly consider mobile hosts (MHs), e.g. laptop computers, PDAs, mobile phones and mobile video phones, as group members. Therefore, there is no guarantee that they can also work well in the presence of MHs. Our work deals with MHs as group members in mobile NGNs. However, the design of a group communication protocol in mobile NGNs is a challenging task. In fact, the intrinsic issues in WANs like high message latency, frequent connectivity change, and instability due to link failures or congestion [2], still exist in mobile NGNs. Furthermore, there are more difficult issues that need to be addressed due to introducing MHs, such as frequent disconnection of the MHs from their attached wireless networks, frequent handoff of the MHs from one wireless network to another, and frequent failure occurrence of the MHs and wireless communication links. Therefore, it has been shown that the design of group communication protocols in mobile NGNs is more difficult than that in the wired Internet [3–5]. In this paper, we are motivated by some interesting problems concerned with very large and highly dynamic groups in mobile NGNs. First, dynamic membership that involves a very large number of the MHs to dynamically join or leave a group should be tackled in a scalable manner. Second, dynamic locations of the MHs add complexity to the existing group communication protocols that only deal with dynamic membership. Third, dynamic networks due to node/link failures make the group membership information difficult to maintain in a reliable manner. We solve these problems by proposing a RingNet hierarchy of proxies for mobile group communications. Based on the proposed hierarchy, we have designed the membership propagation algorithm [24] and the multicast dissemination algorithm [25], which are the important components for mobile group communications. Although the RingNet hierarchy changes due to dynamic membership, dynamic locations of the MHs, and dynamic networks, these algorithms consider that the hierarchy is always ‘stable’. That is, these algorithms are ‘unaware’ of the topological change of the proposed hierarchy, which is handled by the proposed topology maintenance protocol discussed in Sections 3 and 4 in a transparent manner. The proposed hierarchy is a combination of logical rings and logical trees, in the sense that the hierarchy becomes a tree if each logical ring in the hierarchy is considered as one node. Therefore, it takes advantages of the simplicity of logical rings and the efficiency and scalability of logical trees. More importantly, such a combination makes it easy to design a self-organizable topology maintenance protocol that maintains the hierarchy due to dynamic membership, dynamic locations, and dynamic networks. We argue that the proposed protocol is scalable, for the reason that each proxy only records local information about its neighboring proxies, both candidate and
1731
current neighbors, in order to maintain the hierarchy. We also argue that the proposed protocol is more reliable than existing tree-based protocols. In a tree-based protocol, when a node within the tree becomes faulty, the tree is broken into several parts, which is called the single point of failure problem. However, when failures occur within the hierarchy, it can heal itself as quickly as possible by running the proposed topology maintenance protocol in this paper. Therefore, the group communications may only experience temporary service disruption while excluding the faulty nodes from the hierarchy. The rest of the paper is organized as follows. In Section 2, we introduce some related works. In Section 3, we describe the design of a basic topology maintenance protocol without considering failures in the RingNet hierarchy. In Section 4, we add fault tolerance support to the basic protocol, and analyze the Function-Well probability of the proposed hierarchy. In Section 5, we report the results of performance evaluation by extensive simulations. Section 6 concludes the paper.
2. Related works For large-scale information delivery applications, scalability and reliability are two of the most important issues. To tackle the scalability issue, many existing protocols use some form of tree-based hierarchies, e.g. spanning trees. To tackle the reliability issue, many existing protocols use some form of subgroups. That is, they divide the whole group into multiple subgroups, with a special node in each subgroup to be responsible for collecting acknowledgement messages and locally retransmitting lost messages within the subgroup. To tackle the two issues at the same time, many tree-based hierarchies of subgroups are proposed in the literature. In different schemes, the special nodes in the subgroups can be group members or not, and they can be either individual hosts or co-located network entities such as routers. Moreover, the special nodes have different names in different schemes, e.g. the logging servers in [6], the local group controllers in [7], the designated receivers in [8], the proxies in [9], the Reliable Multicast proXies (RMXs) in [10]. However, the basic idea in these schemes is similar. In our paper, we follow the idea of using a hierarchy of subgroups to tackle the two issues by proposing a RingNet hierarchy of proxies, which is a combination of logical trees and logical rings. The aforementioned protocols consider only message loss problem, and they do not consider node failures within the hierarchy. In fact, the single point of failure problem due to the tree-based hierarchy may seriously degrade the protocol performance, especially when failures occur frequently. For reliable group communications, it is very important to maintain a reliable (fault tolerant) communication infrastructure. The proposed RingNet hierarchy is such an infrastructure, which can heal itself as quickly as possible in the presence of failures. Therefore, it has no single point of failure problem. As is well known, group communications in the context of mobile and wireless networks is affected not only by the status of processes (operational or crashed) and the status of links
1732
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
(connected or disconnected), but also by the dynamic locations of the MHs. However, only a few of works have been done for such network environment with regard to scalable and reliable group communications as follows. In [11], a two-tier host-view group communication protocol is proposed. A host-view is a set of mobile support stations (MSSs) representing the aggregate location information of a group. By tracking a set of MSSs other than individual MHs, group membership management is greatly simplified. Furthermore, in order to deliver a multicast message to a group of MHs, it only needs to send a copy of the message to those MSSs in the group’s host-view. However, this protocol does not allow dynamic joins or leaves, and does not specify a method for the creation or deletion of multicast groups. In particular, the global updates that are necessary with every ‘significant move’ make it inefficient and could cause lengthy breaks in service to the MHs [12]. In [12], a three-tier group communication protocol is proposed to deal with problems in the above two-tier protocol. The bottom tier consists of MHs that roam between cells. The middle tier consists of MSSs that provide the MHs with connectivity to the underlying network. The top tier consists of groups of MSSs, each of which is controlled by a Supervisor Host (SH). Since the SH is part of the wired network, it can handle most of the protocol details such as maintaining connections for the MHs, and collecting acknowledgement messages for reliable communications. Another three-tier protocol is proposed in [13]. In this scheme, movements of the MHs do not imply the exchange of any message in the wired network.
3. The basic topology maintenance protocol It is necessary to provide a reliable (fault tolerant) infrastructure for efficient and reliable group communications. In this section, we propose an infrastructure called a RingNet hierarchy of proxies for mobile group communications. The proposed hierarchy dynamically changes as time proceeds. This can be summarized into two cases. One is when failures occur within the hierarchy, which is discussed in Section 4. Another one is when MHs dynamically join, leave, fail in the group, or move around from one proxy coverage area to another one, which is the focus of this section. 3.1. The RingNet hierarchy Researchers have envisioned that, in the near future the wired Internet will integrate with a diversity of wireless networks, such as WLANs, 2.5G/3G/4G cellular networks, and GEO/MEO/LEO satellite networks, to form so-called mobile NGNs. Many mobile NGNs architectures are proposed, such as wireless overlay networking architecture [14], all-IP wireless/ mobile network architecture [15], and always best connected (ABC) architecture [16]. In these architectures, heterogeneity is one of the key problems. Heterogeneous MHs need to seamlessly access
different kinds of application services through heterogeneous wireless access networks. A natural solution is to place intermediate systems among wireless networks between the service providers side (e.g. the traditional application servers) and the mobile users side (e.g. the MHs), which are responsible for hiding the heterogeneity for both the service providers and the mobile users. In the literature, the proxy approach [14] and the communication gateway approach [17] are typical examples to provide such a solution. To address the heterogeneity problem in mobile NGNs, we propose to place multiple tiers of proxies among different networks, resulting in the multi-tier proxy-based mobile nextgeneration networks architecture shown in Fig. 1. We differentiate two kinds of proxies: (1) a variety of direct proxies (DPs) that directly serve their attached MHs, e.g. access points in WLANs, base stations in cellular networks, and satellites in satellite networks; and (2) some intermediate proxies (IPs) that are placed between the DPs and some multicast senders (MSs). In this paper, we are concerned with two kinds of MSs. An MS placed within the wired Internet is called a global sender, while within a wireless network called a local sender. A global sender provides multicast services to the mobile users in the global mobile NGNs, while a local sender provides services within a limited area covered by the wireless network. Different wireless networks connect to the wired Internet in different ways. Radio access networks (RANs) of the cellular networks connect to the wired Internet through the core networks (CNs) of the cellular networks [18]. WLANs either directly connect to the wired Internet through gateways or indirectly connect to the wired Internet firstly through gateways and then through CNs of the cellular networks [19]. Satellite networks connect to the wired Internet through fixed earth stations (FESs) in the satellite networks [15]. In our architecture, the proxies can be either individual hosts deployed in the network, or co-located network entities such as gateways in WLANs, radio network controller (RNC), serving GPRS support node (SGSN) and gateway GPRS support node (GGSN) in the cellular networks [18], FESs in the satellite networks, and even border routers in the wired Internet. Based on the proposed architecture, we propose a RingNet hierarchy of proxies, which can be used to simplify the tasks of group communications. Fig. 2 shows the hierarchy used in this paper. There are four tiers, namely, intermediate proxy tier 2 (IPT2), intermediate proxy tier 1 (IPT1), direct proxy tier (DPT), and mobile host tier (MHT), with the higher two tiers consisting of logically organized rings and with one leader in each logical ring. Besides functioning as normal proxies in a logical ring, the leader is also responsible for communicating with its parent node in the upper-tier logical ring within the hierarchy (if exists), and the leader is thus one of the children nodes of its parent. For each proxy in a logical ring, there exists one proxy that is its previous node, and there exists one proxy that is its next node. If the logical ring contains only one node, then the leader, previous and next nodes are simply the node
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
1733
Fig. 1. The multi-tier proxy-based mobile next-generation networks architecture.
itself. In a real system, the RingNet hierarchy may consist of less than four tiers for small-scale applications, or more than four tiers for large-scale applications. In order to maintain such a hierarchy in a self-organizable manner, we require each proxy to have some knowledge of its candidate neighbors, either some candidate siblings through which it can join a logical ring, or some candidate parents through which it can attach to an existing hierarchy. In the former case, the candidate sibling becomes one of the proxy’s current neighbors, i.e. the proxy’s previous or next node. In the latter case, the candidate parent becomes one of the proxy’s current neighbors called the proxy’s parent node. Accordingly, the proxy becomes one of the parent proxy’s current neighbors called the child node. 3.2. The data structures of MHs, proxies and tokens 3.2.1. The data structure of an MH Each MH maintains the following data structure. † GID: GroupID. Group identity, available from some group addressing scheme, e.g. Class D address in IP multicast [20]. † DP: NodeID. Node identity of the attached DP, e.g. its IP address.
† GUID: GloballyUniqueID. Globally unique identity of the MH, available from some globally unique identity scheme, e.g. home address (HA) [21] in Mobile IP networks and universal personal identification (UPI) [22] in nextgeneration personal communication systems. † LUID: LocallyUniqueID. Locally unique identity of the MH, available from some locally unique identity scheme, e.g. Mobile IP care-of address (CoA) [21]. † Status: Integer. Typical status like operational, disconnected, and faulty.
3.2.2. The data structure of a proxy Each proxy maintains the following data structure. † GID: GroupID. See the data structure of an MH. † Current, Leader, Previous, Next, Parent, Children[ ]: NodeID. Node identities of the current, leader, previous, next, parent, and children[ ] nodes in the hierarchy, respectively, e.g. their IP addresses. Notice that Children[ ] consists of a list of node identities, each of which stands for one child node. † PreviousOK, NextOK, ParentOK, ChildrenOK[ ]: Boolean. Status of the current node’s previous, next, parent, children nodes in the hierarchy, respectively, with TRUE for
1734
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
Fig. 2. The RingNet hierarchy.
†
†
†
†
†
† †
non-faulty and FALSE for faulty or not-existed. Notice that ChildrenOK[ ] consists of a list of sub-items, each of which describes the status of one child node. CandidateSiblings[ ], CandidateParents[ ]: NodeID. List of node identities of candidate siblings and candidate parents, respectively, e.g. their IP addresses. CandidateSiblingsOK[ ], CandidateParentsOK[ ]: Boolean. List of status of the candidate siblings and candidate parents, respectively, with TRUE for non-faulty and FALSE for faulty or not-existed. PreviousTimeout, NextTimeout, ParentTimeout, ChildrenTimeout: Real. Timeout intervals for detecting whether the previous, next, parent and children nodes are faulty or not, respectively. CandidateParentTimeout, CandidateSiblingTimeout: Real. Timeout intervals for detecting whether the candidate parent and candidate sibling nodes are faulty or not, respectively. ListOfMembers[ ]: MemberInfo. List of operational members. For any DP proxy, all the operational members directly attached to the DP are maintained in the list. For any IP proxy, all the operational members within the coverage area of a sub-RingNet hierarchy are maintained in the list. The sub-RingNet hierarchy consists of the logical ring where the IP proxy resides and downward to all the covered DPs/MHs. MQ: MessageQueue. Message queue to buffer membership change messages. MaxVersion: Integer. The maximal version number of the Token that has been generated by this node. Only leader proxy maintains this information.
3.2.3. The data structure of a token Each proxy independently collects/generates membership change/update messages, which are propagated by using a Token along a logical ring as follows. †GID: GroupID. See the data structure of an MH. †Leader: NodeID. Node identity of the leader proxy that generates the Token, e.g. its IP address. †Version: Integer. The version number of the Token. †MQ: MessageQueue. Message queue to buffer membership change/update messages. Notice that each membership change/update message also maintains additional information about the proxy that generates the message and about the time when the message is generated. 3.3. The basic topology maintenance algorithm There are two types of topology change events due to dynamic membership and dynamic locations of the MHs. One is ProxyJoin/Leave event, indicating that a proxy joins or leaves a logical ring at the same tier of the hierarchy. Another one is ProxyAttach/Detach event, indicating that a parent–child relationship is established or finished between nodes at neighboring tiers of the hierarchy. We then illustrate some example scenarios that result in topological maintenance of the proposed hierarchy. Scenario 1. When an MH hands-off to a new DP (or joins a DP), if it happens to be the first member in the new DP (or in the current DP), then the DP starts to attach to the hierarchy. If the MH is the last member that left a DP, then the DP may start to detach from the hierarchy. The Proxy-Attach/Detach event at the DP tier may further trigger the DP’s parent IP node to join, leave, attach, or detach the hierarchy accordingly.
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
Furthermore, the Proxy-Join/Leave/Attach/Detach event at the IP tier may trigger the IP’s parent IP node to join or leave the hierarchy. Scenario 2. Each proxy at different tiers of the proposed hierarchy maintains the membership information in its coverage area. If a proxy is aware that the number of operational members maintained by itself changes from 0 to non-zero, a Proxy-Join or Proxy-Attach event may be triggered when the proxy is currently not in the hierarchy. Similarly, if the number changes from non-zero to 0, a Proxy-Leave or Proxy-Detach event may be triggered when the proxy is currently in the hierarchy. Since the Proxy-Leave and Proxy-Detach (also including the DP’s detachment from its parent IP node) procedures are not mandatory, such an event is not triggered immediately in order to anticipate that the proxy may contain operational members after a while. We call such a mechanism the proxies’ Lazy-Leave/Detach from the hierarchy. We then informally describe the Topology-Maintenance algorithm for Proxy-Join/Leave and Proxy-Attach/Detach procedures. Topology maintenance is treated as a transaction by adopting a two phase commit technique. In each ProxyJoin/Leave transaction, three proxies are involved, namely, INITIATOR that issued the Proxy-Join/Leave message, PREVIOUS that is either a candidate neighbor for ProxyJoin procedure or the previous node of INITIATOR for ProxyLeave procedure, and NEXT that is either the next node of PREVIOUS for Proxy-Join procedure or the next node of INITIATOR for Proxy-Leave procedure. At the first phase, INITIATOR sends Proxy-Join/Leave request to PREVIOUS, and PREVIOUS responds either positively or negatively, and then possibly INITIATOR sends Proxy-Join/Leave request to NEXT, and NEXT responds either positively or negatively. At the second phase, INITIATOR commits or rollbacks, and notifies PREVIOUS and NEXT to commit or rollback accordingly. For the topology maintenance algorithm with ProxyAttach/Detach procedures, only two nodes are involved, INITIATOR and its PARENT. Since the Proxy-Attach/Detach procedures are similar to, but simpler than the ProxyJoin/Leave procedures, we neglect to describe them in detail. At the end of this section, we briefly introduce a mobility detection mechanism. A DP becomes aware of which MHs are in its coverage area using heartbeat messages. Periodically, the DP broadcasts ‘heartbeat’ messages within its coverage area. Upon receiving a heartbeat message, an MH announces its presence to the DP by sending back a ‘greeting’ message that contains its host identities. Upon receiving a ‘greeting’ message, the DP registers the MH as a group member. The MH then refreshes its presence by sending messages to the DP periodically. If the DP cannot get such refreshment information after a timeout interval, it assumes the MH either leaves or fails in the DP. In addition, in case that an MH simultaneously received several ‘heartbeat’ messages from different DPs, it selects one from them to respond with a ‘greeting’ message. The selection procedure may be co-located with the MH’s handoff strategy, which is beyond the scope of this paper. In this paper, we assume a simple handoff strategy: if an MH
1735
received several heartbeat messages from different DPs, then it simply selects the DP with the shortest distance between itself and the DP, and responds to the DP with a ‘greeting’ message. 4. The fault tolerant topology maintenance protocol In this section, we first introduce a failure detection mechanism as the basis to maintain the RingNet hierarchy in the presence of failures, then show the failure cases within the hierarchy, then present the Ring/Hierarchy-Repair algorithms to exclude the failures from the hierarchy. We also analyze the Function-Well probability of the hierarchy, which shows that the proposed hierarchy is highly resilient to failures. In particular, we show that the proposed hierarchy has the property of selforganization in the sense that, it can heal itself as quickly as possible in the presence of failures that occur in the hierarchy, by running the proposed Ring/Hierarchy-Repair algorithms. 4.1. The failure-detection mechanism We introduce a failure detection mechanism to detect failures in the RingNet hierarchy. In this paper, we differentiate two kinds of neighbors, namely, current neighbors and candidate neighbors. The ways for detecting whether these two kinds of neighbors are faulty or not are slightly different from each other as follows. In order to detect whether a proxy’s current neighbors are faulty or not: (1) each operational proxy in the hierarchy periodically sends heartbeat messages to show that ‘I’m alive’ to its current neighbors: Previous, Next, Parent and Children nodes; (2) each heartbeat message contains information about node identities of its Leader, Previous and Next nodes, and information about whether the Leader is at the ‘top’ logical ring or not, i.e. whether the Leader has its Parent node or not; (3) each operational proxy in the hierarchy listens and receives such heartbeat messages, and responds to PreviousTimeout, NextTimeout, ParentTimeout, and ChildrenTimeout timeout settings independently. In order to detect whether a proxy’s candidate neighbors are faulty or not, we use the polling messages other than heartbeat messages as follows. If an operational proxy q needs to detect whether a candidate proxy p is faulty or not, it periodically sends polling messages to p. When p received a polling message, it sends the message back to q with the information about node identities of its Previous, Next and Leader nodes, and the information about whether or not p is at the ‘top’ logical ring within the sub-hierarchy where it resides. The process q then listens and receives the returned polling messages to determine whether to suspect p or not by responding to different timeout settings including CandidateParentTimeout and CandidateSiblingTimeout independently. 4.2. The failure cases in the RingNet hierarchy In Section 3, we considered the RingNet hierarchy with fixed tiers. In real scenarios, the mobile users are not uniformly distributed in the mobile NGNs. Furthermore, the real networks
1736
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
do not have the same size. Therefore, it is more reasonable to consider that each IPT1 or IPT2 tier may have its sub-tiers. In this section, we illustrate our protocol at the IPT1 tier with subtiers only. There are many failure cases in the RingNet hierarchy. For simple illustration, we only consider node failures. In case of link failures, we assume all the signaling messages, except the heartbeat/polling messages, are always reliably transmitted by some retransmission mechanism. If a node failure breaks a previous–next relationship in a logical ring, then a Ring-Repair algorithm (see Section 4.3) runs to exclude such a failure. If a node failure results in parent–child relationship in a hierarchy to be finished, then a Hierarchy-Repair algorithm (see Section 4.4) runs to deal with such a failure. We discuss three kinds of node failure cases as follows (also see Fig. 3). Case 1. Single non-leader failure. For example, if one IP node in the ring R2 becomes faulty, then the ring is broken, and a Ring-Repair algorithm runs to exclude the IP node from the ring. At the same time, the parent–child relationship between the IP node and its child node in the ring R4 is finished, and a Hierarchy-Repair algorithm runs to make the ring R4 to either attach to the hierarchy or merge with a sibling logical ring. Case 2. Single leader failure. For example, the leader IP node in the ring R3 becomes faulty. Similar to Case 1, both the Ring-Repair and Hierarchy-Repair algorithms start to run. Different from Case 1, the parent–child relationship between the leader and its parent is also finished. Therefore, a
Hierarchy-Repair algorithm then runs to make the ring R3 itself to either attach to the hierarchy or merge with a sibling ring. Case 3. Consecutive failures in a logical ring. For example, the IP07 and IP08 nodes in ring R5 become faulty simultaneously. Similar to Case 1 and Case 2, both the RingRepair and Hierarchy-Repair algorithms start to run. Different from them, the Ring-Repair algorithm in this case is more complicated than that in single failure cases (see below for the Slow-Repair procedure). 4.3. The Ring-Repair algorithm We then informally describe the Ring-Repair algorithm as follows. For each proxy running the proposed protocol, if its PreviousTimeout or NextTimeout event occurs, it assumes that its previous or next node in the current logical ring is faulty, then sets its PreviousOK or NextOK to FALSE accordingly, and starts to run a Fast-Repair procedure to repair the ring; if the Fast-Repair procedure cannot make progress after a timeout interval, it assumes that consecutive failures occur in the ring, then starts to run a Slow-Repair procedure to repair the ring. The basic idea of the Fast-Repair procedure is as follows. In order to exclude a single failure from a logical ring, the proxy that is aware of the failure issues a Fast-Repair message to a DESTINATION node to establish a new previous–next relationship between them. The DESTINATION is the proxy’s
Fig. 3. The failure cases in the RingNet hierarchy.
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
previous to previous or next to next node in the logical ring where the proxy resides, the information of which is got from the heartbeat messages for detecting failures. However, the basic idea of the Slow-Repair algorithm is a little complicated. When the Fast-Repair procedure cannot make progress after a timeout interval, it issues a Slow-Repair message forwarded along the logical ring to find a DESTINATION node and then establish a new previous–next relationship between them. Notice that the DESTINATION is the node that cannot forward the Slow-Repair message along the logical ring any longer. In case that failures occur infrequently, it is very probable that only the Fast-Repair procedure runs; in case that failures do occur frequently and at least two consecutive nodes happen to be faulty simultaneously, then the Slow-Repair procedure runs. Since only two neighbors of the faulty node are involved in the Fast-Repair procedure, the procedure is executed very efficiently, and that is why we call it Fast-Repair. However, the Slow-Repair procedure is of less efficiency because the SlowRepair message has to traverse all the successive non-faulty nodes along the ring, and that is why we call it Slow-Repair. If the faulty node excluded from the logical ring happens to be a leader, then either the DESTINATION or the node that issued the Fast-Repair or Slow-Repair message is designated as a new leader and the new leader information is then notified to all the nodes in the logical ring. 4.4. The Hierarchy-Repair algorithm We then informally describe the Hierarchy-Repair algorithm as follows. For each proxy running the proposed protocol, if its ParentTimeout or ChildrenTimeout event occurs, it assumes that its parent or one of its children nodes in the hierarchy is faulty, then sets its ParentOK or ChildrenOK[ ] to FALSE accordingly. When a leader’s ParentOK becomes FALSE, it re-joins the hierarchy either by a Proxy-Attach procedure through one of its candidate parents, or by a Ring-Merge procedure through one of its candidate siblings in order to merge two logical rings into one. The Proxy-Attach procedure here is the same as the ProxyAttach procedure discussed in Section 3.3. In addition, if the logical ring contains the leader only, then the Ring-Merge procedure here is the same as the Proxy-Join procedure discussed in Section 3.3. If the logical ring contains more than one node, then the Ring-Merge procedure can be considered as an extension to the Proxy-Join procedure as follows. The leader first issues a Ring-Merge request to some candidate siblings. If more than one sibling responds to the request with positive acknowledgements, then the leader chooses one among them according to some rules. Then a two phase commit technique is adopted to merge the two logical rings as follows. In each Ring-Merge transaction, four proxies are involved, namely, the LEADER, the LEADER’s next node called NEXT, the CANDIDATE, and the CANDIDATE’s next node called CANDIDATE-NEXT. The information of the last one is got from the polling messages returned from the candidate sibling. At the first phase, LEADER sends a Leader-Merge message to the other three proxies, and the other
1737
three respond either positively or negatively. At the second phase, if the three responses are all positive, then LEADER commits and notifies the other three to commit; otherwise, LEADER rollbacks and notifies the other three to rollback. When the two rings are successfully merged into one, LEADER issues a Leader-Change message using the Leader information got from the polling messages to notify the proxies in the former logical ring where LEADER resides to change their new leader information. In the Ring-Merge procedure, there is an extreme case that results in problems. When two logical rings being merged happen to be the two ‘top’ logical rings of their sub-hierarchies, the two LEADERs may simultaneously start their Ring-Merge procedures. Two problems may occur: (1) if at least two common proxies are in the two sets of four proxies participating in the two transactions, then it is probable that the two transactions may not make progress due to unsuccessful locking of the participating proxies; (2) if no common proxies exist in the two sets, when the two transactions independently commit, then an inconsistent logical ring may be formed. To solve the problems, a simple condition is added: only one of the two LEADERs with the larger node identity is allowed to start its transaction. 4.5. The analysis on Function-Well probability We first informally define the Function-Well concept for a logical ring. If failures do not simultaneously occur in any two consecutive proxies in a logical ring, then only the Fast-Repair procedure is required to exclude individual failures from the logical ring. In this case, we say the logical ring functions well. If a logical ring contains at least two consecutive failures, then it needs to run the Slow-Repair procedure to exclude such failures from the logical ring, and we say the logical ring does not function well. We then present an analytical model for the Function-Well (fw) probability of the RingNet hierarchy with the following parameters: n, the total number of the DP nodes; h, the height of the hierarchy; r, the maximal number of nodes in each logical ring; c, the maximal number of children nodes attached to each proxy; f, the node failure probability with uniform and independent fault distribution in the hierarchy; and k, the maximal number of allowed Slow-Repair procedures, i.e. if at most k Slow-Repair procedures are allowed, then we say the hierarchy functions well. The Function-Well probability t of each logical ring is defined as: ½r=2 def X
t Z Probring fw Z
Bðr; iÞð1Kf ÞrKi f i
(1)
iZ0
In Formula (1), Bðr; iÞ stands for the number of failure occurrences such that there are exactly i faulty nodes and there are not any two consecutive faulty nodes in the logical ring that contains altogether r nodes. For 0% i% 3 and rR 5, it is easy to
1738
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
Table 1 The B(r, i) values for computing the Function-Well probability of a logical ring r
i
B(r,i)
r
i
B(r,i)
4 4 4 6 6 6 6 8 8
0 1 2 0 1 2 3 0 1
1 4 2 1 6 9 2 1 8
8 8 8 10 10 10 10 10 10
2 3 4 0 1 2 3 4 5
20 16 2 1 10 35 50 25 2
deduce Bðr; iÞ as follows: ! r Bðr; 0Þ Z Z1 0
Bðr; 1Þ Z
Bðr; 2Þ Z
Bðr; 3Þ Z
Z
r
r
k def X Probhierarchy Z fw iZ0
Zr !
!
3
! Z
1 r
K
tn i
! ttnKi ð1KtÞi
(6)
(3)
r K
2 r
(2)
!
1
there are not any two consecutive faulty nodes. Based on this idea, we programmed the procedure and got the program output for Bðr; iÞ in Table 1. To compute the Function-Well probability of the RingNet hierarchy, we suppose that the hierarchy is full: it contains maximal number of tiers; each tier contains maximal number of logical rings; and each logical ring contains maximal number of proxies; and each proxy is attached by maximal number of children proxies. Intuitively, under the same network situation, a small hierarchy may have higher probability to function well than a large hierarchy. Therefore, we can make the worst-case analysis by using such a full hierarchy.PSuch a full hierarchy contains nZ ðr !cÞhK2 DPs, 3 i and tnZ hK iZ0 ðr !cÞ logical rings, among which at most k logical rings do not function well. We then present the Function-Well probability of the hierarchy as:
1
Numerical results according to Formulae (1) and (6) are shown in Table 2 with hZ4 and cZr. From the table we deduce the following conclusions:
rðrK3Þ 2
!
r K
1
!
(4)
rK4
† The RingNet hierarchy functions well in the sense that, with high probability of 99.899%, a RingNet hierarchy with up to 10 000 DPs directly attached by a large number of MHs only needs simple and efficient Fast-Repair procedures to repair broken logical rings when the node failure probability is bounded by 0.1%. If at most two SlowRepair procedures are allowed, then the Function-Well probability of the hierarchy is 100.0%. † Under the definition of a Function-Well hierarchy with at most two Slow-Repair procedures, with high probability of 99.985%, a group with up to 10 000 DPs can guarantee that the hierarchy still functions well when the node failure probability is bounded by 1.0%.
!
1
rðr 2 K9r C 20Þ 6
(5)
However, it is difficult to deduce a generic formula to compute Bðr; iÞ when iR 4. Fortunately, this work can be done routinely by a computer program: for each i, 0% i% ½r=2, we simply enumerate all the possible occurrences of i faulty nodes among r nodes, and count the number of occurrences such that Table 2 The Function-Well probability of a RingNet hierarchy n
r
f (%)
K
fw (%)
n
r
f (%)
k
fw (%)
256 256 256 256 256 256 256 256 256 1296 1296 1296 1296 1296 1296 1296 1296 1296
4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6
0.1 0.1 0.1 1.0 1.0 1.0 5.0 5.0 5.0 0.1 0.1 0.1 1.0 1.0 1.0 5.0 5.0 5.0
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
99.993 100.000 100.000 99.329 99.998 100.000 85.012 98.882 99.947 99.978 100.000 100.000 97.826 99.977 100.000 58.837 90.269 98.438
4096 4096 4096 4096 4096 4096 4096 4096 4096 10 000 10 000 10 000 10 000 10 000 10 000 10 000 10 000 10 000
8 8 8 8 8 8 8 8 8 10 10 10 10 10 10 10 10 10
0.1 0.1 0.1 1.0 1.0 1.0 5.0 5.0 5.0 0.1 0.1 0.1 1.0 1.0 1.0 5.0 5.0 5.0
0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
99.948 100.000 100.000 94.981 99.874 99.998 28.869 65.081 87.443 99.899 100.000 100.000 90.483 99.537 99.985 8.976 30.873 57.320
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
† With the node failure probability increasing to 5.0%, the small-scale hierarchy still functions well with very high probability. For example, the Function-Well probability is 99.947% for a small-scale hierarchy with up to 256 DPs directly attached by MHs. However, the large-scale hierarchy with up to 10 000 DPs functions well with low probability of 57.320%.
5. Performance evaluation In this section, we have conducted extensive simulations to evaluate the performance of our topology maintenance protocol. In our simulations, we used a variation of the oneround token passing membership propagation algorithm proposed in [24] in order to improve the efficiency of the membership propagation. Within each logical ring, a Token is used to propagate membership information kept in each proxy’s MQ. In the normal case, when the Token arrives at a proxy, it updates the proxy’s ListOfMembers[ ] using Token.MQ, and the Token.MQ is then updated using the proxy’s MQ. In this way, Token.MQ maintains the up-to-date membership change/update information which is piggybacked from all the proxies in the logical ring, and the membership change/update information can be propagated along the logical ring efficiently. Also notice that in our simulations, we used a simplified version of the reliable multicast dissemination algorithm proposed in [25] in order to derive only the metrics related to our proposed topology maintenance protocol in this paper. In this section, we first describe some performance metrics designed to be used in average cases in the topology maintenance protocol, then present the simulation scenarios and simulation results. 5.1. Performance metrics Join-Delay is defined as the difference between the time at which an MH received the first multicast message and the time at which the MH issued its willing to join the group. Handoff-Delay is defined as the difference between the time at which an MH received the first multicast message from a new DP and the time at which the MH was aware that it has handed-off to the new DP. Service-Speed is defined as the difference between the time at which an MH’s membership information is successfully registered with the leader of the top logical ring and the time at which the MH issued its willing to join the group. Signaling-Overhead is defined as the total number of signaling messages received by all the proxies for topology maintenance during the simulation divided by the total number of proxies then divided by the total simulated time, which stands for the average signaling overhead of the proposed protocol. We call such a metric normalized number of signaling messages, or Norm.Num.Msgs for short.
1739
5.2. Simulation scenarios The ns-2 simulation tool [23] is used for our simulations. For each network topology, the DPs are configured into an m! n mesh, together with (m!n)/4 IPs at the IPT1 tier, (m!n)/4 IPs at the IPT2 tier, and (3m!n)/2 other proxies acting as intermediators among the DPs and IPs, one of which serves as a multicast sender. Each DP is attached by one MH for sparse mode (SM) simulation and two MHs for dense mode (DM) simulation. At any time, around one group member appears within a set of eight DPs in SM simulation, and around one group member within each DP in DM simulation. Enough physical links are configured to guarantee good connectivity among all the proxies. Initially, each DP is connected to its parent IP through some intermediate proxies. To deal with fault tolerance, each DP is configured with four candidate IPs at the IPT1 tier for ProxyAttach/Detach procedures, each IP at the IPT1 tier is configured with four candidate IPs at the IPT1 tier for ProxyJoin/Leave procedures and four candidate IPs at the IPT2 tier for Proxy-Attach/Detach procedures, and each IP at the IPT2 tier is configured with four candidate IPs at the IPT2 tier for Proxy-Join/Leave procedures. Fig. 4 shows an example 4!4 DP configuration with physical links at the DP tier. We simulate 8!4, 12!9, 16!16, 20!25, 24!36 DP configurations, with each IP at the IPT1 tier initially being attached by a set of 4 DPs, with each IP at the IPT2 tier initially being attached by 2, 3, 4, 5, 6 IPs at the IPT1 tier, respectively, and with each IP logical ring initially consisting of 2, 3, 4, 5, 6 IPs, respectively. For the smallest topology in SM, it consists of 32 MHs, 32 DPs, 8 IPs at the IPT1 tier, 8 IPs at the IPT2 tier, and 48 other proxies, with the whole coverage area of 5360 m!2680 m when the coverage area of each DP is 670 m!670 m. Therefore, the total number of nodes in the smallest topology in SM is 128. For the largest topology in DM, it consists of 1728 MHs, 864 DPs, 216 IPs at the IPT1 tier, 216 IPs at the IPT2 tier, and 1296 other proxies, with the whole coverage area of 16 080 m!24 120 m when the coverage area of each DP is 670 m!670 m. Therefore, the total number of nodes in the largest topology in DM is 4320. In all the scenarios, the total simulated time is fixed to 600 s. We fix the link bandwidth to 10 Mbps, the link delay to 10 ms,
Fig. 4. The example 4!4 DP configuration.
1740
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
the message loss rate to 1.0% for all the wired links, and the equivalent link bandwidth to 2 Mbps, the link delay to 20 ms, the message loss rate to 2.0% for all the wireless links between DPs and MHs. The multicast sender sends CBR traffic, with message size of 512 Bytes and with message rate of one message every 100 ms. In the proposed protocol, we used some timeout settings. In order to detect whether a proxy’s current neighbors are faulty or not, the timeout intervals for sending heartbeat messages and that for suspecting a node are set to 50 and 200 ms, respectively. In addition, in order to detect whether a proxy’s candidate neighbors are faulty or not, the timeout intervals for sending polling messages and that for suspecting a node are set to 50 and 250 ms, respectively. The timeout interval at each DP for mobility detection is fixed to 1 s, and the timeout interval at each MH to refresh its status is set to 1 s. The timeout interval at the leader in each logical ring to propagate its membership to its parent (if exists) is fixed to 1 s. The timeout interval for signaling message retransmission is set to 100 ms and the maximal retransmission times is set to 3. The timeout interval for each DP to check whether or not any member has been inactive for too long is set to 3 s. The Lazy-Leave/Detach timeout interval for any proxy to really leave/detach from the hierarchy is set to 3 s. The timeout interval for the Fast-Repair procedure to start the Slow-Repair procedure is set to 1 s. To simulate dynamic locations of the MHs, we adopt the CMU mobility model [23] at maximal speed of 15 m/s and pause time of 5 s. To simulate dynamic membership, we design a Member-Join/Leave pattern with two parameters:
Minimal/Maximal Interval defined as the minimal/maximal time interval between any two consecutive Member-Join/Leave events for the same MH, which is set to {50 s, 70 s}. The start time for an MH to trigger its Member-Join event is defined as a random variable ranging from 1.3 to 20.0 s. In order to dynamically control the member ratio of group members to all the MHs to be an expected value, each time a Member-Join event is triggered, a random variable is used to decide whether an MH really joins the group or not. For example, in order to control the member ratio in SM simulations to be 12.5%, i.e. around one member within a set of eight DPs, if the random variable associated with the Member-Join event of an MH is less than 12.5%, then the MH really joins the group; otherwise, the Member-Join event is neglected. To simulate dynamic networks, a proxy failure is emulated by breaking all its incident links simultaneously, and we do not simulate any group member MH failure. We use the deterministic model in ns-2 [23] with the following four parameters: Start-Time that denotes the time for a proxy to start to be faulty, Up-Interval and Down-Interval that denote the proxy is up (i.e. non-faulty) and down (i.e. faulty), respectively, during that period of time, and Ratio that is the ratio of the number of the proxies that may become faulty to the total number of proxies in the simulation. Start-Time is set by a random number ranging from 0.0 to 100.0 s. The three-tuple of {Up-Time, Down-Time, Ratio} represents the node failure probability, which is set to {95.0 s, 5.0 s, 0.2}, {95.0 s, 5.0 s, 1.0}, and {90.0, 10.0, 1.0} for 1.0, 5.0, and 10.0% of node failure probability in our simulations, respectively.
Fig. 5. The simulation results for the scalability property.
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
1741
Fig. 6. The simulation results for the reliability property.
5.3. Simulation results We have conducted extensive simulations with SM and DM member populations for various network sizes and node failure probabilities. We obtain the simulation results as an average over 10 independent simulation runs. Fig. 5 shows the scalability property, where the X-axis shows the network sizes, and the Y-axis shows the performance metrics under evaluation. The network sizes appear as pairs of numbers, which stand for the total number of simulated nodes in SM and DM, respectively. In each sub-figure, two curves are plotted according to SM and DM populations. We do not report the number of heartbeat/polling related messages in our simulations because they are localized to neighbors only and their numbers are relatively stable for different network settings with the same timeout intervals. Such messages include the heartbeat/greeting messages for mobility detection, the messages issued by each MH for its status refreshment, the messages issued by the leader in each logical ring to its parent for its membership update, and the heartbeat/polling messages for failure detection. Since our simulations show similar trends for the reliability property in all the topology settings, we reported here only with the largest topology setting. In Fig. 6, the X-axis shows the node failure probabilities, and the Y-axis shows the performance metrics under evaluation. From the simulation results we observed that: † Fig. 5 shows the proposed protocol scales very well. When the size of the network becomes large and the density
of members is fixed, the performance of the proposed protocol, i.e. the Join-Delay, Handoff-Delay and ServiceSpeed, keeps very high and varies in a small range, while the signaling overhead, i.e. Norm.Num.Msgs, is low and keeps almost at the same level. For example, the Join-Delay metrics for the five network sizes in SM are 116.86, 120.40, 121.11, 124.15, and 128.92 ms, with the largest variance of only 128.92K116.86Z12.06 ms, and the maximal Norm.Num.Msgs in SM is 2.2 signaling messages per node per second. Notice that in the two curves of signaling overhead metric, there is a trend that it decreases when the network size increases. We explain the reason as follows. Since the Token messages for propagating membership change/update information in each logical ring contribute to one of the major components of all the signaling messages, we compute the ratio of the total number of Tokens to the total number of wired nodes in different network topologies. Due to the fact that the total number of logical rings, i.e. the total number of Tokens, changes due to topological change of the RingNet hierarchy, we estimate it by its initial value when the simulation starts. Therefore, for the five simulated network topologies, the estimated ratios are (1C2!2)/3!8!4, (1C 3!3)/3!12!9, (1C4!4)/3!16!16, (1C5!5)/3!20! 25, and (1C6!6)/3!24!36, which are 5.21, 3.09, 2.21, 1.73, and 1.43%, respectively. We see the ratio decreases when the network size increases. As a consequence, we observed the signaling overhead decreases when the network size increases.
1742
G. Wang et al. / Computer Communications 29 (2006) 1730–1743
† Fig. 6 shows the performance is affected by the node failure probability of proxies in the hierarchy when both the network size and the density of members are fixed. With the increase of node failure probability from 0.0 to 10.0%, the performance degrades gracefully. For example, the Join-Delay metrics for the four node failure probabilities in SM are 128.92, 133.57, 143.45, and 167.54 ms, with the largest variance of only 167.54K 128.92Z38.62 ms between the largest fault case and the faultfree case. † Figs. 5 and 6 show the performance is also affected by the density of members involved. (1) If a group is densely populated, e.g. in DM, then it is highly probable for an MH to receive messages immediately when the MH joins a DP. (2) If a group is sparsely populated, e.g. in SM, then an MH may have to wait for some time to receive messages when the MH joins a DP because the DP may have to join the hierarchy. These are the reasons why we observed an apparent gap between the two curves in the Join-Delay sub-figures. However, the gap between the two curves in the Handoff-Delay sub-figures is not apparent because our protocol uses a reservation mechanism, which weakens the difference between SM and DM. † The Service-Speed metric is relatively stable in all the simulated scenarios because we fixed the height of the hierarchy to 4 and the timeout interval for generating membership change/update messages to 1 s. In real communication environment, these parameters vary with different application requirements. In this sense, we reported a relative Service-Speed metric for evaluating the proposed protocol in different cases. † The Signaling-Overhead is mainly affected by the density of members and it is insensitive to network sizes and node failure probabilities. Since the size of the group in SM is smaller than that in DM with the same network size and the same node failure probability, the number of membership change/update messages and other related signaling messages in SM is naturally less than that in DM. According to the definition of Signaling-Overhead, which is the total number of signaling messages divided by the total number of simulated wired nodes in each time unit, the overhead in SM is lower than that in DM.
6. Conclusions In this paper, we proposed a self-organizable topology maintenance protocol in mobile next-generation networks based on a RingNet hierarchy of proxies. The proposed protocol runs in a parallel and distributed way, in the sense that each proxy in the hierarchy maintains local information about its current neighbors and candidate neighbors, and that each proxy independently decides whether to join or leave the hierarchy or not. Besides the topology maintenance protocol in this paper, we have also done some preliminary works on reliable and secure group communications using the hierarchy [25,26].
Acknowledgements This work is supported by the Hong Kong Polytechnic University Central Research Grant G-YY41, the University Grant Council of Hong Kong under the CERG Grant PolyU/03E, and the National Basic Research Program (973) MOST of China under Grant No. CB312002. References [1] J.F. Huber, Mobile next-generation networks, IEEE Multimedia 11 (1) (2004) 72–83. [2] I. Keidar, J. Sussman, K. Marzullo, D. Dolev, Moshe: a group membership service for WANs, ACM Transactions on Computer Systems 20 (3) (2002) 191–238. [3] U. Varshney, Multicast support in mobile commerce applications, IEEE Computer 35 (2) (2002) 115–117. [4] A. Dutta, J.M. Chennikara, W. Chen, O. Altintas, H. Schulzrinne, Multicasting streaming media to mobile users, IEEE Communications Magazine 41 (10) (2003) 81–89. [5] A. Dutta, H. Schulzrinne, MarconiNet: overlay mobile content distribution network, IEEE Communications Magazine 42 (2) (2004) 64–75. [6] H.W. Holbrook, S.K. Singhal, D.R. Cheriton, Log-based receiver-reliable multicast for distributed interactive simulation, Proceeding of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 1995), Cambridge, Massachusetts, United States, 1995, pp. 328–341. [7] M. Hofmann, T. Braun, G. Carle, Multicast communication in large scale networks, Proceeding of the IEEE 1995 Workshop on the Architecture and Implementation of High Performance Communication Subsystems (HPCS 1995), 1995, pp. 147–150. [8] S. Paul, K.K. Sabnani, J.C.-H. Lin, S. Bhattacharyya, Reliable multicast transport protocol (RMTP), IEEE Journal on Selected Areas in Communications 15 (3) (1997) 407–421. [9] A.P. Markopoulou, F.A. Tobagi, Hierarchical reliable multicast: Performance analysis and placement of proxies, Proceeding of the NGC 2000 on Networked Group Communication, Palo Alto, California, United States, 2000, pp. 27–35. [10] Y. Chawathe, S. McCanne, E.A. Brewer, RMX: Reliable multicast for heterogeneous networks, Proceeding of the IEEE Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2000), vol. 2, 2000, pp. 795–804. [11] A. Acharya, B.R. Badrinath, A framework for delivering multicast messages in networks with mobile hosts, ACM/Kluwer Mobile Networks and Applications 1 (2) (1996) 199–219. [12] K. Brown, S. Singh, RelM: Reliable Multicast for mobile networks, Computer Communications (Elsevier Science) 21 (16) (1998) 1379–1400. [13] G. Anastasi, A. Bartoli, F. Spadoni, A reliable multicast protocol for distributed mobile systems: design and evaluation, IEEE Transactions on Parallel and Distributed Systems 12 (10) (2001) 1009–1022. [14] E.A. Brewer, R.H. Katz, Y. Chawathe, S.D. Gribble, T. Hodes, G. Nguyen, M. Stemm, T. Henderson, E. Amir, H. Balakrishnan, A. Fox, V.N. Padmanabhan, S. Seshan, A network architecture for heterogeneous mobile computing, IEEE Personal Communications 5 (5) (1998) 8–24. [15] T.B. Zahariadis, K.G. Vaxevanakis, C.P. Tsantilas, N.A. Zervos, N.A. Nikolaou, Global roaming in next-generation networks, IEEE Communications Magazine 40 (2) (2002) 145–151. [16] E. Gustafsson, A. Jonsson, Always best connected, IEEE Wireless Communications 10 (1) (2003) 49–55. [17] W. Kellerer, H.-J. Vogel, A communication gateway for infrastructureindependent 4G wireless access, IEEE Communications Magazine 40 (3) (2002) 126–131. [18] T. Tamura, T. Takahashi, T. Morita, K. Ohtaki, H. Takeda, IMT-2000 core network node systems, IEEE Wireless Communications 10 (1) (2003) 15–21.
G. Wang et al. / Computer Communications 29 (2006) 1730–1743 [19] M.M. Buddhikot, G. Chandranmenon, S. Han, Y.-W. Lee, S. Miller, L. Salgarelli, Design and implementation of a WLAN/CDMA2000 interworking architecture, IEEE Communications Magazine 41 (11) (2003) 90–100. [20] S. Deering, D.R. Cheriton, Multicast routing in datagram internetworks and extended LANs, ACM Transactions on Computer Systems 8 (2) (1990) 85–110. [21] C. Perkins, IP mobility support, IETF RFC 2002, 1996. [22] Q. Tian, D.C. Cox, Optimal replication algorithms for hierarchical mobility management in PCS networks, Proceeding of the IEEE Wireless Communications and Networking Conference (WCNC 2002), vol. 2, 2002, pp. 556–562. [23] http://www.isi.edu/nsnam/ns/ [24] G. Wang, J. Cao, K.C.C. Chan, RGB: a scalable and reliable group membership protocol in mobile Internet, Proceeding of the IEEE 33rd International Conference on Parallel Processing (ICPP 2004), Montreal, Quebec, Canada, August 2004, pp. 326–333. [25] G. Wang, J. Cao, K.C.C. Chan, A reliable totally-ordered group multicast protocol for mobile Internet, Proceeding of the IEEE 33rd International Conference on Parallel Processing Workshops (ICPP 2004 Workshops), Montreal, Quebec, Canada, 2004, pp. 108–115. [26] G. Wang, L. Liao, J. Cao, K.C.C. Chan, Key management for secure multicast using the RingNet hierarchy, Proceeding of the 2004 International Symposium on Computational and Information Sciences (CIS 2004), Shanghai, P.R. China, 2004, pp. 77–84.
Dr. Guojun Wang was a Research Fellow (March 2003-March 2005) in the Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. He is currently an Associate Professor (September 2001-) in the Department of Computer Science, The Central South University, Changsha, P. R. China. He is also the director of the Mobile Computing Lab in the department. He received his B.Sc. degree in Geophysics, M.Sc. degree in Computer Science, and Ph.D. degree in Computer Science, from The Central South University, in 1992, 1996, 2002, respectively. His research interests include computer networks, group communications, fault tolerance, and mobile computing. He has published over 80 technical papers in the above areas. He has served as a reviewer for many international journals such as IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, and also as a TPC member for many international conferences such as ICC, IWCMC, EUC, ISPA, and MSN. He is also a senior member of the China Computer Federation, a YOCSEF member of the China Computer Federation, and a member of the Hunan Provincial Association of Computers.
1743
Prof. Jiannong Cao Prof. Jiannong Cao received the BSc degree in computer science from The Nanjing University, Nanjing, China in 1982, and the MSc and the Ph.D degrees in computer science from The Washington State University, Pullman, WA, USA, in 1986 and 1990 respectively. He is currently a Professor in the Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong. He is also the director of the Internet and Mobile Computing Lab in the department. He was on the faculty of computer science at The James Cook University and The University of Adelaide in Australia, and The City University of Hong Kong. His research interests include parallel and distributed computing, networking, mobile computing, fault tolerance, and distributed software architecture and tools. He has published over 160 technical papers in the above areas. He has served as a member of editorial boards of several international journals, a reviewer for many international journals / conference proceedings, and also as an organizing / programme committee member for many international conferences. Dr. Cao is a member of the IEEE Computer Society, the IEEE Communication Society, IEEE, and ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, IEEE Technical Committee on Parallel Processing, IEEE Technical Committee on Fault Tolerant Computing, and Computer Architecture Professional Committee of The China Computer Federation.
Prof. Keith C. C. ChanProf. Keith C. C. Chan received the B.Math. degree in Computer Science and Statistics, and the M.A.Sc. and Ph.D. degrees in Systems Design Engineering from The University of Waterloo, Waterloo, Ontario, Canada. He had worked for a number of years at The IBM Canada Laboratory, Toronto Ontario, where he was involved in the development of software engineering tools. In 1993, he joined the Department of Electrical and Computer Engineering, The Ryerson University, Toronto, Ontario, Canada, as an Associate Professor and in 1994, he joined the Department of Computing of The Hong Kong Polytechnic University, Hong Kong where he is now Professor and Head. He is also an Adjunct Professor of the Institute of Software, The Chinese Academy of Sciences, Beijing, China. He is active in consultancy and has served as a consultant to government agencies and various companies in Hong Kong, China, Singapore, Malaysia and Canada. His research interests are in data mining, bioinformatics and software engineering.