On selective tuning in unreliable wireless channels

On selective tuning in unreliable wireless channels

I KNOWLEDGE DATA& ENGINEERING ELSEVIER Data & Knowledge Engineering 28 (1998) 209-231 On selective tuning in unreliable wireless channels Klan-Lee T...

1MB Sizes 0 Downloads 48 Views

I KNOWLEDGE DATA& ENGINEERING ELSEVIER

Data & Knowledge Engineering 28 (1998) 209-231

On selective tuning in unreliable wireless channels Klan-Lee Tan*, Beng Chin Ooi Dept. of Information Systems and Computer Science, National University of Singapore, Lower Kent Ridge Road, Singapore 119260, Singapore Received 30 October 1997; revised 23 February 1998; accepted 27 March 1998

Abstract

In a wireless computing environment, the server disseminates information by periodically broadcasting data on 'air', while clients 'catch' their desired data on the fly. To minimize energy consumption, these data are usually multiplexed with indexes to facilitate selective tuning. However, most of the existing mechanisms are designed with an inherent assumption that the wireless channel is reliable. But, the signals transmitted are prone to path loss, fading, interference and time dispersion. Because of these impairments, clients may receive 'corrupted' data or miss their data. The resultant effect is that clients may have to wait for the next and even subsequent broadcast cycles to receive their data correctly. This increases the access time and tuning time of data retrieval significantly. In this paper, we propose three selective tuning mechanisms for unreliable wireless channels that can effectively keep the access time low without incurring excessive tuning time. These schemes are variations of three existing schemes--tree-based, hash-based and flexible schemes. The basic idea is to continue the search process within the existing broadcast, rather than restarting the search process from the next broadcast during an access failure. We conducted an extensive simulation study to evaluate the effectiveness of these schemes. Our results demonstrate that these schemes can keep the tuning and access time low in unreliable channels. Comparatively, none of the schemes outperform each other in all cases. © 1998 Elsevier Science B.V. All rights reserved.

Keywords: Wireless computing; Selective tuning; Unreliable wireless channels; Tuning time; Access time

1. Introduction R e c e n t a d v a n c e s in w i r e l e s s n e t w o r k s a n d c o m p u t e r d o w n s i z i n g t e c h n o l o g i e s h a v e l e d to the d e v e l o p m e n t o f the c o n c e p t o f m o b i l e c o m p u t i n g . In the n e a r future, tens o f m i l l i o n s o f m o b i l e u s e r s w i l l b e e q u i p p e d w i t h s m a l l , y e t p o w e r f u l b a t t e r y - o p e r a t e d l a p t o p s . T h r o u g h the w i r e l e s s n e t w o r k s , t h e s e p o r t a b l e e q u i p m e n t s w i l l b e c o m e an i n t e g r a t e d p a r t o f e x i s t i n g d i s t r i b u t e d c o m p u t i n g e n v i r o n m e n t , a n d m o b i l e u s e r s c a n a c c e s s i n f o r m a t i o n a n y w h e r e a n d at a n y time. B e c a u s e o f the

* Corresponding author. E-maih [email protected] 0169-023X/98/$ - see front matter © 1998 Elsevier Science B.V. All rights reserved. P I I : S01 6 9 - 0 2 3 X ( 9 8 ) 0 0 0 1 8-4

210

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

competitive advantage one can be conferred in accessing information independent of location and time, the potential market for mobile computing applications has been estimated to be billions of dollars annually [9]. A wireless communication channel typically comprises a downlink subchannel and an uplink subchannel (in practice, there can be more than one downlink and uplink subchannels). To cope with the limited wireless bandwidth (typically 19.2kbps), frequently demanded data are periodically broadcast on the downlink channel by the server. Mobile clients listen to the channel and download the desired data by filtering through the incoming data stream. The main advantage of this method is that it is independent of the number of clients tuning to the channel. Moreover, for data that are less frequently requested, the clients can submit query requests on the uplink channel and the server responds by sending the data on the downlink channel. This demand-driven approach can minimize the waste of bandwidth used for broadcasting. In this paper, we focus on the push model of information dissemination, i.e. data are disseminated via periodic broadcasting. As argued in [2,10], power conservation is a key issue for small palmtops that typically run on small AA batteries. For an 'average user', the power source is expected to last 2 - 3 hours before replacing or recharging becomes necessary. Therefore, it is important to manage the power resource effectively to extend the battery life. Several recent papers [10-12,14,18,26] propose efficient techniques for organizing broadcast data such that clients can navigate and selectively tune to the desired records. The effect of selective tuning is that battery consumption can be reduced, and hence the effective battery life is extended. However, selective tuning mechanisms are largely studied in the context of a reliable wireless channel. In reality, data transmission in a wireless communication environment is vulnerable to noise and signal distortion. If some of a mobile client's data are 'corrupted' (and cannot be corrected), the client may have to wait for the next and even subsequent broadcast cycles to receive the desired data. This not only increases the access time of the data retrieval significantly but also the tuning time as well. Thus, the issues of receiving data correctly and recovering from errors are important. In this paper, we reexamine the issue of selective tuning in the context of unreliable wireless channels. Traditional methods that multiplexed indexes are not very helpful here because some of the index buckets along the path of the search process may be corrupted. In such cases, the search process may have to be restarted from scratch which can lead to high access time and tuning time. We propose three selective tuning mechanisms for unreliable wireless channels that can effectively keep the access time low without incurring excessive tuning time. These schemes are variations of three existing schemes-tree-based, hash-based and flexible schemes. The basic idea is to continue the search process within the existing broadcast, rather than restarting the search process from the next broadcast during an access failure. We conducted extensive simulation study to evaluate the effectiveness of these schemes. Our results demonstrate that these schemes can keep the tuning and access time low in unreliable channels. Comparatively, none of the schemes outperforms each other in all cases.

The remainder of this paper is organized as follows. In the next section, we present the background to this work. We also review existing work in this section. In Section 3, we examine the three proposed techniques--tree-based, hash-based and flexible selective tuning schemes, respectively. We present the simulation study of the proposed schemes and report our findings in Section 4, and, finally, we conclude in Section 5 with directions for future work.

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

211

2. Preliminaries 2.1. Basics o f wireless broadcasting

We adopted the mobile environment discussed in [4,10]. The environment consists of two distinct sets of entities: a large number of mobile hosts and relatively fewer, but more powerful, fixed hosts (called servers). The servers are interconnected via a wired communication network. Some of the servers, called mobile support stations, are equipped with wireless communication capability, and are responsible for disseminating data via the wireless channels to mobile hosts that are within their areas of coverage (called wireless cells). For simplicity, we also made several reasonable assumptions. First, at any given instant of time, a mobile host may logically belong to only one cell; its current cell defines its location. Next, each server has a complete copy of the database, i.e. the database is fully replicated across all the servers. Furthermore, data is read-only, i.e. there are no updates either by the clients or at the servers. Finally, clients do not cache objects.' Caching is an orthogonal issue and has been studied in [4,7,25]. Thus, it suffices for us to restrict our discussion to just one fixed host and its wireless cell. Retrieving data via wireless broadcasting involves the following: • At the server, the file whose records can be identified by values of a particular attribute (for example the key) is transmitted on the downlink channel. • At the client, each record on the downlink channel is examined until the desired record is found. There are two metrics that can be used to characterize the performance of data access of a mobile client: access time and tuning time. The access time is the time elapsed from the moment a client requests for a record to the point when the desired record is downloaded by the client. On the other hand, the tuning time is the time spent by a client listening to the channel. Since listening to the channel requires the CPU to be in full operation (active mode), the tuning time is often used as a measure on the energy efficiency of data access. The most straightforward broadcasting technique is to send the file periodically. This will result in an average access time of half the time between successive broadcasts of the file. However, this method requires a tuning time equal to the access time. To reduce the energy consumption, Imielinski, Viswanathan and Badrinath [10,11] introduced the concept of selective tuning. The basic idea is to organize broadcast data such that the CPU can operate in the less power consuming doze mode most of the time and wake up to listen to the channel only when the data of interest is broadcast.2 This is realized by broadcasting an index with the file. Thus, the client will now perform the following: • Operate in active mode to listen to the channel for the next available index for the file. • Operate in doze mode until the desired index is broadcast. • Operate in active mode to access the index to determine the address of the desired record. • Operate in doze mode until the desired record is broadcast. • Operate in active mode to retrieve the desired record.

' Alternatively,one can picture this as we restrict our discussion to requests whose data have not been cached or have been invalidated. 2The Hobbit chip by AT&T is a processor that consumes 250 mW in active mode and only 50 mW in doze mode [16].

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

212

While selective tuning can reduce the tuning time, the average access time becomes longer as a result of broadcasting the index as well (since the broadcast channel is a sequential medium). As in [10,11], each version of a file along with the associated index information will constitute a broadcast cycle. Furthermore, (index or data) records are organized into buckets, where a bucket is the smallest unit of broadcast and access. We also restrict our discussion to applications whose file size remains static most of the time. In other words, the content of the file may be updated, but insertions and deletions are infrequent. Fig. 1 illustrates a data broadcast. In the figure, the file has been split into k data segments, D~, D 2..... O k. Associated with each data segment is an index segment. Note that the index segment directs the search to data in other data segments, and provides direct offset to data in its associated data segment. As shown in the figure, a broadcast cycle comprises k index segments and k data segments. When the channel is error-free, the access time corresponds to the time from the moment the client tunes in to the time when the desired data is downloaded. However, if failure is not impossible, then the access time can be much larger. 2.2. Fault on air

Wireless transmission is, unfortunately, error-prone. Many factors contribute to this. Signal may be lost due to external interference such as electromagnetic noises or obstruction by tall buildings and trees. The signal may also be weakened due to the clients' further distance from the server or the high speed at which the clients are moving. In addition, data may also be delayed when the clients move from a cell to another (since the data may not be broadcast in the same manner). Data may also be delayed due to congested network traffics. In this paper, we shall restrict our discussion to the cases when data are lost or corrupted. We assume that there is a mechanism for clients to detect that its data are corrupted. One standard technique is the cyclic redundancy code [20] which is low in overhead of redundant information and efficient in computation. Data loss can occur at any bucket--index or data. With selective tuning, data buckets are examined only at two occasions--during the initial probe and the final probe for the target data bucket. For the former, we need to probe repeatedly till a bucket that is error-free is obtained. For the latter, there is

First tune

•.-Uo Error-free

Desired object

m

I-"

access time

I.,-

mo,

I....

"-- I

Failure

Fault-tolerant

E........ mo

Desired object (cycle j+l)

access

~

~

I

Fig. 1. A data broadcast.

~" I

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

213

nothing to be done except for the client to wait for the next appearance of the data item in the next cycle. This may involve restarting the searching process (unless the cycle length is fixed and the clients know the cycle length). If an access failure occurs while navigating through the sequence of index buckets, the problem is much harder to handle. This is because the sequence of access is critical for clients to continue the search process, and a failure will render the sequence out of order. Under existing selective tuning schemes, there is no mechanism for the clients to continue searching. Instead the clients have to restart the search process or reprobe for the erroneous bucket (provided the cycle length is known and fixed). Both methods can lead to high access time--restarting does not guarantee that errors will not occur in the new sequence of probes and may miss the data if it is in the same segment as the erroneous bucket; reprobing implies waiting for the next broadcast cycle for the same bucket to be available. Furthermore, in the event that the sequence of index buckets is corrupted but the target data bucket is not, waiting for a complete broadcast cycle (which can be long) is certainly unnecessary and undesirable. Referring to Fig. 1, suppose the client first tunes in at segment D k ~ of the cycle (j - 1) and its desired data is in segment D 2. Suppose also that the access sequence if the channel is reliable is: buckets in I k, buckets in 11, buckets in 12 and target data bucket in D 2. Suppose an error occurs when probing the sequence of buckets in 12. Then, either restarting the search or reprobing will imply that the client will miss the target bucket in D 2 resulting in the access time being increased by at least one broadcast cycle. Now, if the target data bucket is not corrupted, then the client might have accessed the data in the same access time as an error-free channel. While tuning time is critical, we believe that the access time should not be excessively high. Instead, it is important to design access methods that can avoid unnecessary increase in the access time without incurring excessive tuning time. In other words, we believe that if the target data bucket is not corrupted, then it should be retrieved without having to wait for the next broadcast cycle. In other words, selective tuning mechanisms should be used to reduce the tuning time without significantly increasing the access time. 2.3. Related work

Disseminating information to a large number of users using the broadcast paradigm has been studied for many years. Works have been done in the context of fixed network [1,3,6,24], FM channel [8] and wireless networks [ 10-12,14,18,22]. The nice property of broadcasting is that its performance is not affected by the actual number of users. In other words, it is a scale-free mechanism. Since work in [1,3,6,24] have 'unlimited' power supply, selective tuning is not an issue. Furthermore, with the exception of [15,22], all other work ignore the issue of channel reliability. Ammar and Wong [3] are the first to discuss the scheduling problem for broadcast information delivery. In [24], an analytical analysis on this problem was presented and the condition for optimal schedule is determined. Vaidya and Hameed [22] extended the results for nonhomogeneous objects and unreliable channels. Exploiting selective tuning mechanisms in the broadcast paradigm have been investigated in [10-12,14,18]. Essentially, the techniques multiplex indexes with the data objects broadcast. Following the indexes, clients will know when the desired objects will be broadcast, and so they can operate in the doze mode most of the time. Several categories of selective tuning schemes have been proposed--tree-based [10], hash-based and flexible-based [11], and signature-based [14]. The use of secondary indexes for selective tuning was addressed in [12].

214

K,-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

Acharya et al. [ 1] studied nonuniform broadcast (where some records are broadcast more frequently than others) to improve the average performance for frequently accessed data. However, the work focused on structuring a broadcast program that determines how data should be broadcast. In addition, several new cache management policies to support nonuniform broadcasting were developed and studied. Regrettably, power management is not an issue in the study because of the 'infinite' power supply of clients which are connected to the wired network. Tan and Yu integrated the concepts of selective tuning and nonuniform broadcast and came up with several data organization methods for indexing data on air [18,26]. Leong and Si [15] examined the issue of reliable broadcast in wireless environment. This work is based on a multiple-channel environment in which one channel is exclusively used for the purpose of index broadcast. However, the basic assumption in this paper is that, the index channel has a very high reliability and thus the focus is restricted to the broadcast schedule of data over multiple unreliable data channels. Two schemes were proposed. In the first scheme, every channel broadcasts the entire file. However, to reduce the access time, the file is broadcast at a different pace such that data broadcast over channel j lags behind that over channel j - 1 by a phase difference. In this way, when a client misses its data in channel j - 1, it can switch to channel j to obtain the data without waiting for the next broadcast in the same channel. Furthermore, since data appear in all channels, the client can obtain its data from other channels if some of them are unreliable. The second method is designed for file access and is based on Rabin's Information Dispersal Algorithm (IDA) [17]. The IDA algorithm transforms an n-block file into N blocks (N > n) and guarantees that any n of the N blocks can be used to reconstruct the file. A similar technique is also proposed in [5] for time-critical applications.

3. Selective tuning schemes In this section, we shall present the proposed selective tuning schemes. For each of the scheme, we shall first review its predecessor from which it is derived, and propose the variation that is robust to access failures.

3.1. Tree-based scheme 3.1.1. The basic scheme The tree-based selective tuning schemes were first proposed in [10]. We shall restrict our discussion to the distributed indexing scheme with partial path replication technique. In this scheme, a data file is associated with a B +-tree index structure (see Fig. 2a). Since the broadcast is a sequential medium, the data file and index must be flattened so that the data and index are broadcast following a preorder traversal 3 of the tree. Furthermore, the index comprises two portions: the first k levels of the index are partially replicated, and the remaining levels are not replicated. Essentially, nodes at the replicated levels are only replicated once at the beginning of the broadcast of each of its children nodes. For example, the root node is replicated only at the first a~, first a 2, and so on, and node a I is replicated at ~ The original scheme broadcasts the index structure in preorder fashion only. In our variant, this is extended to the data file as well.

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

215

Z ~ A~ ~ h Z ~ ' ~ i ~ ~ ' * £ , ~ ' ~ , ~ h 33136l 391 42l 451 48 5~1541571

........

7SlS~I

631661691721

(a) A partial data file and its index tree first al

i' I.'1~'1o' I~Iol;~Ioli Io~l~'l"~l~l"3l,'~il'~lo31~71!il~-Ii l~li l,~, ,,

,,,

second al

al

b2 o4 dl 29 dl

]

/30

32 dl 35 c5 dl 3s I d14 41 dl

33

36

third

391

e6 dl 47 dl

/42

451

I

50 dl 53

/48

51

54

al

i

I I I t I1" I tit"

"

al b3 c7 dl 56 d2 59 d2 62 c8 d2 57

60

63

31 |761 t~ I

4~1 t~9.1t'1 "ti7'I d2

d2

41~4771d2

c9 d2

72

f

51 /781

Sll

a2

sec•n•t

I' I01~'1,'t0t'~':310t~1' 04~1,' ~"' t0t~310t .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

(b) The data broadcast Fig. 2. Illustration of the distributed indexing scheme with partial path replication.

.

.

.

.

.

.

.

.

.

.

216

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

the beginning of b~, b 2 and b 3. Fig. 2b shows the organization of the broadcast for the data file of Fig. 2a. Note that the broadcast effectively comprises segments, each of which corresponds to the data rooted by nodes at the (k + 1)th level. To facilitate selective tuning, each node contains meta-data that help in the traversal of the trees. All non-replicated (index and data) buckets contain pointers that will direct the search to the next copy of its replicated ancestors. For example, data buckets 1, 2 and 3 will provide an offset to the second a~. On the other hand, all replicated data buckets will contain two pairs of the form (x l, y~ ) that can continue the search in the appropriate segments. The first pair indicates that key values less than x~ can be found y~ offset away, while the second pair indicates that key values greater than x~ can be found y~ offset away. As an example, the first a~ will contain the pairs (1, offset to first I) and (81, offset to second I). The major problem with this basic scheme is that there is no mechanism to handle access failures. Consider, for example, a client that tunes in at d~ and is accessing data record 40. If there is no access failures, the access path would have been: d~, second a~, b 2, c 5, d ~ and data bucket containing record 40. However, if a failure occurs at b2, then assuming there are no more failures, the access path becomes: d~, second a~, b2 (error), c~, third a~, second I, first a~, b2, c5, d~4 and the target data bucket. The consequence is that the access time is increased by one broadcast cycle. This increase is unnecessary if the target data bucket is not corrupted at all.

3.1.2. The enhanced scheme The enhanced scheme is very similar to the basic scheme except that each index node (except the root) has an additional (maxvalue, pointer) pair. The maxvalue of a node represents the largest data value that can be obtained for the subtree rooted at the node. Referring to Fig. 3, maxvalue of nodes b~ and c2 are 27 and 18, respectively. For any node, its ancestor's maxvalue is greater than or equal to its maxvalue. For those ancestors whose maxvalue are the same, we call them the E_ancestors of the node. We note that a rightmost child node of a node may have more than one E_ancestors. We refer to

5 6

111~411712012312~129132 351381411441471~01531561~916216~16817! 74177180 ~21~11812~124127130133 3613914214~14818~184157160163166169172 7517818! Fig. 3. An enhanced index tree.

.

.

.

.

.

.

.

.

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

217

the one furthest away (closest to the root) as F a n c e s t o r s . For example, d 9 has two E_ancestors, c 3 and b~, and b t is its F_ancestor. Similarly, d27 has three E_ancestors, c 9, b 3 and at, with a 1 being its F_ancestor. The additional pointer comes in two flavors: • For all the children of a node, except its rightmost child, the pointer provides the offset to its right sibling. In Fig. 3, cj points to c 2, and c 2 points to c 3. • For the rightmost child of a node, its pointer points to the right sibling of its F_ancestor. Consider Fig. 3 again, d 3 points to c 2 since c 2 is the sibling of its F a n c e s t o r c 1. Using the same logic, we note that d27 and c 9 both point to a 2. Fig. 3 shows the tree structure. Its broadcast organization turns out to be exactly the same as that in the basic scheme. The revised selective tuning scheme works with the following access protocol (assuming the desired record is in the broadcast): (1) Tune to the current bucket and listen for an error-free bucket. (2) If the bucket is an index bucket, examine the entries to determine the offset of the next bucket that will eventually lead to the target data bucket. This may require examining the (maxvalue, pointer) pair. Switch to doze mode till the bucket is broadcast and goto 1. (3) If the bucket is a data bucket, examine the data bucket for the desired record. If the record is found, terminate; otherwise goto 1. We use Fig. 2b and Fig. 3 for illustration. Suppose the desired data has key value of 54. Assume also that the initial probe occurs at d~, then: • If there are no faulty accesses, the sequence of buckets to be accessed are: d~, d 2, d 3, c 2, c3, b2, c 6, d~8 and data bucket containing key 54. • If bucket d 3 is faulty, then the sequence becomes: d~, d 2, d 3, data bucket containing 7-9, c2, c 3, b 2, c 6, d18 and the target data bucket. • If bucket b 2 is faulty, then the sequence becomes: d t, d 2, d3, C2, C3, b2, c4, c5, c6, dis and the target data bucket. • If bucket b 2 and c 4 are faulty, then the sequence becomes: d t, d 2, d3, C2, C3, b2, c4, dl0 , d l l , dl2 , c 5, c6, d~8 and the target data bucket. • If the target data bucket is corrupted, then the sequence becomes: d t, d2, d3, C2, c3, b2, c6, dr8 , the target data bucket, third a l, second I, second a~, b 2, c 6 and the target bucket again. Note that the access time is the same for all the cases as long as the data page that contains the desired record is not corrupted. Thus, we can see that the proposed enhancements can maintain the expected access time while keeping the energy consumption low.

3.2. Hash-based scheme 3.2.1. The basic scheme In a hash-based scheme, data are hashed into a set of partitions. Because of the distribution of data, partitions may differ in size. Partitions are further organized into fixed size buckets, i.e. each partition consists of one or more buckets with the last bucket being possibly partially full. The data are then broadcast in partitions as a sequence of buckets. Fig. 4 shows an example of 4 partitions. In the figure, each box represents a bucket. An unfilled box represents the beginning of a partition. As shown, partitions 1, 2, 3 and 4 have 4, 6, 3 and 3 buckets, respectively. To facilitate selective tuning, every bucket contains some control information: a bucket id which is

218

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

first buckets of partitions

I I_d/partition 1

_l_d/

partition 2

_ i_d/partition 3 i_d/partition 4 I

hash buckets Fig. 4. A broadcast with 4 partitions.

the offset of the bucket from the beginning of the version, a hash function (denoted h and a pointer which is the offset to the beginning of the next version. From these information, a client can doze off and tune to the beginning of the next version if it has missed the desired partition. Furthermore, for a file with N partitions, N (predetermined) buckets contain additional control information: a shift pointer. We shall refer to these N buckets as the hash buckets. The reason for introducing the hash buckets is because it is not possible to determine the offset of the desired partition from the hash function alone since partitions may be of unequal length and the broadcast channel is a sequential medium. The hash buckets thus provide an indirect addressing mechanism to the partitions: from the hash function on a key (which returns the bucket id of one of the hash buckets), the target hash bucket is obtained. The shift pointer, being stored in the hash bucket, provides the offset (from the hash bucket) to the actual address of the desired partition. So now, the client finds its desired data by tuning to the broadcast, obtaining the hash function, employing it to determine the hash bucket, dozing off till the hash bucket is broadcast, obtaining the shift pointer, dozing off till the desired partition is broadcast and finally, examining the buckets of the target partition for the desired record. We shall focus on the Hashing B protocol in [11], which is shown in Fig. 4. Basically, in this technique, the hash buckets are spaced out evenly (except for the last hash bucket). The gap between hash buckets is given by the size of the smallest partition in terms of number of buckets. For example, in Fig. 4, each hash bucket is 3 buckets away since the smallest partition contains 3 buckets. This method has been shown to outperform the simple way of assigning the first N consecutive buckets in a broadcast as hash buckets (which is equivalent to having a gap of 1).

3.2.2. The enhanced scheme The basic scheme collapses when the channel is error-prone. For example, if the bucket containing the shift pointer is corrupted, then the client can no longer obtain the starting address of the partition, i.e. the client may have to redo its search process from scratch leading to an increase in the access time by at least one broadcast cycle. This is undesirable especially when the target partition may be far away from the hash bucket (e.g. partitions 3 and 4 in Fig. 4) and the target partition is not corrupted. Consider a n-partition file. Let Pi denote the ith partition, and h/ denote the ith hash bucket. To reduce the number of unnecessary restarts in an error-prone channel, we propose that the scheme be modified as follows (see Fig. 5):

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

partition 1

partition 2

partition 3

<

hl

219

partition 4 <

t

h2

h3

h4

Fig. 5. An improved hash-based scheme of Fig. 4.

• Control information can also be stored in some of the buckets between hash buckets. Consider two consecutive hash buckets, h i and hi÷ I. There are two possible scenarios: • The first bucket of partition Pi is broadcast before bucket hi+ I. For example, in Fig. 5, partition 1 appears before bucket h 2, and partition 2 begins before h 3. Let the offset at h i be j. Then, we can include a shift pointer in j buckets following h i where bucket (h i + k) has shift pointer value of (j - k), for 1 ~< k ~ h i AND pid > h i,

220

K.-L. Tan. B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

then Ndo:~, is given by the sum of the offset to the beginning of the next version and h i. For example, if bid = 10 and h~ is the target hash bucket, then Ndo~e = 6 + 3 = 9. • The hash bucket has been missed but the target partition has not. If bid > h i AND pid <~ h i, then the bucket is either part of the target partition or has yet to be broadcast in the cycle. In this case, N~,,:~, = 0. For example, if h 2 is the target hash bucket and b i d = 5, then N~..... = 0 . As another example, if h3 is the target hash bucket and bid = 8, then N~..... = 0. (3) If Nd,,:c = 0, then: • If bid < hi~ j AND pid < h i, obtain the shift value. Note that this shift value are the 'spare' shift pointers provided by the enhanced scheme. Switch to doze mode until shift number of buckets. Goto 1. • If bid >1 hi+ ~ AND pid < h i, then though the target partition is still not broadcast, we also do not have shift pointers to facilitate doze off. So, we can either scan through all the buckets until the target partition is found or we can handle it as if we have missed the partition. For the former, the tuning cost may be high especially if the target partition is very far away (when the minimum displacement is small). As a compromise, if h i - p i d < ~- (~- serves a threshold, and is set to 3 in our study), we scan all the buckets, i.e. goto I. Otherwise, we recompute N d ..... based on the case when the target partition has been missed. • Otherwise (i.e. pid = hi), examine the records in the bucket. Since the records are sorted, even if the client tunes in mid-way through the partition, the client can tell whether the desired record has been missed. • If record is found, terminate. • If the first record has key > K , then the desired record has been missed and the number of buckets to doze off is given by the sum of the offset to the beginning of the next version and h i. Goto 1. • If all the record keys < K (i.e. data not found in current data bucket), goto 1. (4) If Nd~ ~"> 0, doze off for N~..... buckets (i.e. until slot h(K) is broadcast). Goto 1. We use Fig. 5 to illustrate the access protocol. Suppose the desired record is in bucket 2 of partition 3. Let us use Pij to denote the bucket j of partition i. Suppose the initial probe occurs at P ~ . Then: • If there are no faulty accesses, the sequence of buckets to be accessed are: Pj ~, P23, P3~ and P32 (the target bucket). • If bucket P23 is faulty, then the sequence becomes: P ~ , P23, P24, P3r and P32 (the target bucket). • If buckets P23 and P3~ are faulty, then the sequence becomes: P ~ , P23, P24, P3~ and P32 (the target bucket). Note that the access time is the same for all the cases as long as the bucket that contains the desired record is not corrupted. 3.3. Flexible indexing scheme 3.3.1. The basic scheme The flexible indexing scheme, proposed in [ l i ] , splits a sorted list of records into several equal-sized segments, and provides indexes to navigate through the segments. This is achieved as follows. At the beginning of each segment, there is a control index which comprises two components:

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

221

a global index and a local index. The global index is used to determine the segment which a record may be found, while the local index provides the offset to the portion within the segment where the record may be found. Suppose the file is organized into p segments. Then the global index at a segment, say s, has Vlog2i7 (key, ptr) pairs, where i is the number of segments in front of and including segment s, key is a record key, and ptr is an offset. For the first entry, key is the key value of the first data item in segment s and ptr is the offset to the beginning of the next version. By examining this pair, the client will know if it has missed the data, and if so, it will wait till the broadcast of the next version of the file. For the jth entry ( j > 1), key is the key value of the first data item in the (L 2J i J + 1)th segment following segment s and ptr is the offset to the first data bucket of that segment. The local index consists of m (key, ptr) pairs that essentially partitions each segment further into m + 1 sections. For the first entry, the key value is the the key of the first data item of section m + 1 and the ptr is the offset to that section. For the jth pair, key is the key value of the first data item of section (m + 1 - j ) and ptr is the offset to the first data bucket of that section. From the above description, it is clear that the number of segments, and the number of sections per segment can affect the performance of the scheme. Now, increasing the number of segments or sections will increase the length of the broadcast cycle and reduce the tuning time, and vise versa. Thus, the scheme is flexible in the sense that it can be tuned to fit an application's needs. Fig. 6 illustrates the flexible indexing scheme. Here, the file is divided into 8 segments of 4 records each (except for segment 8). Consider segment 2. There are 3 entries in the global index. The first entry, (12, 27), implies that any record that is less than 12 can be found after the 27th record, i.e. in the beginning of the next version of the file. The second entry, (54, 13), means that a record larger than or equal to 54 can be found after the 13th record, i.e. in segment 5. Similarly, the third entry (42, 5) indicates that a record between 42 and 54 may be found 5 entries away, i.e. two segments away. Implicitly, records that may be found in the segment follows the local index. The local index has 3 entries. Consider the local index of segment 2. The entry (29, 4) means that a record with value 28 and above may be found after 4 records away. The entry (22, 3) refers to a record with value above 22 but less than 29 can possibly be found 3 records away. Likewise, the entry (20, 2) points to where records between 20 and 22 can be found. Implicitly, any record less than value of 22 can be found at the end of the local index.

I1

I2

13

I4

I5

!, I, I,

Fig. 6. Flexible index scheme.

16

I7

18

222

K.-L. Tan, B.C, Ooi / Data & Knowledge Engineering 28 (1998) 209-231

3.3.2. The enhanced scheme As in the other schemes, the flexible scheme is effective when the channel is reliable. However, when the global index is corrupted, the client will have to restart with the next segment's global index in order to be able to proceed on with the search. It is interesting to note that the flexible scheme requires minimal modification. For each data bucket, instead of an offset to the next segment, we advocate a (maxkey, offset) pair. While the offset remains the same, the maxkey indicates the largest key value within the segment. Given a search key, K, the access protocol is also revised to become as follows: (1) Tune into the channel and listen for an error-free bucket. (2) If bucket is a data bucket, we have two scenarios. Suppose x is the first value in bucket. • If K > maxvalue or K < x, then obtain the offset from bucket, doze off till the next segment is broadcast. Goto 1. • If x < K < maxvalue, examine all records in bucket. If found, terminate; otherwise, goto 1. (3) If bucket is an index bucket • If it is a global index bucket, examine its entries. • If the target record falls into a segment whose offset is located in this index bucket, get the offset, doze off for the appropriate amount of time and goto 1. • Otherwise (i.e. there are several global index buckets and the current bucket does not contain the desired offset) goto 1 (i.e. read the next bucket). • If it is a local index bucket, examine its entries. • if the record can be found in some sections in the current local index bucket, obtain the offset to the respective section, switch to doze mode for amount of time determined by offset, goto 1. • Otherwise, goto 1. To illustrate, let us use Fig. 6 as our example. Suppose each bucket contains only one data record, and each global index is contained in a bucket, and each local index is contained in a bucket. Suppose the search key is 61. Then: • If the client tunes in at record 59 and there is no fault, the access path will be: 59 and 61. • If the client tunes in at record 10 and there is no fault, the access path will be: 10, I2's global index, I5's global index, 16's global index, I6's local index, and 61. • If the client tunes in at record 10, and an error occurs at I2's global index, then the access path becomes: 10, I2's global index, I2's local index, 12, I3's global index, I6's global index, I6's local index, and 61. • If the client tunes in at record 10, and errors occur at I2's global and local indexes, then the access path becomes: 10, I2's global index, I2's local index, 12, I3's global index, I6's global index, I6's local index, and 61.

4. A performance study In order to study the effectiveness of the proposed schemes, we conducted an extensive simulation study. The simulation model comprises a server module, a downlink channel module and a set of client modules. The server organizes the data to be broadcast by interleaving the index with the database according to the selective tuning scheme adopted. The clients pick up signals from the downlink channel according to the respective access protocols in order to obtain the desired data

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

223

objects. The simulation model has been implemented using the C-based simulation package by Watkins [23]. The simulation is run for 100 000 queries, and for each query the number of buckets accessed and tuned are recorded. The average number of buckets accessed and tuned are used as the performance measures in our experiments. Table 1 shows the notations and default settings of the parameters used in the simulation. The database contains N objects. Each object is O bits and includes an Oid-bit key. Each pointer in the index node is offset bit long. The size of a bucket is assumed to be B~iz ~ bytes. The reliability of the channel is determined by the channel fault rate and is modeled by the error rate, p. For each packet that is broadcast on the channel, a random value is generated. If it is in the range [0, p], then that is a corrupted packet; otherwise it is a clean and correct packet. If the client receives a corrupted packet, then there is an access failure; otherwise it is a successful probe. In our experiments, unless otherwise stated, the default values shown in Table 1 are used. For each of the proposed schemes, we also compare it against the following schemes: ReStart. Scheme ReStart restarts the entire search process upon a failure. R e P r o b e . Under scheme ReProbe, the client reprobes for the erroneous bucket upon a failure. To do so, we assume that the broadcast cycle length is fixed and known to the clients. Both schemes ReStart and ReProbe serve to model how conventional approach can be employed in unreliable channel environment without modification. Note that each of the proposed scheme will be evaluated against its versions of ReStart and ReProbe (i.e. tree-based, hash-based and flexible oriented). 4.1. On t r e e - b a s e d selective tuning s c h e m e s

We first conducted experiments to evaluate the proposed tree-based scheme (denoted SelTune) against the reference algorithms (denoted ReStart and ReProbe respectively). As shown in Table 1,

Table 1 Notation of parameters and their default settings Parameter

Description

System parameters N Size of database B,i~e Size of a bucket O Size of a record O,d Size of a key offset Size of a pointer/offset p Error rate of a bucket Algorithmic parameters Tree-based scheme r Replication level Size of tree node Ni.i~e Hash-based scheme Number of partitions NDo rtitiotl Zipf factor of data distribution Flexible-based scheme N.~e~..... Number of segments Nt.,...,,ae , Number of local index entries

Default value 30 000 128 bytes 40 bytes 32 bits 32 bits 0.2 2 nsize 60

0.2 50 50

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

224 40000

ReStm't

A

Rtl'wb¢...~.

IKe.~tltrt R e P r o b ¢ -.--~ .... StlTut~ ---~--

20

il

--G

32000

16

24000

....

.............

{3 . . . . . . . . . .

.......................

~ .................

..............

43

.2

ii(: i/~

.4 -IK-. . . . . . . . . . . . . . . . . . . . . . .

31~

16000

< 8000

I

I

I

2

3

4

0

Replication level

I

I

I

2

3

4

Replicttion level

(a) Access time

(b) Tuning time Fig. 7. Effect of replication level.

the size of each node is the same as the bucket size. Fig. 7 shows the result when we vary the replication level from 1 to 4. We note that there is a certain replication level (in our case, r = 2) which is optimal for access time performance. This is expected, since too much replication increases the broadcast cycle, and too little replication may increase the probe wait time (the time to find the offset to the target bucket). We also observe that the proposed selective tuning scheme performs the best. This demonstrates the effectiveness of the scheme SelTune. Scheme ReProbe is the worst since it incurs additional broadcast cycles in order to reaccess the corrupted nodes. Scheme ReStart performs fairly well too. We also note that for all the schemes, the tuning time is largely unaffected by the replication level since the number of probes is essentially the same. It is also clear that schemes ReStart and ReProbe are more energy efficient than scheme SelTune. This is expected since scheme SelTune may tune to more buckets (including data buckets) while the other two schemes attempt to tune to only index buckets (except for the target data buckets). Scheme ReProbe incurs the least tuning time as it reprobes only the corrupted nodes. ReStart has to start from the root and hence more nodes have to be examined. In Fig. 8, we show the result on the effect of error rate on the tree-based schemes. As expected, all the three schemes degrade in performance as the error rate increases. In terms of access time, we observe that ReProbe degrades much faster than the other two schemes. We also note that the difference between schemes ReStart and SelTune widens as the error rate gets higher. This again demonstrates the effectiveness of SelTune. We, however, see a different picture for the tuning time results. We notice that, at high error rate, scheme SelTune's tuning time degrades significantly. Scheme ReProbe remains the best, followed by scheme ReStart. From both results, we note that the proposed scheme sacrifices tuning time for access time. As the

225

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231 14000

/

/' R.~tm't • a. ~.Pro~ ----~-

12000 - $ e l ~

/

---o- -

/ z~

/

/

,f

,

/ ,!

ReSttrt A ReProbe -.- 4 . Sel'rtme ---E3---

/

.r,i

./ .g

,.'

,,D

f

/

10000

300

/

/ A e

200

.-"

/

i

sooo

/

!

/ r6 100

//

<

/

4000 2000 0.1

0.2

03

0

0.5

0.1

Error rate

0.2

0.3

0.4

0.5

Error rat~

(a) Access time

(b) Tuning time F i g . 8. E f f e c t

of error rate.

increase in tuning time is very small compared to the saving in access time, the proposed scheme is certainly more useful and practical.

4.2. On hash-based selective tuning schemes

We also evaluated the proposed hash-based scheme (denoted SelTune) against its versions of ReProbe and ReStart. We model the data distribution using a Zipf-like distribution [13] with a Zipf factor of 0. Under the Zipf-like distribution, when 0 = 0 we have the uniform distribution, and when 0 = 1, we have the pure Zipf distribution [27]. Figs. 9-11 show the results of the experiments. In Fig. 9, we study the effect of the number of partitions on performance (using the default Zipf factor). As shown, the access time increases with the number of partitions. This is because the minimum displacement decreases as the number of partitions increases. We also note that the proposed SelTune hash-based approach performs the best. This is expected since SelTune will not miss the target data bucket unless it is corrupted. The result also shows that scheme ReProbe is inferior to scheme ReStart as ReProbe may miss the target bucket as a result of reprobing a corrupted bucket. The tuning time result shows that all the schemes are more energy efficient when larger number of partitions is used. This is so since the size of the partitions is smaller. We also observe that all the schemes are relatively close in terms of tuning performance with ReProbe being slightly better than SelTune which in term is slightly better than ReStart. In Fig. 10, we show the result as the Zipf factor varies from 0.1 to 1.0 (while keeping the other parameters fixed). As expected, the access time increases as the data distribution becomes highly nonuniform (i.e. larger Zipf factor). This is due to the minimum displacement which decreases with increasing value of the Zipf factor. Again, we note that SelTune is superior while ReProbe performs the worst. The tuning time under various techniques are largely unaffected by the Zipf factor. The

K.-L. Tan, B.C. Ooi / Data & K n o w l e d g e

226

2 8 (1998) 2 0 9 - 2 3 1

Engineering

9000 ReStart A ReProbe ---+, -5¢lTtm¢ - - - o - -

8500

210

ReStart laz Probe -.-.-.~...... SelTtme ---c---

180 ...

8OOO

x. t

2d~

w j #f

7500

150

.~

.~ 7000

m' ~ ..o-.{3 . . . . . . . . o . . . . . . . .o . . . .

30

m' 0 0

I 100

5

i 1.50

i 200

I 250

I 300

0

I 50

I 100

Number o f Imtitions

I 150

I 200

I 250

I 300

Number o f paRitiom

(a) Access time

(b) Tuning time F i g . 9. E f f e c t

of number

of partitions.

relative performance in terms of tuning time remains the same: ReProbe is the best, followed by SelTune, followed by ReStart. In our next experiment, we vary the error rate from 0 to 0.5 while keeping the default settings for the other parameters. The access time result clearly demonstrates the effectiveness of SelTune, and confirms our observations in the earlier experiments. The tuning result is also similar to that observed earlier, showing that the proposed scheme can keep the access time low without sacrificing much on the tuning time. 14000

Jg R~tart A RaP~be .---~t--SelTuae ---z> -

12000

f," / /"

8000

it- ~

~d

76

.g

75

"i

73

. .~'"

/""

,<

P~Slart a Re.Probe ...- ~ ..... SelTmm - -~---

7,

10000

.J

79

.w

jj...- "~

~.. --re........... -~"~

74

2I/

./."

E-

6000

72 121 /

4000 0

71

I

0.2

I

0.4

i

06

I.

70

08

0

o12

Zipf fllctor

|

°.4

o16

Zipf factor

(a) Access time

(b) Fig.

10. E f f e c t

of Zipf factor.

Tuning

time

oi,

i

,

K.-L. Tan, B.C. Ooi I Data & Knowledge Engineering 28 (1998) 209-231 18000

140

:::+=

16000

J

227

?

/ P~

130

*

Ret~be

/

----~-.....

/

/ .,). ....

14000

t io 12000

:.: ~'~

I00

.s

.j

10000

H

<

7"

8000

90

~.7:"

8O

." "f

E-,

7O 6000

.//"

....•

+..~--'+'+

I 0.I

4000

._t3/..'"" 6O

I

I

I

I

0.2

0.3

0.4

0.5

Error rate

(a) Access time

(b) Tuning time Fig. 11. Effect of error rate.

4.3. On flexible-based selective tuning schemes

For the flexible-based scheme, we fixed the number of local index entries at 50 (see Table 1). This is based on earlier works which showed that the number of local entries does not affect the access time very much compared to the number of segments [18,19]. Our first experiment varies the number of segments from 10 to 80. The result, shown in Fig. 12, shows that all the schemes increase in access time as the number of segments increases. This is slightly different from previous work [18] which 14000

~Start

~obe

31~-12000

"

~lTune

R,~rt

..,v .....

---.-~.... ---~--

A

I~Probe

..-~---

$eITm¢ ---tr.-



40

.g

30

10000

"(~'

x

8000

6000

10

I

4000

I

I

I0

I

0 10

2'0

' 30

Number of segments

(a) Access time

410

i

I

i

50

60

70

Numb~ of

(b) Tuning time Fig. 12. Effect of number of segments.

810

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

228

showed that there is an optimal segment size. The reason is because in our case the channel is unreliable. Since the broadcast cycle increases with the number of segments, waiting for another broadcast cycle when an access failure occurs naturally increases the average access time. As in other schemes, we note that the proposed scheme is the most effective, followed by the ReStart and finally scheme ReProbe. In terms of tuning time, as in the hash-based scheme, the tuning time decreases with increasing number of segments. This follows from the fact that fewer number of indexes need to be examined. We note that the tuning time for all schemes are relatively the same. Fig. 13 shows the result of the experiment when the error rate is varied. The result again shows the superiority of the proposed scheme over the other two schemes as the error rate increases. The tuning time result shows that scheme ReProbe is the most energy efficient, followed by SelTune, followed by scheme ReStart.

4.4. Comparative study The above experiments show clearly that the various proposed schemes can outperform their counterparts in terms of access time without sacrificing much on the tuning time. In this section, we compare the relative performance of the three proposed schemes. We shall denote the tree-based, hash-based and flexible-based schemes as Hash, Tree and Flexible, respectively. Fig. 14 shows the result when we vary the error rate. First, we observe that the tree-based scheme performs the worst in terms of access time. Upon investigation, we note that this is the result of a longer broadcast cycle. In fact, the tree-based scheme has the longest broadcast cycle as its meta-data overhead is the most significant. The hash-based scheme is superior when the error rate is high and the flexible-based method performs the best at lower error rate. Two factors contributed to this result: the cycle length of the broadcast (flexible-based method has a longer broadcast cycle than the hash-based scheme) and the minimum displacement of the hash-based scheme. In general, the

40

z 19000 R e l ~ l ~ ---K--SdTt~e ---E~ -

,'

35

RcStKt

.A

/,.

RePI'OI~ -.--.~--

/ / " /

SelTtme - - ~ -

/

30

/

/ "/

.//"

15000 . . ' " ...lff'

.J

11000

~5-::: ........

:~o

// //'

.-- -D ..1--

.g]

I

10

7000

3000 0

I 0.1

i 0.2

I. 03

I. 04

i 0.5

0

Error rate

1.

'

1.

'

,

0I

0.2

03

OA

0.5

Error r ~

(b) 'Ikming time

(a) Access time F i g . 13. E f f e c t o f e r r o r rate.

K.-L. Tan, B.C. Ooi I Data &

Knowledge Engineering 28

229

(1998) 209-231

300 Flexible "['[¢¢

.---,llA ......

HIIsh ---o---

12000

Tree Hash ---o--

/

//

A

/

J d

Flexible ......~ ......

/i T

/

/

..~/ 9000

~

J

j.~"

g

6000

fllt//J" .._

t

200

" / ' / ~

f12t. ~

/ J

.i

"..

.... t3.......... ~

/

/ / .,1~

/' /

..J" ,L

3000

!

-0-

~

......

i

0.1

0,,,2

013

014

01.5

Error r ~

(a) Access time

(b) Tuning time

Fig. 14. Comparison of the three proposed schemes.

hash-based scheme performs poorer because of clients who miss the hash buckets in the initial probe. As the error rate increases, the contribution of the longer cycle length of the flexible scheme to the access time becomes significant, leading to its poorer performance at high error rate. In terms of tuning time, we notice that the tree-based scheme is superior under low error rate, but degrades speedily as the error rate increases. The hash-based scheme performs poorly at low error rate by virtue of its need to examine records of the target partitions. The flexible scheme performs slightly worse than the tree-based scheme under low error rate, and is the most energy efficient under high error rate.

5. Conclusion Reliability is a critical issue in wireless environment which has been largely ignored in existing work. An access failure can lead to long access time and tuning time. To keep the access and tuning time low, we have proposed three selective tuning schemes that are variations of existing schemes. The schemes allow clients to continue the search process within the existing broadcast. Our extensive simulation results concluded that the proposed schemes can reduce the access time and tuning time compared to the traditional method that does not deal with failure explicitly. While none of the three methods perform best in all cases, we felt that tree-based approach is generally worse than the other two methods. The flexible method is promising for channels with low error rate, whereas the hash-based approach can be a better choice for high error rate environments. We are currently extending our work in the following ways. First, we have largely focused on error due to loss of data. We also plan to extend this work to study mechanisms that are robust to other failures, such as delay. Second, we study how updates may affect the various schemes. Finally, we are planning to implement and study the techniques in our CRAM project [21].

230

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

Acknowledgement This work is partially funded by NUS research grant RP960683.

References [1] S. Acharya, R. Alonso, M. Franklin, S. Zdonik, Broadcast disks: Data management for asymmetric communication environments. In: Proc. 1995 ACM-SIGMOD Intl. Conf. on Management of Data (June 1993) pp. 199-210. [2] R. Alonso, H. Korth, Database systems in nomadic computing. In: Proc. 1993 ACM-S1GMOD Intl. Conf. on Management of Data (June 1993) pp. 388-392. [3] M.H. Ammar, J.W. Wong, The design of teletext broadcast cycles. PerJbrmance Evaluation 5(4) (1985) 235-242. [4] D. Barbara, T. Imielinski, Sleepers and workaholics: Caching in mobile distributed environments. In: Proc. 1994 ACM-SIGMOD Intl. Conf. on Management of Data (June 1994) pp. 1- 12. [5] A. Bestavros, Aida-based real-time fault-tolerant broadcast disks. In: Proc. IEEE Real-Time Technology and Applications Symp., Boston, MA (June 1996). 16] T.F. Bowen, G. Gopal, G. Herman, T. Hickey, K.C. Lee, W.H. Mansfield, J. Raitz, A. Weinrib, The datacycle architecture, Comm. of the ACM 35(12) (December 1992). [7] J. Cai, K.L. Tan, B.C. Ooi, On incremental cache coherency schemes in mobile computing environment. In Proc. 13th Intl. Conf. on Data Engineering (April 1997) pp. 114-123. [8] D.K. Gifford, Polychannel systems for mass digital communications, Comm. of the ACM 33(2) (February 1990). [9] Y. Huang, P. Sistla, O. Wolfson, Data replication for mobile computers. In Proc. 1994 ACM-SIGMOD Intl. Conf. on Management of Data (June 1994) pp. 13-24. [10] T. Imielinski, S. Viswanathan, B.R. Badrinath, Energy efficient indexing on air. In: Proc. 1994 ACM-SIGMOD Intl. Conf. on Management of Data (June 1994) pp. 25-36. [11] T. Imielinski, S. Viswanathan, B.R. Badrinath, Power efficient filtering of data on air. In: Proc. 4th Intl. Conf. on Extending Database Technology (March 1994) pp. 245-258. [12] T. Imielinski, S. Viswanathan, B.R. Badrinath, Data on air: Organization and access. IEEE Trans. on Knowledge and Data Engineering 9(3) (1997) 353-372. [13] D.E. Knuth, The Art of Programming, Vol. 3: Sorting and Searching. Addison Wesley (1973). [14] W.C. Lee, D. Lee, Using signature and caching techniques for information filtering in wireless and mobile environments, Journal of Distributed and Parallel Databases 4(3) (1996) 205-227. [15] H.V. Leong, A. Si, Data broadcasting strategies over multiple unreliable wireless channels. In: Proc. 4th Intl. Conf. on Information and Knowledge Management (November 1995) pp. 96-104. [16] EV. Argade et. al., Hobbit: A high-performance, low-power microprocessor. In: Proc. COMPCON'93 (February 1993) pp. 88-95. [17] M.O. Rabin, Efficient dispersal of information for security, load balancing and fault tolerance, Journal of the ACM 36(2) (April 1989) 335-348. [18] K.L. Tan, J.X. Yu, Energy efficient filtering of nonuniform broadcast. In: Proc. 16th IEEE lntL Conf. on Distributed Computing Systems (May 1996) pp. 520-527. [19] K.L. Tan, J.X. Yu, P.K. Eng, Supporting range queries in wireless environment with nonuniform broadcast, in press. [20] A.S. Tanenbaum, Computer Networks, Prentice-Hall (1987). [21 ] The Cram Project ( 1997); http://www.iscs.nus.edu.sg/tankl / wireless.html [22] N. Vaidya, S. Hameed, Data broadcast in asymmetric environments. In: Proc. 1st Intl. Workshop on Satellite-based lnJbrmation Services (November 1996). [23] K. Watkins, Discrete Event Simulation in C, McGraw-Hill (1993). [24] J.W. Wong, Broadcast delivery, Proc. IEEE 76(12) (December 1988) 1566-1577. [25] K.L. Wu, P.S. Yu, M.S. Chen, Energy-efficient caching for wireless mobile computing. In: Proc. 12th Intl. Conf. on Data Engineering (February 1996) pp. 336-343.

K.-L. Tan, B.C. Ooi / Data & Knowledge Engineering 28 (1998) 209-231

231

[26] J.X. Yu, K.L. Tan, An analysis of selective tuning schemes for nonuniform broadcast, Data and Knowledge Engineering 22(3) (May 1977) 319-344. [27] G.K. Zipf, Human Behavior and the Principle of Least Effort. Addison Wesley (1949). Klan-Lee Tan, received his B.Sc and Ph.D. in Computer Science from the National University of Singapore, in 1989 and 1994, respectively. He is currently a lecturer in the Department of Information Systems and Computer Science, National University of Singapore. His major research interests include multimedia information retrieval, wireless information retrieval, query processing and optimization in multiprocessor and distributed systems, and database performance. He has published over 50 conference/journal papers in international conferences and journals. He has also co-authored a tutorial entitled 'Query Processing in Parallel Relational Database Systems' (IEEE CS Press, 1994), and a monograph entitled 'Indexing Techniques for Advanced Database Systems' (Kluwer Academic Publishers, 1997). Kian-Lee was a Visiting Scientist at IBM's Almaden Research Center, California (Jan-Jul 92), and CSIRO's Canberra Laboratory, Australia (Jun 94-Jun 95). Kian-Lee is a member of the Association of Computing Machinery (ACM) and IEEE Computer Society. Beng Chin Ooi, received his B.Sc. and Ph.D in Computer Science from Monash University, Australia, in 1985 and 1989, respectively. He was with the Institute of Systems Science from 1989 to 1991 before joining the Department of Information Systems and Computer Science, National University of Singapore. His research interests include database performance issues, multimedia databases and applications, high-dimensional databases and GIS. He is the author of a monograph entitled 'Efficient Query Processing in Geographic Information Systems' (Springer-Verlag, LCNS, No. 471, 1990), and a co-author of a recent monograph entitled 'Indexing Techniques for Advanced Database Applications' (Kluwer Academic Publishers, 1997). He has published 50 conference/journal papers and served as a PC member for a number of international conferences (including ACM SIGMOD, VLDB, EDBT, DASFAA) and serves as an editor for Geoinformatics, Intl. Journal of GIS, and Journal on Universal Computer Science. He is a member of the Association for Computing Machinery (ACM) and the IEEE Computer Society.