Personalized cache management for mobile computing environments

Personalized cache management for mobile computing environments

Information Processing Letters 87 (2003) 221–228 www.elsevier.com/locate/ipl Personalized cache management for mobile computing environments Ho-Sook ...

230KB Sizes 0 Downloads 70 Views

Information Processing Letters 87 (2003) 221–228 www.elsevier.com/locate/ipl

Personalized cache management for mobile computing environments Ho-Sook Kim ∗ , Hwan-Seung Yong Department of Computer Science and Engineering, Ewha Institute of Science and Technology, Seoul, Republic of Korea Received 29 November 2001; received in revised form 28 December 2002 Communicated by A. El Abbadi

Keywords: Databases; Cache management; Mobile computing

1. Introduction The mobile computing market is increasing rapidly. Several methods have been proposed that deal effectively with the restrictions of the mobile computing environments. In particular, many studies have been done on caching methods. Caching frequently accessed data on the mobile host side can reduce interactions between clients and servers [1,6]. Previous research on cache methods was designed according to temporal properties of data such as access time or updating frequency [2,6]. These studies, however, have limitations because they overlook users’ mobility and the spatial attributes of geographical data. To support users’ mobility, Liu et al. [7] suggested a server-oriented cache management method that exploits information on the estimated direction and range of a mobile host. Each mobile client, however, freely and frequently connects with, or disconnects from, the wireless network. Therefore, it is unfeasible for a server to keep track of all cached copies of individual items. Clients should take a more ac* Corresponding author.

E-mail address: [email protected] (H.-S. Kim). 0020-0190/$ – see front matter  2003 Published by Elsevier B.V. doi:10.1016/S0020-0190(03)00352-1

tive role in maintaining cached items [3]. Based on the location of users and the location-dependent nature of data, Dunham et al. [5] envisioned geometrically distributed spatial replicas located at different sites throughout an entire geographic domain. These methods, however, assume that all spatial attributes of data in a cell are identical except for their locations. Ren et al. [8] proposed a semantic caching scheme called FAR, which considers the location, speed and direction of users. To support virtual walkthrough applications in distributed virtual environments, Chim et al. [4] proposed a caching algorithm that considers the importance of data such as the distance of an object from the viewer, as well as the size and orientation of an object. The density of a region and various query patterns, however, are not considered in their algorithms. In this paper, we show that the movement of a mobile host determines the changes in the value and semantics of data in the mobile host’s cache. We argue that nearby data is better suited to give answers to users’ queries in mobile environments. In addition, we define spatial properties of data such as the location and region. Using these spatial properties, we propose two new cache replacement methods that efficiently

222

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

support users’ mobility and spatial attributes of data. We also analyze a variety of factors affecting the cache hit ratio of a mobile host, and we evaluate the performance of cache replacement methods based on these factors. Finally, we propose a personalized cache replacement selection algorithm with optimal performance in variable mobile environments.

2. Cache replacement methods using the spatial properties of data 2.1. Basic assumptions and definitions of terms We assume that a server broadcasts information of the spatial data included in its area and that all data in its database have the same size. Each datum has a spatial location and logical range in which it has importance. We define spatial range as the region of a datum. Weather information, for example, is meaningful in a certain geographical area, and traffic information is useful in a smaller area [3]. Another example is the target area of mobile advertisements, which are transmitted to wireless hosts such as PDAs and cellular phones, based on the characteristics of the wireless Internet such as personalization and immediacy [9]. When stores wish to send locality-based advertisements to mobile hosts, the advertising cost determines the size of the region covered. A hamburger chain, for example, wants to advertise a discount coupon at lunchtime to mobile hosts within 200 m of the store. In this paper, the region of a datum is represented by the minimum boundary rectangle (MBR). The properties of a mobile host comprise the current position of the mobile host (Xh , Yh ), the cache region (CR) and the cache size (CS). The CR is represented as a circle whose radius is rCR . The cache of a mobile host can store all the data located in the CR. The CS represents the number of data that can be stored in the cache. 2.2. The design of new cache replacement methods based on spatial properties of data Since a mobile host changes its location over time, the response to a query such as “Where is the nearest subway station?” depends on the location of

the mobile host. This means the value and weight of geographic data in a cell change as the mobile host changes its location. Hence, to improve the cache hit ratio, we propose new cache replacement methods that support users’ mobility and the spatial properties of data. Definition 1 (Cache replacement based on the location of data (CR_LOC)). CR_LOC is a cache replacement method based on the distance between the location of a mobile host and the location of a datum. When a mobile host replaces the cache, the datum in the farthest location from the mobile host is deleted from the cache. Definition 2 (Distance from a mobile host to the region of a datum). If a mobile host is inside the region of a datum, the distance is 0. Otherwise, it is defined as the shortest Euclidean distance from the current position of the mobile host to the MBR of the region of the datum. Definition 3 (Cache replacement based on the region of data (CR_REG)). CR_REG is a cache replacement method based on the distance between the location of a mobile host and the region of a datum. When a mobile host replaces the cache, it selects a victim with the longest distance from the mobile host to the region of a datum. Fig. 1 shows changes in the cache state according to the movements of a mobile host to which CR_LOC and CR_REG are applied when the CS is 4. Figs. 1(a) and 1(d) show the initial states of the cache when H was located at (3, 7). After H moved to (5, 5), data 1, 2, 4 and 5 were stored in the cache (Figs. 1(b) and 1(e)). In Fig. 1(c), data 1, 3, 4 and 5 remained in the cache after H moved to (7, 3). Because the CS was 4, datum 2 had to be deleted to insert datum 3. On the other hand, as in Fig. 1(f), data 2, 3, 4 and 5 remained in the cache, and datum 1 was deleted. Regarding the location of data, datum 1 was nearer than datum 2 to the location of the mobile host (7, 3). Datum 2, however, which had a larger region than datum 1, remained in the cache for a long time. Moreover, datum 5 remained valuable although the region was small because it was on the path of the mobile host’s movement.

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

223

of rDTO , and applies that value to all users located in the server’s area. Such cases may be adapted to environments where special patterns of queries usually occur in accordance with the characteristics of data stored in the server. Definition 5 (Density of a region (DR)). The DR is the degree of density of data included in the area compared with the average density of data included in the total area managed by the server. We graded three levels: sparse, average and dense. Fig. 1. Changes in the cache state (H: mobile host; 1–5: data; circle: CR; rectangle: region of a datum; arrow: the moving path of the mobile host).

3. Various factors affecting the cache hit ratio of a mobile host We use the cache hit ratio as a criterion for the efficiency of cache replacement methods. The cache hit ratio is a percentage based on the number of queries whose target data is stored in the cache divided by the total number of queries. We selected four factors that affect the cache hit ratio during the performance of cache replacement methods: the CS, the distribution of query target objects, the density of a region and query patterns. We define them below. Definition 4 (Distribution of query target objects (DTO)). The DTO is the geometrical distribution area of data required by queries performed at the current position of the mobile host. The DTO is represented by a circle whose radius the rDTO is determined by the distance from the mobile host to the farthest datum among the required data. The DTO means the spatial area of interest of a mobile host. When we determine rDTO , we either apply a special value to each user or apply the same value to all users located in a special server’s area. In the former case, each mobile host analyzes the average rDTO values of the historical queries of the user and applies that value for cache management. If users usually access nearby data from the mobile host, then rDTO is small. In the latter case, the server analyzes the historical queries, calculates the average values

To determine the DR, we divided a server’s area into horizontal (I ) × vertical (J ) blocks. When the number of data in the server is N , one block may contain block_Avg = N/(I × J ) objects on average. If a block has more data than (1 + α) × block_Avg, we classify the block as a dense region. If a block has less than (1 − α) × block_Avg, the block is classified as sparse. Otherwise, it is an average region. Definition 6 (Query pattern). A query pattern is a set of characteristics of users’ queries. We classify query patterns as location-oriented queries or regionoriented queries. A query based on the locational property of data, such as “How far from here to the Shin-chun station?”, is called a location-oriented query. A query oriented by the regional property of data, such as “Which factories have influence on the pollution of this stream?”, is a region-oriented query. When we perform cache replacement methods, we need to calculate the rCR while considering the CS and the characteristics of the area where the mobile host is included:  Sarea × CS , rCR = Scount × π where Sarea is the area covered by the server’s region, Scount is the number of data stored in the server’s database and π is the ratio of the circumference of a circle to its diameter. The value of π is 3.14. Fig. 2 shows an example in which 20 bits of data (represented by points) are distributed in a server’s area whose size is 100 units (10 units × 10 units). We calculated the proper radius of the CR in this example. The CS of the mobile host located at (5, 5)

224

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

Fig. 2. Distribution of data in a server.

was assumed to be 5. Since Sarea is 100 and Scount is 20, therefore rCR  2.82. The inner circle, its radius is the rCR , indicates the area that the cache of the mobile host can cover. In Fig. 2, if (a) is the farthest datum required at the current location of the mobile host, all the data required by the queries is located in the outer circle. In this case, rDTO is 4.

4. Performance study To validate the efficiency of the proposed cache replacement methods, we constructed a spatial database using Informix Spatial DataBlade Module. We evaluated the performance under various conditions that affect the cache hit ratio of a mobile host, including the Table 1 Parameter list for experiments Notation

Description

CS rCR rDTO CR_type

Size of the cache (percentage of server DB) Radius of the CR Radius of the DTO Cache replacement methods: CR_LRU, CR_LOC, CR_REG Moving paths of a mobile host Location-oriented queries: region-oriented queries

Path_no Query pattern

CR_LOC and CR_REG methods proposed in this paper, as well as the CR_LRU method, which uses the conventional cache replacement method LRU. The parameters used in our experiments are listed in Table 1. Our experiments covered a rectangular area around Seodaemun-Gu, 5.5 km wide and 3.5 km long. We set 100 m as the unit for coordinates. Consequently, the total server’s area was evenly subdivided into 1,925 (55 × 35) square units. The data in the server was classified into 9 types such as post offices, schools and stores; the number of data was 1,020. To determine the densities of regions, we subdivided the server’s area into 77 (11 × 7) blocks with sides 500 m long. When α = 0.3, there were 30 sparse region blocks (39%), 26 average region blocks (34%) and 21 dense region blocks (27%). Nine paths of a straight-line type were selected as the moving paths of a mobile host. The density of a path was calculated by the average number of data in the blocks that intersected with the path. When we converted the value of block_Avg to 1.0, the density of each path was adjusted, as shown in Table 2. In the experiments, we performed two types of queries: location-oriented queries referred to the location of a target datum for queries within the DTO, and region-oriented queries referred to the region of a target datum for queries that overlapped with the DTO. Three different methods based on the type of data were used to determine the nature of each region. The first region, which consisted of public offices such as post offices and police stations, was determined by the MBR, including the jurisdiction area. The second region, which consisted of types such as schools and subway stations, were assumed to be inversely proportional to the number of data of the same type; for example, since the number of schools in the server’s area √ was 65, the school-type region was the square of 1925/65 ≈ 5 in length. The third region, which consisted of profit-making organizations such

Table 2 Densities of paths Path_no Density α = 0.3

1

2

0.59 0.68 sparse path

3

4

5

6

7

0.91

0.95

1.02 average path

1.09

1.15

8

9

1.31 1.37 dense path

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

225

Table 3 Parameter values for the experiments Exp. 1

Exp. 2

Exp.3

Exp. 4

Exp. 5

CS

5%, 10%, 15%, 20%, 30%

15%

5%, 10%

5%, 10%, 15%, 20%

5%, 10%, 15%, 20%

rCR rDTO

5, 7, 9, 11, 13 9

9 5, 7, 9, 11, 13, 15

5, 7 5, 7, 9, 11

5, 7, 9, 11 7

5, 7, 9, 11 7

CR_type

CR_LRU, CR_LOC, CR_REG

CR_REG

CR_LOC, CR_REG

CR_LOC, CR_REG

CR_LOC, CR_REG

Path_no

3, 4, 6, 7, 8, 9

1, 3, 5, 8

2, 9

1∼9

1∼9

Query pattern Loc:Reg

0:10

0:10

(a) 0:10 (b) 10:0

5:5

10:0, 7:3, 5:5, 3:7, 0:10

as stores and hospitals, was determined by the weight attribute; in this case we randomly assigned a weight to each object. The parameter settings used in these experiments are summarized in Table 3.

from 5 to 15, and when the rCR was 9. When the rDTO was smaller than the rCR , the CS of a mobile host was larger than the number of data required by queries. Consequently, the cache hit ratio was high. However, when the rDTO was larger than the rCR , the hit ratio was quickly reduced.

4.1. Experiment 1: Cache hit ratio according to increases in the CS using the CR_LOC, CR_REG and CR_LRU methods Our first set of experiments compared the performance of the CR_LOC, CR_REG and CR_LRU methods according to increases in the CS. Fig. 3 shows that the cache hit ratio increases as the CS increases; furthermore, the CR_LOC and CR_REG methods completely outperform the CR_LRU method. Fig. 3 shows that the cache hit ratio of the CR_LRU method increased regularly as the CS increased, but that the rate of increase of the cache hit ratio for the CR_REG method was reduced. This phenomenon occurs because when the CS of a mobile host is bigger than the total number of data located in the DTO, the cache includes data less related to the current position; and a cache that includes such data has a reduced effect on the cache hit ratio.

Fig. 3. Cache hit ratio of cache replacement methods as the CS varies from 5 to 30%.

4.2. Experiment 2: Cache hit ratio according to an increase of the rDTO using the CR_REG method The DTO may be changed by characteristics of application programs. In experiment 2, we tested the cache hit ratio as the rDTO increased when regionoriented queries were performed. Fig. 4 shows the results of cache replacement when the rDTO varied

Fig. 4. Cache hit ratio as the rDTO varies from 5 to 15 when the rCR is 9.

226

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

4.3. Experiment 3: A comparison of the CR_LOC and CR_REG methods in relation to the cache hit ratio according to changes in the DR and query patterns We compared the performance of two cache replacement methods (CR_LOC and CR_REG) under more realistic environments. We selected two paths with different densities (sparse and dense) and two kinds of queries (region-oriented and locationoriented). The CR_LOC method had a higher hit ratio than the CR_REG when the DR was high. When the DR was low, the CR_REG method had a better performance. This result was caused by the CR_REG method handling more data for processing the cache replacement than the CR_LOC method. Therefore, when a mobile host moves through a dense region, the size of the cache is less than the number of data in the DTO of the mobile host. This situation causes frequent cache replacements and lowers the cache’s efficiency. Moreover, Fig. 5 shows that the hit ratio is also affected by query patterns even though they are under the same DR. As in Fig. 5(a), when the query pattern is region-oriented, the mobile host passing through a dense region has nearly the same hit ratio whether it uses the CR_REG or CR_LOC method. Nevertheless, as in Fig. 5(b), when queries performed are mainly location-oriented, a mobile host passing through dense regions has a higher performance with the CR_LOC method than with the CR_REG method. This means that for a higher cache hit ratio a mobile host needs to consider query patterns when it passes through various densities.

Fig. 5. Cache hit ratio for two different DRs under (a) region-oriented queries and (b) location-oriented queries.

4.4. Experiment 4: Cache hit ratio according to an increase in the cache requirement rate using the CR_LOC and CR_REG methods According to the results of experiments 1–3, we know that the CS, DTO and DR all influence the cache hit ratio. Hence, we need a new parameter that simultaneously reflects these elements in order to generate a more efficient cache replacement method. We defined the cache requirement rate (CRR) and tested the cache hit ratio while varying the CRR for two cache replacement methods. Definition 7 (Cache requirement rate (CRR)). The CRR is a percentage derived by dividing the number of data in the DTO by the CS, as follows: CRR =

Number of data in DTO × 100 (%). CS

In Fig. 2, for example, if the number of data in the DTO is 8 and the CS is 5, the CRR is (8/5) × 100 = 160%; that means that 1.6 times the current CS is required for a 100% cache hit ratio. In experiment 4, we compared the performance of the CR_LOC and CR_REG methods as the CRR varied under various CSs and different DRs. The results are shown in Fig. 6. In Fig. 6, when the CRR value is small, the CR_REG method is better than the CR_LOC method; and when the CRR value is large, the CR_LOC method performs better. In addition, before the CRR value reaches 40, the CR_REG method has a higher hit ratio; after the CRR value reaches 40, the CR_LOC method performs better. We define the cross point (CP) as the value of the CRR for whichever cache replacement method needs to be changed.

Fig. 6. Cache hit ratio of cache replacement methods as the CRR varies from 15 to 129.

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

Fig. 7. Change of CP values under different query patterns.

Definition 8 (The CP of the CRR). As the CRR increases, the more efficient cache replacement method changes from the CR_REG method to the CR_LOC method. We define the CP as the value of the CRR at the changeover time. 4.5. Experiment 5: Variations of the CP value based on changes in query patterns Our final set of experiments studied variations of the CP value based on changes in query patterns. Fig. 7 shows the results of the CPs affected by users’ query patterns. When queries are mainly about the location of data, the CP is low; the more the queries are about the region of the data, the higher the CP is. The value of the CP is determined by the characteristics of data stored in a database server and the CP is broadcasted from the server to each mobile host.

line1: line2: line3: line4: line5: line6: line7: line8: line9: line10:

227

cache_replacement_selection( ) { Connect to a Server; Get Server_region, Data_count, and Standard_CPs; Calculate rCR ; Set base_CP; While (Current_Position is included in Server’s area) {Get Broadcasted Data; Calculate Current_CRR; IF (Current_CRR < base_CP); Cache_Replacement_Method = CR_REG; ELSE Cache_Replacement_Method = CR_LOC; } }

Algorithm 1. Personalized cache replacement selection algorithm.

stored in the database and the standard CP values (line 2); it then calculates the rCR using its own CS and DR (line 3) and determines the personalized base CP value that should be adapted by its own query pattern among standard CP values that have been broadcasted (line 4); the server broadcasts the spatial data information, and the mobile host hears its related location (line 6); for a mobile host to replace its cache, it calculates the current CRR value using the CS, the DTO and the DR at that position (line 7); if the current CRR value is smaller than the base CP value, it should select the CR_REG method for cache replacement (line 9); otherwise, it should select the CR_LOC (line 10). Algorithm 1 summarizes the process.

4.6. Personalized cache replacement selection algorithm 5. Conclusion Our experiments show that a CS, DTO, DR and query patterns affect the cache hit ratio. Therefore, to select the optimum cache replacement method for each mobile host, we propose an algorithm that considers these elements. We assume that mobile hosts play the main role in cache management and that they know the CS, the DTO and users’ query patterns. In addition, a server broadcasts the size of the total area, the number and distribution of data stored in the server’s database and the standard CP values of the server area for each query pattern. The scenario of the personalized cache replacement selection algorithm is as follows: when a mobile host enters a specific server’s area (line 1), it hears the size of the total server area, the number of data

In this paper, we propose two new cache replacement methods, the CR_LOC method and the CR_REG method, to efficiently support the mobility of users and the spatial properties of data for mobile computing environments. We tested the relationship between the cache hit ratio and factors such as the CS of a mobile host, the DTO, the variable density of the target region and query patterns. Finally, based on the results of the experiments, we propose a personalized cache replacement selection algorithm with optimal efficiency for each mobile host in different conditions. In the future, we will expand performance experiments to compare our cache selection algorithm with other algorithms, including FAR [8]. Moreover, we would like to study

228

H.-S. Kim, H.-S. Yong / Information Processing Letters 87 (2003) 221–228

the directory architecture of a data server for broadcast with respect to the spatial properties of data. [5]

References [1] R. Alonso, H.F. Korth, Database system issues in nomadic computing, in: Proceedings of ACM SIGMOD Conference, 1993, pp. 388–392. [2] D. Babara, T. Imielinski, Sleepers and workaholics: caching strategies in mobile environments, in: Proceedings of ACM SIGMOD Conference, 1994, pp. 1–12. [3] B.Y. Chan, A. Si, H.V. Leong, Cache management for mobile databases: design and evaluation, in: Proceedings of IEEE CS Data Engineering, 1998, pp. 54–63. [4] J.H.P. Chim, M. Green, R.W.H. Lau, H.V. Leong, A. Si, On caching and preferching of virtual objects in distributed virtual environments, in: ACM Multimedia 98—Electronic

[6]

[7]

[8]

[9]

Proceedings, 1998, http://www.acm.org/sigs/sigmm/MM98/ electronic_proceedings. M.H. Dunham, V. Kumar, Location dependent data and its management in mobile databases, in: DEXA Workshop, 1998, pp. 414–419. J. Jing, A. Helal, A. Elmagarmid, Client-server computing in mobile environments, ACM Comput. Surv. 31 (2) (1999) 117– 157. G.Y. Liu, G.Q. Maguire Jr, A mobility-aware dynamic database caching scheme for wireless mobile computing and communications, Distributed and Parallel Databases 4 (3) (1996) 271–288. Q. Ren, M.H. Dunham, Using semantic caching to manage location dependent data mobile computing, in: Proceedings of MobiCom 2000, Boston, MA, 2000, pp. 210–221. Y.-S. Youn, S.-W. Shin, Marketing and trade of the mobile advertising service, J. Korean Inform. Process. Soc. 9 (2) (2002) 32–37.