Decision Support Systems 46 (2009) 492–500
Contents lists available at ScienceDirect
Decision Support Systems j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / d s s
Performance evaluation for implementations of a network of proxy caches Chetan Kumar ⁎ Department of Information Systems and Operations Management, College of Business Administration, California State University San Marcos, 333 South Twin Oaks Valley Road, San Marcos, CA 92096, United States
a r t i c l e
i n f o
Article history: Received 9 January 2008 Received in revised form 13 August 2008 Accepted 4 September 2008 Available online 14 September 2008 Keywords: Caching Proxy cache network Collaboration mechanism Performance evaluation
a b s t r a c t In a network of proxy-level caches, such as IRCache (www.ircache.net), nodes collaborate with one another to satisfy object requests. However, since collaboration in current implementations of proxy cache networks is typically limited to sharing cache contents, there may be unnecessary redundancies in storing objects. It is expected that a mechanism that considers the objects cached at every node in the network would be more effective for reducing user delays. In this study we construct algorithms for different implementations of such a mechanism using the theoretical approach of Tawarmalani et al. [Tawarmalani, M., Karthik, K., and De, P., Allocating Objects in a Network of Caches: Centralized and Decentralized Analyses, (2007) Purdue University Working Paper] that investigate caching policies where nodes do consider objects held by their neighbors. The caching implementations are also compared and contrasted with numerical computations using simulated data. The performance results should provide useful directions for computer network administrators to develop proxy caching implementations that are suited for differing network and demand characteristics. There is a significant potential for deploying proxy cache networks in order to reduce the delays experienced by web users due to increasing congestion on the Internet. Therefore we believe that this study contributes to network caching research that is beneficial for Internet users. © 2008 Elsevier B.V. All rights reserved.
1. Introduction and problem motivation Caching involves storing copies of web objects in locations that are relatively close to the end user. As a result user requests can be served more quickly than if they were served directly from the origin web server [3,9,10,13]. Caching may be performed at different levels in a computer network. These include the browser, proxy, and web-server levels [9,14]. This paper deals with proxy-level caching. Proxy caching is widely utilized by computer network administrators, technology providers, and businesses to reduce user delays and to alleviate Internet congestion (www.web-caching.com). Some well known examples include proxy caching solution providers such as Microsoft (www.microsoft.com/isaserver) and Oracle (www.oracle.com/technology/products/ias/web_cache/), Internet service providers (ISP) such as AT&T (www.att.com), and content delivery network (CDN) firms such as Akamai (www.akamai.com). Effective proxy caching mechanisms are beneficial for reducing network traffic, load on servers, and the average delays experienced by web users [3,13,19]. Specifically we focus on a framework of a network of proxy caches. Unlike the case of a single proxy cache dedicated to one network, proxy caches that are connected together as a network collaborate with each other to improve the overall caching performance in terms of user delays. If a particular proxy cache cannot satisfy a request arriving at it then the object is searched for at its ⁎ Corresponding author. Tel.: +1 760 477 3976. E-mail address:
[email protected]. 0167-9236/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.dss.2008.09.002
neighbors in the network of caches. If none of its neighbors is able to satisfy the object request either, then the request is satisfied directly from the origin sever. Note that collaboration in this setup is limited to only sharing the cache contents. Fig. 1 illustrates how proxy caches may collaborate in a network with nodes at three locations. The demand for web pages chrysler.com, ford.com, and mercedes-benz.com at the U.K. node, that has not cached them, is satisfied from the U.S. and Germany nodes. Therefore the U.K. node need not go to the origin server to satisfy requests for objects it does not hold itself, but are cached by its neighbors. As networked nodes collaborate and share cache contents, objects have to be fetched from the origin server less frequently, thereby reducing user delays. However the degree of collaboration among the nodes can vary depending on the cache network under consideration. Proxy cache networks have been implemented both for public usage as well as in private organizations. Examples of public domain implementations include IRCache (www.ircache.net) and DESIRE (www.uninett.no/ arkiv/desire/). Private organizations that utilize proxy cache networks include CDN providers such as Akamai (www.akamai.com), and ISPs such as AOL (www.aol.com). Network caching protocols, such as ICP and CARP, are supported by most well-known proxy servers including Squid (note that IRCache utilizes Squid), Microsoft ISA Proxy Server, and Sun Java System Web Proxy Server (www.web-caching.com/ proxy-caches.html). Proxy cache networks can significantly reduce user delays. For example, at IRCache network objects that are cached at locations close to the user can be served five times faster in fractions of seconds than the alternative case. By reducing delays proxy caching
C. Kumar / Decision Support Systems 46 (2009) 492–500
493
Fig. 1. A proxy cache network with three nodes (Source: www.ircache.net).
mechanisms benefit both the specific network where it is used as well as all Internet users in general. In a typical proxy cache network implementation, such as IRCache, each node in the network makes its own caching decisions based on the request patterns it observes. Current network caching protocols primarily focus on collaboration by sharing cache contents, and the caching decisions do not effectively take into account objects already held by neighboring nodes [17]. Hence multiple copies of the same object may be unnecessarily stored within the cache network. It is expected that a mechanism that considers the objects cached at every node in the network would incur less user request delays than current cache network implementations. A few studies have investigated such caching policies where nodes do consider objects held by their neighbors under different coordination scenarios in the network [2,18]. The network coordination scenarios include centralized and decentralized frameworks. An example of a centralized implementation is when a single firm that owns a number of caches has control over the network caching decisions. In a decentralized framework the caches operate in a competitive environment and do not coordinate their actions. This paper's primary contribution is to use simulated data to perform numerical analyses of the caching policies investigated in Tawarmalani et al. [18]. We develop algorithms for implementing a network of caches under both centralized and decentralized frameworks. The caching implementations are also compared and contrasted using numerical computations. The performance results should provide useful directions for computer network administrators to develop proxy caching implementations that are suited for differing network and demand characteristics. There is a significant potential for deploying proxy cache networks in order to reduce the delays experienced by web users due to increasing congestion on the Internet [3]. Therefore we believe that this study contributes to network caching research that is beneficial for Internet users. The plan of the rest of this paper is as follows. We first discuss literature related to our topic. We then review the theoretical developments for implementations of cache networks. Next we perform numerical computations for implementing the proxy cache networks and comparing their performance. Finally we discuss conclusions and areas for future research. 2. Related literature Caching has been extensively studied in computer science and other technical areas. In recent times there has been a growing interest in the topic in Information Systems research. Datta et al. [3], and Mookherjee
and Tan [14], among others, have noted caching as an important research area because of its usefulness in reducing user delays due to increasing Internet congestion. Podlipnig and Boszormenyi [16], and Datta et al. [3], provide a comprehensive survey of a number of caching techniques. These include widely used cache replacement strategies such as least recently used (LRU), where the least recently requested object is evicted from the cache to make space for a new one; least frequently used (LFU), where the least frequently requested object is removed; and their many extensions. A majority of studies on caching focus on improving performance on metrics such as user latency, which is the delay in serving user requests, and bandwidth reduction. There have been relatively few studies that analytically model the behavior of caches, and consider the economic implications, while providing insights for managing them effectively. Mookherjee and Tan [14] provide a rigorous analytical framework for the LRU cache replacement policy. The framework is utilized to evaluate LRU policy performance under various demand and cache characteristics. Their study models caching at the browser level for individual caches. Hosanagar et al. [9] develop an incentive compatible pricing scheme for caching with multiple levels of Quality of Service. Their model considers the case of a monopolistic proxy caching service provider and multiple content publishers. Hadjiefthymiades et al. [8] model a game theoretic approach for caching, but also for a case of a single proxy cache and multiple users. The study develops a noncooperative game where competing users are allocated proxy cache space such that monopolizing scenarios are avoided. Using this scheme a pure equilibrium is identified that guarantees similar performance levels for all users. Park and Feigenbaum [15] design a bidding mechanism that provides incentives for users to truthfully reveal their willingness to pay for caching services. The study constructs a computationally tractable algorithm for implementing the mechanism, though for the case of multiple users connected to a single cache. Kumar and Norris [13] develop a model for proxy caching that exploits historical patterns of user requests for caching decisions. Their mechanism, that is shown to perform favorably versus LRU policy using a web trace dataset, is specific for individual proxy caches. Though there has recently been a growing interest in the benefits of proxy cache networks, literature in this area is relatively scarce. Most existing caching network protocols, such as ICP that is supported by IRCache, focus on collaboration by sharing cache contents [17]. Since individual caches do not consider the objects held by their neighbors while determining their own holding strategies, the current caching networks can have unnecessary object replications. This independent cache behavior, referred as an “ad hoc scheme” by Ramaswamy and Liu [17], can lead to suboptimal performance for the network in terms of
494
C. Kumar / Decision Support Systems 46 (2009) 492–500
user delays. A few studies have considered the problem of determining decisions that are optimal for the proxy cache network. Chun et al. [2] consider optimal decisions, under both centralized coordination (optimal social welfare perspective) and decentralized game (selfish cache behavior perspective) scenarios, where caches have a distance metric for accessing objects within the network. The study models the decision problem without cache capacity restrictions. As a result, the centralized coordination scenario is shown to reduce to the mini-sum facility location problem, and a pure equilibrium is identified for the decentralized game. Hosanagar and Tan [10] develop a model for optimal replication of objects, in a centralized scenario, using a version of the LRU policy. The study considers a framework of two caches whose capacities are partitioned into regions where the level of duplication is controlled. Ercetin and Tassiulas [4] construct a market based mechanism for minimizing latency in a network where CDN proxy caches and content providers are behaving selfishly in a noncooperative game. In addition to characterizing the equilibrium of this game, the study also investigates the conditions for the existence of a unique equilibrium. However the centralized coordination scenario is not considered. This study is distinct from prior work as we consider proxy cache networks, with cache capacity restrictions, and numerically evaluate the optimal performance under both centralized coordination and decentralized game scenarios. The basic model for the two approaches is taken from the Tawarmalani et al. [18] study. The centralized and decentralized models are implemented using algorithms and their performance is compared with numerical computations. This study expands on preliminary computations of Kumar [12] by a comprehensive performance evaluation of the proposed mechanisms. 3. Model review In this section we review the theoretical developments for implementations of proxy cache networks from Tawarmalani et al. [18]. The models are reviewed in some detail here so readers have a descriptive illustration for the algorithms developed next in numerical computations Section 4. In addition we describe the current network caching method, and discuss motivations for other caching implementations in contrast to that. Later in numerical computations we construct some illustrative numerical examples for caching approaches for different problem sizes. In the following subsections we first describe the current implementation and characteristics of a proxy cache network such as IRCache. Next we discuss two frameworks, centralized and decentralized, under which network caching maybe implemented [2,18]. In the centralized framework all caching decisions are completely coordinated based on overall network request patterns. Under a decentralized framework the caches interact with one another without the presence of any controlling authority. The motivation for both approaches is further described in their respective subsections. 3.1. Proxy cache network structure As a first step we consider the current implementation of a proxy cache network such as IRCache. Note that in this implementation a user chooses a particular cache and necessarily goes through that node for accessing any object. Every cache in the network attempts to minimize the waiting time of satisfying its local demand for different objects. We model this behavior for a “snapshot” time period assuming a known demand for every object at each proxy-cache location. In current network cache implementations, such as IRCache (www.ircache.net), nodes with cached objects are regularly measured for network proximity using echo requests. Some proxy caching mechanisms use historical request patterns to predict object demand, though they may allow exceptions for dynamic content requests [3,20]. Analogous to these cases we assume object demand at nodes is known a priori as a first cut of the model. Subsequent versions may relax this assumption. Let the sets N = {1,…,n} and M = {1,…,m},
represent objects and caches, respectively. The aggregate demand for object iaN at cache jaM for any “snapshot” time period is denoted by αij, and αij ≥ 0, ∀iaN, ∀jaM. Let the caches jaM have fixed capacities denoted by K = {k1,…,km}, and let kj ≤ n, ∀kj aK. For simplicity, we assume that all objects are of unit size, and that there are no communication congestion delays between locations. We associate a cost with the waiting time that any end user faces between requesting an object and actually receiving it. The waiting time costs can also be viewed as the cost of communication that is incurred while the caches and end-users are interacting with one-another. Let cl, cn, and co represent the unit waiting cost of satisfying an object request from the local cache, a neighbor cache, and directly from the origin web server, respectively. By definition, cl ≤ cn ≤ co, as origin server typically involves highest waiting times, followed in decreasing order by neighbor and local cache. Every cache's objective is then to minimize the total cost of waiting. (Note that traditional measures for caching performance—for example, maximization of hit-ratio, external bandwidth saving, etc.— all aim at reducing user-delays.) Of course this objective is constrained by the fact that the cache has a limited capacity. This problem can be easily formulated as a mathematical program for each cache location shown briefly as follows: min {cl(objects served from local cache) + cn (objects served from neighbor cache) + co(objects served from origin server)}, s.t. a request is satisfied from any one of local or neighbor caches, or else the origin server and the local cache stores objects restricted by its capacity. An intuitive solution for the cost minimization objective would be for every cache j to rank the objects in decreasing order of demand αij. The cache is then filled to capacity with the most demanded objects. The demand for remaining objects, which could not be accommodated within the cache, is then satisfied by either a neighbor cache or the origin server. Of course the origin server route is used only if none of the neighbor caches has this object. 3.2. Centralized mechanism As noted earlier, in the current implementation of cache networks because of the lack of coordination among caches there could be redundant copies of objects at multiple locations. We now consider the case where there is a central administrator who coordinates all the caching decisions based on complete information about the object request patterns in the network. An example of this scenario would be a single firm that owns a number of caches across multiple locations and has complete control over the network caching decisions. The firm's objective is to minimize the overall network waiting time by deciding which objects should be stored at each of the caches. The solution to this problem provides the optimal social welfare outcome in terms of the network costs. The performance of the centralized mechanism may then be used as a benchmark for comparison against other mechanisms that do not assume complete coordination among the caches. The central administrator's decision process is formulated as a 0-1 mathematical program model as follows [18]. m n n n ðLSÞ min ∑ ðcl −cn Þ ∑ α ij xij þ cn ∑ α ij þ ðco −cn Þ ∑ α ij yi ð1Þ j¼1 n
s:t: ∑ xij V kj
i¼1
i¼1
i¼1
8j 2 M
ð2Þ
8i 2 N
ð3Þ
i¼1 m
yi z1− ∑ xij j¼1
yi z0
8i 2 N
ð4Þ
xij 2 f0; 1g 8i 2 N; 8j 2 M
ð5Þ
yi 2 f0; 1g
ð6Þ
8i 2 N;
where xij is 1 if object i is held in cache j and 0 otherwise, and yi is 1 if object i is procured from the origin server and 0 otherwise. In Problem
C. Kumar / Decision Support Systems 46 (2009) 492–500
(LS) variable xij provides the optimal solution for the mathematical program model, i.e., the locations at which objects should be cached. The objective function (1) captures the central planner's goal of minimizing network costs. Constraints (2) provide the cache capacity restrictions. Constraints (3) and (4), along with the non-negative objective coefficient of yi and minimization objective function (1), ensure that (a) any object held by a neighbor cache is only obtained when it is not present in the local cache, and (b) an object is obtained from the origin server only when no cache within the network has that object. The optimal solution to (LS) minimizes redundancy of objects cached in the network. Tawarmalani et al. [18] show that (LS) can be solved in polynomial time since it reduces to the transportation problem [1,6]. As a result the optimal social welfare caching decisions can be efficiently determined by computer network administrators under the centralized scenario. The performance of the centralized mechanism may then be used as a benchmark for comparison against other mechanisms that do not assume complete coordination among the caches. 3.3. Decentralized mechanism In the centralized mechanism we assume that all the caches in the network can be fully coordinated within a single firm. Instead of requiring complete coordination among the caches, we next let the caches interact with one another in a decentralized manner. We now consider a decentralized mechanism where the caches operate in a competitive environment and do not coordinate their actions. In this mechanism every cache behaves selfishly and individually tries to minimize its own costs based on network demand patterns. Note that existing implementations of proxy cache networks such as IRCache can also be viewed as decentralized mechanisms. However at IRCache the cooperation between caches is primarily limited to sharing their cache contents. In the decentralized mechanism the decisions made by the individual caches are based not only on their own demands, but also on the demands of their neighbors. Hence this approach may significantly reduce caching redundancies compared to current implementations. An example of such an arrangement could be when a number of firms decide to share their cache contents for reducing user delays, but they are interested in getting the best possible delay-reduction performance at their own caches. The key difference here, with respect to the centralized mechanism, is that every cache makes its own caching decisions without the presence of a controlling authority. We use the network structure outlined in Section 3.1 to discuss the decentralized caching framework. As before the network demand patterns and cache capacities are assumed to be known a priori. Every cache j determines to hold the set of objects that minimizes its cost, given the holding strategies of other caches j′ ≠ j. The actions of caches in a general setting are modeled using a game-theoritic approach by Tawarmalani et al. [18]. Similar to earlier studies, we focus on the pure strategy Nash equilibrium solutions of this simultaneous move game [2,18]. This is the outcome where no cache benefits by changing it object holding strategy while other nodes keep their strategies unchanged [5]. In the “caching game” each node j solves the following problem, given the strategy of every other cache j′ ≠ j, j′ a M [18]. ðGSÞ
n
n
i¼1
i¼1
min ðcl −cn Þ ∑ α ij xij þ cn ∑ α ij
x1j ;:::;xnj
80 9 1 > > = m n < B C þðco −cn Þ ∑ @α ij ∏ 1−x ij′ A 1−xij > j′¼1 i¼1 > : ;
ð7Þ
j′≠j
n
s:t:
∑ xij V kj
ð8Þ
i¼1
xij ; x ij′; j′≠j 2 f0; 1g
8i 2 N;
ð9Þ
495
where − x ij′ is 1 if cache j′ ≠ j, j′ a M holds object i and 0 otherwise. The optimal solution to (GS) provides the best response object holding strategy for any cache j a M, given the strategies ofneighbor caches j′ ≠ j captured by the non-linear expression 1−∏j′≠j 1−xij′ in objective function (7). Constraint (8) provides the cache capacity restrictions. Tawarmalani et al. [18] demonstrate that a pure equilibrium exists for the above caching game, and that there may be multiple pure equilibria that can have different network costs. The authors also construct an integer programming formulation for the caching game (GS), referred to as (BP), by using the Glover and Woolsey method [7] and introducing variables to linearize 1−∏j′≠j 1−xij′ . Solving (BP) with different objective functions allows us to identify multiple pure equilibria that we are interested in for comparing their performance in terms of network costs. A question that follows naturally is: what is the performance gap between the equilibrium that attains the minimum network costs (referred to hereafter as the best social equilibrium as it maximizes social welfare in the caching game) and that which has the maximum network cost (referred to hereafter as the worst social equilibrium)? In addition, how do both these equilibria solutions of the decentralized mechanism compare with the optimal social welfare outcome provided by the centralized framework? In the following section, algorithms are developed for implementing both the centralized and decentralized frameworks, and numerical computations are performed to evaluate differences between the two mechanisms. 4. Numerical computations We now compare the performance of the centralized and decentralized caching mechanisms in terms of network costs. As mentioned in Section 3.2, the centralized mechanism provides the optimal social welfare outcome. The optimal social welfare solution is obtained by solving Problem (LS) for a given set of parameters. As mentioned in Section 3.3, the decentralized mechanism can have multiple pure equilibria solutions. We specifically consider two pure equilibria outcomes: (a) best equilibrium for social welfare – this is the equilibrium with the least network cost, and (b) worst equilibrium for social welfare – the equilibrium with the highest network cost. The best and worst social equilibrium solutions are obtained by solving Problem (BP), with minimization and maximization objective functions, respectively, for a given set of parameters. We further discuss the motivation for considering different equilibria outcomes later using numerical example 1. The performance of the mechanisms are compared for the following set of model parameters: n = 15 objects, m = 3 caches, kj = 3 objects, cl = 1, cn = 2, co = 3, and ∑ni¼1 α ij ¼ 600; 8j 2 M. We are interested in observing model performance with varying object demand patterns, while keeping other parameters constant. The demand patterns are generated such that the total demand for objects at each cache is always 600, though the demand for individual objects at different cache locations may vary (depending on whether or not the caches are symmetric). In addition the cache capacities are kept constant. In this way we can simulate different aggregate patterns while ensuring that caches vary only in terms of distribution patterns and not in overall network demand and size characteristics. The mechanisms were modeled using GAMS version 21.7 and the corresponding mathematical programs were solved using CPLEX version 8.1. Note that we do not include any cost of coordination among the caches in the centralized network caching framework. This is because the coordination cost may be considered as a fixed cost, say F, for managing the caching decisions of the entire proxy-cache network. The incurred cost can be regarded as a one-time fixed charge (i.e., F), or as being proportional to the number of caches m in the network (i.e., mF). Irrespective of the type of fixed cost including it in the minimization objective function does not influence the solution of the mathematical program model. Therefore we ignore coordination costs and use the waiting time performance of the centralized framework as
496
C. Kumar / Decision Support Systems 46 (2009) 492–500
Table 1 Pure equilibria and optimal social welfare Demand
Pure equilibria
Optimal social welfare
Objects held by cache 1, cache 2, Objects held by cache 1, cache 2, and network costs and network costs [50,20,10,5,3]
{1,2},{1,2}, & 248; {1,2},{1,3} or {1,3},{1,2}, & 238 [50,20,15,10,3] {1,2},{1,3} or {1,3},{1,2}, & 295 [50,45,40,35,30] {1,2},{3,4} or {3,4},{1,2}, & 690; {1,3},{2,4} or {2,4},{1,3}, & 690; {1,4},{2,3} or {2,3},{1,4}, & 690 [80,70,30,5,3] {1,2},{1,2}, & 528
{1,2},{1,3} or {1,3},{1,2}, & 238 Identical to pure equilibria Identical to pure equilibria
{1,2},{1,3} or {1,3},{1,2}, & 508
a benchmark for comparison against the decentralized mechanisms. If we had included a (sufficiently high) coordination cost with the optimal solution, the centralized model may not always produce better results than other mechanisms and could not be used as a benchmark. For consistency in comparison, we also do not include any fixed cost of setting up a decentralized mechanism. 4.1. Symmetric network performance We first consider performance of our network caching mechanism for a network of symmetric caches (i.e., kj =k, ∀jaM and αij = αij′,j′ ≠ j = αi, ∀iaN, ∀jaM). This symmetric setup can be used as a benchmark for comparison of the benefits of network proxy caching under a general setting. Prior to discussing both caching implementations for larger problem sizes, we first illustrate some characteristics of the caching game and centralized model with the following symmetric example using some numerical values for costs and demand. 4.1.1. Example 1 Let us consider a symmetric case (two caches j and j′ are said to be symmetric when kj = kj′ = k, and for each object i, αij = αij′ = αi) with five objects, two caches, and each cache has a capacity of two objects. Let cl = 1, cn = 2, and co = 3. We now discuss the equilibrium characteristics and optimal social welfare solutions for this caching game, shown in Table 1, with varying demand patterns. Here [α1,…α5] represents network object demand and {i,i′i′ ≠ i} are the objects held by any cache. When the network object demand is [50,20,10,5,3] there is a pure equilibrium solution, where cache 1 holds objects {1,2} and cache 2 holds objects {1,3} (or vice versa), which is identical to the optimal social welfare outcome with a cost of 238. This is the best social equilibrium. In addition, there exists another pure equilibrium where both caches hold objects {1,2} with a higher network cost of 248. This is the worst social equilibrium. Note that the number of objects cached in the network can be different across the equilibria outcomes, as well as between some equilibria and social welfare solutions. Network demands of [50,20,15,10,3] and [50,45,40,35,30] yield two and six pure equilibria, respectively, that are identical to their optimal social welfare solutions. In these cases the best and worst social equilibria are also identical. When the demand is [80,70,30,5,3], there is a unique pure equilibrium where both caches hold objects {1,2} at a network cost of 528. This equilibrium indicates identical best and worst social equilibrium. Note that this pure equilibrium differs from the optimal social welfare solution where one cache holds objects {1,2} and the other holds objects {1,3}, with a cost of 508. From these examples we observe that the caching game has the following characteristics: (a) a pure equilibrium solution exists though it may not be unique, (b) there may be multiple equilibria solutions (e.g., best and worst social equilibrium) that differ in terms of network costs and caching decisions, and (c) it may be that no equilibrium solution attains the optimal centralized framework cost that is the least possible for the network. From the above examples we know that the caching game can have multiple pure equilibria, including best and worst social equilibrium. It
can be argued that if the cache administrators had the opportunity to communicate prior to making caching decisions, best equilibrium may emerge as a “focal point” as the network costs are lower in that outcome. Theory of focal points suggest that in some real-life situations players may be able to coordinate on a particular equilibrium by using information that is abstracted away by the strategic form [5]. However in our case this necessitates some sort of prior communication among the cache operators. For example, in Table 1 when the demand is [50,20,15,10,3], then {1,2},{1,3} is the best social equilibrium with 238 cost, that also happens to be optimal social welfare. Arguably cache operators may coordinate on that, rather than {1,2},{1,2} equilibrium with higher cost of 248. However this assumes some communication among the cache operators prior to decision making. In the present form of the caching game we do not consider such situations. In subsequent versions we can consider how best social, or any other, equilibrium may emerge as a focal point in the cache network under realistic scenarios. Further illustrations and characteristics for both centralized and decentralized caching implementations are discussed next for larger problem sizes using algorithms. Let the network demand for objects in the symmetric setting be denoted by Di = {α1,…,αi,…,αn}. We define the base case of the symmetric network demand pattern to be Ci = {c,…,c}, where all objects have the same demand c (i.e., αi = c, ∀iaN). We are interested in observing how model performance for both centralized and decentralized mechanisms vary as the demand patterns deviate from Ci to any generated Di. Let θ be the angle formed between demand vectors Di and Ci in an n dimensional space (refer to Fig. 2). The measure for demand variance in symmetric network setting is defined as tan θ [11]. Note that as Di diverges from Ci, there is a corresponding increase in tan θ as the orthogonal distance h between the demand vectors increases as well. The performance of the centralized and decentralized mechanisms under the symmetric setting can now be compared using the following algorithm. Algorithm Symmetric Network Performance Input: n = 15, m = 3, kj=3, cl = 1, cn = 2, co = 3, and ∑ni¼1 α i ¼ 600. Output: xij, ∀iaN, ∀jaM, Performance of Symmetric Network. Steps: 1. Let c ¼ ∑ni¼1 α i =n; Ci ¼ c; z ¼ 15 2. Let di be uniformly distributed between (−z, +z) n ′ 3. Scale di to d′i such ′ that ∑i¼1 di ¼ 0 ′ 4. wmin ¼ −c=max di ; wmax ¼ c=max −di 5. w = wmin 6. while w b wmax 7. Di = Ci + wd′i pffiffiffi 8. tan θ ¼ w=c n 9. Solve Problem (LS) and Problem (BP), using Di and the other parameter values, for obtaining centralized and decentralized mechanism performances, respectively 10. increment w 11. end while 12. Repeat Steps 2 through 11 for z = 20 and 25.
Fig. 2. Symmetric network demand variance — tan θ.
C. Kumar / Decision Support Systems 46 (2009) 492–500
The objective function values of the optimal solutions of (LS) and (BP) provide the mechanisms' performance in terms of network costs. The mechanism performance and tan θ were recorded for different values of demand deviation vector di. Fig. 3 plots network cost versus tan θ for three cases (a), (b), and (c) where di varies between (−15, +15), (−20, +20), and (−25, +25), respectively. As can be observed in all three cases the network performance of the centralized social welfare mechanism monotonically increases with higher absolute values of
497
tan θ (note that lower network costs translate to better network performance). This is because as |tan θ| increases, and there is greater variance in the object demands, the caches benefit to a greater degree by holding and sharing different objects. Since greater demand variance leads to better network performance it can be expected that with increasing range of di the centralized mechanism performance should improve. This can be verified by observing that the slope of the centralized mechanism line progressively increases across
Fig. 3. Symmetric Network Performance. (a) Symmetric demand deviation di range =(− 15,+15). (b) Symmetric demand deviation di range =(−20,+ 20). (c) Symmetric demand deviation di range =(− 25,+ 25).
498
C. Kumar / Decision Support Systems 46 (2009) 492–500
cases (a), (b), and (c) in Fig. 3. Since the centralized mechanism identifies solutions that result in minimum network costs its performance is always better than or equal to that of the decentralized mechanisms. The decentralized mechanism performance for both best and worst equilibrium also generally tends to improve with increasing |tan θ| for all three cases. However a key difference is that in the decentralized mechanisms there can be deviations at some points where higher |tan θ| can actually lead to worse performance. An example of this is in case (a) of Fig. 3 where the best equilibrium network cost increases from 3544 to 3567 when tan θ increases from 0.0121 to 0.0128. In Fig. 3 case (b) worst equilibrium cost increases from 3611 to 3648 when tan θ increases from 0.0076 to 0.0081. Another example is case (c) of Fig. 3 with tan θ increase from 0.007 to 0.0075 best equilibrium cost increases from 3552 to 3578. These deviations occur because in the decentralized caching game the nodes behave selfishly. Therefore in some cases the best response moves by the caches result in pure equilibria outcomes where some of nodes are better off than others at the expense of overall network performance. Since individual nodes act selfishly the network costs for the decentralized mechanism may increase with increasing variance. Other than the deviations mentioned above, the mechanisms generally tend to perform better with increasing demand variance in the symmetric network. 4.2. Asymmetric network performance We now consider the performance of our network caching mechanisms in a general setting, referred to as an asymmetric network, where the caches j can have different object demands and capacities. This scenario is similar to a real world setting where proxy nodes in the network may have differing demand patterns. Let Aj = (α1j,…,αnj) denote the aggregate demand faced by each cache j, and K = {k 1,…,k m } denote the cache capacities. For comparison purposes, Aj, ∀jaM sin the asymmetric setting is generated such that the demands diverge from corresponding Di in the symmetric setting. As before the total object demand for every cache is set as 600. These network demand characteristics are generated by using the transportation problem framework, which is a classical combinatorial optimization problem, described as follows [1,6]. Find the minimum cost to fulfill the product demand of n demand locations using available capacities of m supply locations, given the variable costs of transporting product units from supply to demand locations. We utilize this framework by setting constraints for object demand as ∑ni¼1 α ij ¼ 600; 8j 2 M and ∑m j¼1 α ij ¼ Di ; 8i 2 N. In addition, on supply side k j = 3, ∀jaM to ensure cache sizes are same across both settings. The relative costs of satisfying object requests from neighbor caches or origin server are also identical for the two settings. Using this we ensure that asymmetric and symmetric network patterns have a basis for comparison. Let the total sample variance, which is an instance of generalized variance, be the measure for demand variance in the asymmetric network setting [11]. The total sample variance, var, is defined as the sum of the lengths of the deviation between Aj, ∀jaM and mean demand vector Ci. The performance of centralized and decentralized mechanisms under the asymmetric setting can now be compared using the following algorithm. Algorithm Asymmetric Network Performance Input: n = 15, m = 3, kj = 3, cl = 1, cn = 2, co = 3, ∑ni¼1 α ij ¼ 600; 8j 2 M, and Di, ∀iaN. Output: xij, ∀iaN, ∀jaM, Performance of Asymmetric Network. Steps: 1. Let c ¼ ∑ni¼1 α ij =n; Ci ¼ c 2. Generate Aj, ∀jaM using the transportation problem framework such that ∑ni¼1 α ij ¼ 600; 8j 2 M and ∑m j¼1 α ij ¼ Di ; 8i 2 N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 m 3. vi ¼ ∑j¼1 α ij −Ci
4. var ¼ ∑ni¼1 vi 5. Solve Problem (LS) and Problem (BP) using Aj, ∀jaM and the other parameter values, for obtaining centralized and decentralized mechanism performances in asymmetric network setting, respectively. 6. Repeat Steps 2 through 5 for different values of Aj, ∀jaM that diverge from specific symmetric Di values. The network costs of the optimal solutions of (LS) and (BP), and corresponding var, were recorded for different values of Aj, ∀jaM for measuring the mechanisms' performance under the asymmetric setting. Fig. 4 plots network costs versus var for three asymmetric cases (a), (b), and (c) that correspond to symmetric Di with θ values 24.08, 28.91, and 31.31, respectively. As can be observed in all three cases, there is a general trend of improved performance with increasing var for the centralized mechanism. However a key difference in the asymmetric setting is that the centralized mechanism performance does not increase monotonically with var. This is because unlike tan θ in the symmetric setting var is an approximate measure of demand variance. The trend of better performance with increasing var is less pronounced for the decentralized mechanisms, and there are more deviations, because the caches behave selfishly. As before the centralized mechanism performs no worse than the decentralized mechanisms. However, compared to symmetric setting, there is a greater divergence between the performance of the decentralized worst equilibrium mechanism and that of the other two. Examples can be observed in case (a) and case (c) of Fig. 4 where the worst equilibrium almost never achieves the costs of the other mechanisms. This occurs due to the greater variance in object demands in the asymmetric setting that leads to increased benefits of sharing objects in the network. Since the centralized and best equilibrium mechanisms allow caches to share objects more effectively they perform much better than the worst equilibrium. It is also for the same reason that centralized outperforms the best equilibrium mechanism to a greater extent here. Therefore we can conclude that due to the increased demand variance in the asymmetric setting, the mechanisms benefit to a greater degree by sharing objects. This implies that in practical scenarios, where there may be differing object demand patterns at nodes, there are increased benefits in deploying proxy cache networks. Note that we use integer programming model (BP) for the decentralized mechanism in both Algorithm Symmetric Network and Algorithm Asymmetric Network. Problem (BP) is among the class of NP-Hard problems [6]. This means that the optimal solutions to the models cannot be solved efficiently for very large problem sizes. However in cases where problem size is very large, alternative solution approaches, that aim to find good results quickly, can be developed. Examples include heuristic procedures and dynamic programming [1,6]. An effective proxy caching mechanism is beneficial for all Internet users, due to reduced network traffic, load on web servers, and user delays [3,13]. The difference can be immediately apparent to an end user where a cached website may seem to load instantaneously compared to several seconds delay in the alternative case. In addition, Internet companies can conserve investment in server farms around the world for replicating web content to improve load speeds (www.web-caching.com). A network of proxy caches can further significantly reduce user delays as illustrated by IRCache (www.ircache.net). If a requested object is cached at a node in the network close to the user, then the waiting time in fractions of seconds can be 5 times less than the alternative scenario. At the aggregate level the reduction in waiting times for all user requests in a network of proxy caches can be quite significant. We have shown with our numerical computations, under different demand and network characteristics, that the performance of proxy caching networks can be improved when nodes also consider objects held by their neighbors. Given our results, we believe that there is significant
C. Kumar / Decision Support Systems 46 (2009) 492–500
499
Fig. 4. Asymmetric Network Performance. (a) Asymmetric demand diverging from symmetric θ = 24.08. (b) Asymmetric demand diverging from symmetric θ = 28.91. (c) Asymmetric demand diverging from symmetric θ = 31.31.
potential for deploying proxy cache networks in order to reduce the delays experienced by web users due to congestion on the Internet. 5. Conclusions Proxy caching is widely used by computer network administrators, businesses, and technology providers to reduce user delays on the increasingly congested Internet (www.web-caching.com). Effective
proxy caching mechanisms are useful for reducing network traffic, load on servers, and the average delays experienced by web users [3,13,19]. Specifically we focus on a framework of a network of proxy caches. In a typical proxy cache network implementation, such as IRCache (www.ircache.net), each node in the network makes its own caching decisions based on the request patterns it observes. Current network caching protocols primarily focus on collaboration by sharing cache contents, and the caching decisions do not effectively take into
500
C. Kumar / Decision Support Systems 46 (2009) 492–500
account objects already held by neighboring nodes [17]. A few studies have investigated caching policies where nodes do consider objects held by their neighbors under different coordination scenarios in the network [2,18]. The network coordination scenarios include centralized and decentralized frameworks. An example of a centralized implementation is when a single firm that owns a number of caches has control over the network caching decisions. In a decentralized framework the caches operate in a competitive environment and do not coordinate their actions. This paper's primary contribution is to use simulated data to perform numerical analyses of the caching policies investigated in Tawarmalani et al. [18]. We develop algorithms for implementing a network of caches under both centralized and decentralized frameworks. The caching implementations are also compared and contrasted using numerical computations. The results demonstrate that the performance of proxy caching networks can be improved when nodes also consider objects held by their neighbors. We show that the centralized mechanism always performs no worse than decentralized mechanisms, and it can serve as a benchmark for other caching approaches. We also demonstrate that the mechanisms' improve performance with greater object demand variance among the proxy caches. This implies that in practical scenarios, where there may be differing object demand patterns at nodes, there are increased benefits in deploying proxy cache networks. The performance results should provide useful directions for computer network administrators to develop proxy caching implementations that are suited for differing network and demand characteristics. There are a number of interesting areas for future research. Thus far we have assumed that demand patterns are known a priori in our mechanisms. An area of future research could be to relax this assumption and develop models where demand patterns are not known, including cases involving requests for dynamic content. It would be also be interesting to determine what could be realistic scenarios for the best social equilibrium to emerge as a focal point among multiple equilibria for the caching game. Another research area could be to compare the performance of our mechanisms against that of existing network cache implementations such as IRCache, as well as caching policies such as LRU, using actual proxy trace datasets. Finally, we can also investigate alternative solution approaches for our caching models, such as dynamic programming and heuristic procedures, which aim to find good solutions quickly [1,6]. To the best of our knowledge our research is the first to evaluate the performance of implementations of capacitated proxy cache networks. There is a significant potential for deploying proxy cache networks in order to reduce the delays experienced by web users due to congestion on the Internet. Therefore we believe that this study contributes to network caching research that is beneficial for Internet users. Acknowledgements We thank Mohit Tawarmalani, Prabuddha De, Karthik Kannan, seminar participants of Purdue University, and the 2004 International
Conference on Information Systems (ICIS) Doctoral Consortium for valuable comments and contributions on this study. References [1] R.K. Ahuja, T.L. Magnanti, J.B. Orlin, Network flows, Theory, Algorithms, and Applications, Prentice Hall, NJ, Englewoods Cliffs, 1993. [2] B.G. Chun, H. Chauduri, H. Wee, M. Barreno, C.H. Papadimitrou, J. Kubiatowicz, Selfish caching in distributed systems: a game-theoritic analysis, Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, 2004, pp. 21–30. [3] A. Datta, K. Dutta, H. Thomas, D. VanderMeer, World wide wait: a study of internet scalability and cache-based approaches to alleviate it, Management Science 49 (10) (2003) 1425–1444. [4] O. Ercetin, L. Tassiulas, Market-based resource allocation for content delivery in the Internet, IEEE Transactions on Computers 52 (12) (2003) 1573–1585. [5] D. Fudenberg, J. Tirole, Game Theory, MIT Press, Boston, 1991. [6] M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, New York, 1979. [7] F. Glover, E. Woolsey, Converting a 0–1 polynomial programming problem to a 0–1 linear program, Operations Research 22 (1974) 180–182. [8] S. Hadjiefthymiades, Y. Georgiadis, L. Merakos, A game theoritic approach to web caching, NETWORKING 2004: Proceedings of the Third International IFIP-TC6 Networking Conference, 2004. [9] K. Hosanagar, Y. Tan, Optimal duplication in cooperative web caching, Proceedings of the 13th Workshop on Information Technology and Systems (WITS), 2004. [10] K. Hosanagar, R. Krishnan, J. Chuang, V. Choudhary, Pricing and resource allocation in caching services with multiple levels of QoS, Management Science 51 (12) (2005) 1844–1859. [11] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, Prentice Hall, Englewood Cliffs, NJ, 2002. [12] C. Kumar, Implementation and Evaluation of Proxy Cache Networks, California State University, San Marcos, 2007 Working Paper. [13] C. Kumar, J.B. Norris, A new approach for a proxy-level web caching mechanism, Decision Support Systems (2008). doi:10.1016/j.dss.2008.05.001. [14] V.S. Mookherjee, Y. Tan, Analysis of a least recently used cache management policy for web browsers, Operations Research 50 (2) (2002) 345–357. [15] C. Park, J. Feigenbaum, Incentive Compatible Web Caching, Yale University, 2001 Working Paper. [16] S. Podlipnig, L. Boszormenyi, A survey of web cache replacement strategies, ACM Computing Surveys 35 (4) (2003) 374–398. [17] L. Ramaswamy, L. Liu, An expiration age-based document placement scheme for cooperative web caching, IEEE Transactions on Knowledge and Data Engineering 16 (2004) 585–600. [18] M. Tawarmalani, K. Karthik, P. De, Allocating Objects in a Network of Caches: Centralized and Decentralized Analyses, Purdue University, 2007 Working Paper. [19] E.F. Watson, Y. Shi, Y. Chen, A user-access model-driven approach to proxy cache performance analysis, Decision Support Systems 25 (1999) 309–338. [20] D. Zeng, F. Wang, M. Liu, Efficient web content delivery using proxy caching techniques, IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews 34 (3) (2004) 270–280. Chetan Kumar is an Assistant Professor in the Department of Information Systems and Operations Management at the College of Business Administration, California State University San Marcos. He received his PhD from the Krannert School of Management, Purdue University. His research interests include pricing and optimization mechanisms for managing computer networks, caching mechanisms, peer-to-peer networks, ecommerce mechanisms, web analytics, and IS strategy for firms. He has presented his research at INFORMS, WEB, WISE, ICIS Doctoral Consortium, and AMCIS Doctoral Consortium conferences. His research has been published in DSS journal, and he has served as a reviewer for EJOR, JMIS, DSS, and JECR journals.