Off-line algorithms for the list update problem

Off-line algorithms for the list update problem

Information ~gcxging ELSEVIER Information Processing Letters 60 ( 1996) 7.5-80 Off-line algorithms for the list update problem Nick Reingold a*l, Je...

637KB Sizes 0 Downloads 47 Views

Information ~gcxging ELSEVIER

Information Processing Letters 60 ( 1996) 7.5-80

Off-line algorithms for the list update problem Nick Reingold a*l, Jeffery Westbrook b,* a AT&T Bell Laboratories, RO. Box 636. Murray Hill, NJ 07974-0636. USA b Department of Computer Science. Yale University, New Haven, CT 06520-2158,

USA

Received 27 February 1996; revised 3 September 1996 Communicated

by V. Ramachandran

Abstract Optimum off-line algorithms for the list update problem are investigated. The list update problem involves implementing a dictionary of items as a linear list. Several characterizations of optimum algorithms are given; these lead to optimum algorithm which runs in time 02”( n - 1) !m, where n is the length of the list and m is the number of requests. The previous best algorithm, an adaptation of a more general algorithm due to Manasse et al. (1988), runs in time O(n!)*m. Keywords: Sequential

search; List updat; On-line algorithms;

Off-line algorithms;

1. Introduction A dictionary is an abstract data type that stores a collection of keyed items and supports the operations access, insert, and delete. In the sequential search or list update problem, a dictionary is implemented as simple linear list, either stored as a linked collection of items or as an array. An access is done by starting at the front of the list and examining each succeeding item until either finding the item desired or reaching the end of the list and reporting the item not present. To insert a new item, the entire list is scanned to ensure the item is not already present, and then the new item is inserted at the back of the list. An item is deleted by first searching for the item and then removing it.

* Corresponding author. Email: [email protected]. Research partially supported by NSF grant CCR-9009753. 1 Corresponding author. Research partially suppotted by NSF grants CCR-8808949 and CCR-8958528. 0020-0190/96/$12.00 Copyright PII SOO20-0190(96)00144-5

@ 1996 Published

Competitive

analysis;

Analysis

of algorithms

At various times it may be advantageous to reorder the list by exchanging pairs of consecutive items, for example, moving a commonly requested item closer to the front. A list update algorithm is an algorithm that is invoked by the dictionary after each operation and determines what rearrangements to perform in order to reduce the total time to process the requests. An on-line list update algorithm must decide what rearrangements to perform knowing only the sequence of requests up to the current time, without knowing the future. On-line algorithms for list update have been investigated by many researchers [l-7,10,12-14,16]. The list update problem is further distinguished by being one of the two problems in Sleator and Tarjan’s seminal paper on competitive analysis [ 141. * In this paper we investigate off-line algorithms for list update. An off-line algorithm determines rear2 An on-line algorithm A is called c-competitive if on all request sequences g, the cost incurred by A is no more than c times the optimum cost to process the sequence 191.

by Elsevier Science B.V. All rights reserved.

76

N. Reingold, J. Westbrook/lnformation

rangements knowing the entire sequence of requests, past and future. An optimum off-line algorithm generates rearrangements that minimize the total cost of processing the request sequence. (No on-line algorithm can always achieve the minimum cost.) Besides being interesting in its own right, study of the offline problem has two benefits: first, an algorithm for the off-line problem is useful in empirical analyses of on-line algorithms, and second, an understanding of the off-line problem may help to improve on-line algorithms. To our knowledge, no previous papers have investigated off-line list update algorithms. A naive adaptation of an off-line algorithm of Manasse, McGeoch, and Sleator [ 93 for task systems yields an off-line algorithm for list update which runs in time @(m(n!)*), and space O(n!), where m is the number of requests and it is the number of items in the list. The main result of this paper is an improved algorithm running in worst-case time @(m2”(n - I)!) and space O(n!). We also give several characterizations of off-line algorithms. The algorithm developed in this paper has been implemented and used in research into the on-line problem.

2. The list update model and notation An instance of the list update problem consists of an initial list & and a sequence of requests cr = . . , r,,,), where (il, i2,. . .) denotes an ordered (n.r2,. sequence. A request is either an access to an item, an insertion of an item, or a deletion of an item. An exchange is a transposition of two adjacent items in the list. We identify the items in &i by the numbers from 1 to n, and identify items subsequently inserted by according to order of insertion. n+ l,n+2,... In the standard model defined by Sleator and Tarjan [ 141, accessing or deleting the ith item from the front of the list costs i. Inserting a new item costs n + 1, where n is the length of the list prior to the insertion; the new item is inserted at the end of the list. Immediately following an access or insertion, the accessed or inserted item can be moved at no cost to any position closer to the front of the list by a sequence of exchanges. These are calledfree exchanges. After an operation and its associated free exchanges (if any) have been performed, the list may be further rearranged by

Processing Letters 60 (1996)

75-80

any number of paid exchanges. Each paid exchange costs 1. Paid exchanges may also be done before the first request arrives. We also consider a generalization of the standard model called the paid exchange model [ 12 1, in which there are no free exchanges and a paid exchange has cost d 2 1. Accesses, insertions, and deletions have the same cost as in the standard model. (This might model a linear list on a tape drive.) The paid exchange model with exchange cost d is denoted Pd. Given request sequence CT = (~1,. . . , r,,), a service sequence, w, is a series of rearrangements @I, E2,. . . , E,,) in which Ei is the sequence of exchanges to be performed between servicing ri-1 and ri. (El rearranges the initial list Le.) Define cost( LQ, CT,w, ) to be the sum of all costs due to paid exchanges, accesses, insertions, and deletions when w is used to service n. The minimum cost to service CT is denoted opt (a). In general there are several service sequences that achieve the optimum cost. An optimum off-line algorithm takes g as input and computes opt(a) and a service sequence that realizes this cost. Sleator and Tarjan claim (Theorem 3 of [ 141) that in the standard model, there is minimum cost service sequence that contains only free exchanges. Unfortunately, this claim is erroneous: paid exchanges are sometimes necessary, as the following counterexample shows: if the initial list is (1,2,3) and the request sequence is (3,2,2,3), then any service sequence containing only free exchanges must incur a cost of at least 9. On the other hand, two paid exchanges can make the list look like (2,3, l), and then the accesses can be performed at an additional cost of only 6, for a total cost of 8. In contrast, for sequences of accesses only in the standard model, free exchanges are not necessary. Theorem 1. For any sequence of accesses u, there exists a service sequence containing no free exchanges that achieves the minimum cost to service CT in the standard model. Proof. Let o be an arbitrary service sequence. We modify o to eliminate free exchanges. Suppose that on some request to item X, w uses free exchanges to move item x forward 1 places. Then G will use paid exchanges to move item x forward I places just before the request, and then make no free exchanges after the

N. Reingold,J. Westbrookllnformation ProcessingLetters60 (1996) 75-80

request. & costs I extra for paid exchanges forward, but saves I on the access. Cl

to move x

In general, if a list update algorithm is charged f(i) to access the ith item, then the theorem is true if the cost of exchanging the ith and the (i + 1) st items is f(i+ 1) -f(i).

II

ates all permutations in 0 (n!) time. This algorithm can be easily extended to output the index and number of inversions: after each transposition of adjacent elements, the new index is calculated from the old index as described above, and it is determined whether the transposition creates or removes an inversion. 4

4. Subset transfers 3. Preliminaries It is convenient to regard a list L as a permutation of the numbers 1 to n. We give some useful definitions and results about permutations. (General references for this section are [ 11, Chapter 51 and [ 151.) Let Ll and L2 be two different permutations. An inversion between Ll and L:! is a pair of items (x, y), such that x is in front of y in one list but behind y in the other. We write inv( L1, L2) for the set of inversions between L1 and L2. The inversion table of a permutation L is a sequence (49a2, * . . , a,), where ai is the number of elements less than i and to its right in L. 3 An inversion table uniquely defines a permutation L and can be used to encode L by an integer n( L) E [O..(n!-I)] by letting n(L) =~~1=2u~(j-1)!.Iftwoadjacentitems (x,y) are exchanged in L, giving (y, X) in permutation L’, then n( L’) can be derived from n(L) by adding (y 1) ! if x < y, or subtracting (x - 1) ! otherwise. The minimum number of exchanges of adjacent elements required to sort L is equal to . . , n) ) I. This number is achieved by linv(L,(1,2,. applying insertion sort to L, and it cannot be improved because any exchange removes at most one inversion. Similarly, an arbitrary list L1 can be converted to another list L2 with precisely linv( Ll, L2) 1 exchanges, by renumbering the elements so that L2 is sorted and then sorting the renumbered Ll. It will be useful to have a routine that generates all permutations of (1,2, . . . , n) and for each permutation L calculates n(L) and linv( L, (1,2,. . . , n)) I. Reingold et al. [ 1 I] describe an algorithm due independently to Trotter [ 171 and Johnson [ 81 in which each successive permutation is generated from its predecessor by a single exchange. The algorithm gener3 Our definitions of inversion table and n(L) differ slightly from those in [ 11,151 but are more convenient to program.

To derive our off-line algorithm, we first restrict our attention to request sequences containing only accesses. Between each pair of accesses there are, a priori, n! ways to rearrange the list. In this section we show that in computing opt( a) it is possible to restrict our attention to at most 2” of these rearrangements. Suppose item x is accessed. A subset transfer is a sequence of paid exchanges made just prior to the access that moves a subset of the items preceding x to just behind x in such a way that the ordering of items in the subset among themselves remains unchanged. Theorem 2. Let CT = (q,r2,. . . ,r,,) be any sequence of accesses. There is an optimal service sequence in which any rearrangement is a subset transfer. Proof. The proof is by induction on the number of requests. The basis, when the number of requests is zero, is trivially true. For the inductive step, let w = (El,. . . , Enl) be an arbitrary service sequence containing only paid exchanges (cf. Theorem 1). We show below that w can be converted into a new service sequence 0’ = (E’, , . . . , EhJ such that rearrangement E’, is a subset transfer and cost( to, u, o’, ) < cost( Lo, (T, w, ). To complete the inductive step, we apply the inductive hypothesis to the problem instance consisting of requests (r-2,. . . , r,) and initial list L’,, where L: is the list that results from applying E; to &. Now we show how to convert w to 6~‘. Suppose that request rI is an access to item X. Let L1 be the list after applying El to Lo. Let R be the items that are in front of x in both Lo and Ll ; let S be the items that are in front of x in & but behind x in LI; and let T be the 4 In our software we use a somewhat simpler recursive algorithm that generates permutations by transposing adjacent elements in some previously generated permutation, but not necessarily the most recently generated one.

78

N. Reingold, J. Westbrook/Informution Processing Letters 60 (1996) 75-80

items that are behind x in & and in front of x in L1. Let r, s, and t be the sizes of R, S, and T, respectively. Let po be the number of exchanges in El. Then the total cost of El plus rl is po + r + t + 1. Construct w’ as follows.

requests of u and end with the list in order L where L is one of the n! permutations of LQ. Let MOVE( L’, L) be the minimum number of exchanges to move from list L’ to list L, and let POS(r, L) be the position of r in L. Then DYN( L, 0) = MOVE( LQ, L), and

1 E{ consists of the minimal number of paid exchanges, say ~1, needed to perform a subset transfer on the items in S. Recall L{ denotes the result of applying E{ to Lo. 2. Ei consists of two consecutive subsequences of exchanges, E, and Eb. Subsequence E, consists of the minimal number of paid exchanges, say ~2, to convert Li into list LI. Subsequence Eb is simply E2. 3. For 2 < i 6 m, Ef = Ei.

DYN(L,i)

The total cost of executing E{, request rl, and E, is p1 +pz + r + 1. We will show that po 3 p1 +p2, which implies that cosf( Lo, (T, o’, ) < cost( Lo, CT,w, ). Recall that the minimal number of paid exchanges required to convert list L1 to L2 is equal to inv( L1, L2). Any inversion between & and L{ must involve an element of S and either x or an element of R. No inversion between Li and L1 can be of this form, so inv(&, Li) fl inv( Li, L1) = 0. It follows from this that inv(b, L;) and inv( L’,, L1) partition inv( LQ, LI ). Therefore, PO

2 lidlo,

LI

>I

= Jinv(Lo,Li)I

+

=p1

0

+p2.

(inv(L:,LI)I

Theorem 2 holds in the standard model and in any Pd model, since the above proof refers only to paid exchanges.

5. Implementation

of an optimum

algorithm

Recall we assume that the request sequence, U. contains only accesses, that n is the length of the list and m is the number of requests. In [9], Manasse et al. discuss task systems, of which list update is a special case. They give a dynamic programming algorithm for finding the minimal cost to service a request sequence in any task system. When applied to list update this works as follows. Let DYN( L, i) be the optimum cost to service the first i

=mi,n{DYN(L’,i-

1) +POS(ri,L’) + MOVE(L’,

L)}

for 0 < i < m. Finally, opt(a) is the minimum over all L of DYN(L,m). As usual for dynamic programming, an m x n! table is used. The values for DYN( L, -) are stored in row n(L), and the columns are filled in from i = 0 to m. If the optimum service sequence is not needed, but only opt( (T), the space required is 0 (n!) .In this naive application of the algorithm of Manasse et al., the computation of DYN( L, i) for a fixed L and i requires n! probes into the table. Hence Q (n!) 2 time is needed to process the ith request. (This running time can be achieved, although we will not elaborate further.) By using the subset transfer theorem (Theorem 2) we can substantially reduce the number of probes. Consider the ith access in the request sequence. There are (n - 1) ! permutations of the list in which the accessed item is at position k. If the item is at position k there are 2k-* subsets that can be moved. Summing over all k, the number of probes into the table to process the ith request is 0(2”(n - l)!). To check these probes efficiently, we use a procedure subset(L, m, p> which solves the following problem: given a list L, and two indexes m, p, generate all lists L’ such that L can be produced from L’ by a subset transfer, so that the requested item ends up at position p in L and the subset transfer only involves elements in positions m or later in L’. When invoked as subset (L, 1, p) this routine gives the set of lists whose values of DYN( -, i - 1) are needed to compute DYN( L, i), given the ith requested item is at position p in L. Procedure subset can be implemented recursively as follows. First, output the list L (corresponding to a null subset transfer). Then, take the element x at position p-l- 1 in L, and repeatedly exchange it forward one place. After each exchange, call subset (L’ , j+i, p+l) , where L’ is list L with x moved forward to position j. This can be implemented in 0( 1) time per output permutation L’, and the cost of the subset trans-

IV. Reingold, .I. Westbrook/lnformation Processing Leiters 60 (1996) 75-80

fer and the index of n( L’) can be calculated at the same time, with 0( 1) overhead. The complete implementation is as follows. For request i, the set of all permutations is generated, together with the corresponding indexes, using the method described in Section 3. For each permutais invoked to generate each tion L, procedure subset permutation L’ that must be examined, together with n( L’) and the value of MOVE( L’, L) . These are then used to update DYN( L, i) . Theorem 3.

There is an algorithm which computes the optimum costfor servicing m accesses to an n item list which runs in time O(m2”( n - 1) !>.

At n = 7, for example, there are approximately ninety-two thousand probes of the DYN( L, i) table per request using subset transfers, while the naive application of the algorithm of Manasse et al. [9] does approximately twenty-five million probes per request. 5. I. Insertions and deletions An optimum off-line algorithm that uses the subset transfer theorem for sequences of accesses can be extended to one for sequences of accesses and insertions. In the standard model, suppose that L is the initial list and u is the request sequence. Construct a new initial list L’ given by L followed by all the items to be inserted during (+, in order of insertion. In u replace each insertion of an item by a request to that item, giving request sequence a’. The application of the subset transfer algorithm to the problem (L’, cr’) generates a service sequence W’ from which a service sequence w for the original problem (L, a) can be derived. Sequence w is the same as W’ except for moves prior to insertions. Let r E a be an insertion of item X, and let r’E a’ be the corresponding (first) access to X. The subset transfer algorithm on (L’, u’) will not involve x in an exchange until just before x is first accessed. To service insertion r E a, the subset transfer that occurs just before r’ f cr’ is performed, except for those exchanges involving X. Then, after x is inserted, it is moved with free exchanges to just before the set of items that were involved in the subset transfer. This procedure ensures

cost(L,a,w,)

=COSt(L’,a’,O’,).

19

Conversely, any service sequence for (L, 0) is a service sequence of identical cost for (L’, a’). Since the items inserted in L never leave their initial positions in L’until first accessed, the cost of the first access in (+I is equal to the insertion cost in cr. Since costs are preserved in both directions, an optimal algorithm for (L’, u’) gives an optimal algorithm for (L, a). The subset transfer algorithm can also be extended to handle insertions in the Pd model. The new list L’is constructed as above for the standard model, while u’ is obtained from u by simply removing all insertions. A similar argument to that above shows that opt( a’) = i, where n is the initial size of L and opt(u) + &+, k the final size. Handling deletions is somewhat more complicated. Our dynamic programming algorithm can be extended to deletions in two ways. One possibility is to simply leave deleted elements in the list, but modify the cost function so that they add nothing to the cost of either accesses or subset transfers. A second possibility is to modify the algorithm to simulate a deletion by moving the deleted element to the back of the list, at zero cost. This saves some space, since deleted elements may be reused for subsequently inserted items. In either case, the optimum cost computed is the same.

6. Off-line algorithms and lookahead A list update algorithm is said to have lookaheadk(n) if it makes each decision knowing only the next k(n) unprocessed requests, where k(n) is some function independent of the request sequence, but perhaps depending on the initial list size. An on-line algorithm for list update therefore has lookahead-0, while the optimum algorithm presented above has unbounded lookahead, since it examines the entire sequence to determine the optimum first move. It is interesting to ask whether there is an algorithm with bounded lookahead that computes an optimal sequence of moves. Proposition 4. The following lookahead- 1 algorithm for maintaining a 2 item list achieves the optimum cost on every request sequence: After each request, move the requested item to the front via free exchanges if the next request is also to that item. Otherwise do nothing.

80

N. Reingold, J. Westbrook/lnformation

Proof. We use induction on the length of the request sequence. Let A(g) be the cost of servicing request sequence u according to the rule above. Let LA(~) denote the final list produced by A when run on C. Let OPTalL be the minimum cost to service c if the initial list is L. Observe that OPTcrJ(l,2) < 1+ OPTg((2, l), and vice versa, for all g. Write u = aba’, where ab is one of 11, 12,21,22. A simple case analysis shows that A(ab) + OPTuILA (ab) < A’( ab) + OPTal LA, (ab) for any algorithm A’. Then apply the inductive hypothesis. Cl

Processing

References [ 11 S. Albers, Improved randomized [2 ]

[ 31 [ 41

[S]

Theorem 5. For lists of size 3 or greater, no deterministic algorithm with bounded lookahead always computes an optimum sequence of moves.

[ 61 [7]

Proof. Suppose A is a lookahead-k algorithm for maintaining a 3 item list. Let & = (1,2,3), and consider request sequences (+t = 3 lj2 and ~72 = 31j3, where j 2 max(2, k}. To achieve the optimum cost on ~1, A must do nothing, while to be optimal on ~72 A must exchange the last two items with either a paid or free exchange just before or just after the first access, respectively. Since A is deterministic with only k lookahead, the moves it makes before servicing the first request to 1 are the same whether it receives (+I or (~2. Hence A cannot be optimum on both ut and u2. 0

[8] [9]

[IO] [ 1 l]

[I21

]131 [I41

7. Remarks We leave open the problem of finding an algorithm that is polynomial in n and m, or proving that the problem is NP-hard. Another interesting question is how well the problem can be approximated. For instance, is there a polynomial time approximation scheme (PIAS) for the list update problem?

Acknowledgements We thank Bill Gasarch and the anonymous for their helpful suggestions.

referees

Letters 60 (1996) 75-80

on-line algorithms for the list update problem, in: Proc. 6th ACM-SIAM Symp. on Discrete Algorithms ( 1995) 412-419. J.L. Bentley and C.C. McGeoch, Amortized analyses of selforganizing sequential search heuristics, Comm. ACM 28 (4) ( 1985) 404-411. P.J. Burville and J.F.C. Kingman, On a model for storage and search, J. Appl. Probab. 10 ( 1973) 697-701. F.R.K. Chung, D.J. Hajeia and PD. Seymour, Self-organizing sequental search and Hilbert’s inequality, in: Proc. 17th ACM Symp. on Theory of Computing (1985) 217-223. G. Gonnet, J.I. Munro and H. Suwanda, Towards selforganizing linear search, in: Proc. 19th IEEE Symp. on Foundations of Computer Science (1979) 169-174. W.J. Hendricks, An account of self-organizing systems, SIAM J. Comput. 5 (4) (1976) 115-723. S. Irani, Two results on the list update problem, Inform. Process. I.&t. 38 ( 1991) 301-306. S.M. Johnson, Generation of permutations by adjacent transposition, Math. Camp. 17 (1963) 282-285. M. Manasse, L.A. McGeoch and D. Sleator, Competitive algorithms for on-line problems, in: Proc. 20th ACM Symp. on Theory of Computing ( 1988) 322-333, J. McCabe, On serial files with relocatable records, Oper. Res. 13 (1965) 609-618. E.M. Reingold, J. Nievergelt and N. Deo, Combinatorial Algorithms: Theory and Practice (Prentice-Hall, Englewood Cliffs, NJ, 1977). N. Reingold, J. Westbrook and D.D. Sleator, Randomized algorithms for the list update problem, Algorithmica 11 (1994) 15-32. R. Rivest, On self-organizing sequential search heuristics, Comm. ACM 19 (2) ( 1976) 63-67. D.D. Sleator and R.E. Tarjan, Amortized efficiency of list update and paging rules, Comm. ACM 28 (2) ( 1985) 202208. Combinatorics D. Stanton and D. White, Constructive (Springer, New York, 1986). B. Teia, A lower bound for randomized list update algorithms, Inform. Process. Lett. 47 (1993) 5-9. HF. Trotter, Algorithm 115, Comm. ACM 5 ( 1962) 434435.