An improved list-searching algorithm

An improved list-searching algorithm

Volume 15, Number 1 INFORMATION 19 August 1982 PROCESSING LETTERS AN IMPROVED LIST-SEARCHING ALGORITHM Leon S. LEVY Bell Laboratories, Whippay, N...

307KB Sizes 17 Downloads 430 Views

Volume 15, Number 1

INFORMATION

19 August 1982

PROCESSING LETTERS

AN IMPROVED LIST-SEARCHING ALGORITHM

Leon S. LEVY Bell Laboratories, Whippay, NJ 07981, U.S.A. Received 5 June 198 1; revised version received November 198 1

Keywords: List searching, linear list, detecting cycles

1. Introduction

Of course, in a typical implementation the items etc. are indices (or addresses) of elements, and the values searched for would be in value( head ), value(s( head )), etc.

head, s(heud),

A new algorithm is described for finding a given element in a linear list or detecting cycles in that list. It is always as good as or better than the best generally known algorithm when measured in the number of elements accessed. Abstractly, this problem can be stated as follows: Given is a linear list in which each element x has a unique successor s(x), except possibly for the last element which may be a terminal marker, #. Variable head is a pointer to the head of the list. In other words, if y is a list element, then for some k~O,y=&head), wheres’(x)=x and s’(x) =s(&*(x))

for k>O.

Let k be the number of elements preceding the cycle and r the number of elements in the cycle of a linear, cyclic list. Thus, the list has k + r elements and can be described as follows:

The new algorithm sequences through the elemc&s of the list as does alg 1. However, as it proceeds, it

checks whether the elements being examined,

Assume a non-cyclic linear list. The typical algorithm to find an element y examines elements beginning with the head: alg 1: x:= head; do x# # cand~#y+x:=s(x) od (x =y or (x = # andy @ list)}

Unfortunately, this algorithm does not terminate if the list is cyclic and y is not in the list. In the next section we present a new algorithm for cycle detection and compare it with the best generally known alternative algorithm [ 11.The measure of efficiency used for comparison is the number of list elements examined. 0020-0190/82/oooO-oooO/$O2.75

2. The new algorithm

are part of a cycle of a given length

q.

If so,

xo=sk(xo) will be detected for some k satisfying 0 < k and the algorithm terminates. The algorithm checks the following for cycles, in the order q=l,

x0= head, cycle: x0, s’(xo)

q=2, q=4,

x0= x0=

q=8,

x0=

0 1982 North-Holland

s’(heud), cycle: x9, s’(xo), s’(x,) s3( head), cycle: x0, s’(xo), s2ix,)9 s3(xo)9 s4(xo) s’( head ), cycle : x0, syx(J,

s2(x,),

s3&J,...

... 43

Volume 15, Number 1

XNFORMATION PROCESSING LETTERS

If _yis not in the list, at some point the proposed cycle length q will be greater than the actual cycle length r and the cycle will be detected. The algorithm is written as follows: alg 2: c, q, x, x0: = 0, 1, head, head; (partial invariant: elements head, s( head ), . . . ,pred( x) are not y and x(-J.s(x&.*r

s4(xo) contains a proposed cycle of length s q and q is a power of two ,nd X”SC(Xg) l_ i.e., the next element to be examined is the cth element from x0}

dox## candxfycandnot(x=xoandc+O)+ if c=q+xo, c, q:=x, 0,2*q c#q+x, c :=s(x), c+ 1 fi od ((x = # + y not in list and list not cyclic) and (x = y +y in list) and (x # # and x #y + not in list and list is cyclic)} A loop is detected when c # 0 and x = x0.

3. The conventional two-speed algorithm

The conventional algorithm [ 11 searches through the list twice, in parallel, the second search proceeding twice as fast as the first. Thus, if there is a cycle, the second search will lap the first and this lapping can be detected: x, z: = he&, head; {x= element being examined in slow search z = element being examined in fast search} do z#xcandz#y candz# # cands(z)#x cand s(z)#y cands(z)# # -,Y, z:=s(x), s(s(;.)) od {if z and s(z) are not y or ?$, there is a cycle and y

is not in list 1

19 August 1982

4. Performance analysis

If the searched-for element occurs in the list, the new algorithm is better since an additional pointer need not be moved along the list. Thus we only need consider the case where a cycle occurs. 4.1. Analysis of the new algorithm

It is easy to see that reference element x0 is always updated on the 2’th element, and the particular reference element which will cause a match is the first reference element chosen that lies in the cycle when the number of elements compared to it is at least r. Therefore, the number of elements examined will be 2’+ r with i chosen to be the least i such that 2’ > max( k, r - 1). This gives E, = the number of elements retrieved by the new

algorithm = 2’ + r where i is the least i s.t. 25max(k, r- 1). 4.2. Analysis of the two-speed algorithm

In the two-speed algorithm the cycle is detected as soon as the ‘fast’ pointer catches the ‘slow’ pointer in the cycle. At that time the slow potiter will have referenced k + d elements and the fast pointer will have referenced two elements for each element referenced by the slow pointer, 2(k + d) elements. This gives E, = number of elements retrieved by the two-

speed algorithm = 3(k + d) where d is the least d s.t. k + d s 0 (mod r). 4.3. Comparison of the two algorithms

We proceed by a case analysis: Case 1. rbk: 2’>r-

1%2++*,

2r-2>2’,

2’+r~3r-2~33r. Therefore, the new algorithm is better in this case. Case2.rsk: 44

Volume 15, Number 1

INFORMATION

PROCESSING LETTERS

19 August 1982

Table 1 Number of retrievals for the two algorithms E,, and E, k

Cycle length, r 1

2-

3

4

5

6

7

8

9

10

0

3 2

44

9 7

12 8

15 13

18 14

21 15

24 16

27 25

30 26

1

3 3 6 5

6 4 6 6

9 7 9 7

12 8 12 8

15 13 15 13

18 14 18 14

21 ‘15 21 15

24 16 24 16

27 25 27 25

30 26 30 26

3

9 5

12 6

9 7

12 8

15 13

18 14

21 15

24 16

27 25

30 26

4

12 9

12 10

18 11

12 12

15 13

i8 14

21 15

24 16

27 25

30 26

5

15 9

18 10

18 11

24 12

15 13

18 14

21 15

24 16

27 25

30 26

6

18 9

18 10

18 11

24 12

30 13

18 14

21 15

24 16

27 25

30 26

7

21 9

24 10

27 11

24 12

30 13

36 14

21 15

24 16

27 25

30 26

8

24 17

24 18

27 19

24 20

30 21

36 22

42 23

24 24

27 25

30 26

9

27 17

30 18

27 19

36 20

30 21

36 22

42 23

48 24

27 25

30 26

2

2ka2’>k

sinceiistheleastis.t.2’>max(k,r-

I),

2’+&2k+rd3k, 2’+63(k+d). The new algorithm is better except in the case in which case they have the where r=k=2i-‘, same number of retrievals. Table 1 shows a comparison of the two algorithms for all cases where k s 10, r 6 1% 5. Possible speed-up of +thenew algorithm The principle-on which the new algorithm works is the use of a reference element against which subsequent elements are compared for a certain interval. If no match is found, a new reference element is chosen and the interval over which comparisons are made is increased. By choosing more than one reference element and staggering the intervals for comparison one can trade a fixed

-

constant amount of storage for a speedup of the algorithm. This more general algorithm is described in [2]. The relationship between the algorithm given there and the ‘special case’ described here is not obvious, and I am indebted to Tom Szymanski for establishing the connection. Nevertheless, the special case given here can be shown directly and because it is simpler to program it should be of interest. After the present work was completed, [2) was brought to the author’s attention by AX. Aho.

Refermces

111D.E. Knuth, The Art of Computer Programming: Vol. 2, * Seminumerical Algorithms (Addison-Wesley, Reading, lvIA, 1969) p. 7. Sedgewick, T.G. Szymanski and A.C. Yao, The cornplexity of finding cycles in periodic functions, Proc. 11th SIGACT Meeting, 1979. 45

121R.T.