Storage for consecutive retrieval

Storage for consecutive retrieval

Vaiume5, number 3 IMFORYATION PROCESSlNG LETTERS August 1976 $IIMAC;E FOR CONSECUTIVE RETRiE:VAL* F. LUCCIQ** and F.P. PREPARATAt Uhivmity keiwd ...

466KB Sizes 0 Downloads 91 Views

Vaiume5, number 3

IMFORYATION PROCESSlNG LETTERS

August 1976

$IIMAC;E FOR CONSECUTIVE RETRiE:VAL* F. LUCCIQ** and F.P. PREPARATAt Uhivmity

keiwd

of Illinoisat Urbana-clcmpaign, Urbana, !liinois 61801, USA 16 March 1976, in revised form 19 April1976

orga+!ation, storage requirements, consecutive retrieval, multiple-attribute retrieval, combinatorial pm!z!~~

Ehrich and Lipski suggested rganization, which clito the construction of invezted fries attributes,and can be used in information f.gr~ of wide generality [ 1,2] . Specificalncsmponent binary vec3 I; and u = (c,,t(v), c,_&), .... Q(V)) be a t St be 8 seqluence of 2”-l 1 the vectors II with n in [!I ?o construct a ,, of vectors of Vn for single-attribute rea sequence containing each Si in 2n- 1 dtfons,OQiez - 1; I all the vectors in Sf are distinct; ctor with more than one component to 1 may have multiple occurrences in Aq, e torrgsponding segof minimizing such occurrences, by allowing the segments to . h [l] it 14shown thar the length of the pro(1) to 4tn asymptotic value flf &nP-t),

where

by the Joint Service:i U.S. N:wy,and U.S., Air 72x-0259. . On lea3 from the UniEngineering and Coordinated

n2”-1 is the length of t’ilesequence composed on nonoverlapping segmenlk(as used in a straightforward inverted file organizatio:l). in this note we present two results on consecutive retrieval: (i) The optimal&y of the Ehrich-Lipski algorithm, i.e., the length of ths single-attribute sequence A, generated by their algorithm is minimal; (ii) A scheme for constructing a sequence Br, OFbinary vectors for multiple-attribute consecutive rem t rieval.

2. Minimallength of sequences for singk-att:dbute consecutive retrieval Let A, be a minimal length sequence of rr-comlm nent binary yectors for single-attribute retrieval and let Zn= length (A,). Let A, [jl be the jth term of A, and let A, [ai] be +Lhe initial term of SI, i.e.,

Without loss of generality we may assume that a4 4: aj for i
Volume 5. number 3

lNFORMAT[ON

August 1976

PROCESSING LETTERS

i
not

.$+z.

eitherp
in Si Or Si+l*

So we

oq

i.e.,

=

have

+ P-l

- 1 Cp,

Oi+l t 2”m-L -~ I*

d,[p]~Si+l!~~eitherp
* -K/I.

The* conditions and

< pi +1 < ai + 2 imply oi Sj n Z&2 = Q. Since the property is proved. D conventionally c_](v) (v) = c~+~_ , (v), .. ..c&). with t(v) = ... = c~(v) =:1 and is called a run of 1 ‘sof v; we {71(v), ...J&v)} be the set of runs of l’s of v. Define the weight W(v)of v as Bi.e..

W(vi ,b

+ I forOGK22k-2-

1. (3)

ForO
- 1

i=e c

p-2 W(vA) t

- 1

c

W(v;=l+i)

i=Ci 2n-2 - 1 C

t

W(v,2n-~+2n-*+ije

i=O

Using (2) and (3) we readily obtain:

For example, W(11101011) = 4. We have: 2. A vector v omm dt least W(v) times in A,. IBy property 1, a given occurrenceof u in A, can be

p-i

hqmty

sharedonly by two segments with consecutive inCrices,hence fl y&) l/2 1 Occurrences of v are reLet npw u: in&ate the vector of I$ such that -&) ... co(#) is the binary representationof i, an let .

cc

“a2k-

1

wn =

c i=O

_1

2n-2 - 1 W&)t

2

c

i=O

(W@4_*> z- 0

= W”_’ + 2wn_2 t Zn-‘.

This recurrence equation, with initial conditions WI = 1 and W, = 3 (obtained by inspection), yields function (1) as it9 solution. Since In > Wn and the length’s of the sequences constructed by Ehrich and Lipski meet the bound , with equality, wt conclude that the laiter are optimal.

Wehave: It

follows immediately from Vn must appearin A,,

The foilowmg property provic& an inductive meth-

od for computing the weights of vectors with k components from the weights of vectors with (k - 1) and * (k - 2) con~pone~ts.

for OGiC!k-l

- 1; (2)

3. A scheme for multipleVattribute consecutive retrieval

Let Bn be a sequence of vectors of V, with the multiple-attribute consecutive retrieval (MACR) property, i.e., for each nonvojd (I C { 0, f,. ... n - 1) there is a se ent S of 2”-- I 7’ consecutive terms of n consisting of all :ectors u for w I’E 0. Sequences with the MACR property for n = 2 and . 3 aregiven below 69

August 1976

INFORMATION PRWESSING LETTEKS c()111111110000110101100111100 c~01~00111111111i0000111110000 c~001111000011111111110011010 c-J01100110011001001111111111~

oincide respectively with those of Mvai ,quences. We shall now ilfs, c:rnstructilrga Bn for arbiuctive asmmption that 9 sethe MACR property nnd additiona!properties: s of Bpt_ 1 are distinct; terms of Bn _ 1 belong exclusively bdfiuctionwith n = 5 since the foly natisfkr(i) and (ii) and the MACR

The following procedure constructs Bn (refer to fig. 1 for az illustration of the construction of Bs). Step I. Form l?A as the sequence of vectors of V, obtained by replacing each vector u in B, _ 1 with ii and Ou. Comment:BA contains a segment So of n-component vectors for each (I S {0,1, .... n - 2); furthermore, by hypothesis (ii/, the last 2 X 2n-4 = 2AP3 terms of Bh can be arbitrarily permuted without destroying the stated property. Notice that, by construction, the first .3.2”-* terms of BA are distinct. Step 2. Form Bi as the sequence of vectors of b& obtained by rearranging the last Znm3 terms of BA, so that the last 2n-4 positions of Bi contain the WCtors u with cn _ I (Q = 1, but in revel orderwith respect to f3;. .

-

a.0

.

.

.

.

.

1 0 0

1’0 0

0100

0’1 0 1 111 1 0 110 0

.

Fig. 1. Construction of the sequence Bs

7

B4

FORMATION PROCESSING LETTERS

Since we have only rearranged the: last 2” - 3 terms of BA, the sequence Bi also contains a ent S, for each 0 !G{Q, 1, ... . II- 2). The array formed by nponen ts co, terms of s,” coincides with B~_*[1,_,-2”-~+ I*ln_ Comment:

cs,“,= al,_*.

August 1976

lows that Bn has the MACR property and that length (C,) = P-3. , since C, is used only to construct Si,+ property (ii) ic pV3nexter;ded. Finally the first 3.2” -2 of B;; her _A*9of 8,, , 3re distiilct, thereby extending property (1). We can now easily determine In. In fact lfi = length (Bl) + length (Bi = 21~_,t(I,_1-2n--4)t2n-:'=

31,_,+ 2”-4.

This recurrence relation in In, with initial conditions 14 = 27, is solved as

ln = g3n-.2n-3, Since the length of a sequence of nonoverlapping seg ments is (3” - 29, we see that this scheme achieves a storage saving of approximately J. However, we have not succeeded in evaluating the optimality of the result.

a sequence ccvMst.ingof all terms withc,_ l = 1 which do not appear in the last 3.2n-3 pssitions of&#,*lQll. Set Bn *LB,”8,*r l C . Comment: By property (ii) the first 3.2”-3 terms of B,+ I, hence the last 3.2n-3 tsrms of s,‘? l, are dist P- t terms of 3,+2 1CR are distinct and have en._ 1 = I, i.e., they form Sf,,). It foi-

References (1) H.D. Ehrich and W. Lipski, Jr., On the storage space requirement of consecutive retrieval with redundancy, Inform. Proc. Mt. 4 (1976) lOl- 104. 121 W. Marek and Z. Pawlak, Information storage and retrieval systems - mathematical foundations I, CCPAS Reports No. 149, Warsaw (1974).

71