Vaiume5, number 3
IMFORYATION PROCESSlNG LETTERS
August 1976
$IIMAC;E FOR CONSECUTIVE RETRiE:VAL* F. LUCCIQ** and F.P. PREPARATAt Uhivmity
keiwd
of Illinoisat Urbana-clcmpaign, Urbana, !liinois 61801, USA 16 March 1976, in revised form 19 April1976
orga+!ation, storage requirements, consecutive retrieval, multiple-attribute retrieval, combinatorial pm!z!~~
Ehrich and Lipski suggested rganization, which clito the construction of invezted fries attributes,and can be used in information f.gr~ of wide generality [ 1,2] . Specificalncsmponent binary vec3 I; and u = (c,,t(v), c,_&), .... Q(V)) be a t St be 8 seqluence of 2”-l 1 the vectors II with n in [!I ?o construct a ,, of vectors of Vn for single-attribute rea sequence containing each Si in 2n- 1 dtfons,OQiez - 1; I all the vectors in Sf are distinct; ctor with more than one component to 1 may have multiple occurrences in Aq, e torrgsponding segof minimizing such occurrences, by allowing the segments to . h [l] it 14shown thar the length of the pro(1) to 4tn asymptotic value flf &nP-t),
where
by the Joint Service:i U.S. N:wy,and U.S., Air 72x-0259. . On lea3 from the UniEngineering and Coordinated
n2”-1 is the length of t’ilesequence composed on nonoverlapping segmenlk(as used in a straightforward inverted file organizatio:l). in this note we present two results on consecutive retrieval: (i) The optimal&y of the Ehrich-Lipski algorithm, i.e., the length of ths single-attribute sequence A, generated by their algorithm is minimal; (ii) A scheme for constructing a sequence Br, OFbinary vectors for multiple-attribute consecutive rem t rieval.
2. Minimallength of sequences for singk-att:dbute consecutive retrieval Let A, be a minimal length sequence of rr-comlm nent binary yectors for single-attribute retrieval and let Zn= length (A,). Let A, [jl be the jth term of A, and let A, [ai] be +Lhe initial term of SI, i.e.,
Without loss of generality we may assume that a4 4: aj for i
Volume 5. number 3
lNFORMAT[ON
August 1976
PROCESSING LETTERS
i
not
.$+z.
eitherp
in Si Or Si+l*
So we
oq
i.e.,
=
have
+ P-l
- 1 Cp,
Oi+l t 2”m-L -~ I*
d,[p]~Si+l!~~eitherp
* -K/I.
The* conditions and
< pi +1 < ai + 2 imply oi Sj n Z&2 = Q. Since the property is proved. D conventionally c_](v) (v) = c~+~_ , (v), .. ..c&). with t(v) = ... = c~(v) =:1 and is called a run of 1 ‘sof v; we {71(v), ...J&v)} be the set of runs of l’s of v. Define the weight W(v)of v as Bi.e..
W(vi ,b
+ I forOGK22k-2-
1. (3)
ForO
- 1
i=e c
p-2 W(vA) t
- 1
c
W(v;=l+i)
i=Ci 2n-2 - 1 C
t
W(v,2n-~+2n-*+ije
i=O
Using (2) and (3) we readily obtain:
For example, W(11101011) = 4. We have: 2. A vector v omm dt least W(v) times in A,. IBy property 1, a given occurrenceof u in A, can be
p-i
hqmty
sharedonly by two segments with consecutive inCrices,hence fl y&) l/2 1 Occurrences of v are reLet npw u: in&ate the vector of I$ such that -&) ... co(#) is the binary representationof i, an let .
cc
“a2k-
1
wn =
c i=O
_1
2n-2 - 1 W&)t
2
c
i=O
(W@4_*> z- 0
= W”_’ + 2wn_2 t Zn-‘.
This recurrence equation, with initial conditions WI = 1 and W, = 3 (obtained by inspection), yields function (1) as it9 solution. Since In > Wn and the length’s of the sequences constructed by Ehrich and Lipski meet the bound , with equality, wt conclude that the laiter are optimal.
Wehave: It
follows immediately from Vn must appearin A,,
The foilowmg property provic& an inductive meth-
od for computing the weights of vectors with k components from the weights of vectors with (k - 1) and * (k - 2) con~pone~ts.
for OGiC!k-l
- 1; (2)
3. A scheme for multipleVattribute consecutive retrieval
Let Bn be a sequence of vectors of V, with the multiple-attribute consecutive retrieval (MACR) property, i.e., for each nonvojd (I C { 0, f,. ... n - 1) there is a se ent S of 2”-- I 7’ consecutive terms of n consisting of all :ectors u for w I’E 0. Sequences with the MACR property for n = 2 and . 3 aregiven below 69
August 1976
INFORMATION PRWESSING LETTEKS c()111111110000110101100111100 c~01~00111111111i0000111110000 c~001111000011111111110011010 c-J01100110011001001111111111~
oincide respectively with those of Mvai ,quences. We shall now ilfs, c:rnstructilrga Bn for arbiuctive asmmption that 9 sethe MACR property nnd additiona!properties: s of Bpt_ 1 are distinct; terms of Bn _ 1 belong exclusively bdfiuctionwith n = 5 since the foly natisfkr(i) and (ii) and the MACR
The following procedure constructs Bn (refer to fig. 1 for az illustration of the construction of Bs). Step I. Form l?A as the sequence of vectors of V, obtained by replacing each vector u in B, _ 1 with ii and Ou. Comment:BA contains a segment So of n-component vectors for each (I S {0,1, .... n - 2); furthermore, by hypothesis (ii/, the last 2 X 2n-4 = 2AP3 terms of Bh can be arbitrarily permuted without destroying the stated property. Notice that, by construction, the first .3.2”-* terms of BA are distinct. Step 2. Form Bi as the sequence of vectors of b& obtained by rearranging the last Znm3 terms of BA, so that the last 2n-4 positions of Bi contain the WCtors u with cn _ I (Q = 1, but in revel orderwith respect to f3;. .
-
a.0
.
.
.
.
.
1 0 0
1’0 0
0100
0’1 0 1 111 1 0 110 0
.
Fig. 1. Construction of the sequence Bs
7
B4
FORMATION PROCESSING LETTERS
Since we have only rearranged the: last 2” - 3 terms of BA, the sequence Bi also contains a ent S, for each 0 !G{Q, 1, ... . II- 2). The array formed by nponen ts co, terms of s,” coincides with B~_*[1,_,-2”-~+ I*ln_ Comment:
cs,“,= al,_*.
August 1976
lows that Bn has the MACR property and that length (C,) = P-3. , since C, is used only to construct Si,+ property (ii) ic pV3nexter;ded. Finally the first 3.2” -2 of B;; her _A*9of 8,, , 3re distiilct, thereby extending property (1). We can now easily determine In. In fact lfi = length (Bl) + length (Bi = 21~_,t(I,_1-2n--4)t2n-:'=
31,_,+ 2”-4.
This recurrence relation in In, with initial conditions 14 = 27, is solved as
ln = g3n-.2n-3, Since the length of a sequence of nonoverlapping seg ments is (3” - 29, we see that this scheme achieves a storage saving of approximately J. However, we have not succeeded in evaluating the optimality of the result.
a sequence ccvMst.ingof all terms withc,_ l = 1 which do not appear in the last 3.2n-3 pssitions of,*lQll. Set Bn *LB,”8,*r l C . Comment: By property (ii) the first 3.2”-3 terms of B,+ I, hence the last 3.2n-3 tsrms of s,‘? l, are dist P- t terms of 3,+2 1CR are distinct and have en._ 1 = I, i.e., they form Sf,,). It foi-
References (1) H.D. Ehrich and W. Lipski, Jr., On the storage space requirement of consecutive retrieval with redundancy, Inform. Proc. Mt. 4 (1976) lOl- 104. 121 W. Marek and Z. Pawlak, Information storage and retrieval systems - mathematical foundations I, CCPAS Reports No. 149, Warsaw (1974).
71