Inform
Svrrems. vol. 2. pp. 199405.
Pergamon Press 1977.
Pnnted m Great Britain
A UNIMODALITY PROPERTY OF OPTIMAL EXHAUSTIVE PREFIX CODES AND RETRIEVAL TREES OVER ALPHABETS OF VARYING SIZE, L. E. STANFEL~ Department of Applied Physics, Chr. Michelsen Institute, Bergen, Norway (Received 21 June 1976; in reoised form 15 November 1976) Abstract-For the case of a set of equally probable words to be encoded, by a coding alphabet in which each new symbol is more costly than the last, it is clear that the average word cost (equivalent to the total in this case) of an exhaustive prefix code varies with the subset chosen from the possible alphabet. The present paper establishes the nature of the variation and discovers the average work length is non-decreasing to a point, and then non-increasing beyond, thus making simple any search for a best alphabet. The above result is established llrst for an alphabet with costs { 1,2,3, .), which is important in information retrieval applications, then for arbitrary, but strictly increasing costs and for arbitrary, non-decreasing costs. 1.PROBLEMDEFWlTlON
Given an alphabet of r symbols with arbitrary costs it is fairly easy to show how to construct an optimal exhaustive prefix coding for some n equally-probable items to be coded, if n = r + k(r - 1) for some nonnegative integer k. This was done, in fact, in [5]. It is not intended to claim that equally probable words are found often in real situations. For flies whose items have inquiry probabilities yet to be determined, or where these latter vary over time, which is common, the assumption is a reasonable one, and may be the best choice. It is also the case that in problems of this general variety, results have only been obtained for the equal probability case. Both theoretical analysis and computation become greatly complicated in the situation of general probabilities. By exhoustiue, we remember, we mean only that any string composed of the r symbols is either a sequence of valid code words or the prefix of a sequence of valid code words. In terms of the usual tree representations, it means only that every non-terminal vertex possesses r successors on the next tree level. For example the exhaustive prefix encoding
The present paper addresses the question of finding that alphabet which produces the best code, under the same assumptions, if there are r-symbol alphabets of known cost available for several r satisfying n=r+k(r-1)
for
kE{O,l,...}.
(1)
To be more explicit, we assume there are available as many us N alphabet symbols, and the costs of these are +< C,, while if we use some r < N of the c,
{ill, 112,113,12,13,21,22,23,3} 2.A FIRSTCASERElWEVALAPPLICATION
has the tree representation of Fig. 1.
123 Fig. 1. ton leave from the Department of Industrial Engineering, University of Texas at Arlington, Arlington, TX 76019, U.S.A.
Initially, we wish to treat the case where Ci = j, j = 1, . . .1rl = alphabet size. This is not only helpful in
treating the more general case, but also provides the instance where word cost is identical to retrieual time where information retrieval is being accomplished by double chaining in the tree (see, e.g. [2-4]). This case, then, provides an important application. In double chaining, we recall, one searches the tree across until a match with a presented symbol is obtamed, and then down to begin comparisons for the next symbol. With the information file represented as in Fig. 2, reaching the circled vertex, where we would store the address of the record labeled ABC, from the root of the tree, requires 6 compares. Clearly if we let A have cost 199
L.
200
E.
STANF‘EL
l)+r,=n,
j:rl: A
A
or
8
A
8
C
8
C
C
Fig. 2.
1, B have cost 2 and C have cost 3, this is the same as saying that string ABC has cost 6. In terms of the graphs, themselves, [51 shows that a best exhaustive prelix code may be constructed iteratively, where each iteration consists of appending a one level subtree of ah rl branches to a least cost strinn in the present tree. For three symbols with costs 1,2,3, Fig. 3 shows the hrst four minimal cost exhaustive prefix codes, along with the costs of each word. In what follows we shall build up any desired configuration over an alphabet of r, symbols by the process illustrated in Fig. 3. The numbers of words generated, by iteration, are
k, =
z+ I
1.
Let us increase the alphabet size to r2> rl, and of course not every rz corresponds to an exhaustive tree, so let r2 be the smallest integer greater than r, which does correspond to an exhaustive tree. If we then generate II words with the r2 alphabet and compare terms, in order, with those of (2), we find (i) fewer terms in the new list (ii) every term of the r2 list larger than its corresponding term in the rI list. In fact, with k2= (n - rJr2 - 1) + 1 we can express both the decrease and increase in (i) and (ii), respectively. mini+rtmin
Decrease = Increase = 2 _
mim, + min, + --r2tr2
+
2
I
.
(3)
1) rl(rl + 1) 2
+r2minrl-r,min
I
(4)
where mitt,’ denotes the mm cost word available at iteration i with the r2 alphabet. The net increase, then, is (4)-(3)
rl, 2r, - 1,3r, - 2,. . ., kr, -(k - l), . . . =~_min/+$&t+$!?$t!2-~T 1 I
and we can mark the total word lengths rl at stage i by
kz r,(r, + 1) k_
t, = Ml + 1) 2 r,=r,-l+v+r, or, combining some sums, (2)
ti = t,-, -mini. f -rdrl+l)+r 2
I
mh
Ir, k2 kz k, -~mminr-rlC,mini+r2~mmin~‘-~min~’ I I I I
%r2(r2+ 1) +zT-c.T
I
where min, = cost of a minimum cost word at the beginning of the ith stage. (We may define mm, = 0.) The number of such subtrees, k,, which we add to obtain n words using rl symbols is related by (k, - l)(r, -
A
I
2
3
A
A 2
2
3
(6)
If we can show that the terms k2 kz r2(r2t 1) (r2-l)Cmini’+C-T_ I
A
3 A
4
3
/I\ 3
‘I ij(r, + 1) I
4 Fig. 3.
5
2
3
4
(7)
201
A unimodality property of optimal exhaustive prefix codes
from (6) constitute an increasing function of r2, and assuming that the minimum is achieved for alphabet size rlr then we shall know that once the minimum occurs, the function is strictly increasing thereafter. The expression (7) is
As our proxy for 272, (d/dr2) min,‘, then, we take _d,+(d,+dJ+..
.+(d,+d2+...+dr)
F,‘+Fz’+...+F,’
.
Our estimate for (dTl/dr2) becomes r2(r2+ 1)
(T:- 1) 2 mini’ + kz2 1
-(r2-
and although we are interested in but some discrete values for r2, let us temporarily consider r2 a continuous variable and write the derivative of (8) = T, t T2, treating one term at a time.
1)
d,+(d,+dz)+... +(d,+.‘.+dr,)+Fl+2F2’ F,‘+F2l+...+Fk’
_(r2_1)dl+(dh&)+~..+(d,+.*.+d~)+F~+F,
1
F,‘tFz’+...tFL’
2
+...+Fk’.
(10)
We are interested in the sign of that expression, or. equivalently, in that of
LL[(Ezi)r”ti”]
(FI’+Fz’+.
=- n - 1 4r2’-8r2-4 r2- 1 8(r2- 1) I ’
,“-rp2-l)+l
7
-. . .-2(r?-
,n-r#,-l,+l
&mini’+
T
mini’.
(9)
To treat the expression (9), let us return to the concept of building trees iteratively. The successive trees formed may be classified into epochs according to the cost of the minimum cost word available. Figure 1 itself shows this, for 1 is the cost of the minimum cost word for one iteration; 2 is the minimum cost for two iterations; 3 will be it for four iterations; 4 for eight, etc. By E we denote the epoch length for minimum word cost i, using the rl alphabet and by E’, the identical quantity when the r2 alphabet is used. Charting some period lengths for rl, r2 values, in the case where r2 = rl + 1, we can view the change we know will occur. The larger alphabet, beyond some i = iO, stretches the length of each epoch; i.e. fi’ > E for i > i. and I?’ = E for 15 i I io. (In fact, the value ia is apparent and will be mentioned later.) Thus, comparing iteration by iteration, up to the number k2 of iterations we have with the larger alphabet, we find the minimum cost word always the same or one unit of cost less in the r2 alphabet than in the rl. Now we let di = Fi’ - E.
The fraction of iterations, through the first k epochs with the r2 alphabet, at which the cost of the minimum cost word decreases, then is dl+(dl+d2)+.
. .+(dl+dz+. F,‘+Fz’+. . .+E’
Elsewhere, there is no change.
l)[d,+.
.+(d,+‘.
‘+dt)]
=(F,'t.. .+ K’)* - k(r2- l)d, -(k - l)(r> - l)d,
Now in computing (dTl/dr2), we have the problem of deciding (d/dr,) min,‘: %=(r2-1)
. .+F4’)2-(r2-
. *+dx)
’
l)C_,-
l(r2- l)d,.
First, suppose k, the number of epochs, is larger than or equal to the alphabet size. Now there appears to be something of a disparity between our concern for k2 iterations and k epochs. If we are able to obtain the proper result for k epochs we shall have no difficulty, below, in adapting the result to iterations. Let us think of k as the number of epochs required to provide the requisite number, k2, of iterations. Now (F,’ + . ’ . + FL’)’ > (k - l)‘(dl+ . . . t dt) SillCe
F,’ + . . + F,’ > (k - 1)2 and F,’ f.
. t F,’ > d, + dz+. . . t dp.
Further, (k - 1)2(d,+ d2 t . . . + d,J 2 k(r2 - l)dl + (k - l)(rz - l)d, +.
. t 2(r2- l)dx-, t (r2- l)dk.
Now, since we envision, especially in the retrieval context, that we shall have n B r, for any r we might be able to use, k will certainly exceed r. Nevertheless, in case r > k, the negative term in (10) is zero, since all the di are zero, j = 1,. . ., k-we have not constructed su5icient epochs to realize any positive differences 4’ -6. Hence our derivative estimate is always positive. Next, to account for cases where iterations may not coincide with whole epochs, let us return to our derivative expressions. (dT,/dr,) remains in terms of iterations and requires no attention.
202
L. E. STANFEL
In our estimate of the derivative of the second part, suppose we use _(r*_l)d,+(d,+d2)+..-+(d~+~-~+4)+(d,+~~~+d,,l) F,‘+F,‘+...+F,
that is, we include all the variation in mitt,’ through the (k t 1)st epoch, but retain a sum of F”s that is actually smaller than it should be. From the form of (1l), then, we are using an estimate of derivative ~rn~~@~than it should truly be. The sign of interest then is that of (F,‘+..
. f E’)” -(k t l)(rz - l)d, - k(r2 - l)d2 - ’ ’ *- 3(r2- l)&_, -2(r,- 1)L -(r*-
l)b+,.
In the k 1. r, case we have F,‘+.,.+F,‘>(k2-l)=(k+l)(k-1)
+ F,‘+ . ..+Fk’.
Then, for this X, (13) states c’(X)+ d(r+ I)- d(r)<0 which contradicts (12). Thus we have shown that g(r+ I)> g(r), and our function (8) is strictly increasing beyond the minimizing r The final little difference upon which we might comment relates to alphabet size. Beginning with an alphabet of size r in the analysis, r t 1 may not be feasible in the sense of (l), and the next feasible alphabet may be r t p for some p > 1. But if one is approximating derivatives linearly
and F,‘+. . .+ Fk’> d, f- . . +dk t de+,. In the r > k case, the d,, . . ., 4 are stillzero. Since the fust variation in d’s is in the i,, epoch, where io = new alphabet size, d,,, may equal one (the 8rst positive d is always unity), if r = k t 1. In that event, the entire derivative expression is
Y?_YI
Y3”YZ
r2-r,‘r,-rz’
...
Ye - Ys-I
’ r,-r,_,
“[“‘,;“‘;“f
(r-1) -~+XFi’+r_t 1
Since X I$’ > 1, and since we do not treat alphabets of size larger than n, we have positivity again, and we have accounted for partial epochs in one of the alphabets. We temporarily restrict attention to these two cases completed. Next, there is the question of the validity of approximating a derivative for a function that is actually ~scont~uous. We demons~te this below, as well as the way monoton~city of our function follows from it. The situation with our estimate is as follows. We have g(r) = c(r) + d(r), where g is the total cost function, and c, d are the continuous and discrete components of g, respectively. We have shown that on [r, r t 11, c’+d(r+
1)-d(r)>0
(12)
and inquire whether we may have g(r+ l)
or I)< c(r)+d(r).
If so, then c(r + 1)- c(r) + d(r t 1)- d(r) < 0.
(13)
The mean value theorem is applicable to c, and it states that for some Xr
then the sum of these is simply ya - y1 if 6 = ri+r- 1, all j, whereas, the estimate over the entire interval is (y,y,/r, -r,). That is the uueruge of the unit interval estimates is the whole interval estimate. All our estimated slopes were negative, and we can obtain the proper sign result for the smallest of these. But the average value will be no larger than the maximum, so we must obtain the positivity of the whole derivative estimate over the entire interval [r, r + p] and, consequently, the strict increasing of the total cost function over it. Looking back at (6) and (7), then, we have half our desired result. Once the total cost function achieves its minimum, it is strictly increasing over all feasible alphabets thereafter. We wish also to demonstrate the fact of its strictly decreasing nature up to its minimum. But this is actually the same property. What we desire in this case is to show that the expression (7) with its sign i~ue~edis a strictly increasing function as r decreases, or that (7) is strictly decreasing as r decreases, which we have already done. As a result, then, we know that the total cost function of optimal exhaustive codes over all feasible alphabet sixes, where the r-symbol alphabet has symbol costs 1.2,. . ., r, has the property of strictly decreasing up to its turn and strictly ~c~asing thereafter. (The only point that prevents the same resuit from holding over all such alphabets, in which case an optimal code may not be exhaustive, is that then the cost function may remain constant over ranges. Consequently, from the point of view of conducting a search to locate a best alphabet, it is nearly as e~ciently carried out as in the au-exhaustive case). But our results pertain to two cases. If both
203
A unimodality property of optimal exhaustive prefix codes alphabets yield partial epochs, as in Example 1, we may find unchanging function values.
Inequality (14) is equivalent to
3. GENERAL, INCRFASINGcm It is next reasonable to inquire whether a like property holds for alphabets with arbitrary symbol costs. We know that if the costs are rational, multiplication by a suitable integer will provide a set of integer symbol costs and that the average (or total) code cost is only changed by that identical multiple. Thus, if we first prohibit irrational symbol costs we lose no generality in assuming the symbol costs C, < CZ< < C, to be positive integers, where perhaps C, C,-, > 1, for some i. When we consider expanding the alphabet size. now, we will have C,+, > C,, but perhaps CL, i c, + 1. Looking at the process of building up exhaustive trees, now, we find that the per iteration increase in total word costs is
- min, + z [min, + C,] = (r - 1) min, + 2 C, ,=I
+...+oF:,,>~~-l)F,‘+~~-2)F, + ‘i-3 (
I
mini’ - i
I
mini’ + i
I
2 C, I
-i(r:-
4. NON-DFXXEASMGSYMBOL COSB
A natural extension of the arbitrary increasing symbolcost case is the similar one where a new symbol may cost the same as the last one, and not necessarily more. The principal difficulty, with reference to the kinds of analyses undertaken previously here, is that the epochs are lengthened over what they would be if the next symbol had unit cost more than its predecessor. Let us return to our estimate (IO) for this case, and assume the first symbol has unit cost, with all costs, as before, integers. We observe first that F,’ _ 2F2’ +
+
kFk’ >; (F,’ +.
. + FL').
(14)
..+(;-(i-1))
F;,>_,
1)
d, + (d, + dz) + .
+ (d, + .
F,‘+F:‘+...+E
+ dk) -
is a liberal estimate of the negative contribution, while -i(r2-
where the notation above is the same as in (5). As before the only quantity about which we are uncertain is the rate of change of the minimum cost word with increases in alphabet size. In the new problem, however, we find this rate less than we had previously, for any gaps of greater than unit cost in the symbol costs produce decreases in the quantities of words of given cost later on. The effect then is to reduce the epoch length relative to that of th’e smaller alphabet. The result is that what we denoted (d/dr,) mini’ is greater than it previously was. Since that was the only difficult term before, the positivity of the total derivative term in the new case is easily established, and the same property of the total word length function is present with the more general alphabet. (As a subsidiary observation we can note that consecutive integer costs 1,2,. . ., r maximizes epoch length.)
FJi+.
which is clearly correct, by the non-decreasing nature of the F,‘. At the same time, none of epochs 1.2.. ., k increase to as much as a factor k/2. so
for the r-symbol alphabet. Upon expanding the alphabet, the difference in total cost. corresponding to (5) becomes $ min, - r, $ mini + r2i
)
1)
d,+(d,+dl)+..
.+(dlt..,tdk)
F,‘+F2’t...+FA’ +;(F,‘+F++Fi’)
(15)
is a conseruatioe estimate of the sum. But (15) is simply (k/2) times a quantity already shown positive. Hence the entire derivative estimate is again positive. and we are finished. In this case again, however, it is possible that strict increase may not result when it is necessary to deal in partial epochs, that is, in iterations. Thus, we may find a constant total word length over a range of feasible r. Example 3 below illustrates this phenomenon.
5. IMPLICATION
From the point of view of wanting to locate a best alphabet. the result allows very efficient searches to be carried out, much as the unimodality of continuous functions permits very efficient searches for maxima or minima. A function is said to be unimodal[l] if it is strictly increasing to a point and strictly decreasing thereafter, or the opposite. We have shown our total cost or average cost function to be unimodal in a discretely varying fashion. Efficient searches are well-known for even the discrete variety of unimodal functions, [6], but one can exploit the unimodality with even rudimentary procedures, for whenever the function begins to increase, the optimal solution has been passed, whereas as long as the function decreases we either just found an optimal solution or it lies yet ahead of us. Clearly. the adjustment necessary
204
L. E.
for the possibility of constant ranges of function value is minor. The assumption of a uniform distribution is obviously restrictive, but for large sets of data, as in retrieval applications, it is a reasonable compromise. 6.We illustrate the variation in total word cost for several example situations.
STANFEL
r = 13
2,2 2,3
1,2 3,3 Total cost = 309
r = 19
1:13
2:13 I,14
1,l
2.2 23
1;19
2;19 1.20
3113 2,14 1,15
Total cost = 398
Example
1 R = 37; alphabet costs = {1,2,3,4, . . .}. Valid r for exhaustive trees, r = 2,3,4,5,7,10,13,19,37. We illustrate the sequential build up in terms of (number of words, length) through each full epoch and then the final result, which may require but several iterations of the next complete epoch. Each column of pairs represents an epoch, or the final solution.
1,l 1,2
r=37
1,l Total cost = 703 1137
The optimizing value is r = 4 or r = 5. r=2
r=3
1,l 1,2
2,2 1,3
21,7 13,8
18,7 16,8 399
1,l 1,2 1,3
2.2 2,3 1,4
IO,5 14,6 10,7 398 r=4
r=5
r=7
r= 10
1,l I,2 1,3 1,4
3,3 2,4
5,4 3,5
8,5 5,6
13,6 8,7
Example 2 n =37.
Costs = {1,3,6,7,9,12,14,15,16,17,.
. ., 37)
r=2
1,l 1,3
1,2 1,3 1,4
2.3 1,4 I,5
6,6 3,7 4,8
9,7 498 6.9
13,8 6.9 9,lO
4,8 15,9 9, 10 9,11
1,l 1,3 1,6
1,2 1,3 1,4 1,6 1,7
2,3 1,4 1.5 1,6 1,7 1,8
4,5 376 4,7 1,8
7,6 4,7 5,8 2,9 3,lO 4,11
I,6 10,7 5,8 8,9 3,lO 4,11 6, 12
I,2 1,3 1,4 1,6 2,7 1,8
2,3 1,4 1,5 1,6 2,7 2,8 1,9
Total cost = 281
4,3 3,4 2,5
7,4 6,5 4,6
13,5 11,6 7,7
Total cost = 228 r=3 2,2 2,3 2.4 1,5
494 11,5 10,6 897 438
4,3 4,4 3,5 2,6
8,4 7,5 6,6 4,7
Total cost = 219
279 3,lO 4,3
2,3 6,4 6,5 6,6 6,7 5,8 4,9 2. 10
1,l 1,2 1,3 1,4 1,5 1,6 1,7
2,2 2,3 2,4 2,5 2,6 2,7 1,8
4,4 4,5 4,6 4,7 3,8 2,9
1,l I,2
2,2 2.3
4,3 494
r=4
Total cost = 234
r=5 4,lO 1;10
2:10 1,ll
3,11 2,12
Total cost = 265
1,l 1,3 I,6 1,7
796 597 68 399 5,lO 7,11 4,12
3,4 1,5 2,6
4,5 2,6 3,7
Total cost = 356
3,4 1,5 3,6 1,7 1,8 2,9
Total cost = 334
3,4 1,5 3,6 2.7 2,8 3.9 2, 10
4,5 3,6 5,7 2.8 3,9 5,lO 3, 11
Total cost = 338
205
A unimodality property of optimal exhaustive prefix codes 1, 1 1,3 1,6 1,7 1,9 1,12 1.14
r=lO
Example 3 1.2 1,3 1.4 1,6 2,7 1,8 1,9 1.10 1,12 1,13 1.14 1,15
3,4 1.5 3,6 2,7 2,8 4,9 3,lO 1,ll 3,12 1,13 2, 14 3, I5 1,16 2,17
2,4 2.5 3.6 3.7 2,8 4,9 4,lO 2,ll 3,12 2,13 2,14 3,15 2,16 2,ll 1, 18
1,l 1,3 1,6 1,7 1,9 1, 12 1,14 1,15 1,16 1,17
1,2 1,3 1,4 1,6 2,7 1,8 1,9 1,lO 1,12 1,13 1,14 2,15 2, 16 2,17 1,18
2,3 1,4 1.5 1,6 2.7 2,8 2,9 1,lO 1,ll 1,12 1.13 2,14 1.15 1,16
n = 37. Costs = {l. 2,2,4,4,4,6.7,7,10,12,12,12,. r=2 r=3
Same as Example 1, r = 2.
1, 1 2.2
r=4
131 2.2 174
r=7
r= 10
2,3 1,4 1,5 1,6 2,7 2,8 2,9 1,lO 1,ll 1,12 1.13 2, 14 2,15 3,16 3,17 2,18 1,19
r= 13 etc. The optimizing value is r = 4.
133 234 195 236 297 2,8 399 2, 10 1,11 2,12 1,13 2, 14 3,15 3,16 4,17 3,18 2,19 1.20
5,3 6,4
11.4 10,J
394 18.5 16,6 Total cost = 198
372 233 1,4 175
5,3 7.4 1,5 3.6
12,4 17,5 3,6 5,l
10,4 13,5 7.6 5.7 2,8 Total cost = 193
131 232 394 196
3,2 233 334 335 1.6 197
171 2,2 334 I,6 237 1,lO
3,2 293 3,4 335 136 337 238 1,lO 1,ll
5,3 9,4 3,5 10,6 1,7 3,8 132 473 7,4 335 736 331 48 479 1,lO 1, II 2,12
r= 13
Total cost = 454
Total cost = 281
3,2 2,3
r=5
Total cost = 390
. ., 12)
4,3 10,4 5,5 10,6 4,7 3,8 1.9
Total cost = 198
Total cost = 233
Total cost = 285
etc’ The optimizing value is r = 5.
Total cost = 507
[l] B. D. Sivazlian and L. E. Stanfel: Optimization Techniques in Operations Research. Prentice-Hall, Englewood Cliis, New Jersey (1975). [2] L. E. Stanfel: Tree structures for optimal searching. L A.C.M. 17 (3), 508-517 (1970). [3] L. E. Stanfel: Optimal tree lists for information storage and retrieval. Inform. Systems 2(2), 65-70 (Aug. 1976). [4] E. H. Sussenguth: Use of tree structures for processing files. Commun. A.C.M. 6(S), 272-279 (1%3). [5] B. Vam: Optimal variable length codes (arbitrary symbol cost and equal code word probability). Inform. Contra! 19.2RL301 (1971). [6] D. Wilde: Optimum Seeking Methods. Prentice-Hall, Engiewood ClXs, New Jersey (1964).