CHAPTER
VI
Densities
0. INTRODUCTION
We have used measures for studying codes as early as in the first chapter. In particular we have seen that for any code X over an alphabet A and for any Bernoulli distribution A over A*, we have n ( X ) 5 1. We also have seen that a thin code X is a maximal code iff A ( X )= 1. We thus have obtained an astonishingly simple method for testing the maximality of a code. This is a mark of the importance of the role played by probability measures in the theory of codes. In this chapter we again use measures. We will study asymptotic properties of certain sets related to a code. For this the notion of density of a subset L of A* is introduced. It is the limit in mean, provided it exists, of the probability that a word of length n is in L. We shall prove a fundamental formula (Theorem 3.1) that relates the density of the submonoid generated by a thin complete code to that of its sets of left factors and right factors. For this, and as a preparation, we shall see how one can compute the density of a set of words by transfering the problem to a study of probabilities in abstract monoids. 1. DENSITIES
Let A be a positive Bernoulli distribution on A*, It is a morphism A: A* +lo, I] 290
I . DENSITIES
29'
such that
c n(a)
as.4
= 1.
In the sequel, we use the notation A'") = (1) u A v
. * *
v A"-'.
In particular A'') = fa, A") = {I}. Let L be a subset of A*. The set L is said to have a density with respect to n if the sequence of the a(L n A") converges in mean, i.e., if 1"-1
lim -
n-tm
C n(L n Ak)
nk=O
exists:If this is the case, the density of L (relative to n) denoted by 6(L),is this limit, which can also be written as
6(L)= lim (l/n) a(L n A'")). n+cn
An elementaryresult from analysis shows that if the sequence n(L n A") has a limit, then its limit in mean also exists, and both are equal. This remark may sometimes simplify computations. Observe that 6(A*)= 1 and 0 I6 ( L )I1 for any subset L of A* having a density. If L and M are subsets of A* having a density, then so has L v M, and 6(L v M) I 6 ( L ) + 6(M).
If L n M # fa, and if two of the three sets L, M and L v M have a density, then the third one also has a density and
+
6(L v M ) = 6(L) 6 ( M ) .
The function 6 is a partial function from ?(A*) into [0,1]. Of course, 6({w}) = 0 for all w E A*. This shows that in general
Observe that if a(L)< 00, then 6(L) = 0 since n(L n A'"))I x(L),
VI. DENSITIES
292
whence 1
-x(L n A'"))+ 0. n
EXAMPLE 1.1 Let L = (Az)* be the set of words of even length. Then Thus
n (n ~ A(Zk))= x(L n A(zk-l)) = k. 6(L)= 4.
1.2 Let D* = ( w E A* I Iwl. = (wIb} over A = {a,b}, and set EXAMPLE p = n(a), q = I@). Then (see Example 1.4.5) x(D* n
= 0.
Using Stirling's formula, we get 1
x(D* n Azn) N
showing that for all values of p and q, lim n ( ~ n * A'") = 0.
n-r w
Thus 6(D*) = 0. The definition of density clearly depends only on the values of the numbers x(L n A"). It appears to be useful to consider an analogous definition for
formal power series: we restrict ourselves to formal power series in one variable with coefficients in R,. Let be a formal series. The density of f, denoted by S ( f ) is the limit in mean, provided it exists, of the sequence f,, In-1
S ( f ) = lim- C n-m n i = o
fi.
To each set L c A* we associate a series fL E R+ [ [ t ] ]by
sL= C x(L n A")t" m
n=O
Clearly fL has a density iff L has a density, and
W )= S(fd.
293
I . DENSITIES
Let fL be given by (1.1). We denote by p L the radius of Convergence of the series fL. It is the unique real number p E R, such that
converges for IzI c p and diverges for 1zI > p. For any set L, we have p L 2 1 since n(L n A") I 1 for all n 2 0. The following proposition is a more precise formulation of Proposition 1.5.6. PROPOSITION 1.1 Let L be a subset of A* and let distribution. If L is thin, then p L > 1 and 6 ( L ) = 0.
R
be a positive Bernoulli
Proof Let w be a word in F(L),and set n = 1 wI. Then we have, for 0 Ii I n and k 2 0, L n Ai(A")'
t
- {w})'.
Hence
n(L n Ai(A")k)I(1 - R(W))". Thus for any p E R, satisfying (1 - n(w))p"< 1, we have n-1
m
i=O k=O
This proves that
1
For later use, we need an elementary result concerning the convergence of certain series. For the sake of completeness we include the proof. PROPOSITION 1.2 Let f = C n L O f n t ng , = Cnrogat"E R + [ [ t ] ] be two power series satisfying (i) 0 < 0 = C , z o g n < a; (ii) for all n 2 0,O 5 f, I 1. Then 6( f)exists zff 6(fg) exists and in this case,
S(fd
= 6(f)a.
294
Proof
VI. DENSITIES
Set h =f g =
m
C h,t'
n=O
Then for n 2 1,
where ri = c;=,gj.Let s, = C ',:;
Then for n 2 1,
Furthermore s, =
c
n-1
5
i=o
Since C g , converges, we have limi,
c rn-I =
n-1
I=O
i=1
ri.
ri = 0, showing that also
In-1
lim - C ri = 0, n i=o
n-m
and in view of (1 ,4), 1
lim -s, n-m n
= 0.
Since # 0 Eq. (1.3) shows that S( f ) exists iff S(h)exists and that 6(f )a = 6(h). This proves (1.2) and the proposition. 0
PROPOSITION 1.3 Let ~tbe a positive Bernoulli distribution on A*. Let L, M be subsets of A* such that
-=
(i) 0 < n(M) 00. (ii) The product LM is unambiguous. Then LM has a density iff L has a density, and if this is the case, Proof
S(LM) = S(L)x(M). Since the product LM is unambiguous, we have fLM
=fLfM
*
(1.5)
295
1. DENSITIES
In view of the preceding proposition
W M )= W L M ) =
w&,
where o = C n ; r o ~ ( n M A") = n(M). 0 This proposition will be useful in the sequel. As a first illustration of its use, note COROLLARY 1.4 Each right (left) ideal I of A* has a nonnull density. Each prejx-closed set has a density. More precisely where X = I - IA'. &I) = a(X),
Proof Let I be a right ideal and let X = I - IA'. By Proposition 11.1.2, the set X is prefix and I = XA*. The product XA* is unambiguous because X is prefix. Further n ( X ) I1 since X is a code, and n ( X )> 0 since I # fa and consequently also X # 0.Thus, applying the (symmetrical version of the) preceding proposition, we obtain 6(Z) = 6(XA*) = x(X)G(A*)= n ( X ) # 0.
Finally, the complement of a prefix-closed set is a right ideal. 0 Let X be a code over A. Then n(X)5 1 and n ( X ) = 1 if X is thin and complete. For a code X such that n(X) = 1 we define the aoerage length of X (relatively to n) as the finite or infinite number
n(x)= C Ixlz(x) = nCL O na(X n A"). xox
The followingfundamental theorem gives a link between the density and the average length.
THEOREM1.5 Let X c 'A be a code and let II be a positioe Bernoulli distribution. I f (i) n ( X ) = 1, (ii) A(X) < 00, then X* has a density and d(X*)= l/A(X). Proof Set f, = n(X n A"). Then f x = C,"= f,t". Since X is a code, we have as a consequence of Proposition 1.1.6, fx* = (1 - f x ) - l .
Let g =
c;=,grit" be defined by
296
VI. DENSITIES
Identifying terms, we have for n 2 0,
From n(X) = C,"=,h= 1 we get that
Note that by (1.9), gn 2 0 for n 2 0. Moreover, by (1.6) and(1.9), (1.10)
We see that (1.8) and (1.7) give t* = f x x g .
(1.11)
Since A(X) is finite and not zero (indeed A(X)= 1 shows that X # fa), we can apply Proposition 1.2 to (1.11). Since d(t*) = 1, S(fx*) exists and
Note the following important special case of Theorem 1.5. THEOREM 1.6 Let X be a thin complete code over A, and let n be a positive Bernoulli distribution. Then X* has a density. Further S(X*) > 0, A(X) < CQ, and d(X*) = l/A(X).
Proof Since X is a thin and complete code, n(X) = 1. Next, since X is thin, > 1 by Proposition 1.1. Thus the derivative of f x , which is the series
px
also has a radius of convergence strictly greater than 1. Hence f;(l) is finite. Now fk(1) =
n21
nn(X n A") = A(X).
Thus A(X) < m and the hypotheses of Theorem 1.5 are satisfied. 0 2. PROBABILITIES OVER A MONOID
A detailed study of the density of a code, in relation to some of the fundamental parameters, will be presented in the next section. The aim of the present section is to prepare this investigation by the proof of some rather
2. PROBABILITIES OVER A MONOID
297
delicate results. We will show how certain monoids can be equipped with idempotent measures. This in turn allows us to determine the sets having a density, and to compute it. We need the following lemma which constitutes a generalization of Proposition 1.2.
LEMMA2.1 Let I be a set, and for each i E I , let
be formal power series satisfying
(i) 0 = C i o I C n r o d ' < (ii) 0 I fp) I 1 for all i E I , n 2 0, (iii) S(f(')) exists for all i E 1. Set di)=
Proof
c,"=,g!,"for i
E I.
Then
Xiel f(i)g(i)admits a density and
Set
Set also
We have, for n 2 1 and i E I,
Dividing by n and summing over I, one obtains
Set
d = C d(f(")di). iol
Since S(f(')) I 1 for all i E I,we have d ICiE,di)= 0.This shows that d < co.
VI. DENSITIES
298
Let now E > 0 be a real number. Then there exists a finite subset F of I such that a2
This is equivalent to
c a"' 2 a
- E.
isF
7 di'
i s -F
I E.
Observing that S(f(") I 1 for i E I and (l/n)
c;:
fli)
I1, we get that both
This leads to the evaluation
I
1n - 1
d--chk nk=o
I
Let us estimate the s,. Since f I i ) I1, we have
Now
Thus s,/n I ( ~ i , F ( l / n ) ~ ; = rli)) + E. Next for each i E I, we have liml+m rii) = 0, whence also limn+m(l/n)c;=, ri') = 0. Consequently, by choosing a sufficiently large integer n, we have for each index i in the finite set F the
299
2. PROBABILITIES OVER A MONOID
inequality 1
2 m. E
- rji) I nl=1
Thus for sufficiently large n, sJn I 2 ~ Substituting . in (2.2), we get for sufficiently large n,
Now, if n is large enough, we also have
and this holds for each index i in F. Thus
This proves that d = l i m n + m ( l / n hk. ) ~ ~I]~ ~ Lemma 2.1 leads to the following proposition which extends Proposition 1.3. PROPOSITION 2.2 Let I be a set and for each i E I , let Li and Mi be subsets of A*. Let IC be a Bernoulli distribution on A* and suppose that (i) C i e I n(Mi) < 00, (ii) The products LiMi are unambiguous and the sets LiMi are pairwise disjoint, (iii) Each Li has a density S(Li). Then
uielLiMi has a density, and
Proof
Set in Lemma 2.1,
c
flf)
gf) = n(Mi n A").
= n(Li n A"),
Then f(')= fL,, g(" = fMi;furthermore S(f'") = 6(Li),a'" = a(Mi), and in particular cr = n(Mi) < co. According to the lemma, we have
6
c f(i)g'i))
(is,
=
1S(Li)a(Mi).
is1
VI. DENSITIES
300
Since.condition (ii) of the statement implies that the proposition follows. 0 Let be a morphism from A* onto a monoid M, and let K be a positive Bernoulli distribution on A*. Provided M possesses certain properties which will be described below, each subset of A* of the form cp- ' ( P ) ,where P c M, has a density. The study of this phenomenon will lead us to give an explicit expression of the value of the densities of the sets cp-'(m) for m E M, as a function of parameters related to M. A monoid M is called well founded if it has a unique minimal ideal, if moreover this ideal is the union of the minimal left ideals of M, and also of the minimal right ideals, and if the intersection of a minimal right ideal and of a minimal left ideal is a finite group. Any unambiguous monoid of relations of finite minimal rank is well founded by Proposition IV.4.10 and Theorem IV.4.11. It appears that the development given now does not depend on the fact that the elements of the monoid under concern are relations; therefore we present it in the more abstract frame of well-founded monoids. Let cp: A* 3 M be a morphism onto an arbitrary monoid, and let m, n E M . We define C,,,,,= { w E A*Imcp(w) = n} = cp-'(m-'n). The set C,,,,,is a right-unitary submonoid of A*: for u, uv E C,,,, we have ncp(u) = n = ncp(uv) = ncp(u)cp(v)= ncp(u).Thus C,,nis free. Let X,, be its base; it is a prefix code. Let be the initial part of C,,,,,.Then Cm,n
= Zm,nX,*
and this product is unambiguous. Note also that C1.n = c~-'(n)
PROPOSITION 2.3 Let cp: A* -,M be a morphism onto a well-founded monoid M , and let R be a positive Bernoulli distribution on A*. Let K be the minimal ideal of M . 1. For all m,n E M , the set C,,,,,= cp-'(m-'n) has a density. 2. We have
a(Z,,,,)S(X:)
if n E K and m-'n # @, otherwise.
301
2. PROBABILITIES OVER A MONOID
3. For m, n E K such that nM = mM, n(Z,,,) = 1 and consequently d(Cm,n)= S(Cn,n) = a(X,*)*
Proof Let n E M, with n # K. Then m-’n n K = 0 . Indeed, assume that p E m- ‘n n K.Then mp = n and since K is an ideal, p E K implies n E K. Thus for an element n $ K, the set C,,” does not meet the ideal cp-’(K). Consequently C,,n is thin, and by Proposition 1.1, d(C,,,) = 0. Consider now the case where n E K. Let R = nM be the minimal right ideal containing n. Consider the deterministicautomaton over A, d = (R, n, n) with transition function defined by r a = rcp(a) for r E R, a E A. We have Id1 = X,*. Since R is a minimal right ideal, the automaton is complete and trim, and every state is recurrent. In particular, X, is a complete code (Proposition 11.3.9). Let us verify that the monoid cp,(A*) has finite minimal rank. For this, let u E A* be a word such that q(u) = n. Since d is deterministic, it suffices to compute rank,(u). Now rank(cp,(u)) = rank,(u) = Card(R.u) = Card(Rn) = Card(nMn). By assumption, nMn = nM n Mn is a finite group. Thus rank(q,(u)) is finite and the monoid qd(A*) has finite minimal rank. By Corollary IV.5.4,the code X,is complete and thin and according to Theorem 1.6, X,*has a positive density. Since Z,,,, is a prefix set, we have n(Zm,,) I1. In view of Proposition 1.3, the set C,,, has a density and
-
6(Cm,n)= 4zm,n)d(X,*)*
Clearly C,,,=@
o m-’n=$3
e
Z,,,=$3;
moreover, n being positive, A(Z,,,) > 0 iff Z,,, # $3; this shows that d(C,,,,) # 0 if m-’n # $3. This proves (2) and (1). 3. Let u E A* be a word such that ncp(u) = m and ncp(u’) # n for each proper nonempty left factor u’ of u. Then G n , n
c
Xn.
Thus Z,,” is formed of right factors of words in X,,and in particular Z,,, is thin. To show that Z,, is right complete, let w E A* and let n’ = rncp(w).Then n’ E R = nM,and since R is right-minimal, there exists n” E M such that n’n” = n. Let U E A* be such that cp(u) = n”. Then mcp(wu) = n, and consequently wu E C,,,”. This shows that Z,,, = {I} or Z,,,, is a thin right complete prefix code, thus a maximal code. Thus n(Z,,J = 1. Consequently d(Cm,n)= d(X,*). 0 2.4 Let cp: A* + M be a morphism onto a well-founded monoid, THJ~OREM and let II be a positive Bernoulli distribution on A*. Let K be the minimal ideal of M.
VI. DENSITIES
302
1. For all n E M , the set q - ' ( n ) has a density. 2. W e haoe S(q-'(n)) = 0
for all 9-equivalent m, n E K ,
sft"
n$K;
'
6 ( q - ' ( n ) ) = S ( q - '(m - n))S(
3. For all n E K ,
Note that according to Corollary 1.4, every one-sided ideal has a positive density; thus G ( q - ' ( n M ) ) and d ( q - ' ( M n ) ) exist. Proof 1. Since q - ' ( n ) = Cl,nthe claim results from Proposition 2.3. 2. By Proposition 2.3, 6(Cln)# 0 iff n E K, since CI," is never empty. Let us prove the second formula. For this, let Y = cp-'(K) - cp-'(K)A+ be the initial part of the ideal q - ' ( K ) . We have q - ' ( K ) = YA*. Since the set A* - q - ' ( K ) is thin, we have 6 ( q - ' ( K ) ) = 1. Consequently n ( Y ) = 1. For each W-class R of K , consider YR= Y n q-'(R). Then q- '(R) = YRA*, and hence, 6 ( q - ' ( R ) ) = n(YR). Let now R = n M . Then (2.3) CP- ' ( n ) = (YR n 9- '(r))Cr,n*
u
IPR
Indeed, each word w E qO-'(n) factorizes uniquely into w = UD, where u is the shortest left factor of w such that q(u) E R. Then u E YR n q- '(r) for some r E R, and u E Cr,n.The converse inclusion is clear. The union in (2.3)is disjoint, and the products are unambiguous because the sets YR n q - I ( r ) are prefix. Each Cr,nhas a density, and moreover
C n(YRn q - I ( r ) ) = n(YJ I; I.
rsR
Thus we can apply Proposition 2.2, which yields d(q-'(n)) =
C n(YR n q-'(r))NCrJ
roR
According to Proposition 2.3, all the 6(C,,J for r E R are equal. Thus, for any
m E R,
6((P- '(n)) = 6(Cn1,n)n(~R) = 6(q-l(m= 6(q-'(m-'n))S(q-'(R)).
'n))X(yR)
1. PROBABILITIES
303
OVER A MONOID
3. Set R = nM, L = Mn, and H = R n L. Then we claim that L=
u m-'nnK
meH
and furthermore that the union is disjoint. First consider an element k E m-'n n K for some m E H. Then mk = n. Thus n E Mk, and since n is in the minimal ideal, M n = Mk. Therefore, k E Mn = L. This proves the first inclusion. For the converse, let k E L = Mn.The right multiplication by k,
mwmk is a bijection which exchanges the 2'-classes in K and preserves W-classes (Proposition 0.5.2). In particular, this function maps the 9-class L onto Lk = L,thus onto itself. It follows that there exists m E L such that mk = n. This element is W-equivalent with n. Consequently m E H and k E m-'n for some rn E H. Since the function m w m k is a bijection, the sets m-'n are pairwise disjoint. This proves the formula. For all m, n E K ,
d(rp-'(m-'n
n K)) = d((p-'(m-'n))
since the set cp-'(m- 'n n ( M - K)) is thin and therefore has density 0. The set H being finite, we have Using the expression for d((p-'(n)) proved above, we obtain
This proves the last claim of the theorem. 0 Let E be a denumerable set. A probability measure over E is a function P: W E ) + LO, 11
that satisfies the following two conditions: for any subset F of E,
and p ( E ) = 1.
The support of a probability measure p is the set S = {eE Elp(e) # O}.
304
VI. DENSITIES
A probability measure over E is completely determined by a function p: E+[O,l]
such that
Indeed, in this case, any family ( p ( e ) ) e e F , where F is a subset of E, is summable, and p is extended to '$I by @ (2.4). ) Note once more that the density 6 defined by a Benoulli distribution on A* is not a probability measure over A* since it does not satisfy (2.4). We consider, for probability measures over E, the following notion of convergence: if (P,,),,~~and p are probability measures over E, then we set p = lirn pn n-r m
iff for all elements e E E, we have p(e) = lim p,,(e). n-rm
The following proposition is useful.
PROPOSITION 2.5. Let
(pn)n20and p be probability measures over E, such
that p = limn+ p,,. Then for all subsets F of E,
p ( F ) = lirn p,,(F). n-r m
Proof
The conclusion clearly holds when F is finite. In the general case, set 6
= lirn inf p,,(F),
z = lirn sup pn(F),
and let F = E - F. Of course, o I z and 1 - z = lim inf pn(F).
Let F' be a finite subset of F. Then pn(F')I; p,,(F) for all n, and taking the inferior limit, p(F') I 6.It follows that p ( F ) = sup p(F') I F'cF F'finite
6.
Similarly, p ( F ) I 1 - z. Since p ( F ) + p ( F ) = p ( E ) = 1, we obtain 1 I 6 + (1 - z), whence 6 2 z. Thus 6 = z. Since p ( F ) I 6 and p ( F ) I; 1 - 6,one has both p ( F ) I; CT and p ( F ) 2 6,showing that p ( F ) = 6. 0 THEOREM2.6 Let cp: A* + M be a morphism onto a well-founded monoid, and let II be a positive Bernoulli distribution on A*. For any subset F of M , the set
305
2. PROBABILITIES OVER A MONOID
cp-'(F) c A* has a density and S(cp -
'(m= c S(cp - '(N. meF
Proof It is convenient to introduce the shorthand v(F) = d(cp-'(F)).Let K be the minimal ideal of M, let r be the set of its W-classes and A the set of its 2'-classes. We have verified, during the proof of Theorem 2.4., that V(K)= ?C( Y) = 1,
V(R)= K( YR) where Y (resp., YR)is the initial part of cp - ' ( K ) ,(resp., of cp- '(R),with R E r). Since K is the disjoint union of its @-classes,we have
hence v ( K )=
c
1
v(R) = v ( L )= 1, Rer LEA where the intermediate assertion follows by symmetry. Now consider a fixed 9-class R E r. Then by Theorem 2.4,
= v(R)
c v(L)
LEA
= v(R)
and also
Since v(n) = 0 for n $ K , it follows that
c d((p-'(n))
neM
= 1.
Consider the family 9of subsets of A*, The preceding formula gives
c S(T)
T€.%
= 1.
This shows that there exists a probability measure 8 over 9, defined for a subset X c 9by B(X)=
c S(T).
TeX
VI. DENSITIES
For any positive integer n, define pn:9 + [0,1] by 1
p,,( T ) = - n( T n A'"'),
n
where A(")= { w E A* I Iw) < n}. The morphism cp being surjective, the sets T E 9forma partition of A*. Thus
showing that p,, (n 2 1) is a probability measure over 9.Moreover, we have &TI =
TI = lim ~ J T ) n-tm
by the definition of a density. Thus s^ = pn. Proposition 2.5 shows that for any subset X of 9, &x)= lim p , , ( . ~ ) . n- m
Now X = {cp-'(m) 1 m E P} for some subset P of M . Thus but also
Consequently p , ( X ) = (l/n)n(cp-' ( P ) n A(")),and thus
The left side of the formula is d(cp-'(P)) and the right side is $(A!). Thus 6(cp-'(P)) exists and has the announced value. 0
PROPOSITION 2.7 Let cp: A* -+ M be a morphism onto a well-founded monoid and let 7c be a positive Bernoulli distribution on A*. The function v: M + [0,1] defined by v(m) = 6(cp-'(m)) is a probability measure ouer M. Let K be the minimal ideal of M. Then the following formulas hold: v(m) # 0
iff
v(m) = v(n-'m)v(mM)
if m,n E K and n g m ,
v(mM)v(Mm) v(m) = Card(mM n M m )
if
v(M') = v(M' n K )
mEK,
mEK,
for M' c M .
(2.6)
30.7
2. PROBABILITIES OVER A MONOID
For each &'-class H c K, and h E H,
v(h) = v ( H ) Card(H)' Proof All these formulas with the exception of (2.7), are immediate consequences of the relations given in Theorem 2.4. For (2.7)observe that the value of v is the same for all h E H by formula (2.6). Next v ( H ) = z h s H v ( h ) . This proves (2.7). 0
EXAMPLE 2.1 Let cp: A* + G be a morphism onto a finite group. Let R be a positive Bernoulli distribution. For g E G,
in view of formula (2.7) and observing that H = K = G. This gives another method for computing the density in Example 1.1. To that example corresponds a morphism cp: A* + Z/2Z onto the additive group 2/22 with q(a) = 1 for any letter a in A. EXAMPLE 2.2 Let cp: A* --t M be the morphism from A* onto the unambiguous monoid of relations M over Q = {1,2,3} defined by a = cp(a), /? = cp(b),with
This monoid has already been considered in Example IV.5.3.Its minimal ideal J is composed of elements of rank 1 and is represented in Fig. 6.1. Let x be the Bernoulli distribution defined by x(a) = p, n(b)= q, with p + q = 1, p , q > 0. Let us compute the probability measure v = Jcp-' over M. With the notations 00 1
110
L1
L2
Fig.6.1 The minimal ideal of the monoid M.
308
VI. DENSITIES
of Fig. 6.1, we have the equalities L,u = L,,
L,/3 = L,,
L,p= L , . Set X l = cp-'(L,), X , = cp-'(L,). By (2.9), L,a = L,,
X,a-' n cp-'(J) = 0,
X,b-' n cp-'(J) = X , ,
X,a-'
X,b-' n cp-'(J) = X , .
r\
cp-'(J) = X , u X,,
(2.10)
Indeed consider, for instance, the last equation: if w E X , , then cp(w) E L,, hence cp(wb)E L2 by the fact that Ll/3 = L , . Thus w b E X , , and w E X,b-' n cp-'(J). Conversely, let w E X,b-' n cp-'(J). Since w E c p - l ( J ) , w E X l u X,. But if w E X,, then cp(wb)E ' L ~showing , that wb E X , , whence w # X,b-'. Thus w E X I . In view of (2.10), X,a-' = T l ,
Xlb-' = X2 u T i ,
X ~ U - '= XI u X, u T,,
X,b-' = Xl u T i ,
where T,, T;T,, T ; are disjoint from cp- ' ( J ) .Multiplication by a and b on the right gives Xi = Xzb u (Ti0 u T i b ) ,
X, = X i a u X ~ u U X i b u (TZUu Tib). Since T, is thin, d(T,a) = G(Tl)n(a)= 0, and similarly for the other T's. Thus
W,)= Wl)+ V , ) P
W , )= W , ) 4 , which together with 6(X,)
gives V l ) = -3
+ ax,) = 1
4
1 S(X2) = 1+q'
Thus 4 v(L,) = 1+q'
1
v(L,) = 1+q'
An analogous computation yields
P v(R,) = 1+p'
1
V(R2)= 1+p'
3.
309
CONTEXTS
In particular, since R2 n L, = {pa}, we obtain 1 v(pa) = (1 + p)(l + 4)3. CONTEXTS
Let X c A + be a thin completecode. We have seen that the degree d ( X )of X is the integer which is the minimal rank of the monoid of relations associated with any unambiguous trim automaton recognizing X*. It is also the degree of the permutation group G(X), and it is also the minimum of the number of disjoint interpretations in X (see Section IV.6). In this section,we shall see that d ( X )is related in a quite remarkable manner to the density 6(X*).The set Gx of left completable words is G, = { w E A* IA*w n X* # @} = (A*)-'X* = A - X * ; symmetrically, the set D, of right-completable words is D, = ( w ~ A * l w A *n X * # @} = X*(A*)-' = X * A - . THEOREM 3.1 Let X t A* be a thin complete code, and let x be a positioe Bernoulli distribution on A*. Then 1 J(X*)= d(X)W,)6(D,). (3.1) Proof Let d = (Q,1, I) be an unambiguous trim automaton recognizing X*, let cp be the associated morphism and M = cp(A*). In view of Corollary IV.5.4., the monoid M is well founded. Set v = 6cp-'. By Proposition 2.7, v is a probability measure over M, and the values of v may be computed by the formulas of this proposition. Let K be the minimal ideal of M. Since v vanishes outside of K, we have 6(X*) = v(cp(X*)n K ) . Let R be the union of the W-classes in K meeting cp(X*), and similarly let 2 be the union of those 9'-classes in K that meet cp(X*).Then v(cp(x*) n K ) = v ( c p ( ~ *n) R n 2 ) = CV(cp(X*)n H), H
where the sum is over all &'-classes H contained in k? n 2. For such an &'class H,we have
3'0
Vl. DENSITIES
where R and L are the 9-class and 9-class containing H. Thus v ( q ( X * )n H) =
Now observe that for any &-c'lass
Card(q(X*) n H) v(R)v(L)* Card(H)
H
c I? n
i,
Card(q(X*) n H) =- 1 Card(H) d ( X )* Thus the formula becomes
1 1 6(X*) = C -v(R)v(L)= -v(R)v(L). H d(X) d(X) A
-
Now q - @ ) = D, n ~ - I ( K ) .
(3.2) Indeed, let w E Dx n q - ' ( K ) . Then wu E X* for some word u. Consequently, ~ ( w u= ) q(w)cp(u)E q(X*) n K,showing that the W-class of p(w), which is the same as the 9-class of ~ ( w u )meets , cp(X*). Thus q ( w ) E fi. Conversely, let w E q - l ( I ? ) . Then q ( w ) E fi and there is some rn E M such that q(w)m E q(X*) n K.Thus wqO-'(m) n X* # 521 and we derive that w E D,. It follows from (3.2) that
v(B) = 6(q-I(I?))= 6 ( ~n, c p - ' ( ~ ) ) . Since A* - q - ' ( K ) is thin, we have 6(Dx)= 6(D, n q - ' ( K ) ) . Thus a(&) = v(I?)and similarly v ( z ) = 6(G,). This concludes the proof. 0 The following corollary is a consequence of Theorem 1.6. COROLLARY 3.2 Let X c A + be a thin complete code, and let A be a positive Bernoulli distribution on A*. Then
We observe that for a thin maximal biprefix code X c A', we have G, = D, = A*. Thus in this case, (3.3) becomes A(X) = d(X). This gives another proof of Corollary 111.3.9.
EXAMPLE3.1 Let A = {a,b ) and consider our old friend X = {aa, ba, baa, bb, bba} which is a finite complete code. In Fig. 6.2 an automaton d = (Q,1,l) recognizing X* is represented.
3.
CONTEXTS
a. b
Fig. 6.2 An unambiguous trim automaton recognizing X*.
We have qJa) = a, 4 4 6 ) = p, where a, p are the relations considered in Example 2.2. To obtain the set D,, we can compute the deterministic automaton having as a set of states, the set of row vectors {O,l}Q. The transition function is determined by multiplication on the right, with the elements in cpJA*) considered as matrices. The part of this automaton that is accessiblefrom the vector 100 is drawn in Fig. 6.3. (Note that this automaton can also be considered as the deterministic automaton associated with d by the subset construction, as given in Proposition 0.4.1 with sets of states represented by vectors.) The elements of D, are the words which are labels of paths from 100 to a state distinct from 000. We obtain Dx = a* u (a2)*6A*.
A similar computation on column vectors gives Gx = 6* u A*4bz)*.
Let K be the positive Bernoulli distribution given by .(a) = P,
4 6 ) = 4,
Fig. 6.3 A deterministic automaton for D,.
3'2
VI. DENSITIES
with p, 4 > 0, p + q = 1. Then
6(Dx)= 6(a*) + d((a2)*bA*)= d((a2)*bA*)
since d(a*) = 0. Since (a2)*bis a prefix code, we have
d(D,)
= n((a2)*b).
Thus 4 1 6(D,) = 7 =l-p l+p'
In a similar fashion, we obtain 1
6(Gx) = 1 +q'
On the other hand, d ( X ) = 1 since the monoid pd(A*) has minimal rank 1. Thus by formula (3.3),
4x1 = (1 + P)(l + 4). This can also be verified by a direct computation of the average length of X. The computations made in this example are of course similar to those of Example 2.2. Let X c A be a code. A context of w E A* is a pair (u,u) of words such that the following two conditions hold +
uwu = X 1 X 2 " ' X n
(Xi E
X)
and IuI < 1x1I,
101
< IxnI*
The set of contexts of a word w E A* (with respect to X)is denoted by C(w). The set C(l) is C(1) = {(u,u) E A + x A + I uu E X} u ((1, l)}. The context of a word can be interpreted in terms of paths in the flower automaton d t ( X ) = (P, (1, l), (1,l)):there exists a bijection between the set C(w)and the set P(w) of paths labeled w in the flower automaton. Indeed let c:
(u, u') A (u', u)
be a path labeled w in "QIg(X).Then uwu E X*.Thus either uwu = 1, or uwu = X l X 2 * * * x,
with x i E X and n > 0. In that case, IuI < lxll and 101 < Ix,I. Thus in both cases, (u,u) is a context. Consider another path c: (u, ii') (3, u).
3.
3’3
CONTEXTS
Then both paths
(1,l) A (u,u’) JL (u’,u) L(1, l), (1,l)
u-. (u,E’) -%
(F’,u) L(1,l)
are labeled uwu. By unambiguity, c = C: Conversely, if (u, u) is a context of w and uwu = x1***x,,,define two words u’, u’ by u’ = u-1 x1
if u # 1, u ’ = 1 otherwise,
u’ = x,,u-l
if
u#
1, v’
=
1 otherwise.
Then (u, u’) and (u’, u) are states in d g ( X ) ,and there is a path (u, u’) A (u’, u). The following result shows a strong relationship between all sets of contexts. THEOREM 3.3 Let X c A + be a thin complete code, and let a be a positiue Bernoulli distribution on A*. For all w E A*, A(X) =
1
(u.0) E C ( W )
n(uu).
Proof Let d g ( X ) = (P, (1, l), (l., 1)) be the flower automaton of X, let M = cpD(A*) and set v = 6cp.’; Let w E A*, m = cpD(w),and let
T(m)= {(r,I ) E M x M I rml E cp,(X*)}
We compute t(m) in two ways. First define, for each state p
E P,
L , = ( 1 E it411p,l = l}. R, = { r E M Irl,, = l}, Then rml E cpD(X*)iff there exist p, 4 E P such that rl,, = 1,mp,q= 1, lq,l = 1. Consequently,
Thus
Set p = (u, u’), q = (u’, u). Then mpq = 1 iff there is a path c: p + q labeled w. According to the bijection defined above, this hold iff (u, u) E C(w).Next, vD’(R,) = X*U,
cp,’(L,) = uX*,
hence v(R,) = d(X*u) = B(X*)a(u),
v(L,) = S(uA*) = n(u)G(X*).
VI. DENSITIES
3'4
Consequently t(m) =
c
S(X*)n(u)n(u)S(~*) =[S(X*)]~
(U,U)EC(W)
c
(U,U)E C(w)
~(uu).
This is the first expression for t(m). Now we compute t(m) in the monoid M. Let K be the minimal ideal of M. Since v vanishes for elements not in K, we have t(m) =
C
(r.1) E K x K rmlecpD,(X*)
v(r)v(l).
Let N = qD(X*)n K. Then
Let r E K . Since (rm)- 'n # 0 iff M n , and since rgrrn, we have (rm)- 'n # fZI iff r 9 n and
by Proposition 2.6. Further
Comparing both expressions for t(m), we get
There is an interesting interpretation of the preceding result. With the notations of the theorem, set for any word w E A*,
Call y(w) the contextual probability of w. Then the theorem claims that if n is a Bernoulli distribution we have identically Y(W)
= n(w).
The fact that y and II coincide is particular to Bernoulli distributions (see Exercise 3.3.) We now study one-sided contexts. Let X c A + be a code, and let w E A*. The set of right contexts of w is C,(W) = { u E A*l(l, 0) E C(w)}.
Thus u E Cr(w)iff wu = x 1 x 2 ~ ~(x,~ Ex X) n with IuI < 1x.l.
3.
315
CONTEXTS
Symmetrically,the set of lejit contexts of w is
I
C,(W) = {u E A* (u, 1) E C(w)}. We observe that
C,(W)X* = w-'x*.
(3.4)
The product C,(w)X* is unambiguous, because X is a code.
PROPOSITION 3.4 Let X c A' be a thin complete code and let d = (Q, 1,l) be an unambiguous trim automaton recognizing X*. Let K be the minimal ideal of the monoid M = qd(A*). Let n be a positive Bernoulli distribution. For all w E cp,'(K) n D,, we have
(3.6) n(Cdw))G(G,) = 1. Proof Set cp = cpd, v = &-',and let k (resp., 2) be the union of the Yeclasses (resp., 9-classes) in K that meet cp(X*). We have seen, in the proof of Theorem 3.1, that G(D,) = v(R) and 6(Gx)= ~ ( 2 According ). to formula (3.4),
6(w-'x*) = n(C,(w)>d(X*).
Set n = ~ ( w and ) T = { k E K 1 nk E cp(X*)}. Then T c 2since for k E T, we have nk E Mk n cp(X*),showing that the left ideal Mk meets cp(X*). Let H be an &-'class contained in 2. The function h I+ nh is a bijection from H onto the X-class nH. Since n E R, we have nH c R; since H E 2 we have nH c Thus nH t k n 2. This implies that nH n cp(X*) # 0.Indeed let R and L denote the W-class and 9'-class containing nH, and take rn E R n cp(X*), m' E L n cp(X*). Then mm' E R n L n cp(X*) = nH n cp(X*). Setting d = d(X), it follows that
z.
Card(nH n cp(X*)) = -1 Card(nH) d'
Since H n T = {k E Hlnk E cp(X*)} is in bijection with nH n cp(X*), we have 1
Card(H n T) = Card(nH n cp(X*)) = - Card(H). d Therefore
316
VI. DENSITIES
We observe that cp-'(T) = w-'X* n q - ' ( K ) . Consequently 6 ( w - ' X * ) = v(T) = (l/d)v(i). Since also v ( z ) = S(G,), we obtain (3.7)
By Theorem 3.1, the last expression is equal to 1. 0 PROPOSITION 3.5 Let X c A + be a thin complete code. Let A be a positive Bernoulli distribution on A*. For all w E A*. the following conditions are equivalent. (i) The set C,(w) is maximal among the sets C,(u),for u E A*. (ii) n(C,(w))G(D,)= 1. Proof With the notations of Proposition 3.4, consider a word X E cp-'(K) n X*. Then C,(w) c C,(xw), hence also n(C,(w))I n(C,(xw)). On the other hand xw E rp-'(K) n D,. Indeed the right ideal generated by x is minimal, and therefore there exists u E A* such that ~ ( X W U= ) cp(x). Thus xwu E X*. By Proposition 3.4, we have n(C,(xw))G(D,)= 1 showing that .(CAW)) I 1/&Dd
(3.8)
Now assume C,(w)maximal. Then C,(w) = C,(xw),implying the equality sign in the formula. This proves (i) (ii). Conversely formula (3.8) shows the implication (ii) => (i). [I In fact, the set of words w E A* such that the set of right contexts is maximal is an old friend; in Chapter 11, Section 8, we defined the sets of extendable and simplifying words by E = {U E A* IVV E A*, 3~
E
A*:UUW EX*},
S = {U E A* IVx E X * , VV E A*:xuu E X * i.Jv E X * } . We have seen that these sets are equal provided they are both nonempty (Proposition 11.8.5). It can be shown (Exercise 3.1) that, for a thin complete code X,the following three conditions are equivalent for all words w E A*: (i) w E E, (ii) w E S, (iii) C,(w) is maximal. This leads to a natural interpretation of formula (3.5) (see Exercise 3.1). We now establish, as a corollary of formula (3.5) a property of finite maximal codes which generalizes the property for prefix codes shown in Chapter I1 (Theorem 11.6.4).
3.
3'7
CONTEXTS
THEOREM 3.6 Let X c A + be a $finite maximal code. For any letter a E A, the order of a is a multiple of d(X). Recall that the order of a is the integer n such that a" E X . Proof Let II be a positive Bernoulli distribution on A*. Let d = (Q, 1 , l ) be a trim unambiguous automaton recognizing X*. Let K be the minimal ideal of the monoid M = qd(A). Let x E X * n q,'(K). According to Proposition 3.4, n(Cdx))d(GX)= 1.
n(Cr(X))d(Dx)= 1,
By formula (3.3) the average length of X is
Thus
4x1 = d(X)~(Cr(X))~(C,(X)). Let a be a fixed letter and let n be its order. Consider a sequence ( n k ) k > O of positive Bernoulli distributions such that lim nk(a) = 1
k+ m
and for any b E A - a, Iimk+mnk(b)= 0. For any word w E A* we have limk+mn k ( W ) = 1 iff w E a*, and = 0 otherwise. For any k 2 0,denote by &(x) the average length of X with respect to r k a Then = d(X)nk(C,(x))IIk(C~(X)).
But also, by definition
Since X is finite, this sum is over a finite number of terms, and going to the limit, = C 1x1 Iim R k ( X ) . lim &(x) xeX
k+m
Since limk+mnk(x)= 0 unless x
E a*, we
k+m
have
lim nk(x)= n,
k+m
where n is the order of a. O n the other hand,
318
VI. DENSITIES
The words in C,(x) are right factors of words in X.Since X is finite, C,(x) is finite. Thus, going to the limit, we have lim nk(C,(x)) =
k-m
C
V E C,(x)
lim
k-. m
Similarly Iim nk(C,(x)) =
k-m
Thus
n
= d(X)
9
?rk(u)
C
= Card(C,(x) n a*).
lirn Kk(V) = Card(C,(x) n a*).
vsC,(x)k+m
Card(C,(x) n a*)Card(C,(x)n a*)
This proves that d ( X )divides n. 0
EXERCISES SECTION 1 1.1. Let X be a recognizable subset of A* and let K be a positive Bernoulli
distribution. (a) Show that px = co iff X is finite. (b) Show that, if X is infinite, then p x is a pole of the rational function fx.
SECTION 2 2.1. 2.2.
Show that any recognizable subset of A* has a density. Let M be a monoid, and let p, v be two probability measures over M. The convolution of p and v is defined as the probability measure given by p * v(m) =
(a) Show that
c
uv=m
)*
lim p n
(n-m
v = lim ( p n* v). a-m
(b) Let K be a positive Bernoulli distribution on A*. For n 2 0, let d")the probability measure defined by a(")(L) = n ( t n A")
for L
t
A*. Show that #+1)
= x(4*,+1).
(c) Let p:A* + M be a morphism onto a well-founded monoid. Let x be as above and let v = 6q-l be the probability measure over
3’9
EXERCISES
M defined in Proposition 2.7. Show that v is idempotent, i.e, v * v = v. 2.3. Let d = (Q, i, T) be a deterministic automaton and let n be a Bernoulli distribution on A*. A function Y: Q+Co,11 is a stationarv distribution if
Y ( A * ) . Show that if d has finite minimal rank, and if all states of ,d are recurrent, it admits a unique stationary distribution given for p E Q by
Y(P) = l/W.D),
where Xp is the prefix code such that X; = Stab(p). (Hint:show that for a stationary distribution y
where L, is the set recognized by the automaton (Q,p, q). (b) Let d be the minimal rank of M, with the hypotheses of (a). Let I be the set of minimal images of d.Let 9 be the deterministic automaton with states 8 and with the action induced by d .Let y be the stationary distribution of d,and (T that of 9.Show that
Where the sum runs over the minimal images I E I such that q
E 1.
SECTION 3 3.1. Let X c A + be a thin complete code. Let S ( X ) and E(X) be the sets of simplifying and extendable words defined in 11.8. Show that for w E A*, the following conditions are equivalent: (i) w E S ( X ) ; (ii) w E E(X); (iii) C,(w)is maximal among all C,(u),u E A*. 3.2. Use Proposition 11.8.6 to give another proof of formula (3.7.). 3.3. Let n:A* + [0,1] be a function. Then K is called an invariant distribution if (i) n(1) = 1; (ii) CaEA ~ ( w a=) I , A n(aw) ,= n(w) for w E A*.
VI. DENSITIES
320
Let X c A + be a code and a: B* 4A* a coding morphism for X, i.e., a(B) = X.Let K be an invariant distribution on B*. Show that the function K'" from A* into [0,1] defined by
with I(a)= CXEX Ixln(a- '(x)) is an invariant distribution on A*. Compare with the definition of the contextual probability.
NOTES
Theorem 1.5 is a special case of a theorem of Feller which can be formulated as follows in our context: "Let X c A + be a code, and let K be a positive Bernoulli distribution such that n ( X ) = 1. Let p be the g.c.d. of the lengths of the words in X.Then the sequence
n(X* n AnP)
(n 2 0) has a limit, which is 0 or p / I ( X ) ,according to A(X) = co or not" (see Feller, 1957).Theorem 1.5 is less precise on two points: (i) we only consider the case where A(X) < co and (ii) we only consider the limit in mean of the sequence n(X* n A"). The results of Section 2 and related results, can be found in Greenander (1963)and Martin-Lof, (1965).Theorem 3.1 is due to Schutzenberger(1965b). Theorem 3.3 is from Hansel and Perrin (1983). Exercise 2.1 belongs to the folklore of automata theory. The notion of stationary distribution introduced in Exercise 2.3 is a classical one in the theory of Markov processes. Further developments of the results presented in this chapter may be found in Blanchard and Perrin (1980)and Hansel and Perrin (1983).In particular these papers discuss the relationship of the concepts developped in this chapter with ergodic theory.