String Ipattems of Leading S. Whamma Iyengar
Depamt of cbmputer ScirnG Ltn4isiuna State Unitxmi~ Baton Rouge, lhtina
70803
A. K. Rajagopal oj~Ph@cs and Astronomy Lmiaianu State Unitxmity Baton Rouge, Louisiana 70803
IIqmtmmt
and
T,-ansmitted hy Robert KaIaba
ABSTFMCT This paper reviews numerous theoretical resdts fop1proper&s of string sequences gwxated by leading digits of 2” and explores their practical implications. l’%e result of Rajagopal et al. on statistical prqertk 3f string sequences is discuss& Several rrxent ideas on the graph-theoretic complexity of string sequences of leading digits are t!len ex&8d. The application of MctCaWs complexity measure to the directed graphs d skngs of leading digits is given. In conclusion, uses of these string sequences irs different areas c;Zcomputer science are discussed.
0.
r the
years,
a number
propc~ \es of the distri\)ution 1’ :
of papers have been pubikhed on wuious of Gst significstnt digits of powers of two.
'PL’ z 3 MATffE.&?ATf%SAND COiWWTATIOiV12:321-337 (1983)
329
:: &:ree*, _1 Science Publishing Co., Inc., 1983 5.2Vsxl
-2dltAve., New York, NY 10017
ou!xao3/&3/rbo3.~
32!2
S. SITHAI-UMA IYENGAR ET AL.
Notable a= the works of Benford [ 11, Pinkham [2], Hamming [3], Knuth b&j, Tsao [S], Raimi [6,7], Diaconis [8], Macon and Moser [9]-, and Kuiper~ and Niderreiter [lo]. The results from these papers have placd the pecdhr behavior of the distribution of first significant digits of powers of two on a firm fcxmdation. Rajagopal et al. [11] first observed that the powers of two in b*ase IO generate sequences of 5 different strings and found that the process ofi generating strings follows a non-Markovian but ergodic process. Recently Kak [12] has examined in some detail the leading digits’ and strings in powers of 3,5,7, etc. and has found interesting string structure and the corresponding stat~transi%n graphs of the leading digits. In this paper we present a review of new results on the behavior of string patterns generated by the leading digits of powers of two and discuss their practical implications. The remainder of this paper is organized as follows: a. description of the statistical properties of strings, higherarder pattern oii strings within strings, a measure of complexity of string patterns, and con&sions.
2.
STATISTICAL PROPERTIES OF STRINGS
This fRction presents vario~ statistical properties of string sequences of first di,gits of powers of two. The structural sequence of first digits of powers of 2 in base 10 generates a pattern of 5 different unique strings: (124&}., (E&49}, { 1251, { 1361, md {137}, and these strings appear in a randcm pattern. The state transition graph for these strings is cas shown in Figure 1 I %mxal interesting observations of the state transition of graphs of strings are in order at this stage, which though evident, do not seem to have been noted before in the vast literature on the subject. Fomution of String Patterns The strings {1248}, {K&49}, (125}, {136}, and (137) appear in a random fashion. Let us represent these strings as a, b, c, d, and e. These strings have the hollowing unique markers: string a is associated w’th 8 (last dig!t of the string), string b with 9, string c with 5, string d with C;;,and! string e bvith 7. These strings, as stated above, are marked by 5,6,7,8,9, aI.l of which go back to 1 upon multiplication by 2. 0ne cculd therefore argue that in the limit of Iarge values of n the strings will also occur with asympto5c probabilities T~(a), with a standing for one of the strings a, b, c, d, and e Thus there is ia connection between the last digit of the string (e:.g., 8 in the qzase of the string 1248) and the stationary probability of occurrence of 8. We are nr)w ready to state a theorem on the above obsemation of the probability of occurrence of strings.
2.2.
FIG. 1.
RESULT
1.
State-transition graph of strings
The asymptotic probabiiities
of occumnce
of the strings are
given by
++logz
(l+$),
?r(b)=log,(l+i), Tr(c)=bg,(1+~),
n(d)=log&+3), Ir(e)=log&+$).
First, let us recall that the asymptotic probabihi-yof occurrence 9. As noted earlier, the strings are og&l+l/p), p=l,..., a-~i,quely determined by the last digit in each of them, and by the above result w Oreprobability of occurrence of the digit p, the asymptotic probabilities of RF bl.mKS.
S. SITHARAMA IYENGA:R FT AL,.
324
strings are proportional
to the logarithm to base 10 of these digits, i.e.
Tr(u) = K
log,,(l+i),
m(b)= K log,&+ i), 71(c)= K
log,,(l+S)9
n(d) = K log,,(l+i),
The constant of proportionality
i
7+x)=1
K is determined
andso
by the condition
X=(logr02)-‘,
a=0 at-d hence the proof. Several researchers who discuss leadingdigit behavior have often used the tools of dorm distribution mod 1 [9], and Raimi [6] states that the strong; Benford sequences were first defined by Cigler [lf] using the concept oi’ uniform distribution. It is interesting to see the behavior of string plttems of Peading digits using the language of uniform distribution. Assume n is sufficiently large so that 2” is at least a fourdigi t m.mdxx starting with a 1. Then the string of leading-digits that occur depends directl~~ on the 3 digits following the 1 in the number 2”. If these digits are from 600-124, the string is 1248; if from 125-249, the string is 125; ;f from m-749, the string is 136; if from 750-999, the string is 137. This behavior of strings suggests that if the three dig& following the 1 were uniforml:~ distributed, then n( Q) = 0.125, TT(b) = 3.125, n(c) = 0.25, ~td ) = 0.25, and YT( e) = 0.25. The fact that Result 1 does not give these probabilities is interesting and suggests that these other digits are nclt qu.itE;un!form. ‘it does, hlBwever, explain why we might expect n(a) and n(b) to be smallzr thaul a( c ), a( d ), and T( e ). Furthermore, Hamming [3] finds that the leading digit G of numbers used in digital computers (and nu,nbers found in tabks) tend overall not to the uniform distributions as cne might expect; but toward one in which the smallest digits are sign&n-.-.ntly overrcpresented [ 14,151. In the next section of our paper we show that the filvs strings generrited b, 2” are an emnple of a non-Marko~.ian process *;,ith L stationa.rq linr\\. distribution.
stYi?lg Mtems bS;ULT
2.
ofxeading ii’h? jh.w
Digits strings
325
cm be considered as states of a chain.
Tht: chain i!i of interest as an example of a nun-Markovian process with a stationary limit distribution. The reason it is non-Markovian is that the transition probability from one state to another is not independent of the previous states visited. The non-Markovian nature is apparent only when looking at the twostep transition probability matrix, which is not the square of the onwtep transition matrix.
3,
MIGIIER-ORIXR
PATTERNS
OF STRINGS
0ur notion of a pattern is a simple one: a pattern is a LIOMUU finite string of J&its. The path of a pattern is all strings obtainable starting from a given string back to itself. In graph theory this path is a circuit. me two terminal vertices of a path coincide, and the remaining vertices are distinct.) For example, the path @d&z) describes a circuit, because: the two terminal vertices of this path coincide at the string a. In the case of elementary strings, for example 125, the digit 1 returns after 5. Furthermore, a ~&WI-I generates an associated set of strings describing a umque path sequence. The patterns of “strings within strings” are aIso related tco induced transformation of the interval [l, 2), which when partitioned into sets {[l, 1.125),[1.125,1.25),[1.25,1.50),[1.50,1.75),[1.75,2.0)} geuerates 5 different strings (u}, {b}, (c}, {d), and (e). [Sez Figure z(a))..] Again by applying the induced transformation on the subinterval of the previous set [ 1,1X5), we can partition it into sub-&subsets of the following form &y u&y tower trausformation theorem of Robertson et al. [17]: ([LO, 1.693?5), [1.09375,1.0%6328125), [ lB986328125,l. 1102230%5), [1.110223025,1.11758XW5), [1.117W70895,1.125)). “ihrls ;I point in the interval. [LO, 1.09375) generates the pattern Q&U, a point in the interval [1.109375,1.0984’?28125) generates the patterr- aecu, a point in the i!lt~~al [ 1.~86328125,1.1102:23025) gener&tts the lengthy pattern ~~c&cb&e&edbec!c~, and so on. Figure 2 &ows a pa&tern chart for strings within strings. An algorithm to compute the threshold values for higherorder Ipatterns of strings within strings has been developed [16], and readers lint~:rested in this algorithm are requzst,d( to write to the authors of [IS]. ‘Based on the above observation, we shaU now state d theorem.
S. SITI-IARAMA IKENGA
326
I .ooo
I.825 1.250 I.500 1.7 2.000
FIG. 2.
[a) Formation of elementary strings. (b) Higherarder formation of strings within
strings.
THEOREM (Transition mapping of strings). Let A be any subset of X of positive jk meusure, and n-l
A, = in
n Tk(~I) W(A) k=l
be the set of points which return to A for the f&St time in n steps. I’ rA of string a onto itself also define3 a mapping of strings: string a b associated with [ 1.0,1.125), string I) with [~.125,1.2SO), st&!g c with [1.250, LSQ),string d with [MO, LX), arul string t- with [ 1.75,2.0).
A = [1,2), then the induced tramfm?ion
&SJLT 3. The number of elementa y strings associated with fikst digits of powers of two is 5, and the same number of strings appms when
FIG. 3.
Higher-order pattern of string a (1248).
Frc. 4.
Hig,herader
pattern of string b (1249).
Stdng Patdm
ofLeading Digit8
FIG. 5.
higher-u&r
4.
tigherorda
tmns*tion, ofstings
329
pattern of string c (1!25).
within stings am wnsimctd.
The tree
tXAPH-THECdRETICCOMPEEXTIES OF STRING PA’ITERNS
State-transition gmphs ?or string patterrls have proven very useful kuse they illuminate the structural characteristics of a string sequence path. It is the An of this section to provide an introduction to measuring the string-patteru complexity of leading digits by the use of state-transition graphs. The compkxity measure developed here is defined in terms of WC path sequences of strings that when taken in combination will ge.Beirate every
Rc. 6.
Highewrder
pattern of SIring d (138).
FIG. 7.
Higherader
pattern of string c’(137).
S.SITHARAMkIYENGARETAL.
x-s2
possible path. In a graph G of n vertices, e edges, and p connects1 components, the complexity of a graph (defined by McCabes work [18]) ti defined as follows: V(G)=e-n+p. This number is called the cyclomatic number and defines the structural characteristics of a graph. A strongly connected (directed) graph G is one in which, if V, and V, are vertices, there exists a path from VI to V, for any V, and V, in G. McCabe’s major theorem on complexity deals with the number of linearly independent paths and its relation to the cyclom3tic number. The theorem is stated as follows: In a strongly connected graph G, the cyclomatic number is equal to the maximum number of linearly independent circuits. Since all of the graphs we are dealing witih are strongly connected (as seen from Figures 3,4,5,6,and 7), the thorem is applicable in our case. Tlx: application of the &ove theorem is made as follows: Given a state-transition graph (Figure l), we associate it with a description of the number of nodes., number of edges, 3nd number of connected components as follows: Number of Number of Number of Cyclomatic
nodes = 5 (each node corresponds edges = 9. connected components I= 1. number V(G) = 9 - 5 + 1 = 5.
The connectivity
to a string).
matrix’ is 3s follows: ai-o-oa
b - 0c- __d1- - e1-
b;O
0
0
1
c,l
16 0 11 0 0 11
0 0 0
d,O el0
0
Thus the overall strategy will be to measure the complexity of the directs 14 graph 0y computing the number of linearly independent ~~l~*hs? which is equal to the cyclomatic number. Similarly one could cr.:;g ,.: ‘0 the cyclomat ic complexity of the other directed graphs shown in Fi,:i:u 3, 4, 5, 6, and 7.
‘Refer to the Appendix for the definition of the connwtivity
matrix.
ooooooof-fooooooooooooooo
~00000000000000~00000000 00Q000000~8000000000000 I
joooooooorcoooooooooooooo
S. SI?“HARAMAIYENGAR ET AL.
2434 A computatior. graphs idicate~.
of the cyclomatic
complexity
‘;or other state-transition
that the number of string sequences is always 5, but what
co~~Gtvtes the &ing is not unique. Now we are ready ito state Result 4, based on ow obserations on the complexities of string patterns. The CycloPnatic complexity emljilrms to Result 3. The number computed abotx equat the number of strings genemted by leading digit8 of powers of turn,,which equals the number of linearly i-t cirnits# This redt is true fbr higherider patterns of strings within strings. REWLT 4.
c~y~m
5.
REMARKS
C6NCLWDINC
For the qast quarter century, binary arithmetic has been used in the design o!f digital computers, and it is promoted as the way of the future. Decimal numbers h-lve not been used because of the cost of hardware. Chen and MO
TABLE 2 ..01
-a, --0
CONNECTIVITY
5”
a2
a3
a4
a5
06
h
cl
c2
c3
c4
c5
c6
d,
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
00013000000000010000 0 0 0 0 0 0 0 0 0000000000000000100 0000000000000000011 0 0 0 0 0 0 0 0 0000000000000000001 6; 0 0 0 0 0 0 0 0 6, 1000001000000000000 Cl 1 0 0 0 0 0 0 CCJ 0 c3 0010000000000000000 0G01000000000000000 c4 COOO1OOGOOOOOOOG c5 0 0 0 0 0 1 0 0 % 0000000010000000000 d, 0 0 0 0 0 0 0 0 d2 0 0 c: 0 0 0 0 0 d3 0 0 0 0 0 0 0 0 b 000000000000100000(-) d, 0 0 0 0 0 0 0 1 e1 I_“starting sttig el; terminal string e,.
a2 U3 Q4 a5 a6
MATRIX: CASE
d3
d4
d5
0
0
0
0
1
01 0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0
0 0
0 0
0 0 0
10 0 1 0 0
0 0 1
0 0 0
0 0 0
0 0 II
0 0 0
0 0 0
0 c! 0
0 0 0
0
0
3
0
1
0
0
0
0
()
0
6
e,
3000000-00000000000000~~~~~~~Q =0000C-=000000000000000000Q~Q~
::~oOQO-OOoOOOOOc~OOOOOOOc3OOOOOOQ 3000-l00000000000000000Q0000000 coo-00000000000000000000000000 co-000008000000000000000000000 3~0000Q00000000000000000000000 ~00000000000000000~000000000000 000000000000000000000000000000000000000000000000000000000-0 000000000000000000000000000000 000000000000000000000000-00-00 00000000000000000000000000-000 0000000000000000000000000-0000 -00000000000000000000000000000 000800000000000000000-00000000 000000000000000000-00000000000 00000000000000000-000000000000 a00000000000000000000000-00000 00000000000000000000000-000000
30000b0000000000000000-0000000 ~c>0000000303000-0000000c000==0 <~0000000000300000000-000000QQQ Q000000000000000000-0000808000 o~)oQooooooooooooo-oooooQ=Q=QQ= ~~)Q~Qooooooooooo-oooooooQQQ=== c,~~0Q0000000-0000000000==~===QQ ~,<>OQQ0C)000-00000000000C~=OOO==Q <:,~~QOQOc,Qc=,-roooooooOOoOO=OQOQ=~=
S. SITHAF~AMANENGAR ET AL.
336
[ 14 have demon&rated that the storage efficiency disadvantage
is RO longer a serious
of decimal machines ‘usmg decimal sequences. A Huffman code
could eesily be designed to encode groups of digits as described in our paper. In many aspects ad cryptographic processes, th;;ldata transformation needs L5iequem of unifoxlm qlasimn dom numbers. From the structure of the transition probabilities Ir(a -, /3), we may explore the possibilities of using &se string sequences for error correcting codes and cryptography. More ap?aysis is needed to identify and evaluate the fuX scope of these interesting pattern of sequences in other areas of computer science. APPENDIX A cycle is a directed mph in a path of length 2 1 from a vertex to itself. A graph C is strongjy connected if, for every pair of vertices a and b, there is a path from Q to b. The connectivity matrix is the same as the matrix of the reaexive-transitive dosure of the relation described by the adjacency matrix WI* The complexity meanve
described in this paper describes; the number of paths starting from any string back to itself. The corresponding intermediate string that can be rewhed is described in the form of a connectivity (reachabiity) matrix. Sllch matrkes are presented in Tables I-3 in order of irweasing size. REFERENCES ‘Frank Eenford, The k~.wof anomalous numbers, P~oc. Amn-. Philos. Sec. 78:551572 (193$). Roger S. Pi&am, Ch the distribution of first signifimt digits, Ann. Math. Slkztist. 32: 1223- 1.230(19%). R. W. Hamming On the distribution of numbers, Bell Systefarl&h. J. 40:16091825 (1970). D. Ihu& 7ke Art rf Cumputer Proaamming, Vol. 2, Addrsos..-‘Wesley, Fkading, Mass., 1969, pp. 219-229. N. EE.‘ho, The distribution of significant digits and round off errors, Comm. ACM 17:2$9-271 (1974). Ralph A. Rairni, The peculiar distribution of the first digits, Sci. Amer. 221:109220 (1969). R&b A. Raimi, ‘I’br! first digit problem, Amer. Math. IWotrihty 8$521--538 (1976). Yersi Ihconis, The distributiou of leading dig& and uniform di.itribution mod 1, Ann. plobnb. 5 (1):71”-8.1(1977). N. Macoltland L. Mow, The distribution of first digits of pow,ers, Smiptd M&. A&%0-291 (1950).
Stdng Pattern3 of Leading Digits
337
LO ‘kuipers and H. Neiderreiter, Unifm Distribpttion Seqmes, Wiley, New York, 197 4. 11 A. K. Rajagopal, V. R. R. Uppuluri, David Scott, S. S. Iyengar, and Mohan Yellayi, New statistical aspects of the first significant digits of 2”, unpublishd rtport. 12 S. C. Kak, New results on the first digit problem, Tech. Rep. EE m7, Louisiana Stato Univ., Baton Rouge, LA 70803. 1.3 J. Cigler, Methods of summability and uniform distribution mod 1, Composirio fi&ztk. l&44-51 (1964). 14 T. C. Chen and I. T. Ho, Storage efficient representation of decimal data, Comm. ACM 18, No. 8 (1975). 15 Ghan J. Smith, Comments on a Paper by T. C. Chen and I. T. Ho, G-mm. ACM 18, No. 8 (1975). 16 S. Sitharama Iye...._ x, A. K. Rajogopa& and Frank Ramos, On the distribution of St-ring sequences, J. Combin. Thesty cmd System &Sci.,to appear. 17 Junes B. Robertson, V. R. R. Uppuhui, and A. K. Rajogopal, First digit phenomenom and ergodic theory, Math. A&. and AruzZ., to appear. 18 T. J. McCabe, A complexity measure, IEEE Trans. Soj3ware Engrg. SE2 (4):308-320 (1976). 19 Leon S. Levy, Discrete St4wcturesof Computer science, Wiley, 1.980.