String patterns of leading digits

String patterns of leading digits

String Ipattems of Leading S. Whamma Iyengar Depamt of cbmputer ScirnG Ltn4isiuna State Unitxmi~ Baton Rouge, lhtina 70803 A. K. Rajagopal oj~Ph@cs...

1MB Sizes 2 Downloads 74 Views

String Ipattems of Leading S. Whamma Iyengar

Depamt of cbmputer ScirnG Ltn4isiuna State Unitxmi~ Baton Rouge, lhtina

70803

A. K. Rajagopal oj~Ph@cs and Astronomy Lmiaianu State Unitxmity Baton Rouge, Louisiana 70803

IIqmtmmt

and

T,-ansmitted hy Robert KaIaba

ABSTFMCT This paper reviews numerous theoretical resdts fop1proper&s of string sequences gwxated by leading digits of 2” and explores their practical implications. l’%e result of Rajagopal et al. on statistical prqertk 3f string sequences is discuss& Several rrxent ideas on the graph-theoretic complexity of string sequences of leading digits are t!len ex&8d. The application of MctCaWs complexity measure to the directed graphs d skngs of leading digits is given. In conclusion, uses of these string sequences irs different areas c;Zcomputer science are discussed.

0.

r the

years,

a number

propc~ \es of the distri\)ution 1’ :

of papers have been pubikhed on wuious of Gst significstnt digits of powers of two.

'PL’ z 3 MATffE.&?ATf%SAND COiWWTATIOiV12:321-337 (1983)

329

:: &:ree*, _1 Science Publishing Co., Inc., 1983 5.2Vsxl

-2dltAve., New York, NY 10017

ou!xao3/&3/rbo3.~

32!2

S. SITHAI-UMA IYENGAR ET AL.

Notable a= the works of Benford [ 11, Pinkham [2], Hamming [3], Knuth b&j, Tsao [S], Raimi [6,7], Diaconis [8], Macon and Moser [9]-, and Kuiper~ and Niderreiter [lo]. The results from these papers have placd the pecdhr behavior of the distribution of first significant digits of powers of two on a firm fcxmdation. Rajagopal et al. [11] first observed that the powers of two in b*ase IO generate sequences of 5 different strings and found that the process ofi generating strings follows a non-Markovian but ergodic process. Recently Kak [12] has examined in some detail the leading digits’ and strings in powers of 3,5,7, etc. and has found interesting string structure and the corresponding stat~transi%n graphs of the leading digits. In this paper we present a review of new results on the behavior of string patterns generated by the leading digits of powers of two and discuss their practical implications. The remainder of this paper is organized as follows: a. description of the statistical properties of strings, higherarder pattern oii strings within strings, a measure of complexity of string patterns, and con&sions.

2.

STATISTICAL PROPERTIES OF STRINGS

This fRction presents vario~ statistical properties of string sequences of first di,gits of powers of two. The structural sequence of first digits of powers of 2 in base 10 generates a pattern of 5 different unique strings: (124&}., (E&49}, { 1251, { 1361, md {137}, and these strings appear in a randcm pattern. The state transition graph for these strings is cas shown in Figure 1 I %mxal interesting observations of the state transition of graphs of strings are in order at this stage, which though evident, do not seem to have been noted before in the vast literature on the subject. Fomution of String Patterns The strings {1248}, {K&49}, (125}, {136}, and (137) appear in a random fashion. Let us represent these strings as a, b, c, d, and e. These strings have the hollowing unique markers: string a is associated w’th 8 (last dig!t of the string), string b with 9, string c with 5, string d with C;;,and! string e bvith 7. These strings, as stated above, are marked by 5,6,7,8,9, aI.l of which go back to 1 upon multiplication by 2. 0ne cculd therefore argue that in the limit of Iarge values of n the strings will also occur with asympto5c probabilities T~(a), with a standing for one of the strings a, b, c, d, and e Thus there is ia connection between the last digit of the string (e:.g., 8 in the qzase of the string 1248) and the stationary probability of occurrence of 8. We are nr)w ready to state a theorem on the above obsemation of the probability of occurrence of strings.

2.2.

FIG. 1.

RESULT

1.

State-transition graph of strings

The asymptotic probabiiities

of occumnce

of the strings are

given by

++logz

(l+$),

?r(b)=log,(l+i), Tr(c)=bg,(1+~),

n(d)=log&+3), Ir(e)=log&+$).

First, let us recall that the asymptotic probabihi-yof occurrence 9. As noted earlier, the strings are og&l+l/p), p=l,..., a-~i,quely determined by the last digit in each of them, and by the above result w Oreprobability of occurrence of the digit p, the asymptotic probabilities of RF bl.mKS.

S. SITHARAMA IYENGA:R FT AL,.

324

strings are proportional

to the logarithm to base 10 of these digits, i.e.

Tr(u) = K

log,,(l+i),

m(b)= K log,&+ i), 71(c)= K

log,,(l+S)9

n(d) = K log,,(l+i),

The constant of proportionality

i

7+x)=1

K is determined

andso

by the condition

X=(logr02)-‘,

a=0 at-d hence the proof. Several researchers who discuss leadingdigit behavior have often used the tools of dorm distribution mod 1 [9], and Raimi [6] states that the strong; Benford sequences were first defined by Cigler [lf] using the concept oi’ uniform distribution. It is interesting to see the behavior of string plttems of Peading digits using the language of uniform distribution. Assume n is sufficiently large so that 2” is at least a fourdigi t m.mdxx starting with a 1. Then the string of leading-digits that occur depends directl~~ on the 3 digits following the 1 in the number 2”. If these digits are from 600-124, the string is 1248; if from 125-249, the string is 125; ;f from m-749, the string is 136; if from 750-999, the string is 137. This behavior of strings suggests that if the three dig& following the 1 were uniforml:~ distributed, then n( Q) = 0.125, TT(b) = 3.125, n(c) = 0.25, ~td ) = 0.25, and YT( e) = 0.25. The fact that Result 1 does not give these probabilities is interesting and suggests that these other digits are nclt qu.itE;un!form. ‘it does, hlBwever, explain why we might expect n(a) and n(b) to be smallzr thaul a( c ), a( d ), and T( e ). Furthermore, Hamming [3] finds that the leading digit G of numbers used in digital computers (and nu,nbers found in tabks) tend overall not to the uniform distributions as cne might expect; but toward one in which the smallest digits are sign&n-.-.ntly overrcpresented [ 14,151. In the next section of our paper we show that the filvs strings generrited b, 2” are an emnple of a non-Marko~.ian process *;,ith L stationa.rq linr\\. distribution.

stYi?lg Mtems bS;ULT

2.

ofxeading ii’h? jh.w

Digits strings

325

cm be considered as states of a chain.

Tht: chain i!i of interest as an example of a nun-Markovian process with a stationary limit distribution. The reason it is non-Markovian is that the transition probability from one state to another is not independent of the previous states visited. The non-Markovian nature is apparent only when looking at the twostep transition probability matrix, which is not the square of the onwtep transition matrix.

3,

MIGIIER-ORIXR

PATTERNS

OF STRINGS

0ur notion of a pattern is a simple one: a pattern is a LIOMUU finite string of J&its. The path of a pattern is all strings obtainable starting from a given string back to itself. In graph theory this path is a circuit. me two terminal vertices of a path coincide, and the remaining vertices are distinct.) For example, the path @d&z) describes a circuit, because: the two terminal vertices of this path coincide at the string a. In the case of elementary strings, for example 125, the digit 1 returns after 5. Furthermore, a ~&WI-I generates an associated set of strings describing a umque path sequence. The patterns of “strings within strings” are aIso related tco induced transformation of the interval [l, 2), which when partitioned into sets {[l, 1.125),[1.125,1.25),[1.25,1.50),[1.50,1.75),[1.75,2.0)} geuerates 5 different strings (u}, {b}, (c}, {d), and (e). [Sez Figure z(a))..] Again by applying the induced transformation on the subinterval of the previous set [ 1,1X5), we can partition it into sub-&subsets of the following form &y u&y tower trausformation theorem of Robertson et al. [17]: ([LO, 1.693?5), [1.09375,1.0%6328125), [ lB986328125,l. 1102230%5), [1.110223025,1.11758XW5), [1.117W70895,1.125)). “ihrls ;I point in the interval. [LO, 1.09375) generates the pattern Q&U, a point in the interval [1.109375,1.0984’?28125) generates the patterr- aecu, a point in the i!lt~~al [ 1.~86328125,1.1102:23025) gener&tts the lengthy pattern ~~c&cb&e&edbec!c~, and so on. Figure 2 &ows a pa&tern chart for strings within strings. An algorithm to compute the threshold values for higherorder Ipatterns of strings within strings has been developed [16], and readers lint~:rested in this algorithm are requzst,d( to write to the authors of [IS]. ‘Based on the above observation, we shaU now state d theorem.

S. SITI-IARAMA IKENGA

326

I .ooo

I.825 1.250 I.500 1.7 2.000

FIG. 2.

[a) Formation of elementary strings. (b) Higherarder formation of strings within

strings.

THEOREM (Transition mapping of strings). Let A be any subset of X of positive jk meusure, and n-l

A, = in

n Tk(~I) W(A) k=l

be the set of points which return to A for the f&St time in n steps. I’ rA of string a onto itself also define3 a mapping of strings: string a b associated with [ 1.0,1.125), string I) with [~.125,1.2SO), st&!g c with [1.250, LSQ),string d with [MO, LX), arul string t- with [ 1.75,2.0).

A = [1,2), then the induced tramfm?ion

&SJLT 3. The number of elementa y strings associated with fikst digits of powers of two is 5, and the same number of strings appms when

FIG. 3.

Higher-order pattern of string a (1248).

Frc. 4.

Hig,herader

pattern of string b (1249).

Stdng Patdm

ofLeading Digit8

FIG. 5.

higher-u&r

4.

tigherorda

tmns*tion, ofstings

329

pattern of string c (1!25).

within stings am wnsimctd.

The tree

tXAPH-THECdRETICCOMPEEXTIES OF STRING PA’ITERNS

State-transition gmphs ?or string patterrls have proven very useful kuse they illuminate the structural characteristics of a string sequence path. It is the An of this section to provide an introduction to measuring the string-patteru complexity of leading digits by the use of state-transition graphs. The compkxity measure developed here is defined in terms of WC path sequences of strings that when taken in combination will ge.Beirate every

Rc. 6.

Highewrder

pattern of SIring d (138).

FIG. 7.

Higherader

pattern of string c’(137).

S.SITHARAMkIYENGARETAL.

x-s2

possible path. In a graph G of n vertices, e edges, and p connects1 components, the complexity of a graph (defined by McCabes work [18]) ti defined as follows: V(G)=e-n+p. This number is called the cyclomatic number and defines the structural characteristics of a graph. A strongly connected (directed) graph G is one in which, if V, and V, are vertices, there exists a path from VI to V, for any V, and V, in G. McCabe’s major theorem on complexity deals with the number of linearly independent paths and its relation to the cyclom3tic number. The theorem is stated as follows: In a strongly connected graph G, the cyclomatic number is equal to the maximum number of linearly independent circuits. Since all of the graphs we are dealing witih are strongly connected (as seen from Figures 3,4,5,6,and 7), the thorem is applicable in our case. Tlx: application of the &ove theorem is made as follows: Given a state-transition graph (Figure l), we associate it with a description of the number of nodes., number of edges, 3nd number of connected components as follows: Number of Number of Number of Cyclomatic

nodes = 5 (each node corresponds edges = 9. connected components I= 1. number V(G) = 9 - 5 + 1 = 5.

The connectivity

to a string).

matrix’ is 3s follows: ai-o-oa

b - 0c- __d1- - e1-

b;O

0

0

1

c,l

16 0 11 0 0 11

0 0 0

d,O el0

0

Thus the overall strategy will be to measure the complexity of the directs 14 graph 0y computing the number of linearly independent ~~l~*hs? which is equal to the cyclomatic number. Similarly one could cr.:;g ,.: ‘0 the cyclomat ic complexity of the other directed graphs shown in Fi,:i:u 3, 4, 5, 6, and 7.

‘Refer to the Appendix for the definition of the connwtivity

matrix.

ooooooof-fooooooooooooooo

~00000000000000~00000000 00Q000000~8000000000000 I

joooooooorcoooooooooooooo

S. SI?“HARAMAIYENGAR ET AL.

2434 A computatior. graphs idicate~.

of the cyclomatic

complexity

‘;or other state-transition

that the number of string sequences is always 5, but what

co~~Gtvtes the &ing is not unique. Now we are ready ito state Result 4, based on ow obserations on the complexities of string patterns. The CycloPnatic complexity emljilrms to Result 3. The number computed abotx equat the number of strings genemted by leading digit8 of powers of turn,,which equals the number of linearly i-t cirnits# This redt is true fbr higherider patterns of strings within strings. REWLT 4.

c~y~m

5.

REMARKS

C6NCLWDINC

For the qast quarter century, binary arithmetic has been used in the design o!f digital computers, and it is promoted as the way of the future. Decimal numbers h-lve not been used because of the cost of hardware. Chen and MO

TABLE 2 ..01

-a, --0

CONNECTIVITY

5”

a2

a3

a4

a5

06

h

cl

c2

c3

c4

c5

c6

d,

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

00013000000000010000 0 0 0 0 0 0 0 0 0000000000000000100 0000000000000000011 0 0 0 0 0 0 0 0 0000000000000000001 6; 0 0 0 0 0 0 0 0 6, 1000001000000000000 Cl 1 0 0 0 0 0 0 CCJ 0 c3 0010000000000000000 0G01000000000000000 c4 COOO1OOGOOOOOOOG c5 0 0 0 0 0 1 0 0 % 0000000010000000000 d, 0 0 0 0 0 0 0 0 d2 0 0 c: 0 0 0 0 0 d3 0 0 0 0 0 0 0 0 b 000000000000100000(-) d, 0 0 0 0 0 0 0 1 e1 I_“starting sttig el; terminal string e,.

a2 U3 Q4 a5 a6

MATRIX: CASE

d3

d4

d5

0

0

0

0

1

01 0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 0

0 0

0 0

0 0 0

10 0 1 0 0

0 0 1

0 0 0

0 0 0

0 0 II

0 0 0

0 0 0

0 c! 0

0 0 0

0

0

3

0

1

0

0

0

0

()

0

6

e,

3000000-00000000000000~~~~~~~Q =0000C-=000000000000000000Q~Q~

::~oOQO-OOoOOOOOc~OOOOOOOc3OOOOOOQ 3000-l00000000000000000Q0000000 coo-00000000000000000000000000 co-000008000000000000000000000 3~0000Q00000000000000000000000 ~00000000000000000~000000000000 000000000000000000000000000000000000000000000000000000000-0 000000000000000000000000000000 000000000000000000000000-00-00 00000000000000000000000000-000 0000000000000000000000000-0000 -00000000000000000000000000000 000800000000000000000-00000000 000000000000000000-00000000000 00000000000000000-000000000000 a00000000000000000000000-00000 00000000000000000000000-000000

30000b0000000000000000-0000000 ~c>0000000303000-0000000c000==0 <~0000000000300000000-000000QQQ Q000000000000000000-0000808000 o~)oQooooooooooooo-oooooQ=Q=QQ= ~~)Q~Qooooooooooo-oooooooQQQ=== c,~~0Q0000000-0000000000==~===QQ ~,<>OQQ0C)000-00000000000C~=OOO==Q <:,~~QOQOc,Qc=,-roooooooOOoOO=OQOQ=~=

S. SITHAF~AMANENGAR ET AL.

336

[ 14 have demon&rated that the storage efficiency disadvantage

is RO longer a serious

of decimal machines ‘usmg decimal sequences. A Huffman code

could eesily be designed to encode groups of digits as described in our paper. In many aspects ad cryptographic processes, th;;ldata transformation needs L5iequem of unifoxlm qlasimn dom numbers. From the structure of the transition probabilities Ir(a -, /3), we may explore the possibilities of using &se string sequences for error correcting codes and cryptography. More ap?aysis is needed to identify and evaluate the fuX scope of these interesting pattern of sequences in other areas of computer science. APPENDIX A cycle is a directed mph in a path of length 2 1 from a vertex to itself. A graph C is strongjy connected if, for every pair of vertices a and b, there is a path from Q to b. The connectivity matrix is the same as the matrix of the reaexive-transitive dosure of the relation described by the adjacency matrix WI* The complexity meanve

described in this paper describes; the number of paths starting from any string back to itself. The corresponding intermediate string that can be rewhed is described in the form of a connectivity (reachabiity) matrix. Sllch matrkes are presented in Tables I-3 in order of irweasing size. REFERENCES ‘Frank Eenford, The k~.wof anomalous numbers, P~oc. Amn-. Philos. Sec. 78:551572 (193$). Roger S. Pi&am, Ch the distribution of first signifimt digits, Ann. Math. Slkztist. 32: 1223- 1.230(19%). R. W. Hamming On the distribution of numbers, Bell Systefarl&h. J. 40:16091825 (1970). D. Ihu& 7ke Art rf Cumputer Proaamming, Vol. 2, Addrsos..-‘Wesley, Fkading, Mass., 1969, pp. 219-229. N. EE.‘ho, The distribution of significant digits and round off errors, Comm. ACM 17:2$9-271 (1974). Ralph A. Rairni, The peculiar distribution of the first digits, Sci. Amer. 221:109220 (1969). R&b A. Raimi, ‘I’br! first digit problem, Amer. Math. IWotrihty 8$521--538 (1976). Yersi Ihconis, The distributiou of leading dig& and uniform di.itribution mod 1, Ann. plobnb. 5 (1):71”-8.1(1977). N. Macoltland L. Mow, The distribution of first digits of pow,ers, Smiptd M&. A&%0-291 (1950).

Stdng Pattern3 of Leading Digits

337

LO ‘kuipers and H. Neiderreiter, Unifm Distribpttion Seqmes, Wiley, New York, 197 4. 11 A. K. Rajagopal, V. R. R. Uppuluri, David Scott, S. S. Iyengar, and Mohan Yellayi, New statistical aspects of the first significant digits of 2”, unpublishd rtport. 12 S. C. Kak, New results on the first digit problem, Tech. Rep. EE m7, Louisiana Stato Univ., Baton Rouge, LA 70803. 1.3 J. Cigler, Methods of summability and uniform distribution mod 1, Composirio fi&ztk. l&44-51 (1964). 14 T. C. Chen and I. T. Ho, Storage efficient representation of decimal data, Comm. ACM 18, No. 8 (1975). 15 Ghan J. Smith, Comments on a Paper by T. C. Chen and I. T. Ho, G-mm. ACM 18, No. 8 (1975). 16 S. Sitharama Iye...._ x, A. K. Rajogopa& and Frank Ramos, On the distribution of St-ring sequences, J. Combin. Thesty cmd System &Sci.,to appear. 17 Junes B. Robertson, V. R. R. Uppuhui, and A. K. Rajogopal, First digit phenomenom and ergodic theory, Math. A&. and AruzZ., to appear. 18 T. J. McCabe, A complexity measure, IEEE Trans. Soj3ware Engrg. SE2 (4):308-320 (1976). 19 Leon S. Levy, Discrete St4wcturesof Computer science, Wiley, 1.980.