Recovering highly-complex linear recurrences of integer sequences

Accepted Manuscript Recovering highly-complex linear recurrences of integer sequences Gadi Aleksandrowicz, Andrei Asinowski, Gill Barequet, Ronnie Ba...

Download PDF

270KB Sizes 2 Downloads 64 Views

Report

PDF Reader
Full Text

Accepted Manuscript Recovering highly-complex linear recurrences of integer sequences

Gadi Aleksandrowicz, Andrei Asinowski, Gill Barequet, Ronnie Barequet

PII: DOI: Reference:

S0020-0190(17)30128-X http://dx.doi.org/10.1016/j.ipl.2017.07.006 IPL 5562

To appear in:

Information Processing Letters

Received date: Revised date: Accepted date:

31 July 2015 12 June 2017 7 July 2017

Please cite this article in press as: G. Aleksandrowicz et al., Recovering highly-complex linear recurrences of integer sequences, Inf. Process. Lett. (2017), http://dx.doi.org/10.1016/j.ipl.2017.07.006

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Highlights • The Berlekamp–Massey algorithm is used for recovering complex linear recurrences. • The Chinese Remainder Theorem breaks the problem into simpler subproblems. • Method is used for recovering formulae enumerating polyominoes on twisted cylinders.

Recovering Highly-Complex Linear Recurrences of Integer Sequences$ Gadi Aleksandrowicza , Andrei Asinowskib , Gill Barequeta,∗, Ronnie Barequetc a b

Dept. of Computer Science, The Technion—Israel Inst. of Technology, Haifa 3200003, Israel.

Inst. of Discrete Mathematics and Geometry, Vienna Univ. of Technology, Wiedner Hauptstraße 8-10/104, 1040 Vienna, Austria. c

Dept. of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.

Abstract We suggest a variant of the Berlekamp-Massey algorithm, originally used for recovering a linear shift register from known output bits, for recovering a linear recurrence satisﬁed by a sequence of natural numbers from known values of the sequence. We present an application of the algorithm to recovering extremely complex recurrences satisﬁed by the sequences enumerating polyominoes on twisted cylinders. Keywords: Algorithms, recurrence formula, Chinese remainder theorem, polyominoes.

1. Introduction In this paper we revisit the problem of recovering a linear recurrence satisﬁed by a sequence of integer numbers, given the ability to produce as many elements of the sequence as we wish. We present a new algorithm, based on existing tools, and show an application of the algorithm to recovering highly complex recurrences with extremely large coeﬃcients. To the best of our knowledge, available commercial mathematical software packages use diﬀerent methods which fail on signiﬁcantly simpler instances of the problem. An algorithm presented by Berlekamp [6, §7] for decoding BCH (Bose-Chaudhuri-Hocquenghem) codes was used by Massey [14] in order to ﬁnd the shortest linear recurrence that satisﬁes a sequence of elements from a ﬁeld. A straightforward implementation of the algorithm runs in O(n2 ) time, where n is the order (number of terms) of the recurrence. Using a recursive version of the algorithm and the fast Fourier transform [1], the algorithm can be implemented in O(n log n log log n) time [7, 8]. The analysis of the complexity of the algorithm assumes that the size of the values incurred throughout the course of running the algorithm is ﬁxed (that is, that the ﬁeld is ﬁnite). Consequently, using the algorithm in order to ﬁnd the optimal (simplest) linear recurrence satisﬁed by a sequence of large integer numbers (or rational numbers) leads to very time-consuming computations. Reeds and Sloane [15] generalized this method to the case in which the input to the algorithm consists of elements of a sequence of integer numbers taken modulo some arbitrary number whose prime factorization is known. (Note that the original elements of the sequence are unknown.) We $ A preliminary (but much diﬀerent) version of this paper appeared in [2]. Work on this paper by the second and third authors has been supported in part by a grant from Bar-Nir Bergreen Software Technology Center of Excellence. Work on this paper by the third author has also been supported in part by E. and J. Bishop Research Fund, and by Grant 575/15 from the Israel Science Foundation (ISF). ∗ Corresponding author Email addresses: [email protected] (Gadi Aleksandrowicz), [email protected] (Andrei Asinowski), [email protected] (Gill Barequet), [email protected] (Ronnie Barequet)

Preprint submitted to Elsevier

July 20, 2017

use a similar idea in order to ﬁnd the shortest linear recurrence satisﬁed by a sequence of known integer number. Given an upper bound on the magnitude of the coeﬃcients of the recurrence, we can easily ﬁnd a set P = {p1 , p2 , . . . , pk } of primes whose product surpasses this bound. Then, we compute k “modulo sequences,” each one consisting of the elements of the original sequence taken modulo some prime pi ∈ P , and proceed by applying the Berlekamp-Massey algorithm to each such “modulo sequence” independently. Finally, using the Chinese Remainder Theorem (CRT) as Reeds and Sloane do, we combine the recurrences satisﬁed by all the “modulo sequences” into the recurrence satisﬁed by the original sequence. The motivation behind taking this approach is clear: While the elements of the original sequence (as well as the coeﬃcients in the respective recurrence) might be extremely large (thereby, preventing the algorithm from running), the magnitude of elements of the ith “modulo sequence” is small—at the order of the number pi . This allows us to perform arithmetic operations eﬃciently with computer-size words instead of very time-consuming “big-number” variables. This is particularly important since we use the simple quadratic-time implementation of the Berlekamp-Massey algorithm.

2. The Algorithm M Deﬁnition 1. An integer sequence s satisﬁes a linear recurrence of order M if sn = − i=1 ci sn−i for n > M , where for 1 ≤ i ≤ M the numbers si are the initialization values, and ci are the coeﬃcients of the recurrence. Problem 2. Assume that we can generate an arbitrarily-long preﬁx of an integer sequence s, which satisﬁes a linear recurrence of order at most M . Recover the minimal recurrence satisﬁed by s. Note that we are not interested in solving the recurrence, that is, in ﬁnding a closed-form formula for sn , but rather just in recovering the recurrence formula. Observing that a recurrence remains the same if it is taken modulo an integer number,1 we use an idea similar to the one used by Reeds and Sloane [15]. The algorithm consists of the following steps:

1. 2.

3. 4. 5.

Input: A generator of an integer sequence s satisfying an unknown linear recurrence r of order M ; Output: r; Initialization: Set k := k ∗ (where k ∗ is some nominal number); Let P = {p1 = 2, p2 = 3, p3 = 5, . . . , pk } be the set containing the ﬁrst k prime numbers. For each 1 ≤ i ≤ k, compute the ﬁrst 2M elements of the “modulo sequence” si , that is, elements of s modulo pi ; Compute, using the Berlekamp-Massey algorithm, the linear recurrences r1 , . . . , rk satisﬁed by the sequences s1 , . . . , sk ; Using CRT, recover the original recurrence formula r from the recurrences r1 , . . . , rk ; “Veriﬁcation test”: Compare the leading 2M elements produced by r with the respective elements of s. If there is any diﬀerence, set k := 2k and go to Step 2; Otherwise, output r and halt.

The remainder of this section contains details and explanations of the diﬀerent steps of the algorithm. Correctness of the Algorithm. Naturally, if a sequence s satisﬁes a recurrence r, then the sequence, obtained by taking all elements of s modulo a prime p, satisﬁes the recurrence obtained by taking all coeﬃcients of r modulo p. Since all primes are obviously coprime, we can use CRT to recover separately 1 In the sense that all the coeﬃcients of the recurrence, as well as the elements of the sequence, are taken modulo the same number.

2

each original coeﬃcient of r from the respective coeﬃcients of the modulo sequences, independently of the other coeﬃcients of r. However, this process yields the correct original recurrence r only if we use enough prime numbers and, consequently, enough modulo sequences, as we proceed to explain. The Number of Modulo Sequences. In favorable situations, the “correct” value of k ∗ (i.e., the number of needed modulo sequences) is known in advance. Starting with a value of k larger than needed makes no harm, as the correct recurrence will be recovered anyway. Otherwise, when k ∗ is less than required, the algorithm will discover this when the recovered recurrence r will fail the veriﬁcation test in Step 5 (see below). How many modulo sequences are needed? Let L be the largest coeﬃcient (in absolute value) of k the sought recurrence formula, and let N be the primorial pk # = i=1 pi (the product of the ﬁrst k prime numbers). Note that N is even. The value of k should be high enough so as to make N greater than 2L. Indeed, if this is the case, then the original non-negative coeﬃcients of r are recovered correctly (by applying CRT) in the range [0, N/2), while the negative coeﬃcients are recovered uniquely in the range [N/2, N ). If a recovered coeﬃcient c exceeds N/2 − 1, we subtract N from c in order to obtain its correct value. In practice, we may have no clue a priori about how large L is, and so we do not know what the correct value of k is. Hence, we start from some nominal number of modulo sequences, and if the recovered recurrence fails the veriﬁcation test (see below), we double the number of modulo sequences and repeat the process until the recovered recurrence passes the veriﬁcation test. (As an alternative, we could increase the number of modulo sequences so as to double the product of the respective prime numbers.) Recovering a Single Modulo Recurrence. For recovering a linear recurrence from a preﬁx (list of elements) of a sequence, we use the known Berlekamp-Massey algorithm. Details are omitted, and the interested reader is referred to their original works [6, 14]. Recovering the Original Recurrence. Given the ith coeﬃcients of all the modulo sequences, each sequence obtained by taking the original recurrence modulo a diﬀerent prime number, a direct application of CRT recovers ci , the ith coeﬃcient of the original recurrence. As explained above, if ci exceeds N/2 − 1, where N is the product of all prime numbers used to create the modulo sequences, we subtract N from ci in order to obtain its true (negative) value. Verifying a Recovered Recurrence (Veriﬁcation Test). Now we arrive at the heart of the algorithm. It is known [14, Thm. 3] that if M is the order of the minimal recurrence r satisﬁed by a sequence, then 2M elements of the sequence suﬃce for the algorithm to recover r uniquely. Hence, verifying the correctness of a recovered recurrence r could easily be performed by comparing the ﬁrst 2M values produced by r with an equally-long preﬁx of the sequence s. (Recall that we can generate as many elements of s as we need.) If M is just an upper bound on the order of the linear recurrence, then 2M values surely suﬃce for ensuring that the recurrence is recovered correctly. We suspect that our algorithm is not the algorithm used in commercial tools for recovering the linear recurrence satisﬁed by a sequence of numbers from a preﬁx of it: Such tools broke down for recurrences much simpler and with much smaller coeﬃcients than the recurrences which our package handled quite easily, with a few orders of magnitude in diﬀerence.

3. An Application A polyomino of size n is an edge-connected set of n cells on the square lattice Z2 . Fixed polyominoes are considered distinct if they diﬀer in their shapes or orientations. The number of ﬁxed 3

13 14 15 10 11 12 7 8 9 3 4 5 6 1 2 3 4 {1, 3, 4, 5, 7, 8, 9, 12, 13}

(a) Cylinder

1

(b) Polyomino

3

4

5

7

8

9

12 13

(c) Subgraph of the host graph

Figure 1: A polyomino on the twisted cylinder of width 3.

⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

(a) Automaton

1 0 1 1 1 1 1 1

1 0 1 0 0 0 0 0

0 0 0 1 1 1 1 1

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 0 1 1 0 0 1

0 0 0 0 0 1 1 0

0 0 1 0 0 0 0 0

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(b) Transfer matrix

Figure 2: An automaton implementing the building of polyominoes on the twisted cylinder of width 3, and the corresponding transfer matrix.

polyominoes of

size n is usually denoted in the literature by A(n). Klarner [11] proved that the limit λ := limn→∞ n A(n) exists. (Henceforth, λ has been called “Klarner’s constant.”) Only three decades later Madras [13] showed that the limit limn→∞ A(n + 1)/A(n) exists and, hence, is equal to λ. The currently best known lower [5] and upper [12] bounds on λ are 4.0025 and 4.6496, respectively. It is generally assumed (see, e.g., [9]), as a conclusion from numerical methods applied to the known values of A(n), that λ = 4.06 ± 0.02. The currently best estimation of λ, 4.0625696 ± 0.0000005, is by Jensen [10]. The method employed for setting the lower bound uses so-called twisted cylinders [4]. For a ﬁxed natural number w, the twisted cylinder of width w is the spiral surface obtained from the Euclidean plane by identifying all pairs of points of the form (x, y), (x − kw, y + k), for k ∈ Z. Pictorially, we can imagine this as taking the strip [0, w] × R and gluing, for all y, (w, y) with (0, y + 1) (see Figure 1). It was proven [ibid.] that the growth constant of polyominoes on a twisted cylinder of any width is a lower bound on λ, the growth constant of polyominoes in the plane. In a preliminary version of this paper [2], we showed that polyominoes of size n on any twisted cylinder are in bijection with words of length n in a language accepted by a ﬁnite automaton, and showed constructively how to build the automata corresponding to cylinders of any width. A direct consequence is that the sequence enumerating polyominoes of size n on a twisted cylinder of width w (which also enumerates words of size n accepted by the respective automaton) satisﬁes a linear recurrence. Figure 2(a) shows the automaton that models the growth of polyominoes on a twisted cylinder of width 3. (For the meaning of the names of the states, the reader is referred to a previous work [4].) Figure 2(b) shows the transfer matrix that corresponds to this automaton. In principle, the generating function of the sequence, as well as the recurrence it satisﬁes, can be obtained directly from the transfer matrix encoding the automaton. Using standard methods, one can easily obtain that the generating function of the sequence enumerating polyominoes on a twisted cylinder of width 3 is

4

6

5

ln ln L(w)

4

3

2

1

4

5

6

7 w

8

9

10

Figure 3: A plot of ln ln L(w) as a function of w.

(x3 + x2 + x − 1)/(2x3 + x2 + 2x − 1), which implies immediately [16, p. 202, Thm. 4.1.1]2 that the recurrence formula is an = 2an−1 + an−2 + 2an−3 . However, the size of the transfer matrix grows rapidly with w. The number of states in the automaton corresponding to a twisted cylinder of width w is Mw+1 , where Mw ≈ 3w /w3/2 is the (w+1)st Motzkin number. Hence, the size of the transfer matrix, Mw+1 × Mw+1 prohibits its manipulation for computing the recurrence already for small values of w. In our experiments, this method broke down already at w = 7. Hence, we attempted to compute the linear recurrence directly from the sequence of numbers of polyominoes obtained by running the automaton. (Recall that we can build and run the automaton even without understanding its structure, thus obtain a preﬁx, of any desired length, of the sequence enumerating polyominoes.) We then computed modulo sequences as described in the previous section. We fed each such modulo sequence to a C++ program in which we coded the simple O(n2 )-time implementation of the Berlekamp-Massey algorithm, where n is the order of the recurrence. (As mentioned in the introduction, asymptotically more eﬃcient versions are known but are much more diﬃcult to code. The running times of our implementation were still manageable.) Finally, we recovered the original recurrence formula from all the modulo recurrences using a program written in Python, which has a built-in capability of handling big numbers. In order to “help” the algorithm, we tried to estimate L(w), the largest coeﬃcient in absolute value in the recurrence that corresponds to a twisted cylinder of width w. From a plot of the values of ln ln L(w) as a function of w, for 4 ≤ w ≤ 10 (see Figure 3), it seems that L(w) grows in a double exponential manner with respect to w, speciﬁcally, ln ln L(w) ∼ w−4. This double exponential behavior can also be expected from the upper bound O((3w )!) on the coeﬃcients of the characteristic polynomial of the transfer matrix. Now, using the asymptotic approximation pk # ∼ e(1+o(1))(k ln k) , we conclude that k = O(ln L(w)/ ln ln L(w)) = O(ew /w) prime numbers suﬃce for our purpose. In practice, at least for w ≤ 10, much less than ew /w primes were needed. Thus, we started with a small nominal number of primes, and whenever the obtained recurrence did not pass the veriﬁcation test, we doubled the number of primes and repeated the procedure. Since Mw+1 is an upper bound on the order of the 2 The cited theorem says that the recurrence is manifested in the denominator of the generating function. Actually, the recurrence can be obtained directly from the transfer matrix (once its minimal polynomial is known), but we ﬁnd it beneﬁcial to also have the generating function at hand.

5

sought recurrence, checking whether the procedure succeeded could be performed by comparing 2Mw+1 values produced by the recovered recurrence to the values computed by the automaton. Our two programs were run on a home laptop with a 2.2 GHz processor and 3 GB of main memory (RAM). The computation time of the entire procedure for cylinders of up to width 8 was negligible. For width 8, the obtained recurrence included 315 terms, and the largest coeﬃcient in absolute value was about 5.48 · 1019 (66 bits). For width 9, the computation took about 20 seconds. The obtained recurrence included 826 terms, and the largest coeﬃcient in absolute value was about 5.18 · 1051 (172 bits). The computation of the linear recurrence for the number of polyominoes on a twisted cylinder of width 10 required about 5 minutes, and the obtained recurrence included 2,168 terms, with the largest coeﬃcient in absolute value being about 6.39 · 10129 (432 bits). The complete speciﬁcations of the recurrences are found in an archive [3]. Here is a summary of our results. Theorem 3. The recurrence formulae enumerating polyominoes on twisted cylinders of width 1 ≤ w ≤ 10 are as follows. Largest w Complexity Coeﬃcient Initialization and Recurrence (abs. value) 1 1 1 (1) ; (−1) 2 1 2 (1) ; (−2) 3 3 2 (1, 2, 6) ; (−2, −1, −2) 4 7 8 (1, 2, 6, 19, 59, 181, 555) ; (−5, 8, −7, 2, 2, −6, 2) 5 18 25 (1, 2, 6, 19, 63, 211, 707, 2360, 7853, 26070, 86434, 286416, 948991, 3144464, 10419886, 34530671, 114435963, 379251561) ; (−4, 1, 6, −5, 0, −1, −25, −9, −18, −10, 14, 16, −22, −23, −1, 14, 0, 2) 6 48 2.95 · 103 (1, 2, 6, 19, 63, 216, 754, 2650, 9311, 32648, 114210, 398711, 1389910, 4840947, 16852765, 58657588, 204151644, 710538469, 2473082513, 8608079201, 29963112912, 104298075975, 363053581718, 1263768927868, 4399117783874, 15313119079728, 53304212536356, 185549212643365, 645886848026350, 2248296466237295, 7826193746149272, 27242536831652344, 94829719987493235, 330096849979162682, 1149048320483048893, 3999771751380580614, 13922977784041234403, 48465093550790494424, 168704234381507799362, 587249847637486241853, 2044183333472295697123, 7115685974133683488273, 24769298361096626327975, 86220519493365324684086, 300128726780117770956748, 1044731035631558645331234, 3636649341693369874678279, 12658969612510481574745708) ; (−8, 23, −30, 21, −32, 100, −217, 260, −204, 107, −46, 208, −435, 550, −787, 1411, −2181, 2416, −2815, 2102, −1941, 1109, −367, −793, 2537, −2472, 2952, −1924, 2893, −1016, 1200, 1622, 183, 1101, −330, −120, −299, −731, 245, −147, −161, 201, −103, −34, 1, −7, −9, 2) 7 121 8.74 · 107 (1, 2, 6, . . . ) ; (−7, 15, −17, . . . , 56, −5, 2) [3] 8 315 5.48 · 1019 (1, 2, 6, . . . ) ; (−13, 72, −241, . . . , 42, −22, 2) [3] 9 826 5.18 · 1051 (1, 2, 6, . . . ) ; (−12, 59, −179, . . . , 122, −8, 2) [3] 10 2, 168 6.39 · 10129 (1, 2, 6, . . . ) ; (−25, 300, −2359, . . . , −419, 47, 2) [3] 2

6

References [1] A.V. Aho, J.E. Hopcroft, and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1974. [2] G. Aleksandrowicz, A. Asinowski, G. Barequet, and R. Barequet, Formulae for polyominoes on twisted cylinders, Proc. 8th Int. Conf. on Language and Automata Theory and Applications, Madrid, Spain, Lecture Notes in Computer Science, 8370, Springer-Verlag, 76–87, March 2014. [3] Recurrence data for polyominoes on http://www.cs.technion.ac.il/~barequet/twisted/ .

twisted

cylinders,

available

at

´ , and G. Rote, Counting polyominoes on twisted cylinders, Integers: The [4] G. Barequet, M. Moffie, A. Ribo Electronic Journal of Combinatorial Number Theory, 6 (2006), #A22, 37 pp. [5] G. Barequet, G. Rote, and M. Shalah, λ > 4: An improved lower bound on the growth constant of polyominoes, Comm. of the ACM, 59 (2016), 88–95. [6] E.R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, New York, 1968. [7] R.E. Blahut, Theory and Practice of Error Control Codes, Addison-Wesley, Reading, MA, 1983. [8] S.R. Blackburn, Fast rational interpolation, Reed-Solomon decoding and the linear complexity proﬁles of sequences, IEEE Trans. on Information Theory, 43 (1997), 537–548. [9] D.S. Gaunt, M.F. Sykes, and H. Ruskin, Percolation processes in d-dimensions, J. of Physics A: Mathematical and General, 9 (1976), 1899–1911. [10] I. Jensen, Counting polyominoes: A parallel implementation for cluster computing, Proc. Int. Conf. on Computational Science, part III, Melbourne, Australia and St. Petersburg, Russia, Lecture Notes in Computer Science, 2659, Springer, 203–212, June 2003. [11] D.A. Klarner, Cell growth problems, Canadian J. of Mathematics, 19 (1967), 851–863. [12] D.A. Klarner and R.L. Rivest, A procedure for improving the upper bound for the number of n-ominoes, Canadian J. of Mathematics, 25 (1973), 585–602. [13] N. Madras, A pattern theorem for lattice clusters, Annals of Combinatorics, 3 (1999), 357–384. [14] J.L. Massey, Shift-register synthesis and BCH decoding, IEEE Trans. on Information Theory, 15 (1969), 122–127. [15] J.A. Reeds and N.J.A. Sloane, Shift-register synthesis (modulo m), SIAM J. on Computing, 14 (1985), 505–513. [16] R. Stanley, Enumerative Combinatorics, vol. I, Cambridge University Press, 1997.

7

Recovering highly-complex linear recurrences of integer sequences

Recovering highly-complex linear recurrences of integer sequences

Recommend Documents