Discrete Mathematics 229 (2001) 29–55
www.elsevier.com/locate/disc
Randomness in lattice point problems Jozsef Beck Mathematics Department, Busch Campus, Hill Center, Rutgers University, New Brunswick, NJ 08903, USA
Keywords: Discrepancy; Lattice points; Continued fraction
1. Super irregularity Discrepancy theory, the counterpart of uniform distribution, was initiated by the classical papers of H. Weyl, Van der Corput, van Aardenne-Ehrenfest, and was developed to a coherent theory by Roth and Schmidt. Here is a typical question. What is the ‘most uniform’ way to place n points in the unit square? We can measure uniformity with respect to natural geometric classes like axis-parallel rectangles, tilted rectangles, circles, convex sets, and so on. How big is the inevitable ‘irregularity’ for each class? Discrepancy theory has important applications in number theory, combinatorics and computational geometry. We refer to the books of Beck and Chen [7] and Matousek [19]. Matousek’s book is especially great as an introductory textbook. On the other hand, Ramsey theory is one of the most extensively developed theories in discrete mathematics, originated from the fundamental works of Van der Waerden, and Ramsey, and was greatly extended by Paul Erd˝os and his ‘school’. It became a deep theory — it is enough to refer to contributions such as Szemeredi’s theorem, Furstenberg’s ergodic method, Shelah’s primitive recursive upper bound on the Hales-Jewett numbers, and Gowers’ very recent analytic work. Ramsey theory has many important applications beyond combinatorics, mainly in number theory and geometry. We refer to the book of Graham et al. [10]. Section 1 is an attempt to extend discrepancy theory in the direction of Ramsey theory. Roth’s Theorem on Long Arithmetic Progressions: For any partition of the integers 1; 2; : : : ; N into two sets S1 and S2 ; there exists an arithmetic progression P = {a; a + E-mail address:
[email protected] (J. Beck). c 2001 Published by Elsevier Science B.V. All rights reserved. 0012-365X/01/$ - see front matter PII: S 0 0 1 2 - 3 6 5 X ( 0 0 ) 0 0 2 0 0 - 4
30
J. Beck / Discrete Mathematics 229 (2001) 29–55
d; a + 2d; : : : ; a + (k − 1)d} in 1; 2; : : : ; N such that ||P ∩ S1 | − |P ∩ S2 || ¿
1 1=4 N : 20
Compare Roth’s theorem to the following fundamental result of Ramsey theory. Van der Waerden’s Theorem on Short Arithmetic Progressions: For any integer k there exists an N such that for any partition of the integers 1; 2; : : : ; N into two sets S1 and S2 ; there exists an arithmetic progression P = {a; a + d; a + 2d; : : : ; a + (k − 1)d} of length k which is entirely contained in either S1 or in S2 : In other words, ||P ∩ S1 | − |P ∩ S2 || = |P| = k: Though these two theorems have pretty much the same structure, there is a fundamental dierence: in Van der Waerden’s theorem the discrepancy is as large as possible. On the other hand, the length k of the ‘monochromatic’ arithmetic progression in terms of the length N of the underlying interval [1; N ] is very short: it is known that log N ¿ k ¿ log log log log log log N: The six times iterated logarithm lower bound is a remarkable very recent result of the 1998 Fields-medalist Gowers. Next consider the following basic theorem from geometric discrepancy theory. Schmidt’s Theorems on Rectangles: Let P be an arbitrary set of N points in the unit square. (i) Then there exists an axis-parallel rectangle A in the unit square such that ||P ∩ A| − N area(A)| ¿
1 log N ; 100
(ii) There exists a tilted rectangle B in the unit square such that ||P ∩ B| − N area(B)| ¿ N 1=4− : Both statements (i) and (ii) are best possible. We mention that circles have roughly the same discrepancy as tilted rectangles, and even for the class of all possible convex sets in the unit square the discrepancy is less than N 1=3+ . Is there any natural geometric shape which has ‘extra large’ discrepancy? Well, the answer is yes if we change the normalization, and instead of taking N -element point sets in the unit square, we consider in nite point sets of density one in the whole plane. ‘Extra Large Discrepancy’ Question: Let be an arbitrarily large real number. Does there exist a set S of area such that for any point set P of density one on the plane there is a congruent copy S 0 of S such that |S 0 ∩ P| ¿ (1 + c);
(1)
J. Beck / Discrete Mathematics 229 (2001) 29–55
31
and there is another congruent copy S 00 of S such that |S 00 ∩ P| ¡ (1 − c)?
(2)
Here c ¿ 0 is an absolute constant independent of ; S; P: We managed to give a positive answer to this question (see [5]). We proved that hyperbola-segments like S = {(x; y) ∈ R2 : 16x6e=2 ; −1=x ¡ y ¡ 1=x}
(3)
satisfy requirements (1) and (2). The proof uses a version of the Roth–Halasz method from discrepancy theory. What is so special about hyperbolas? Well, the shape of the hyperbola ‘resembles’ a lacunary Fourier series with Hadamard gap condition. For these gap series a well-known theorem of S. Sidon states that the maximum is ‘almost as big as possible’. More precisely, the maximum is bigger than a small absolute constant multiple of the sum of the absolute values of the Fourier coecients (which is of course the absolute limit). Lacunary Fourier series with Hadamard gap condition behave like ‘random series’ (for a similar idea, see (4) in Section 3, and the argument after it.) Sidon theorem corresponds to ‘extra large deviation’ in the following sense: tossing a coin n times it is extremely unlikely to have n heads, or even more than 51% heads, but the probability is positive, so it can happen on a very-very long run. The whole Section 2 is devoted to the probabilistic aspects. Open Problem 1: Does there exist a set S in the plane such that its translates alone satisfy requirements (1) and (2), respectively? We can answer the analogous question for 2-colorings of the lattice points Z2 : Given any 2-coloring f : Z2 → {−1; +1} of the lattice points and given an arbitrarily large number , there is a plane set S of area such that X f(n) ¿ c · : (4) sup 2 C∈R n∈Z2 ∩(S+C) √ Here S is a tilted copy of the hyperbola-segment S with slope 2 (the slope can be any other quadratic irrational), S + C is the translated by vector C copy of S, and c ¿ 0 is a universal constant. (4) is an ‘almost Van der Waerden’ theorem for translated copies. This is a kind of new phenomenon, because in Ramsey theory magni cation plays an absolutely crucial role in the proofs. Open Problem 2: Does there exist a convex set S such that its congruent copies S 0 and S 00 satisfy requirements (1) and (2), respectively? It is not hard to see that a positive answer to Open Problem 2 implies the positive solution of two famous unsolved problems in geometry.
32
J. Beck / Discrete Mathematics 229 (2001) 29–55
Danzer’s Conjecture (early 1950s): Let P be a point set on the plane such that any convex set of area one contains a point of P: Then the density of P is ∞. Dual Danzer: Let P be a point set of density one on the plane. Then for arbitrarily large n, there is a convex set of area one which contains (at least) n points of P. We cannot help mentioning here an old ‘extra large uctuation’ type problem of Erd˝os and Turan in combinatorial number theory. Erd˝os–TurÃan Conjecture (1930s): Let A = {a1 ; a2 ; a3 ; : : :} ⊂ N be an in nite sequence of integers. Let f(n) denote the number of solutions n = ai + aj : If f(n)¿1 for all n ∈ N, then f(n) cannot be bounded; what is more, f(n) ¿ c · log n for in nitely many n. This conjecture was motivated by the highly irregular behavior of the well-known arithmetical functions X X 1 and d(n) = 1: r(n) = n=x2 +y2
n=x2 −y2
Answering a 20-year-old problem of S. Sidon, in 1954 Erd˝os proved the existence of a sequence A = {a1 ; a2 ; a3 ; : : :} ⊂ N such that for all n c1 · log n ¡ f(n) ¡ c2 · log n: The proof was one of the rst applications of the Probabilistic Method. Erd˝os realized that his method cannot guarantee the stronger requirement f(n) → c ¿ 0: log n This led him to the following Erd˝os Conjecture (1950s): Let A and f(n) be as in the Erd˝os–Turan conjecture above. Then f(n) → c¿0 log n is impossible. Erd˝os oered $500 for a rst solution to any of the last two conjectures.
2. Counting lattice points It is a well-known classical problem to understand the local and global properties of the irrational rotation n mod 1 (n = 1; 2; 3; : : :).
J. Beck / Discrete Mathematics 229 (2001) 29–55
33
A local question is to decide whether an inequality
m 1 1
⇔ − ¡ 2 ||n|| ¡ n(n) n n · (n) or in general, ||n − || ¡
1 n(n)
(1)
has in nitely many integral solutions, and if this is the case, to determine the solutions, or at least to determine the asymptotic number of the integral solutions. Here ||x|| denotes, as usual, the distance of a real x from the nearest integer, and (n) is a positive increasing function of n. A global question is to give a quantitative description of the equidistribution property of the irrational rotation n mod 1: We are going to emphasize the similarities between the local and global aspects by formulating parallel theorems. Consider rst the subclass of quadratic irrationals. √ √They play a very special role in diophantine approximation: numbers like = 2; 3; : : : have the worst rational approximation property in the sense that − p ¿ const() for all rationals p=q q q2 (called ‘badly approximable’ numbers), and at the same time, they are the most uniformly distributed n mod 1 sequences. From algebraic point of view the great advantage is the extremely simple cyclic group structure of the units in the corresponding quadratic eld. Local Case (Lattice points in tilted hyperbola-segments). Consider rst an inhomogeneous Pell inequality like − c6(x + !)2 − 2y2 6c
(2)
where c ¿ 0 and ! ∈ [0; 1) are constants. (Note that the restriction to a symmetric interval in (2) is just a matter of taste — everything works for the general case, too. Similarly, factor 2 can be replaced by any other square-free integer.) In view of the factorization √ √ (3) (x + !)2 − 2y2 = (x + ! − y 2)(x + ! + y 2); the asymptotic number of integral solutions of√(2) heavily depends on√ the local behavior √ of n 2 mod 1: In fact, (2) and (1) with = 2; = ! and = c=2 2 are essentially equivalent. Let f! (c; N ) be the number of integral solutions (x; y) ∈ Z2 of (2) satisfying 16x6N and 16y: In the special case c = 1 and ! = 0; (2) becomes the classical Pell equation. The integral solutions form a cyclic group generated by the smallest solution, so trivally f0 (1; N ) =
log N √ + O(1): log(1 + 2)
34
J. Beck / Discrete Mathematics 229 (2001) 29–55
In general, by a theorem of Lang [18], for every homogeneous Pell inequality the asymptotic number of integral solutions behaves like const · log N with uniformly bounded uctuations. In contrast to this ‘deterministic behavior’ in the homogeneous case ! = 0; the inhomogeneous case turns out to be entirely dierent: for a ‘randomly chosen’ ! the asymptotic number of solutions f! (c; N ) as N → ∞ possesses ‘perfect randomness’. The two basic parameters of a random variable are the mean value and the variance. The mean value of f! (c; N ) as 06! ¡ 1 is Z 1 c (4) f! (c; N ) d! = √ log N + O(1): 2 0 Formula (4) expresses the simple fact that the average number of lattice points contained in all the translated copies of a hyperbola domain is precisely the area of the domain. Since for any 1 ¡ M ¡ N; N f! (c; N ) − f! (c; M ) log N − log M = log ; M it is more natural to use the exponential scaling f! (c; eN ) instead of the linear one. The variance comes from the following limit formula: for any c ¿ 0 there is a positive constant = (c) ¿ 0 such that 2 Z 1 1 c N f! (c; e ) − √ N d! = 2 (c): (5) lim N →∞ N 0 2 √ Formula (5) follows from Fourier analysis by using the arithmetic of Q( 2): The rst result is a Central limit theorem, see [5]. Theorem 1.1. For any c ¿ 0; the renormalized counting function f! (c; eN ) − √c2 N √ ; 06! ¡ 1 (c) N has a standard normal limit distribution as N → ∞; that is; for every ÿxed value of ; ) ( Z f! (c; eN ) − √c2 N 2 1 √ √ 6 → e−u =2 du Lebesgue-measure ! ∈ [0; 1]: (c) N 2 −∞ as N → ∞: Here (c) ¿ 0 is a constant depending on c only; see (5). What is the intuition behind Theorem 1.1? Write gj (!) = f! (c; ej ) − f! (c; ej−1 );
j = 1; 2; : : : ; N:
That is, gj (!) is the number of the integral solutions n ∈ N of (2) satisfying ej−1 ¡ n6ej : The key observation is that function gj (!) resembles the jth Rademacher function, so the sum N X c c ; gj (!) − √ f! (c; eN ) − √ N = 2 2 j=1
J. Beck / Discrete Mathematics 229 (2001) 29–55
35
as a function of ! ∈ [0; 1); behaves like a sum of ≈ N independent Bernoulli variables c (6) f! (c; eN ) − √ N ≈ ±1 ± 1 ± · · · ± 1 (≈ N terms): 2 Let us go back to Theorem 1.1: what can we say about the sequence of those n’s which satisfy inequality (7) below (note that −∞ ¡ 1 ¡ 2 ¡ ∞ are xed) √ √ c c √ n + 1 n ¡ f! (c; en ) ¡ √ n + 2 n (7) 2 2 for a ‘randomly chosen’ !? We proved that the ‘logarithmic density’ Rof these n’s does exist for every xed 2 −∞ ¡ 1 ¡ 2 ¡ ∞, and equals to (2)−1=2 12 e−u =2 du: Theorem 1.2 (Beck [5]). For almost every ! ∈ [0; 1); Z 2 X 2 1 1 1 e−u =2 du =√ lim N →∞ log N n 2 1 16n6N :
(8)
n satis es (7)
holds for all ÿxed −∞ ¡ 1 ¡ 2 ¡ ∞: Note that the logarithmic density law for the random walk was discovered by Paul Levy (see Chapter 3 in [8]) in the late 1930s. As an analog of Khintchine’s famous Law of the iterated logarithm for the oscillation of the random walk (see, e.g. Chapter 8 in [8]), we can show that the number of solutions f! (c; en ) of (2) oscillates between the sharp bounds √ p √ p c c √ n − n 2 log log n ¡ f! (c; en ) ¡ √ n + n 2 log log n 2 2 as n → ∞ for almost every !: More precisely, given arbitrarily small but xed ¿ 0; for almost every !; the inequalities √ p √ p c c √ n − n (2 + )log log n ¡ f! (c; en ) ¡ √ n + n (2 + )log log n 2 2 hold for all suciently large n; i.e. if n ¿ n0 (!; ): On the other hand, given arbitrarily small but xed ¿ 0; for almost every !; √ p c √ n − n (2 − )log log n ¿ f! (c; en ) 2 holds for in nitely many values of n. Similarly, √ p c f! (c; en ) ¿ √ n + n (2 − )log log n 2 holds for in nitely many values of n as well. What we actually formulate is the ‘ultimate’ Kolmogorov–Erd˝os form which contains the Khintchine’s form as an easy corollary.
36
J. Beck / Discrete Mathematics 229 (2001) 29–55
Theorem 1.3 (Beck [5]). Let (n) be an arbitrary positive increasing function of n: For almost every !; √ c (9) f! (c; en ) ¿ √ n + (n) n 2 holds for inÿnitely many n’s if and only if the series ∞ X (n) n=1
n
e−
2
(n)=2
diverges:
Exactly the same holds for the other inequality √ c f! (c; en ) ¡ √ n − (n) n: 2
(90 )
It is well-known that both the Law of the iterated logarithm and Levy’s law (8) for the random walk are distant consequences of the Central limit theorem. p In particular, the Law of the iterated logarithm corresponds to the case = (N ) ≈ 2 log log N in the Central limit theorem. Note that for even moderately large values of like ≈ log N , the Central limit theorem ‘becomes’ the following limit problem: √ Lebesgue-measure{! ∈ [0; 1): √c2 N + N ¡ f! (c; en )} R∞ →1 (10) √1 e−u2 =2 du 2 where both and N tend to in nity. Results like (10) are called ‘Large deviation’ theorems in Probability theory (see e.g. [9]). Theorem 1.4 (Beck [5]). If =(N ) varies with N in such a way that 0 ¡ ¡ N const (where const means a ‘small’ positive absolute constant); then (10) is true. Changing into − we obtain exactly the same result for the ‘left tail’. √ However, if is around N , (10) fails even in the ‘ideal case’ of independent random variables. So the following exciting question arises: Let c ¿ 0 be xed. Is it possible that the ‘Law of large numbers’ c f! (c; en ) →√ as n → ∞ (11) n 2 holds for every single ! ∈ [0; 1)? Of course, Theorem 1.3 implies (11) for almost every !. The answer to this question is negative. Theorem 1.5 (Beck [5]). For every c ¿ 0 there are continuum many ‘divergence points’ !∗ = !∗ (c) ∈ [0; 1) such that c f!∗ (c; en ) ¿√ n 2 n→∞
(12)
c f!∗ (c; en ) ¡√ : n→∞ n 2
(120 )
lim sup and
lim inf
J. Beck / Discrete Mathematics 229 (2001) 29–55
37
It is easy to see that the uctuation in (12) is as large as possible, so (12) can be interpreted as an ‘extra large discrepancy’ result (see Section 2). Beyond quadratic irrationals: In view of factorization (3), (2) and the inhomogeneous inequality ||n − || ¡ =2n
(13)
are essentially equivalent. So Theorems 1.1–1.5 describe the√randomness of the asymptotic number of the integral solutions of (13) when = 2, or any other quadratic irrational, and is the variable. These results are corollaries of (6), which is an ‘almost independence’ property in the inhomogeneous local case. Of course, the analog of Theorems 1.1–1.5 (and of (6)) must be true for a much bigger class of ’s. To obtain a ‘simple’ norming factor (see (5)), we need some kind of ‘regularity’ for the sequence of the partial quotients a1 ; a2 ; a3 ; : : : of . For example, e = [2; 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ] = [2; : : : ; 1; 2i; 1; : : : ] √
e = [1; 1; 1; 1; 5; 1; 1; 9; 1; 1; 13; 1; 1; : : : ] = [1; : : : ; 1; 4i + 1; 1; : : : ]
(14) (15)
e2 = [7; 2; 1; 1; 3; 18; 5; 1; 1; 6; 30; : : : ] = [7; : : : ; 3i − 1; 1; 1; 3i; 12i + 6; : : : ] and so on. For these special numbers we can prove the perfect analog of the probabilistic theorems above. What about the class of ‘badly approximable’ numbers, (i.e., ai = O(1))? The only technical, or aesthetic problem with badly approximable numbers is as follows: though the Central limit theorem and the other √ results hold true, the norming factor oscillates between two constant multiples of N , and the limit in (5) not necessarily exists. In other words, for this class the Central limit theorem (and the others) are not so ‘elegant’. Global Case (Lattice points in tilted √ rectangles): Theorems 1.1–1.5 describe the 2 mod 1. We have surprisingly similar results ‘randomness’ of the local behavior of n √ on the global behavior of n 2 mod 1; too. The global version of a local inequality ||n − || ¡ =2n is ||n − || ¡ =2
(16)
where 0 ¡ ¡ 1 is a constant. What (16) means the following. If is irrational then the sequence n mod 1 is uniformly distributed in the unit [0; 1) interval. So among the rst N members of the n mod 1 sequence, there are around N in the interval I = ( − =2; + =2) (mod 1) of length . Formally, let G(; ; ; N ) denote the number of integral solutions n ∈ N of (16) satisfying 16n6N: The equidistribution theorem states that for every irrational and for every interval I = ( − =2; + =2) of length |I | = ¡ 1; G(; ; ; N ) → |I | = as N → ∞: N
38
J. Beck / Discrete Mathematics 229 (2001) 29–55
It is, therefore, natural to study the discrepancy D(; I ; N ) = D(; ; ; N ) = G(; ; ; N ) − N
(17)
where I = ( − =2; + =2) runs over all the subintervals of [0; 1). The classical works of Hardy and Littlewood [12,13], and Ostrowski [20] give an upper bound on D(; I ; N ) in terms of the partial quotients ai of : Let pi =qi be the ith convergent of , let s be de ned by qs 6N ¡ qs+1 , and write as+1 (N ) = N=qs : Now the upper bound is max |D(; I ; N )| ¡ 3(a1 + a2 + · · · + as + as+1 (N )): I
(18)
The intuition about the global behavior of n mod 1 is as follows: For a ÿxed irrational , the discrepancy function D(; I ; N ) behaves like a sum of independent random variables D(; I ; N ) ≈ ±a1 ± a2 ± · · · ± as ± as+1 (N )
(19)
as both n and I vary in 16n6N and I ⊂ [0; 1), respectively. (That is, we have 3 variables.) Heuristic (19) is a sort of global analog of the local heuristic (6). Let us emphasize that the ‘almost independence’ in (6) and (19) are completely dierent from the well-known almost independence property of the distribution of the of Kusmin, partial quotients ai =ai () as varies in an interval (see the classical works √ Paul Levy and Khintchine in [16]). In our case is xed (like = 2). It is an important program to justify√intuition (19) for every single . We worked out the details in the special case = 2 (see [1]), and as a corollary obtained that the renormalised discrepancy function √ D( 2; ; ; n) q √ log N √ 12 2 log(1+ 2)
has a standard normal limit distribution as 06 ¡ 1; 06 ¡ 1; 16n6N: In other words, we have a 3-parameter central limit theorem: for every xed −∞ ¡ ¡ ∞, p √ volume{( ; ; ) ∈ [0; 1)3 : D( 2; ; ; N ) ¡ c0 log N } 1 →√ 2
Z
−∞
e−u
2
=2
du
(20)
√ √ as N → ∞. Here c0 = 1=12 2 log(1 + 2). Observe that (20) is √ a sort of global version of Theorem 1.1. Of course the proof of the special case = 2 easily extends to the whole class of quadratic irrationals. Can we get a similar 3-parameter central limit theorem for a ‘randomly chosen’ ? The answer is rather surprising: in contrast to the local case, in the global case we cannot expect a normal limit distribution. The reason is that a necessary condition for normal limit distribution is the relation a21
max16i6s a2i →0 + a22 + · · · + a2s
as s → ∞:
(21)
J. Beck / Discrete Mathematics 229 (2001) 29–55
39
This means that the components are ‘individually negligible’. However, a basic theorem in the theory of continued fractions, the so-called Gauss–Kusmin theorem (see [16]), which describes the asymptotic distribution of the partial quotients ai () as i → ∞, implies that (21) actually fails for almost every : Indeed, according to the Gauss– Kusmin theorem, for a typical ∈ [0; 1), among the rst s partial quotients a1 ; a2 ; : : : ; as of (s is ‘large’), we can expect (roughly speaking) one partial quotient between s=2 and s, two partial quotients between s=4 and s=2, four partial quotients between s=8 and s=4, eight partial quotients between s=16 and s=8, and so on. This distribution clearly contradicts (21). As we already said, (20) is a global version of Theorem 1.1. The global version of the ‘large deviation’ Theorem 1.4 is as follows. The problem is to prove the limit of the ratio q √ N √ } volume{( ; ; ) ∈ [0; 1)3 : D( 2; ; ; N ) ¿ 12√2 log log(1+ 2) R →1 (22) ∞ 2 √1 e−u =2 du 2 where both and N tend to in nity. Theorem 2.1 (Beck [6]). If = (N ) varies with N in such a way that 0 ¡ ¡ (log N )const (where const means a ‘small’ positive absolute constant); then (22) is true. Changing into − we obtain exactly the same result for the ‘left tail’. Next we formulate the analogs of Theorems 1.2–1.3. What can we say about the set of those positive real numbers x ∈ R+ which satisfy inequality (23) below √ D( 2; I ; ex ) ¡ 2 (23) 1 ¡ q x √ √ 12 2 log(1+ 2)
for a ‘randomly chosen’ subinterval I of [0; 1)? Theorem 2.2 (Beck [6]). For almost every subinterval I of [0; 1); Z 2 Z 2 1 1 dx =√ e−u =2 du lim N →∞ log N x 2 1
(24)
06x6N : x satis es (23)
holds for all −∞ ¡ 1 ¡ 2 ¡ ∞: Why did we switch to real numbers (and to an integral) in Theorem 2.2? The reason is a basic dierence between the local and global counting functions. We recall: if 16M ¡ N then N f! (c; N ) − f! (c; M ) log N − log M = log ; M and so f! (c; n) does not change more than O(1) as (say) N=e ¡ n6N: It is, therefore, enough to study the asymptotic behavior of f! (c; N ) on an exponentially rare
40
J. Beck / Discrete Mathematics 229 (2001) 29–55
√ subsequence like N =en ; n ∈ N. On the other hand, for the global function G( 2; ; ; N ): if 0 ¡ M ¡ N then √ √ G( 2; ; ; N ) − G( 2; ; ; M ) log(N − M ): √ Hence the global discrepancy function D( 2;√ ; ; n) might change as much as its maximum log N on a short interval like N − N ¡ n ¡ N: So in the global case we cannot restrict ourselves to an exponentially rare subsequence of integers. The ‘global law of the iterated logarithm’ goes as follows. Theorem 2.3 (Beck [6]). Let (n) be an arbitrary positive increasing function of n. Consider the set {x ∈ [0; ∞): x satis es (25)} where
r √ D( 2; I ; ex ) ¿ (x)
x √ √ : 12 2 log(1 + 2)
(25)
For almost every subinterval I of [0; 1); the Lebesgue measure of the set {x ∈ [0; ∞): x satis es (25)} is ∞ if and only if the integral Z ∞ (x) −2 (x)=2 d x diverges: e x 1 Exactly the same holds for the other inequality r √ x √ √ : D( 2; I ; ex ) ¡ − (x) 12 2 log(1 + 2)
(250 )
Remark: Switching back to the linear scaling we have: Lebesgue-measure{x ∈ [0; ∞): x satis es (25) with ex = y} = ∞ Z dy = ∞: ⇔ {y∈[1;∞): x=log y satis es (25)} y In other words, the logarithmic density is in nite. Now if we drop the rather strong P0 in nite logarithmic density 1=n = ∞, and just want in nitely many √ requirement integersp n for which D( 2; I ; n) is ‘large’, then the uctuation becomes much bigger than ≈ log n · log log log n, what the law of the iterated logarithm gives. Indeed, the
uctuation becomes as large as log n, i.e. as large as possible. In fact Vera T. Sos [24] proved that for every (not just for quadratic irrationals) and for almost every subinterval I of [0; 1), D(; I ; n) ¿ 0: (26) lim sup log n n→∞ Note that Halasz [11] and independently Tijdeman and Wagner [25] proved far-reaching generalizations of (26) for arbitrary sequences (not just for the n mod 1 sequences). Sos’ theorem can be interpreted as a global ‘extra large discrepancy’ result. In other words, it is a sort of global version of Theorem 1.5.
J. Beck / Discrete Mathematics 229 (2001) 29–55
41
Simultaneous Case: The main diculty is that the theory of continued fractions does not seem to extend to higher dimensions. Some of the most natural problems, like the famous Littlewood’s conjecture, are still completely hopeless. Here we just mention one result which extends a 70-year-old theorem of Khintchine [17] to higher dimensions (see [4]). Khintchine proved, by using continued fractions, that for almost every , the usual interval discrepancy of the n mod 1 sequence is between log n · (log log n)1± . We proved, by using completely dierent approach, that for almost every (1 ; : : : ; k ) ∈ Rk , the usual box discrepancy of the (n1 ; : : : ; nk ) mod 1 sequence is between (log n)k · (log log n)1± . Similarly to Khintchine, we in fact proved a precise ‘convergence–divergence criterion’ (i.e., a Borel– Cantelli type theorem). Note that one can easily formulate the analogs of the above-mentioned ‘onedimensional’ probabilistic local and global theorems in the simultaneous case. We are pessimistic: we do not believe in any normal limit distribution in higher dimensions. The reason is that the same diculty appears here as in Littlewood’s notoriously intractable problem lim inf n||n||||n || = 0 for any pair reals and . Note that there were some earlier attempts to give a systematic study of probabilistic methods in diophantine approximations (see, e.g. Kac [14] and Kemperman [15]), but they were discussing n (mod 1)√for almost every only, and did not say anything about concrete sequences like n 2 (mod 1).
3. Continued fractions and quadratic ÿelds In Section 2 we studied the local and global behavior of sequence√n (mod √ 1) for concrete values of . So far it did not make any dierence that =√ 2 or 3:√But there is a substantial dierence between the real quadratic elds Q( 2) and Q( 3): the corresponding fundamental units have norm 1 and −1, respectively. In other words, in contrast to x2 − 2y2 = −1 which has in nitely many integral solutions, the equation x2 − 3y2 = −1 does not have integer solution. This simple fact will play an important role in what follows. Consider the classical diophantine series ( is a xed irrational and {: : :} stands for the fractional part) S (n) =
n X
({k} − 1=2):
(1)
k=1
This series has been thoroughly discussed by Hardy and Littlewood, Hecke, Ostrowski, Behnke, and more recently by Vera T. Sos and others. They concentrated on the maximum uctuations as n → ∞. We focused on the typical uctuations, and managed to prove an elegant Central limit theorem √ for individual ’s, including the class of quadratic irrationals. In particular for = 2 this Central limit theorem goes as follows, see [5].
42
J. Beck / Discrete Mathematics 229 (2001) 29–55
Theorem 3.1. There is a positive absolute constant c such that ( ) Z S√2 (n) 2 1 1 e−u =2 du 6 → √ 16n6N : p N 2 −∞ c log n for every ÿxed value of as N → ∞. The reason behind a Central limit theorem is usually some kind of ‘independence’. Where does the ‘independence’ in Theorem 3.1 come from? To understand it, consider the jth convergent denominator qj of √
2=1+
1 2+
1 2+···
= [1; 2; 2; 2; : : : ] :
√ √ 1 (2) qj = √ ((1 + 2)j − (1 − 2) j ): 2 2 √ √ Note that replacing 2 with the golden ratio ( 5 − 1)=2; qj becomes the jth Fibonacci number. It is well-known that every positive integer can be uniquely represented as a sum of distinct Fibonacci numbers if we do not allow to take two consecutive ones. Similarly, every positive integer n can be uniquely written in the form n=
l X
bj qj ;
(3)
j=0
where bj = bj (n) ∈ {0; 1; 2}, if we make the extra restriction bi+1 = 2 ⇒ bi = 0: Let t1 = t1 (n) =
bl ql X
√ ({k 2} − 1=2);
k=1 bl ql +bl−1 ql−1
t2 = t2 (n) =
X
√ ({k 2} − 1=2);
k=bl ql +1 bl ql +bl−1 ql−1 +bl−2 ql−2
t3 = t3 (n) =
X
√ ({k 2} − 1=2);
k=bl ql +bl−1 ql−1 +1
and so on. The last term is n X tl+1 = tl+1 (n) =
√ ({k 2} − 1=2):
k=n−b0 q0 +1
Thus by (3) we get the following decomposition of sum (1): S√2 (n) = t1 (n) + t2 (n) + t3 (n) + · · · + tl+1 (n):
(4)
Note that here l ≈ log n, and each term ti = ti (n) is between −2 and 2. But the crucial observation is that the terms ti = ti (n) are ‘almost independent’ in the precise sense
J. Beck / Discrete Mathematics 229 (2001) 29–55
43
that the correlation between ti = ti (n) and ti+d = ti+d (n) is going down exponentially fast (as a function of d) as n runs through the integers 1; 2; : : : ; N . In this sense decomposition (4) resembles the partial sums of a lacunary Fourier series like ∞ X √ i+1 sin(i x); ¿1 + 2; i i=1
for which a Central limit theorem was proved in the 1940 –1950. An adaptation of the proof of this classical result gives Theorem 3.1. √ √ Now what happens if in Theorem 3.1 we replace 2 with 3? Well, the big difference is that the mean value of n X √ ({k 3} − 1=2) S√3 (n) = k=1
is not zero: N log N 1 X √ √ + O(1): S 3 (n) = N +1 12 log(2 + 3) n=0
(5)
P √ The left-hand side of (5) is called the Cesaro mean of the series ({n 3} − 1=2). In general, let us de ne the Cesaro mean of (1): N N X n 1 X ({n} − 1=2): (6) 1− S (n) = T (N ) = N +1 N +1 n=0 n=1 √ (N ) is O(1) for = 2; but it is constant What is the reason that the Cesaro mean T √ times log N for = 3? What can we say about the Cesaro mean T (N ) for an arbitrary real ? Well, this question has a surprisingly simple and elegant answer. Theorem 3.2. For arbitrary N T (N ) =
−a1 + a2 − a3 ± · · · + (−1)l al + O max ai ; 16i6l 12
where has the continued fraction expansion 1 = [a0 ; a1 ; a2 ; a3 ; : : : ]; = a0 + 1 a1 + a2 +··· qi is the ith convergent denominator of ; and l is the ÿrst index for which ql ¿N . Remark: In particular, if N = ql − 1 then we can prove bounded error term: l T (N ) − −a1 + a2 − a3 ± · · · + (−1) al ¡ 20: 12 A proof of this special case can be found in [3]. In Appendix we are going to derive Theorem 3.2 from this particular case.
44
J. Beck / Discrete Mathematics 229 (2001) 29–55
Now√we have a perfect understanding of the dierence between = For 3 = [1; 1; 2; 1; 2; 1; 2; : : : ] the period is 1; 2. Since √ √ p2i−1 ± 3q2i−1 = (2 ± 3)i ;
√ √ 2 and 3:
we have
√ √ 1 q2i−1 = √ ((2 + 3)i − (2 − 3)i ): 2 3 Therefore, √ 1 q2i−1 ≈ √ (2 + 3)i ≈ N 2 3 √ implies that i = log N=log(2 + 3) + O(1). By Theorem 3.2 the Cesaro mean equals T√3 (N ) =
log N −1 + 2 − 1 + 2 ∓ · · · − 1 + 2 −1 + 2 √ + O(1); = 12 12 log(2 + 3)
proving (5). √ On the other hand, for 2 = [1; 2; 2; 2; : : : ] the alternating sum −2 + 2 − 2 + 2 ∓ · · · in Theorem 3.2 cancels out. This is the reason why for any quadratic irrational for which the length of the continued fraction period is odd, the Cesaro mean T (N )=O(1). √ Is this observation useful? Yes, for the subclass p; p prime, we know exactly the parity of the length of the period: odd if p ≡ 1 (mod 4), even if p ≡ 3 (mod 4). Unfortunately, we do not have a good characterization like this for the whole class of quadratic irrationals. How about special numbers like e = [2; 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ; 1; 2i; 1; : : : ]? Well, the alternating sum (−1+2−1)+(1−4+1)+(−1+6−1)+· · ·+(−1)i (1−2i+1) equals i − 1 if i is odd, and −i if i is even. Thus by Theorem 3.2 we have Te (N ) = O(log N=log log N );
(7)
which is the true order of magnitude. √ Cesaro sum of (n − 1=2) and quadratic elds: In view of Theorem 3.2 the Cesaro sum T (N ) has a surprisingly simple formula. The proof is not easy, but it was equally dicult to ÿnd the right conjecture. What was the motivation to guess the right formula? To explain it, we discuss an alternative approach to gure out the Cesaro mean T (N ): We start with the well-known Fourier series expansion of the fractional part function (warning: it is not absolutely convergent) ∞
{x} =
1 X sin(2nx) − : 2 n
(8)
n=1
Substituting it back to (6), after some straightforward manipulations we obtain N 1 1 X + O max ai ; T (N ) = − 16i6l 2 n tan(n) n=1
(9)
J. Beck / Discrete Mathematics 229 (2001) 29–55
45
√ where l is the least index for which ql ¿N: Let√ = d; d square-free positive integer. We clearly have (m is the nearest integer to n d): √ √ √ −(m2 − dn2 ) 1 √ : tan(n d) ≈ ±||n d|| = n d − m ≈ 2n d In view of (9) and (10), the following formula is not too surprising: √ N N dXX 1 + O(1): T√d (N ) = 2 x2 − dy2 x=1 y=1
(10)
(11)
√ √ If d ≡ 3 (mod 4) then x2 −dy2 is the norm of the algebraic integer x + dy in Q( d). We recall some basic de nitions from the theory of quadratic elds. Let √ D be a square-free positive or√negative integer, and consider the quadratic eld Q( D). The 2 or 3 (mod 4), and D if D ≡ 1 (mod 4). discriminant of Q( D) is 4D if D ≡ √ √ (a + b D)=2 is an algebraic integer in Q( D) i a and b ∈ Z are integers satisfying a ≡ b ≡ 0 (mod 2) when D ≡ 2 or 3 (mod 4), and a ≡ b (mod 2) when D ≡ 1 (mod 4). So the norm √ √ a + b D a − b D a2 − b2 D = 2 2 4 √ √ of (a + b D)=2 is always an integer. An algebraic√integer in Q( D) is called a√unit if its norm is ±1. There exists a unit = D in Q( D) such that any unit in Q( D) is n representable √ as ± ; n=0; ±1; ±2; : : : : This number =D is called the fundamental unit in Q( D). Let F(x; y) = ax2 + bxy + cy2 be an integral binary quadratic form of discriminant b2 − 4ac (a; b; c ∈ Z are integers). If an integral binary quadratic form F(x; y) is transformed into the form F1 (x1 ; y1 ) by an integral unimodular transformation x=Ux1 + Vy1 , y =Wx1 +Zy1 where UZ −VW =1, then F and F1 are called equivalent. The class number h(D) is basically the number of non-equivalent integral binary quadratic forms of discriminant . More precisely, by computing the class number we do not distinguish a quadratic form from its negative, though they might be non-equivalent (which is exactly the case if D ¿ 0 and x2 − Dy2 = −1 does not have an integer solution). For example, let D=79; then the discriminant is 4×79=316; and there are 6 non-equivalent integral binary forms of discriminant 316: F1 =x2 −79y2 ; −F1 =−x2 +79y2 ; F2 =3x2 + 2 ; −F3 =−3x2 −2xy+26y2 . 4xy−25y2 ; −F2 =−3x2 −4xy+25y2 ; F3 =3x2 +2xy−26y √ So the class number h(79) of the√quadratic eld Q( 79) is 3 (and not 6). If h(D) = 1 then the algebraic integers in Q( D) have unique factorization into algebraic primes. √ The ‘ rst’ quadratic eld with class number ¿ 1 is Q( −5). The discriminant is 4 × (−5) = −20, and there are two non-equivalent integral binary quadratic forms of discriminant −20: x2 + 5y2 and 2x2 + 2xy + 3y2 . So the class number h(−5) is 2. A counter-example to the unique prime factorization is √ √ (1 + −5)(1 − −5) = 6 = 2 × 3; √ √ √ where all the 4 factors (1 + −5); (1 − −5); 2, and 3 are primes in Q( −5).
46
J. Beck / Discrete Mathematics 229 (2001) 29–55
Now let us return to (11). If we make the extra hypothesis that h(d) = 1 then the right-hand side of (11) becomes √ d log N L(1; ∗ ) + O(1); (12) 2 log d ∗ where L(1; ∗ ) is a Dirichlet’s √ L-function, is the ‘norm-sign’ character, and d is the fundamental unit of Q( d). It is a well-known fact (a corollary of the Euler product and the quadratic reciprocity theorem) that −4 d ; 0 (n) = : (13) L(1; ∗ ) = L(1; )L(1; 0 ) where (n) = n n
By Dirichlet’s class number formula, L(1; ) =
4
and
L(1; 0 ) =
h(−d) √ : d
(14)
Now this is where the famous Hirzebruch–Meyer–Zagier formula (HMZ formula, in short) enters the story: h(−p) can be expressed in terms of an alternating sum of the √ digits in the period of p. But before formulating the HMZ formula, let us speak rst about continued fractions and quadratic elds in general. The basic fact is that quadratic irrationals are perfectly characterized by the periodicity of their continued fraction expansions. It is well-known how to read out the least solution of the Pell equation x2 − dy2 = 1 from the period of √ d. Moreover, the parity of the length of the period describes the sign of the norm of the fundamental unit: odd length means +1; even length means −1. Combining Dirichlet’s class number formulas with the ineective Siegel theorem we obtain the deep asymptotic formulas h(d) log d = d1=2±j ;
(150 )
h(−d) = d1=2±j ;
(1500 )
where √ elds √ h(d) and √h(−d) are the class numbers of the real and complex quadratic Q( d) and Q( −d), respectively, and d is the fundamental unit of Q( d). Note that the order of magnitude of log d is roughly around the length of the period of the √ continued fraction of d. The beautiful Hirzebruch–Meyer–Zagier formula (HMZ formula) was discovered in the 1970s: h(−p) =
−a1 + a2 − a3 ± · · · + a2s ; 3
where p ≡ 3 (mod 4) prime, h(p) = 1, and a1 ; a2 ; : : : ; a2s forms the period of p ≡ 3 (mod 4) prime, the length 2i of the period has to be even).
√
p (since
J. Beck / Discrete Mathematics 229 (2001) 29–55
47
Combining the HMZ formula with (11) – (14), we conclude T√p (N ) = =
−a1 + a2 ∓ · · · + a2s log N + O(1) 12 log d −a1 + a2 − a3 ± · · · + (−1)l al + O(1); 12
(16)
where l is the least index for which ql ¿N . Now from (16) it was very natural to guess Theorem 3.2 for arbitrary , and this is exactly how we gured out the right conjecture. Because our proof of Theorem 3.2 is completely elementary, reversing the previous argument one obtains an elementary, alternative proof of the HMZ formula. The next application of Theorem 3.2 is a far-reaching generalization of Theorem 3.1, see [6]. For what ’s can we prove an analogous Central limit theorem? Well, the mean value (or rst moment) is N 1 X 1 X S (n) = T (N ) = (−1)i ai + O max ai : i:qi 6N N +1 12 i:q 6N n=0
i
For the variance (or second moment) we do not have a similar elegant formula, but we still have the analog of (9): N
V (N ) =
1 X (S (n) − T (N ))2 N +1 n=0
2 ! N 1 1 X : = 2 +O max ai 16i6l 4 (n tan(n))2
(19)
n=1
of
Note that the right-hand side of (19) is between two absolute constant multiples P 2 i:qi 6N ai .
Theorem 4.1 (General CLT). Let = [a0 ; a1 ; a2 ; : : : ]; and assume a2l Pl
i=1
Then 1 N
a2i
→0
as l → ∞:
(20)
( ) Z 2 S (n) − T (N ) 1 p e−u =2 du 6 → √ 16n6N : 2 −∞ V (N )
for every ÿxed value of as N → ∞. Observe that (20) expresses the apparently necessary condition that the components are ‘individually negligible’ (Lindeberg condition, see e.g. [9]). This is why Theorem 4.1 is the most general result what we can hope for.
48
J. Beck / Discrete Mathematics 229 (2001) 29–55
Appendix A. On the series
P ({n} − 1=2)
The partial sums S (n) =
n X
({k} − 1=2)
k=1
are wildly uctuating as n → ∞; but the mean value of S (n) (i.e. the Cesaro sum of {n} − 1=2): T (N ) =
N −1 N −1 X 1 X n ({n} − 1=2) S (n) = 1− N N n=0
n=0
has a much nicer, smooth behavior. It turns out that the Cesaro sum T (N ) can be expressed in terms of the alternating sum of the partial quotients ai in the continued fraction expansion of = a0 +
1 = [a0 ; a1 ; a2 ; a3 ; : : : ]: 1 a1 + a2 +···
Let pi =qi = [a0 ; a1 ; a2 ; a3 ; : : : ; ai ] denote the ith convergent of . In particular, qi is the ith convergent denominator of : We recall Theorem 3.2. For arbitrary and N; T (N ) =
−a1 + a2 − a3 ± · · · + (−1)l al + O max ai ; 16i6l 12
where l is the ÿrst index for which ql ¿ N; and the implicit constant in the error term is absolute; independent of and N . We are actually going to prove a better error term from which Theorem 3.2 immediately follows. Theorem A: Let ql−1 6N ¡ ql ; then T (N ) =
−a1 + a2 − a3 ± · · · + (−1)l al 12 ql−1 ql−2 ql−3 + al−2 + al−3 + ··· ; + O al + al−1 N N N
where the implicit constant in the error term is absolute; independent of and N . Corollary A.1. Theorem 3:2. Proof: It immediately follows from q3 ql−2 ql−1 q2 q1 + + ··· + 6 61: + N N N N N
J. Beck / Discrete Mathematics 229 (2001) 29–55
49
The ‘weight’ ql−i =N of al−i goes down exponentially fast as i increases, so the contribution of the partial quotient al−i to the error term is negligible if i is ‘large’. By repeated application of the trivial inequality qj−2 =qj 6 12 we obtain the next consequence of Theorem B. Corollary A.2. The error term is less than ! l−1 X al−i 2−i=2 O i=0
where the implicit constant is absolute. If we use the stronger inequality qj−2 =qj 61=aj aj−1 + 1; we obtain a much more complicated but better error term. Corollary A.3: The error term is less than al−5 + al−6 al−3 + al−4 + + ··· O al + al−1 + al−2 + (al−1 al−2 + 1) (al−1 al−2 + 1)(al−3 al−4 + 1) where the implicit constant is absolute. Finally, if we use the ‘natural’ inequality qj−1 =qj 61=aj ; we obtain a more elegant error term than that of Corollary A.3. Corollary A.4: The error term is less than al−2 al−3 al−4 + + + ··· O al + al−1 + al−1 al−1 al−2 al−1 al−2 al−3 where the implicit constant is absolute. Unfortunately Corollary A.4 has an obvious handicap: the product al−1 al−2 al−3 · · · in the denominator is not necessarily ‘large’ if the majority of the partial quotients are equal to 1. This is why in general Corollary 4 is not stronger than Theorem 3.2. (Of course, if all the partial quotients are ¿2 then Corollary A.4 is the best.) We are going to derive Theorem A from the special case where N is precisely a convergent denominator of , i.e. if N = ql . Theorem B (Beck [3]). For every and l; −a1 + a2 − a3 ± · · · + (−1)l al + O(1); 12 where the implicit constant is absolute; independent of and l. T (ql ) =
Deduction of Theorem A from Theorem B: First we recall the well-known recursive formulas on the convergents pi =qi of = [a0 ; a1 ; a2 ; : : : ] : p0 = a0 ; q0 = 1;
p1 = a0 a1 + 1; q1 = a1 ;
and for all i¿1;
and for all i¿1;
pi+1 = ai+1 pi + pi−1 ;
qi+1 = ai+1 qi + qi−1 :
50
J. Beck / Discrete Mathematics 229 (2001) 29–55
Let ji = ji () = qi − pi . It follows from the de nition that ji+1 = ai+1 ji + ji−1 . Let sign(ji ) = ±1 be the usual sign: +1 or −1: It is well-known that for every xed , ji = ji () is an alternating sequence as i = 0; 1; 2; : : : : There is a unique way to express an arbitrary positive integer n as a linear combination of the qi ’s (convergent denominators) as follows: n=
∗ X
bi qi ;
06bi = bi (n)6ai+1 ;
06i6s
where the ∗ indicates the restriction that if bi = ai+1 then bi−1 = 0. We recall the well-known facts that |ji | ¡ 1=qi+1 and ||m||¿||qi−1 || = ji−1
for all 0 ¡ m ¡ qi :
We are going to use a weak version of a classical upper bound on the sum S (n); due to Ostrowski (see [20]). Lemma A.1 (Ostrowski). Let ql 6n ¡ ql+1 ; and write n = ! l X S (n) = O bi ;
P∗
06i6l
bi qi . Then
i=0
where the implicit constant is absolute. We also need the following Lemma A.2. If 06b ¡ ai+1 and 06m ¡ qi then {bqi + m} = bi + {m}: Proof: We distinguish two cases. Case 1: i ¿ 0 and m¿1 In this case, {bqi + m} = {bqi } + {m} since the sum of the fractional parts on the right-hand side is less than 1. Indeed, to prove {bqi } + {m} ¡ 1; we note that ||bqi ||6(ai+1 − 1)||qi || ¡ ai+1 ||qi || = ai+1 i ¿ 0: Since 0 ¿ i+1 = ai+1 i + i−1 ; it follows that |i−1 | ¿ ai+1 i ¿ 0: Therefore, ||bqi || ¡ |i−1 |: On the other hand, ||m||¿||qi−1 || = |i−1 | for all 0 ¡ m ¡ qi : This implies that {m}61 − |i−1 | for all 0 ¡ m ¡ qi . Sumarizing, we have {bqi } + {m} ¡ |i−1 | + (1 − |i−1 |) = 1;
J. Beck / Discrete Mathematics 229 (2001) 29–55
51
which proves {bqi + m} = {bqi } + {m}: It follows that {bqi + m} = {bqi (pi =qi + i =qi )} + {m} = bi + {m}: Case 2: i ¡ 0 and m¿1 Then {bqi + m} = {bqi } + {m} − 1: Indeed, {bqi } = {bqi (pi =qi + i =qi )} = bi + 1; {m} = {m(pi =qi + i =qi )} = {mpi =qi } +
bqi + m i + 1; qi
and {mpi =qi } +
(bqi + m) 1 1 i + 1 ¿ − + 1 = 1; qi qi qi
since bqi + m6ai+1 qi ¡ qi+1 and |i | ¡ 1=qi+1 : Thus we have {bqi + m} = {bqi } + {m} − 1 which equals (1 + bi ) + {m} − 1 = bi + {m}. Finally, if m = 0 then Lemma A.2 is trivial. Now we are ready to derive Theorem A from Theorem B. In Theorem A integer N is not necessarily a convergent denominator. Let qs 6N ¡ qs+1 ; and consider the unique representation of N as a linear combination of the qi ’s as follows: ∗ X
N=
bi qi ;
06bi = bi (N )6ai+1 ;
06i6s
where the ∗ indicates the restriction that if bi = ai+1 then bi−1 = 0: In what follows bi always means bi (N ). We have T (N ) =
N −1 1 X S (n) N n=0
bs qs +bs−1 qs−1 −1 bs qs +bs−1 qs−1 +bs−2 qs−2 −1 bs qs −1 X X 1 X S (n) + S (n) + S (n)+ · · · = N n=0
n=bs qs
= T0 + T1 + T2 + · · · + Tj + · · · where nj +bs−j qs−j −1
Tj =
X
n=nj
S (n);
n=bs qs +bs−1 qs−1
52
J. Beck / Discrete Mathematics 229 (2001) 29–55
and nj = bs qs + bs−1 qs−1 + · · · + bs−j+1 qs−j+1 . In view of Lemma A.2, ! (b+1)qs −1 qs −1 M X X X S (n) = ({bqs + m} − 1=2) S (bqs ) + M =0
n=bqs
m=1
qs −1
=
X
S (bqs ) + Mbs +
M =0
M X
! ({m} − 1=2)
m=1
= qs S (bqs ) + qs T (qs ) +
qs (qs − 1) bs : 2
By Lemma A.1 S (bqs ) = O(b) where the implicit constant is absolute. Therefore, qs −1 2qs −1 bs qs −1 X X 1 X S (n) + S (n) + · · · + T0 = N n=q n=0
=
S (n)
n=(bs −1)qs
s
1 (bs qs T (qs ) + O(b2s qs ) + O(b2s qs2 s )): N
Since bs qs 6N ¡ qs+1 and |s | ¡ 1=qs+1 ; it follows that bs qs (T (qs ) + O(bs )) N where the implicit constant is absolute. Next we estimate T1 ; T2 ; : : : ; Tj ; : : : exactly the same way as we did T0 . P∗ We recall: N = 06i6s bi qi ; 06bi = bi (N )6ai+1 ; is the unique representation of N . (The ∗ indicates the restriction that if bi = ai+1 then bi−1 = 0:) If 06b ¡ bi and 06m ¡ qi , then by repeated application of Lemma A.2, T0 =
{(bs qs + bs−1 qs−1 + · · · + bi+1 qi+1 + bqi + m)} = bs s + {(bs−1 qs−1 + · · · + bi+1 qi+1 + bqi + m)} = bs s + bs−1 s−1 + {(bs−2 qs−2 + · · · + bi+1 qi+1 + bqi + m)} = bs s + bs−1 s−1 + · · · + bi+1 i+1 + bi + {m}: In view of this, with n0 = nj + bqs−j = bs qs + bs−1 qs−1 + · · · + bs−j+1 qs−j+1 + bqs−j ; ! n0 +qs−j −1 qs−j −1 M X X X 0 0 S (n) = ({n + m} − 1=2) S (n ) + n=n0
M =0
m=1
qs−j −1
=
X
M =0
S (n0 ) + M (bs s + · · · + bs−j+1 s−j+1 + bs−j )
J. Beck / Discrete Mathematics 229 (2001) 29–55
+
M X
53
! ({m} − 1=2)
m=1
= qs−j S (n0 ) + qs−j T (qs−j ) +
qs (qs − 1) (bs s + bs−1 s−1 2
+ · · · + bs−j+1 s−j+1 + bs−j ): By Lemma A.1, S (n0 ) = O(bs + bs−1 + · · · + bs−j+1 + b) where the implicit constant is absolute. Therefore, with nj = bs qs + bs−1 qs−1 + · · · + bs−j+1 qs−j+1 ; nj +qs−j −1 2qs−j −1 nj +bs−j qs−j −1 X X X 1 S (n) + S (n) + · · · + S (n) Tj = N n=n n=n +q j
=
j
n=nj +(bs−j −1)qs
s−j
1 (bs−j qs−j T (qs−j ) + bs−j qs−j O(bs + bs−1 + · · · + bs−j+1 + bs−j )) N +
1 2 (bs−j qs−j O(bs s + bs−1 s−1 + · · · + bs−j+1 s−j+1 + bs−j s−j )): N
Since |i | ¡ 1=qi+1 61=qs−j+1 for all s − j6i6s; it follows that bs−j qs−j (T (qs−j ) + O(bs + bs−1 + · · · + bs−j+1 + bs−j )) N where the implicit constant is absolute. P∗ Summarizing, by using Theorem B, with qs 6N = 06i6s bi qi ¡ qs+1 we have Tj =
T (N ) = T0 + T1 + T2 + · · · + Tj + · · · s X
!!
s X bs−j qs−j = N
T (qs−j ) + O
s X bs−j qs−j = N
1 X (−1)k ak + O(1) + O 12
bi
i=s−j
j=0
j=0
s−j
s X
k=1
i=s−j
!! bi
s
1 X (−1)k ak + O(1) = 12 k=1
Ps−1 + O as
l=0
bl ql
N
Ps−2 + as−1
Ps−1 + O bs + bs−1
l=0
N
bl ql
l=0
bl ql
N
Ps−3 + as−2
Ps−2 + bs−2
l=0
N
bl ql
l=0
bl ql
N
! + ···
Ps−3 + bs−3
l=0
N
bl ql
! + ··· :
54
J. Beck / Discrete Mathematics 229 (2001) 29–55
Ps−j Since bs−j 6as−j+1 and l=0 bl ql 6qs−j+1 ; we conclude Ps (−1)k ak T (N ) = k=1 12 qs qs−1 qs−2 qs−3 + as−2 + as−3 + ··· ; +O as+1 + as + as−1 N N N N and Theorem A follows. This completes the argument to derive Theorem A from Theorem B. Finally, a proof of Theorem B can be found in [3]. 4. Uncited references [2,21–23,27,28] References [1] J. Beck, A central limit theorem for quadratic irrational rotations, preprint, 1991, 90 p. [2] J. Beck, Diophantine approximation and quadratic elds, in: Gy˝ory, Peth˝o, Sos (Eds.), Number Theory, Walter de Gruyter GmbH, Berlin, 1998, pp. 55 –93. [3] J. Beck, From probabilistic diophantine approximation to quadratic elds, Random and Quasi-Random Point Sets, Lecture Notes in Statistics, Vol. 138, Springer, New York, 1998, pp. 1– 48. [4] J. Beck, Probabilistic diophantine approximation-Part I: Kronecker sequences, Ann. Math. 140 (1994) 451–502. [5] J. Beck, Probabilistic diophantine approximation-Part II: Local results, preprint, 1998. [6] J. Beck, Probabilistic diophantine approximation-Part III: Global results, preprint, 1998. [7] J. Beck, W.W.L. Chen, Irregularities of Distribution, Cambridge Tracts in Mathematics, Vol. 89, Cambridge University Press, Cambridge, 1987. [8] W. Feller, An Introduction to Probability Theory and its Applications, Vol. 1, 3rd Edition, Wiley, New York, 1968. [9] W. Feller, An Introduction to Probability Theory and its Applications, Vol. 2, 2nd Edition, Wiley, New York, 1971. [10] R.L. Graham, B.L. Rothschild, J.H. Spencer, Ramsey Theory, Wiley-Interscience Series in Discrete Mathematics, Wiley, New York, 1980. [11] G. Halasz, On Roth’s method in the theory of irregularities of point distributions, Recent Progress in Analytic Number Theory, Vol. 2, Academic Press, London, 1981, pp. 79 –94. [12] G. Hardy, J. Littlewood, The lattice-points of a right-angled triangle. I, Proc. London Math. Soc. 3 (1920) 15–36. [13] G. Hardy, J. Littlewood, The lattice-points of a right-angled triangle. II, Abh. Math. Sem. Hamburg 1 (1922) 212–249. [14] M. Kac, Probability methods in some problems of analysis and number theory, Bull. Amer. Math. Soc. 55 (1949) 641–665. [15] J.H.B. Kemperman, Probability methods in the theory of distributions modulo one, Compositio Math. 16 (1964) 106–137. [16] A. Khintchine, Continued Fractions (English translation) P. Noordho, Groningen, The Netherlands 1963. [17] A. Khintchine, Ein Satz u ber Kettenbruche mit arithmetischen Anwendungen, Math. Z. (1923) 289–306. [18] S. Lang, Introduction to Diophantine Approximations, Addison-Wesley, Reading, MA, 1966. [19] J. Matousek, in: Geometric Discrepancy, Algorithms and Combinatorics, Vol. 18, Springer, Berlin, 1999.
J. Beck / Discrete Mathematics 229 (2001) 29–55
55
[20] A. Ostrowski, Bemerkungen zur Theorie der Diophantischen Approximationen. I, Abh. Hamburg Sem. 1 (1922) 77–98. [21] K.F. Roth, On irregularities of distribution, Mathematika 1 (1954) 73–79. [22] W.M. Schmidt, A metrical theorem in diophantine approximation, Canad. J. Math. 12 (1960) 619–631. [23] W.M. Schmidt, Metrical theorems on fractional parts of sequences, Trans. Amer. Math. Soc. 110 (1964) 493–518. [24] V. Sos, On the discrepancy of the sequence {n}, Coll. Math. Soc. Janos Bolyai 13 (1974) 359–367. [25] G. Tijdeman, G. Wagner, A sequence has almost nowhere small discrepancy, Monatshefte Math. 90 (1980) 315–329. [26] H. Weyl, Uber die Gleichverteilung von Zahlen mod Eins, Math. Ann. 77 (1916) 313–352. [27] D. Zagier, Nombres de classes et fractions continues, J. Arithmetiques Bordeaux, Asterisque (24 –25) (1975) 81–97. [28] D. Zagier, Zeta-funktionen und quadratische Korper, Springer, Berlin, 1981, Hochschultext.