JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.1 (1-17)
Theoretical Computer Science ••• (••••) •••–•••
Contents lists available at ScienceDirect
Theoretical Computer Science www.elsevier.com/locate/tcs
On randomised strategies in the λ-calculus ✩,✩✩ Ugo Dal Lago a,b,∗ , Gabriele Vanoni a,b,∗ a b
Università di Bologna, Italy INRIA Sophia Antipolis, France
a r t i c l e
i n f o
Article history: Received 28 January 2019 Received in revised form 18 September 2019 Accepted 19 September 2019 Available online xxxx Keywords: λ-calculus Probabilistic rewriting Reduction strategies
a b s t r a c t In this work we study randomised reduction strategies—a notion already known in the context of abstract reduction systems—for the λ-calculus. We develop a simple framework that allows us to prove a randomised strategy to be positive almost-surely normalising. Then we propose a simple example of randomised strategy for the λ-calculus that has such a property and we show why it is non-trivial with respect to classical deterministic strategies such as leftmost-outermost or rightmost-innermost. We conclude studying this strategy for two sub-λ-calculi, namely those where duplication and erasure are syntactically forbidden, showing some non-trivial properties. © 2019 Elsevier B.V. All rights reserved.
1. Introduction There are different possible strategies you can follow to evaluate expressions. Some are better than others, and bring you to the result in a lower number of steps. Since programs in pure functional languages are essentially expressions, the problem of defining good strategies is particularly interesting. Finding minimal strategies, i.e. strategies that minimise the number of steps to normal form, seems even more interesting. However, the problem of picking the redex leading to the reduction sequence of minimal length has been proven undecidable for the λ-calculus [2, Section 13.5], the core of pure functional programming languages. In the last decades several reduction strategies have been developed. Their importance is crucial in the study of evaluation orders in functional programming languages, and defines an important part of their semantics. The reader can think about the differences between Haskell (call-by-need) and Caml (call-by-value). The λ-calculus is a good abstraction to study reduction mechanisms because of its very simple structure. In fact, although Turing-complete, it can be seen as a rewriting system [3], where terms can be formed in only two ways, by abstraction and application, and only one rewriting rule, the β -rule, is present. Reduction strategies for the λ-calculus are typically defined according to the position of the contracted redex e.g. leftmost-outermost, leftmost-innermost, rightmost-innermost. As the following trivial examples show, the adopted strategy can indeed have a strong impact on evaluation performances, and possibly also on termination behaviour. Example 1. Let I = λx.x, ω = λx.xx and = ωω . We now consider the reduction of the term (λx. I ) according to two different reduction strategies, namely leftmost-outermost (LO) and rightmost-innermost (RI).
✩ ✩✩
*
This work was partially supported by ANR grant Elica ANR-14-CE25-0005 and Inria/JSPS EA grant Crecogi EQA04150. A preliminary version of this paper appeared in the Proceedings of the 19th Italian Conference on Theoretical Computer Science, ICTCS 2018 [1]. Corresponding authors. E-mail addresses:
[email protected] (U. Dal Lago),
[email protected] (G. Vanoni).
https://doi.org/10.1016/j.tcs.2019.09.033 0304-3975/© 2019 Elsevier B.V. All rights reserved.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.2 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
2
(λx. I ) −→LO I (λx. I ) −→RI (λx. I ) −→RI (λx. I ) −→RI · · · The term is a looping combinator i.e. it reduces to itself. However, in (λx. I ) the argument is discarded since the function returns the identity combinator. Thus, leftmost-outermost (akin to call-by-name in functional programming languages) yields a normal form in one step. Conversely, rightmost-innermost (akin to call-by-value) continues to evaluate the argument (λx. I ), though it is useless, and rewrites always the same term, yielding to a non-terminating process. Example 2. We now consider the reduction of the term (λx.xx)( I I ), according to LO and RI strategies, as above.
(λx.xx)( I I ) −→LO ( I I )( I I ) −→LO I ( I I ) −→LO I I −→LO I (λx.xx)( I I ) −→RI (λx.xx) I −→RI I I −→RI I Here the argument I I is duplicated and thus it is much more convenient to reduce it before it is copied, as in rightmostinnermost. Leftmost-outermost does, indeed, some useless work. In general, innermost strategies are considered more efficient, because programs often need to copy their arguments (as in Example 2). However, as seen in Example 1, rightmost-innermost is not normalising: there exist terms which have a normal form which, however, can be missed by innermost strategies. Instead, a classical result by Curry and Feys [4] states that the leftmost-outermost strategy is normalising, i.e. it always rewrites terms to their normal norm, if it exists. Thus, leftmost-outermost is slower, but safer. Could we get, in a sense, the best of both worlds? All reduction strategies for the λ-calculus in the literature up to now are deterministic, i.e. they are (partial) functions on (possibly shared representations of) terms. There is however some work on probabilistic term rewriting systems [5–7], in particular regarding termination, and about randomised strategies in the abstract [8]. What would happen if the redexes to reduce were picked according to some probability distribution? How many steps would a term need to reach a normal form on the average? In this work we consider a simple randomised reduction strategy Pε , where the LO-redex is reduced with probability ε and the RI-redex is reduced with probability 1 − ε . This is not necessarily the most interesting example, but certainly a good starting point in our investigation. The uniform randomised strategy, which picks one between all the redexes in the term uniformly at random looks more natural, although much more difficult to analyse: there is no fixed lower bound on the probability of picking the standard redex, i.e. the leftmost-outermost one. The following are our main results.
• The uniform strategy is not positive almost-surely normalising. • For every, 0 < ε ≤ 1, the strategy Pε is positive almost-surely normalising on weakly normalising terms. That means that if a term M is weakly normalising, then the expected number of reduction steps from M to its normal form with strategy Pε is finite. This is in contrast to the rightmost-innermost strategy, as can be seen from Example 1. Rightmost-innermost, in other words, is the only non-normalising strategy in the family {Pε }0≤ε≤1 , namely P0 . • The function ExpLen M (ε ) representing the average number of steps a term M needs to reach normal form under strategy Pε is a power series. • The family of strategies {Pε }0<ε<1 is shown to be non-trivial. In other words, there exists a class of terms and a real number 0 < μ < 1 for which Pμ outperforms, on average, both LO and RI. This shows that randomisation can indeed be useful in this context. This is not surprising: in computer science there many situations in which adding a random factor improves performances, e.g. in randomised algorithms [9], which are often faster (on average) than any (known) deterministic algorithm. Furthermore we show that also the converse can hold, i.e. there exists a term for which both LO and RI outperform Pε for every 0 < ε < 1. • The expected number of reduction steps to normal form with strategy Pε , seen as a function on ε , has minimum at 1 for terms in the affine λ-calculus λ A and maximum at 1 for λ I -terms. However, this function is neither monotonic nor convex nor concave in either λ A or λ I : already in simple calculi it can be very chaotic. The remainder of this paper is structured as follows. In Section 2, basic definitions and results for the untyped λ-calculus are given. In Section 3 we present our model of fully probabilistic abstract reduction systems, and we give a sufficient condition for positive almost-sure termination. In Section 4 we apply this model to the λ-calculus, defining a randomised reduction strategy and collecting some results. Section 5 concludes the paper with some ideas for further investigations on the subject. Acknowledgements Our interest in randomised strategies comes from a discussion the first author had with Prakash Panangaden. We thank also the anonymous reviewers for their many and insightful comments.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.3 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
3
2. Basic notions and notations The following definitions are standard and are adapted from [3]. Definition 1. Assume a countable infinite set V of variables. The λ-calculus is the language of terms defined by the following grammar:
M , N ::= x ∈ V
|
MN
|
λx. M
We denote by the set of all λ-terms. As usual, λ-terms are taken modulo α -equivalence, which allows to appropriately define the capture-avoiding substitution of all the free occurrences of x for N in M, denoted by M { N /x}. Lemma 1 (Substitution Lemma). For any M , N , L ∈ , if x = y and x is not free in L, then
M { N /x}{ L / y } = M { L / y }{ N { L / y }/x}. Reduction will be defined based on the notion of a context, which needs to be given a formal status. Definition 2. We define (one-hole) contexts by the following grammar:
C , D ::= 2
|
CM
|
MC
|
λx.C
We denote with 2 the set of all contexts. Intuitively, contexts are λ-terms with a hole that can be filled with another λ-term. We indicate with C [ M ] the term obtained by replacing 2 with M in C , contexts can in fact bind variables. Those λ-terms in the form R = (λx. M ) N are called β -reducible expressions or β -redexes and M { N /x} is said to be the contractum of R. This is justified by the following definition. Definition 3. The relation of β -reduction, −→β ⊆ × , is defined as
−→β = {(C [(λx. M ) N ], C [ M { N /x}]) | M , N ∈ , C ∈ 2 }. We denote by −−−β the reflexive and transitive closure of −→β . We introduce some concepts of rewriting theory that will be useful in the following sections. We borrow terminology from [10], except for the fact that we do not label reduction steps. Definition 4 (ARS). An abstract reduction system (ARS) is a pair ( A , →) where A is a set of objects with cardinality at most countable and → ⊆ A × A is the reduction relation. Definition 5 (Sub-ARS). Let A = ( A , →α ) and G = (G , →γ ) be two ARSs. A is a sub-ARS of G if A ⊆ G and →α ⊆→γ . If (a, b) ∈ → for a, b ∈ A in an ARS ( A , →), then we write a → b and if a → a1 → · · · → an−1 → b, we write a →n b. is the reflexive and transitive closure of →. A reduction sequence is a finite or infinite sequence σ : a0 → a1 → · · · . If σ is finite, then |σ | is the length of σ . We say that a ∈ A is in normal form if there exist no b ∈ A such that a → b. We call NF( A ) the subset of A whose elements are the normal forms. An object a is (weakly) normalising if there exists a reduction sequence such that a b and b is in normal form. An object a is strongly normalising if every reduction sequence from a is finite. We can see the λ-calculus defined above as an ARS (, −→β ). We denote by WN the set of weakly normalising terms of and by SN the set of strongly normalising terms of . 2.1. Two subcalculi of Full λ-calculus is very powerful and flexible, but it has a complicated dynamics. Sometimes it is useful to restrict ourselves to a subset of terms. In particular we focus our attention on two subsystems where terms satisfy a predicate on the number of occurrences of free variables. These systems are meaningful because they are stable w.r.t. β -reduction i.e. if M ∈ S and M −→β N then N ∈ S. The interested reader can find a comprehensive treatment in [11].
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
4
[m3G; v1.261; Prn:2/10/2019; 16:15] P.4 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
2.1.1. The λ I -calculus The λ I -calculus was the original calculus studied by Alonzo Church in 1930s [12], and [2] contains a whole section dedicated to it. In λ I -calculus there is no cancellation, in that variables have to occur free at least once when forming abstractions. Terms of the λ I -calculus are not strongly normalising in general. As an example, is a λ I -term. One can prove, however, that on λ I -terms, weak normalisation implies strong normalisation: in other words, all strategies are qualitatively equivalent. In other words the equation WN = SN , although not true in general holds in λ I . This does not mean, however, that all strategies are quantitatively equivalent. 2.1.2. The λ A-calculus The λ A-calculus is the dual of λ I and it is often called affine λ-calculus in the literature. It is a very weak calculus in which variables bound by abstractions occur at most once free in the abstraction’s body, thus forbidding duplication. The λ A-calculus is strongly normalising, in the following strong sense: every reduction sequence from a term M has length bounded by the size of M. In other words, the equation WN = SN = , although not true in general, holds in λ A. 2.2. Reduction strategies ARSs are sets endowed with a relation, and are thus a nondeterministic model of computation. The notion of deterministic reduction strategy allows us to fix one redex among the available ones, thus turning reduction into a deterministic process. Definition 6 (Reduction strategies). A reduction strategy for an ARS A = ( A , →α ) is a sub-ARS ( A , →γ ) of A, indicated with →γ when the set of objects is clear from the context, having the same objects and normal forms of A. Definition 7 (Deterministic ARS). An ARS ( A , →), is deterministic if for each a ∈ A there is at most one b ∈ A such that a → b. Definition 8. Given an ARS ( A , →) and a0 ∈ A, a finite reduction sequence σ : a0 → a1 → · · · → an is under strategy →γ if ai →γ ai +1 for each 0 ≤ i < n. An infinite reduction sequence σ : a0 → a1 → · · · ai → · · · is under strategy →γ if ai →γ ai +1 for each i ≥ 0. Since according to Definition 6 reduction strategies are (sub-)ARSs, deterministic reduction strategies are just reduction strategies which are deterministic ARSs. In the following, we will often employ a different definition of deterministic strategy, which turns out to be equivalent to the previous one, but more convenient. Definition 9 (Deterministic strategies). Given an ARS ( A , →), a deterministic reduction strategy for A is a partial function S : A A such that S(a) is defined iff a is not in normal form and a → S(a) whenever S(a) is defined. If σ : a0 → a1 → · · · → an is a reduction sequence under a deterministic strategy S and an is in normal form, we write StepsS (a0 ) = n = |σ |. If σ : a0 → a1 → · · · is infinite, we say that StepsS (a0 ) = +∞. We can now define two deterministic reduction strategies for the λ-calculus that will be useful in the following sections. Definition 10. Leftmost-outermost (LO) is a deterministic reduction strategy in which LO( M ) = N if and only if M −→β N and the redex contracted in M is the leftmost among the ones in M (measuring the position of a redex by its beginning symbol). If LO( M ) = N, we write M −→LO N. Please note that the leftmost-outermost strategy is not always a deterministic strategy in the more general context of term rewriting systems [3], while in the λ-calculus it is. This will be fundamental in the definition of randomised strategies for the λ-calculus in next sections. Definition 11. Rightmost-innermost (RI) is a deterministic reduction strategy in which RI( M ) = N if and only if M −→β N and the redex contracted in M is the rightmost among the ones in M (measuring the position of a redex by its beginning symbol). Again, if RI( M ) = N, we write M −→RI N. In the remainder of this section, we will give some basic preliminary results about β -reduction, which form the basic blocks of our subsequent analysis. Although standard, we prefer to state and prove such results, for the sake of self-containedness. The following is a technical lemma which basically says that β -reduction and substitution commute. Crucially, the case in which the substituted variable occurs at least once is kept separate from the one in which it occurs at most once, this way allowing for a more informative result. Lemma 2. Let M , N , L ∈ . If M −→β N, then:
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.5 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
5
≤1
1. L { M /x} −→β L { N /x} if x is free at most once in L; ≥1
2. L { M /x} −→β L { N /x} if x is free at least once in L; 3. L { M /x} −−−β L { N /x}; 4. M { L /x} −→β N { L /x}. Proof. 1. We proceed by induction on the structure of L. • L = x. Then L { M /x} = M −→1β N = L { N /x}.
• L = y and y = x. Then L { M /x} = y −→0β y = L { N /x}.
1 • L = λ y . P . Then L { M /x} = λ y . P { M /x}. By induction hypothesis P { M /x} −→≤ β P { N /x} and thus L { M /x} = λ y . P { M /x} 1 −→≤ β λ y . P { N /x} = L { N /x}. • L = P S. Then L { M /x} = P { M /x} S { M /x}. If x has no free occurrences in L, then L { M /x} −→0β L { N /x}. Otherwise
≤1
suppose the only occurrence of x is free in P . Then S { M /x} = S { N /x} and by induction hypothesis P { M /x} −→β
≤1 P { M /x} S { M /x} −→β P { N /x} S { N /x} = L { N /x}. The case in which x is free in S is equivalent.
P { N /x}. Thus L { M /x} = 2. The proof is equivalent to the one above in the variable and abstraction cases. One case is missing. • L = P S. Then L { M /x} = P { M /x} S { M /x}. If x is free only in P or S the proof is the same of the above one. Otherwise let us suppose that x occurs free both in P and in S. Then we can apply twice the induction hypothesis which ≥1 ≥1 ≥1 ≥1 yields P { M /x} −→β P { N /x} and S { M /x} −→β S { N /x}. Thus L { M /x} = P { M /x} S { M /x} −→β P { N /x} S { M /x} −→β P { N /x} S { N /x} = L { N /x}. 3. The result directly follows by the previous two. 4. We proceed by induction on the structure of M. • M = x. This case cannot occur because M is a reducible term. • M = λ y . P . Then M { L /x} = λ y . P { L /x}. Since M −→β N, then N = λ y . S and P −→β S. Thus by induction hypothesis P { L /x} −→β S { L /x}. Then M { L /x} = λ y . P { L /x} −→β λ y . S { L /x} = N { L /x}. • M = P S and the redex fired in M to get N is inside P . Then M { L /x} = P { L /x} S { L /x}, N = V S and P −→β V . By induction hypothesis P { L /x} −→β V { L /x}. Thus M { L /x} = P { L /x} S { L /x} −→β V { L /x} S { L /x} = N { L /x}. • M = (λ y . P ) S and N = P { S / y }. Then M { L /x} = (λ y . P { L /x}) S { L /x} −→β P { L /x}{ S { L /x}/ y }. By Substitution Lemma this last term is equivalent to P { S / y }{ L /x} = N { L /x}. Thus M { L /x} −→β N { L /x}. 2 Next, we need to give some quantitative refinements of the usual confluence result, which are however possible only in fragments of the λ-calculus rather than in the λ-calculus itself. Let us first consider the case of the λ A-calculus. We can observe in this case that the leftmost-outermost strategy has some peculiarity over the other ones, in that every peak N LO ←− M −→β L can be closed by a valley in which the path from N is not longer than the one from L, hinting at the efficiency of LO. Lemma 3. Let M be a λ A-term. If M −→LO N and M −→β L, then there exists a term P such that N −→nβ P , L −→m LO P and n ≤ m ≤ 1. Proof. Let us call R = (λx. S ) V the LO-redex of M and Q the redex fired in the reduction from M to L. Let M = C [ R ]. We proceed by induction on the structure of C .
• C = 2. Then M = R and N = S { V /x}. We distinguish some cases. • R is Q . Then N = L = P . • Q is inside S. Call S the contractum of S obtained firing redex Q . Then L = (λx. S ) V −→LO S { V /x}. By Lemma 2.4, N = S { V /x} −→β S { V /x}. • Q is inside V . Call V the contractum of V obtained firing redex Q . Then L = (λx. S ) V −→LO S { V /x}. By Lemma 2.1, ≤1 N = S { V /x} −→β S { V /x}. • C = A D. Q cannot be inside A, otherwise R would not be the LO-redex. Thus the result follows by induction hypothesis. • C = D A. If Q is inside D [ R ], then the result follows by induction hypothesis. Otherwise R and Q are independent. Then M = D [ R ] E [ Q ]. Moreover, N = D [ R ] E [ Q ] and L = D [ R ] E [ Q ], where R and Q are the contracta of R and Q , respectively. Thus, N = D [ R ] E [ Q ] −→1β D [ R ] E [ Q ] = P and L = D [ R ] E [ Q ] −→1LO D [ R ] E [ Q ] = P . • C = λx. D. The result follows by induction hypothesis. 2 The situation in λ I is dual: peaks can be closed (possibly) faster by not going leftmost outermost. Lemma 4. Let M be a λ I -term. If M −→LO N and M −→β L, then there exists a term P such that N −→nβ P , L −→m LO P and n ≥ m.
JID:TCS AID:12187 /FLA
6
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.6 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
Proof. The proof is equivalent to the one of Lemma 3, using Lemma 2.2 instead of Lemma 2.1.
2
Thirdly, we can consider the case of the full λ-calculus. Here, the only thing which can be said is that peaks can be solved in such a way as to guarantee that the valley has at most unitary length on the LO side. Lemma 5. Let M be a λ-term. If M −→LO N and M −→β L, then there exists a term P such that N −−−β P , L −→m LO P and m ≤ 1. In particular m = 0 if and only if N = L. Proof. The proof is equivalent to the one of Lemma 3, using Lemma 2.3 instead of Lemma 2.1.
2
Finally, we can prove that StepsLO (·), seen as a function on λ-terms, can never increase along any β -reduction. Lemma 6. If M −→β N, then StepsLO ( N ) ≤ StepsLO ( M ). Proof. We argue by induction on StepsLO ( M ). We call R the redex contracted from M to N, L the LO-reduct of M and Q the LO-redex of M. The base of the induction is StepsLO ( M ) = 1 and it is proved as follows. By Lemma 5, there exists a term P such that L −−−β P , N −→m LO P and m ≤ 1. Since StepsLO ( M ) = 1, L is in normal form and thus coincides with P . Thus StepsLO ( N ) ≤ 1. Now suppose this Lemma true for each term P such that StepsLO ( P ) ≤ k. Let us consider a term M with StepsLO ( M ) = k + 1 < ∞. By Lemma 5, there exists a term P such that N −→m −−β P . The case in LO P and L − which N and L coincide and thus m = 0 is trivial, so let us consider distinct N and L, which implies m = 1. StepsLO ( L ) = k. Thus, by induction hypothesis, StepsLO ( P ) ≤ k. StepsLO ( N ) = 1 + StepsLO ( P ) ≤ 1 + k = StepsLO ( M ). If StepsLO ( M ) = ∞, then StepsLO ( N ) ≤ StepsLO ( M ) since StepsLO ( N ) ≤ ∞. 2 2.3. Minimal and maximal strategies in sub-λ-calculi Since we exploit ARSs as models of computation, we are interested in defining quantitative properties on them. If one is interested in efficiency, it makes a lot of sense, of course, to implement an ARS with a strategy that minimises the number of steps to normal form. On the other hand, one could be interested in a worst-case analysis, thus seeking a maximal strategy. Definition 12 ([10]). Let ( A , →) be an ARS and →γ a strategy for it.
• →γ is normalising, if every weakly normalising object only allows finite maximal reduction sequences under →γ . • →γ is minimal, if the length of any reduction sequence under →γ from any object to normal form is minimal among the lengths of all the reduction sequences from the former to the latter.
• →γ is perpetual, if every object which is not strongly normalising only allows infinite maximal reduction sequences under →γ . • →γ is maximal, if the length of any reduction sequence under →γ from any object to normal form is maximal among the lengths of all the reduction sequences from the former to the latter. It is well-known that the minimal and normalising strategy is not computable in the scope of the full λ-calculus [2, Section 13.5], i.e. one could not select the redex leading to the minimal reduction sequence to normal form in an effective way. Instead, there exists an effective maximal and perpetual strategy [13]. In this Section we prove that in λ A (respectively, in λ I ) the leftmost-outermost strategy is minimal and normalising (respectively, maximal and perpetual). In order to prove this, we exploit a result due to van Oostrom [10]. Definition 13. Let ( A , →) be an ARS and →α , →γ two reduction strategies for it. Suppose that for every a ∈ A, if b ←α a →γ c, then either there is an infinite reduction sequence from c under →α or there is d ∈ A such that b →nγ d ←m α c and n ≤ m. We say that →α ordered locally commutes with →γ , abbreviated OLCOM(→α , →γ ). Theorem 1 ([10]). Let ( A , →) be an ARS and →α a reduction strategy for it. 1. If OLCOM(→α , →), then →α is minimal and normalising. 2. If OLCOM(→, →α ), then →α is maximal and perpetual. This way, we have a proof method that allow us to state and prove in a simple way that the LO strategy is minimal and normalising (respectively, maximal and perpetual) in λ A (respectively, in λ I ). Theorem 2. In λ A LO is a minimal and normalising strategy, in λ I it is a maximal and perpetual one.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.7 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
7
Fig. 1. The right-hand side of each diagram represents the reduction sequence under rightmost-innermost strategy. On the left-hand side a shorter (a) and longer (b) reduction sequences are provided.
Proof. In λ A, by Lemma 3, OLCOM(LO, −→β ) holds and in λ I , by Lemma 4, OLCOM(−→β , LO) holds. Then the result follows by Theorem 1. 2 One might guess that a dual result holds, i.e. the rightmost-innermost strategy is minimal for λ I and maximal for λ A. However this is not the case, as witnessed by the two following counterexamples (Fig. 1). The former from [14], the latter provided by Damiano Mazza in a personal communication.
• In λ I , RI is not minimal: M I = (λx.xI )(λx.(λ z. zz)(xy )) reduces rightmost to normal form in five steps, while there exists a reduction sequence to normal form of four steps.
• In λ A, RI is not maximal: M A = (λx.xI )(λx.(λ z. y )(xy )) reduces rightmost to normal form in three steps, while there exists a reduction sequence to normal form of four steps. Please observe that the two counterexamples are built behind very simple principles. In both of them there is the virtual redex xy that is copied (respectively, cancelled) before being contracted, if one reduces rightmost-innermost. 3. Probabilistic abstract reduction systems as strategies We introduce now a framework suitable to define randomised strategies. In particular, we shift the notion of ARS to the fully probabilistic case, where nondeterminism is solved through probabilistic choice, obtaining this way a model akin to Markov chains. The first preliminary concept we need is that of a distribution. Definition 14 (Distribution). A partial probability distribution over a countable set A is a mapping ρ : A → [0, 1] such that ρ (a). We denote the set of partial probability distributions over A by PDist ( A ). The support of a
|ρ | ≤ 1 where |ρ | =
a∈ A
partial distribution ρ ∈ PDist ( A ) is the set Supp (ρ ) = {a ∈ A | ρ (a) > 0}. A probability distribution over a countable set A is a partial probability distribution μ such that |μ| = 1. Dist ( A ) denotes the set of probability distributions over A. Strategies as from Definition 9 are deterministic: the process of picking a reduct among the many possible ones can only have one outcome. But what if this process becomes probabilistic? This is captured by the following notions. Definition 15 (Randomised strategies). Given an ARS ( A , →), a randomised reduction strategy P for ( A , →) is a partial function P : A Dist ( A ) such that P(a) is undefined if and only if a ∈ A is in normal form. Moreover, if P(a) = μ, then Supp (μ) ⊆ {b | a → b}. Intuitively, the last constraint ensures that one can pass with positive probability from an object a to an object b only if a → b. Definition 16 (FPARS). If ( A , →) is an ARS and P is a randomised reduction strategy for it, then we call ( A , P) a fully probabilistic abstract reduction system (FPARS). Our model is called fully probabilistic in that nondeterminism is not taken into account. Indeed PARSs as defined in [5,7] combine a nondeterministic behaviour with a randomised one. FPARSs dynamics instead is purely probabilistic. In the following, we will study randomised strategies seen as FPARSs. The dynamics of an FPARS can be handled by way of an appropriate notion of a configuration, on which an evolution function can be defined.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.8 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
8
Definition 17 (Configurations, computations). Let ( A , P) be an FPARS and a, b ∈ A be two states. We define the probability P (a → b) of a transition from a to b:
P (a → b) =
μ (b) if P (a) = μ, 0
if P (a) is undefined.
A configuration of an FPARS ( A , P) is a partial probability distribution function E : PDist ( A ) → PDist ( A ) defined as follows:
E (ρ ) = σ where σ (a) =
ρ ∈ PDist ( A ). The evolution of an FPARS ( A , P) is the
ρ (b) · P (b → a) for every a ∈ A .
b∈ A
If E (ρ ) = σ we write
ρ σ . A computation is any sequence (ρi )i∈N , such that ρi ρi+1 .
Remark 1. Those computations (ρi )i ∈N where ρ0 is Dirac (i.e. there exists a ∈ A such that ρ0 (a) = 1) are particularly interesting: they model the evolution of an FPARS starting from a state a ∈ A. We write in this case ρ0 = Dirac(a). In the following, we will always consider computations of this type. Remark 2. Please note that if ρ σ , then |σ | ≤ |ρ |. As proposed in [7], the total probability mass decreases along a reduction sequence, becoming 0 when the computation has terminated. In particular, if ρ = Dirac(a) and a is in normal form, then E(ρ ) = σ , where |σ | = 0. Example 3. Consider an ARS ( A , →), where A = {a, b} and → = {(a, a), (a, b)}. We define a randomised strategy P on top of ( A , →). P(a) = μ where μ(a) = μ(b) = 12 , while P(b) is undefined (since b is in normal form). A computation (ρi )i ∈N starting from ρ0 = Dirac(a) has the following form.
1 a 0 b ρ0
1 2 1 2
ρ1
a b
1 4 1 4
a b
···
ρ2
1 2k 1 2k
a b
···
ρk
How could we measure the length of a computation? It is natural to look for a definition capturing the average derivation length from s to its normal form.
Definition 18. Let ( A , P) be an FPARS and
ρ0 = Dirac(a), where a ∈ A. Given the computation (ρi )i∈N , StepsP (a) =
∞ i =1
|ρi |.
The definition above collapses to the one given for the deterministic case when P is deterministic. Moreover, observe that StepsP : A → R ∪ {+∞}, i.e. the average length of a computation could be infinite. A witness of such behaviour is the FPARS ( A , P) where A = a and P(a) = μ such that μ(a) = 1. In the next Section we provide a justification for this definition considering the length of a computation as an appropriately defined random variable. Besides quantitative information on the length, one might just be interested in knowing if a reduction is finite or infinite. Indeed, termination is a crucial problem in rewriting theory. Since we are in a probabilistic context, distinct such notions are possible. We define in our setting two classical termination properties, namely almost-sure termination and positive almost-sure termination [5–7]. Definition 19. An FPARS is almost-surely terminating (AST) if each computation (ρi )i ∈N , is such that
lim |ρn | = 0.
n→+∞
Definition 20. An FPARS ( A , P) is positive almost-surely terminating (PAST) if for each a ∈ A, StepsP (a) < +∞. In this case we say that P is a positive almost-surely normalising strategy. Example 4. Consider the same setting of Example 3. It is easy to see that ( A , P) is AST since Moreover ( A , P) is PAST since StepsP (a) =
∞ n=1
|ρn | =
∞ n=1
1 2n−1
lim |ρn | = lim
n→+∞
1
n −1
n→+∞ 2
= 0.
= 2 and StepsP (b) = 0.
While PAST clearly implies AST, it is well-known from Markov chain literature that AST does not imply PAST e.g. in the symmetric random walk on Z [15, Chapter 1.7]. We prove in our framework a result analogous to a classical Theorem in Markov chain theory due to Foster [16] that gives a sufficient condition for PAST and a bound on the average number of steps to normal form.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.9 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
9
Notation 1. For ε > 0 we write x >ε y if and only if x ≥ y + ε . This order is well-founded on real numbers with a lower bound. Please note that if ε = 0, then the order is not well founded on the reals with a lower bound. For example, one could have an infinite descending chain of strictly positive reals like 1 > 12 > 14 > · · · > 1k > · · · . 2
Definition 21. Given an FPARS ( A , P), we define a function V : A → R as Lyapunov [17] if the following are satisfied. 1. There exists k ∈ R such that V (a) ≥ k for each a ∈ A. 2. There exists ε > 0 such that for every a ∈ A if P (a) = μ, then V (a) >ε V (μ), where V is extended to partial distributions as follows:
V (μ) =
V (b) · μ (b) .
b∈ A
Remark 3. Without loss of generality, given a Lyapunov function V we can always consider a new Lyapunov function W (a) ≥ 0 for each a ∈ A simply adding a constant to V . Thus in the following we will always assume Lyapunov functions to be non-negative. Theorem 3. If we can define for an FPARS P = ( A , P) a Lyapunov function V , then P is PAST and the average derivation length V (a) StepsP (a) of any sequence (ρi )i ∈N starting from any a ∈ A is bounded by ε .
ρi−1 ρi of P . V (k) ρ i −1 ( j ) · P ( j → k )
Proof. Let us consider a generic transition
0 ≤ V (ρi ) =
V (k)ρi (k) =
k∈ A
j∈ A
k∈ A
If j ∈ NF( A ), then P ( j → k) = 0, otherwise P ( j → k) = μ j (k), where
0≤
V (k)
≤
ρi−1 ( j ) · μ j (k) =
j ∈ A \NF( A )
k∈ A
ρ i −1 ( j ) · ( V ( j ) − ε ) ≤
j ∈ A \NF( A )
j∈ A
ρ i −1 ( j )
j ∈ A \NF( A )
ρ i −1 ( j ) · V ( j ) −
μ j = P( j ). Thus, resuming the chain of inequalities, V (k)μ j (k) = ρi−1 ( j ) · V (μ j ) j ∈ A \NF( A )
k∈ A
ρi−1 ( j ) · ε = V (ρi−1 ) − ε · |ρi |.
j ∈ A \NF( A )
Iterating the above inequality we have
0 ≤ V (ρi ) ≤ V (ρi −1 ) − ε · |ρi | ≤ V (ρi −2 ) − ε · (|ρi | + |ρi −1 |) ≤ · · · ≤ V (ρ0 ) − ε
i
|ρn |.
n =1
Taking the limit for i → +∞ this yields to 0 ≤ V (ρ0 ) − ε · StepsP (a) and thus
StepsP (a) ≤
V (ρ0 )
ε
=
V (a)
ε
.
2
3.1. FPARSs and Markov chains All the machinery introduced up to now avoids the use of formal probability theory, namely probability spaces, measures and random variables. This was done on purpose, in order to provide a simple mathematical framework in which our theory could be settled. However, it seems appropriate to relate our definitions to those which appear in the literature about stochastic processes [18,17]. We first recall some basic notions of probability theory and Markov chain theory. Please refer to the references for a more detailed discussion on the subject. Definition 22. These definitions are standard in probability theory and are adapted from [18,6].
• A probability space is a triple (, F , P ) where is the sample space, F ⊆ 2 is a σ -algebra on whose elements are said to be events, and P is a probability measure for F i.e. P : F → [0, 1], P {} = 1, and P is countably additive: the probability of a countable disjoint union of events is the sum of the individual probabilities.
• Given two events A and B if P { B } = 0 the conditional probability of A given B is defined as
P { A|B } =
P { A ∩ B} . P {B}
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.10 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
10
• Let (, F , P ) be a probability space. A random variable X is a function from to R ∪ {±∞} such that for each x ∈ R ∪ {±∞}, the set { X ≤ x} := {ω ∈ : X (ω) ≤ x} ∈ F . A random variable X is discrete if the range of X is finite or countably infinite.
• Given a random variable X on (, F , P ), its expected value is
E[ X ] = where
XdP ,
is the Lebesgue integral. In the case X is a discrete random variable assuming values in I ,
E[ X ] =
i · P { X = i }.
i∈I
We give an alternative way of computing the expected value of a discrete random variable that will be useful in the remainder of this section. Proposition 1 (Telescope formula, [17]). Let X be a discrete random variable with values in N ∪ {+∞}. Then
E[ X ] =
∞
P { X ≥ i }.
i =1
Proof. Since X is discrete
E[ X ] =
∞
k · P { X = k} =
k =1
k ∞
P { X = k}.
k =1 i =1
We can exchange the summations by Fubini-Tonelli Theorem that yields
E[ X ] =
∞ ∞ i =1 k = i
P { X = k} =
∞
P { X ≥ i }.
2
i =1
If random variables represent the state of a dynamical system through time, we call this system a stochastic process. In this work, the reduction to normal form of a λ-term is a stochastic process, and in particular a discrete time Markov chain. These are a well-studied class of memoryless stochastic processes. In fact, they satisfy the Markov property, which means that transitions from the current state to the next one do not depend on the whole history of the process, but only on the current state. Definition 23 (Markov chain). Given a probability space (, F , P ), and a state space I at most countable and measurable with the σ -algebra of all its subsets, a sequence of random variables ( X n )n≥0 with values in I is a Markov Chain if and only if it satisfies the Markov property, i.e. for each n ≥ 0, for each i 0 , .., in , j ∈ I , such that P { X n = in , ..., X 0 = i 0 } > 0 it holds that
P { Xn+1 = j | Xn = in , ..., X 0 = i 0 } = P { Xn+1 = j | Xn = in } = p in j . We can completely characterise a Markov chain M as a tuple ( I , X 0 , P ) where I is the state space, X 0 is the initial distribution and P = ( p i j ),i , j ∈ I is the transition matrix. The first entrance time into a state is a very useful random variable in order to study the behaviour of a Markov chain. Particularly interesting is knowing whether it is almost-surely finite and with finite expected value. Definition 24 (First entrance time). Let M = ( I , X 0 , P ) be a Markov chain and j ∈ I be a state. We call first entrance time into j the random variable with values in N ∪ {+∞}
Tj =
min{n ≥ 1| X n = j } if {n ≥ 1| X n = j } = ∅,
+∞
otherwise.
We show now how to derive a Markov chain MP = ( I , ·, P ) from any FPARS P = ( A , P). First we define a relation ≡⊥ on the states of A defined for each a, b ∈ A as
a ≡⊥ b if and only if a, b ∈ NF( A ) or a = b.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.11 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
11
Clearly ≡⊥ is an equivalence relation. We call A ≡⊥ the quotient set of A by ≡⊥ and trm the equivalence class of a if a ∈ NF( A ). We define MP = ( I , ·, P ) in the following way.
I = A ≡⊥ ,
pi j =
if i , j ∈ NF( A ),
1
P (i → j ) otherwise.
Actually, we have defined not one but a family of Markov chains, since the initial distribution is not specified. When we talk about MP we are implicitly universally quantifying over all the initial laws, which are countable once one only considers initial configurations which are Dirac. It is not obvious that the AST and PAST properties we defined are actually meaningful. Now we prove their relationship with classical properties defined for Markov chains (as AST and PAST were defined in [5,6]), in a way similar to what have been done in [7] for a slightly different model. Remark 4. Given an FPARS P and a sequence of configurations (ρi )i ∈N , from how we have defined MP , it follows that |ρk | = P { T trm ≥ k}. Proposition 2. An FPARS P is AST if and only if P { T trm < +∞} = 1, for the Markov chain MP . Proof. An FPARS P is AST if and only if limit at both sides, we have if and only if
lim |ρn | = 0.
n→+∞
lim |ρn | = 0. Since |ρn | = P { T trm ≥ n}, then P { T trm < n} = 1 − |ρn |. Taking the
n→+∞
lim P { T trm < n} = P { T trm < +∞} = lim (1 − |ρn |) = 1 − lim |ρn |. Thus P { T trm < +∞} = 1
n→+∞
n→+∞
2
n→+∞
Proposition 3. Given an FPARS ( A , P), ρ0 = Dirac(a), where a ∈ A and the computation (ρi )i ∈N , E[ T trm ] =
Proof. By Proposition 1, E[ T trm ] =
∞ i =1
P { T trm ≥ i } =
∞ i =1
∞ i =1
|ρi | = StepsP (a).
|ρi | = StepsP (a). 2
3.2. Related works ARSs were first considered with probabilistic strategies without nondeterminism in [8] and then with nondeterminism in [5]. In these works AST and PAST properties were introduced, as a rewriting analogue of recurrence and positive recurrence in Markov chain literature. This framework was refined in [7], which introduced multidistributions to handle nondeterminism in a simple and uniform way. We were inspired by this latter work and used a similar terminology, in particular partial probability distributions that admit their mass to sum to less than one. However, while in [7] PARSs are first-class citizens, we attach probabilities to a pre-existing ARSs, in order to solve nondeterminism. For this reason we have reformulated their model, getting rid of multidistributions, thus coming up with a very light and simple framework. 4. Randomised strategies in the λ-calculus In the previous Sections we have defined all the mathematical machinery needed for defining randomised strategies for ARSs, in the abstract. In this Section we focus on the λ-calculus, as defined in Section 2, as the target of our investigation. In deterministic strategies, the redex which is being reduced is typically chosen according to its position in the term. In randomised strategies we have more freedom. Intuitively, we have to assign a probability to each redex of any term, making them sum to one. The space of possible choices is indeed very large. A first design choice could be the answer to the following question: should every redex in a term being reduced with strictly positive probability? If the answer is positive, then one should decide how to define these numbers. 4.1. The uniform strategy Maybe the most trivial way in which one could assign probabilities to redexes is in a uniform way. Definition 25 (Uniform strategy). U is a randomised reduction strategy for the ARS (, −→β ) such that for each reducible term M, U( M ) = μ ∈ Dist (), where for each N ∈ :
μ( N ) =
|R M → N | |R M |
where R M is the set of redex occurrences of M, R M → N = { R ∈ R M | M −→β N by firing R } and | · | denotes the cardinality of a set.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.12 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
12
Fig. 2. The tree representing the reduction of term M = (λx.λ y . y )4 under randomised strategy U.
Although very easy to define, the uniform strategy is not easy at all to study. In fact, the number of redexes in terms along a reduction sequence can grow. As we have seen in the previous Section, we study randomised strategies as FPARSs. In the case of the λ-calculus, we focus our attention on FPARSs in the form (WN , ·), because we are interested in studying quantitative properties about terminating terms. Whether (WN , U) is AST is still an open problem, but for sure (WN , U) is not PAST, as we are going to prove with the following counterexample. Example 5. Let us consider the combinator M = (λx.λ y . y )4 , where 4 = 2 2 and 2 = λx.(xx)(xx). We observe that 4 reduces to 4 4 . Thus, if we do not reduce leftmost, the number of redexes grows linearly with the length of the reduction sequence. Instead, reducing leftmost, we reach the normal form in one step. Considering the computation |(ρi )|i ∈N starting from ρ0 = Dirac( M ) (in Fig. 2) we can compute the sequence |ρi |.
|ρ0 | = 1 ,
1
|ρ1 | = 1 ,
|ρ2 |= |ρ1 | − |ρ1 | , 2
1
|ρ3 | = |ρ2 | − |ρ2 | . 3
In general, if i ≥ 2, it holds that
1 1 . |ρi | = |ρi −1 | − |ρi −1 | = |ρi −1 | 1 − i
i
Once solved the recurrence relation in the last line one has |ρi | =
StepsU ( M ) =
∞ i =1
|ρ i | =
∞ 1 i =1
i
1 i
for each i ≥ 1. Thus
= +∞.
Since there is a term M ∈ WN such that StepsU ( M ) = +∞, the following holds. Proposition 4. (WN , U) is not PAST. 4.2. The strategy P ε In real programming languages we would like to have a normalising strategy, or at least, since we are in probabilistic context, a (positive) almost-surely normalising one. For this reason, we devised a new strategy for the ARS (, −→β ), simpler than the uniform one, which mixes leftmost-outermost and rightmost-innermost strategies. If ε ∈ [0, 1], call Pε the strategy in which the leftmost redex is always chosen with probability ε and the rightmost one with probability 1 − ε . Definition 26 (Strategy Pε ). Given a reducible term M and Dist (), where for each N ∈ :
ε ∈ [0, 1], the randomised strategy Pε is such that Pε ( M ) = μ ∈
⎧ ⎪ ⎨ε
if M −→β N by firing the LO-redex,
⎪ ⎩
otherwise.
μ( N ) = 1 − ε if M −→β N by firing the RI-redex, 0
Notation 2. For readability reasons we define for the FPARS (, Pε ) the family of functions ExpLen M : [0, 1] → R ∪ {+∞} indexed on a term M ∈ , where ExpLen M (ε ) = StepsPε ( M ).
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.13 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
13
Fig. 3. The tree (a) and the cyclic graph (b) representing the reduction sequence of M.
Example 6. Let us consider the term M = (λx. y ) where = ωω with ω = λx.xx. There are two possible representations of the behaviour of Pε for this term, one as an infinite tree (Fig. 3a) and another one as a cyclic graph (Fig. 3b). According to the different representations, we can compute in different ways the probability of reaching normal form and the average derivation length. The results coincide yielding in both cases probability of termination equal to 1 and average derivation length equal to 1ε , if ε = 0. If ε = 0, the system never terminates and thus the average derivation length is +∞. The first Theorem we are going to prove about Pε states that Pε is an almost-surely normalising strategy. The key aspect of this proof is that the LO-redex has always the same probability ε of being reduced, along the whole reduction sequence. Theorem 4. The FPARS (WN , Pε ) is PAST whenever ε > 0. Proof. We use Foster’s Theorem to prove the claim. Thus, we have to find a suitable Lyapunov function f : WN → R. We consider f = StepsLO . Certainly, condition (1) is verified since StepsLO ( M ) ≥ 0 for each M ∈ WN . We have to verify (2). Suppose Pε ( M ) = μ. If M −→LO N and M −→RI L, by Lemma 6 we can write:
StepsLO (μ) = StepsLO ( N ) · ε + StepsLO ( L ) · (1 − ε ) ≤ StepsLO ( M ) − 1 · ε + StepsLO ( M ) · (1 − ε )
= ε · StepsLO ( M ) − ε + StepsLO ( M ) · (1 − ε ) = StepsLO ( M ) − ε . Since 0 ≤ ε ≤ 1, StepsLO ( M ) >ε StepsLO (μ) for each normalising term M. Then, if
ε > 0, (WN , Pε ) is PAST and the average
number of steps to normal form of a term M reduced with strategy Pε is bounded by
StepsLO ( M )
ε
.
2
The bound we obtain on ExpLen M (ε ) from the above proof is very loose and thus it does not give us any information on the actual nature of the function ExpLen M (ε ). We show, by means of an example, that the strategy Pε is non-trivial i.e. there exists a term M and 0 < ε < 1, such that ExpLen M (ε ) < ExpLen M (1) = StepsLO ( M ) < ExpLen M (0) = StepsRI ( M ). Example 7. Let us consider a family of terms M n = N L n where:
N = λx. ((λ y . z)) x
P
L n = C n ((λx.x) y )
S
C n = λx. xx · · · x n times
After quite simple computations one can derive ExpLen Mn (ε ) = (n − 3)ε 3 + 4ε 2 + ε2 . Clearly for ε = 0 the expression diverges. If n ≥ 2 there is a minimum for 0 < ε < 1, and thus ExpLen Mn (ε ) < ExpLen Mn (1) = StepsLO ( M n ) < ExpLen Mn (0) = StepsRI ( M n ) = +∞. ExpLen M 2 (ε ) is plotted in Fig. 4. Studying the behaviour of ExpLen M (ε ) for an arbitrary term M is a difficult task, which is outside the scope of this paper. However, we are able to characterise ExpLen M : [0, 1] → R ∪ {+∞} from an analytical point of view, e.g. investigating critical points and convexity. This is particularly interesting since we are interested in the efficiency of Pε . Since P0 is just RI and P1 is LO, a global minimum (respectively, maximum) of ExpLen M strictly between 0 and 1 would mean that reducing under Pε could lead to normal form in less (respectively, more), steps on average, than reducing deterministically under either LO or RI. Call Poly(x) the set of polynomials in the unknown x with coefficients in R. Lemma 7. Every configuration ρ of the FPARS (, Pε ) is such that ρ ( M ) ∈ Poly(ε ) for each M ∈ . Proof. We prove this by induction on the length of the computation ρ0 ρ1 ρ2 · · · ρ . If = 0, ρ = ρ0 , thus ρ ( M ) = 1 if ρ0 = Dirac( M ), 0 otherwise. Now let us suppose that for computations with ≤ k, ρ ( M ) ∈ Poly(ε ). Let us consider a computation ρ0 ρ1 ρ2 · · · ρk ρ of length k + 1. ρ ( M ) = ρk ( N ) · P ( N → M ). Polynomials are closed with N ∈
respect to addition and multiplication, thus by induction hypothesis ρ ( M ) ∈ Poly(ε ) if P ( N → M ) ∈ Poly(ε ). This is certainly true because P ( N → M ) is either 0, ε , or (1 − ε ) from the definition of Pε . 2
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.14 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
14
Fig. 4. The function ExpLen M 2 (ε ).
Theorem 5. For each λ-term M, the function ExpLenM (ε ) is a power series. Proof. Let us consider the FPARS P = (, Pε ) and a computation (ρi )i ∈N starting from ρ0 = Dirac( M ). Supp (ρi ) is finite for every i ≥ 0, since the number of redexes in a term is finite and thus |ρi | ∈ Poly(ε ) for every i ≥ 0, since it is a sum of polynomials by Lemma 7. Therefore ExpLen M (ε ) = terms are reordered, is a power series.
2
∞
i =1
|ρi | is a series whose terms are polynomials in ε , which, once the
4.3. Optimality and pessimality results As a further step in the direction of a full understanding of the nature of ExpLen M (ε ), we study it in the special case in which M is a term of the subcalculi λ A and λ I we have previously introduced. In particular, we prove that ExpLen M (ε ) has minimum (respectively maximum) at ε = 1 for λ A-terms (respectively for λ I -terms). All we need to do is to lift Theorem 2, a result about deterministic strategies, to the randomised setting. Some preliminary lemmas are necessary in order to appropriately do so. The following two lemmas tell us that the existence of a strictly partial probability distribution along a computation witnesses the existence of a deterministic computation leading to normal form. Lemma 8. Let ( A , P) an FPARS and (ρi )i ∈N a computation, where ρ0 = Dirac(a0 ). For each a ∈ A, if there exists k ≥ 0 such that ρk (a) > 0, then there exists a reduction sequence a0 → a1 → · · · → ak−1 → a. Proof. We argue by induction on k. If k = 0, then the reduction sequence is trivially a0 = a. If k = h,
ρh (a) =
b∈ A
ρh−1 (b) ·
P (b → a). Since ρh (a) > 0, there exists b ∈ A such that ρh−1 (b) · P (b → a) = 0, i.e. ρh−1 (b) > 0 and P (b → a) > 0. Thus, by induction hypothesis, there exists a sequence a0 → a1 → · · · → ah−2 → b, and b → a. Hence there exists a reduction sequence a0 → a1 → · · · → ah−2 → b → a. 2 Lemma 9. Let ( A , P) an FPARS and (ρi )i ∈N a computation, where ρ0 = Dirac(a0 ). If there exists k ≥ 1 such that |ρk | < 1, then there exists a sequence a0 → a1 → · · · → a j such that a j is in normal form and j ≤ k − 1. Proof. We argue by induction on k. If k = 1, since |ρ0 | is Dirac(a0 ) then P(a0 ) is undefined (otherwise |ρ1 | = 1). Hence a0 is in normal form. If k = h and |ρh−1 | < 1 by induction hypothesis we are done. So let us consider the case in which |ρh−1 | = 1 and |ρh | < 1. We claim that there exists b ∈ NF( A ) such that ρh−1 (b) > 0.
|ρh | = =
a∈ A b ∈ A
b ∈ A a∈ A
b∈ A
ρh−1 (b) · P (b → a) =
b∈ / NF( A )
ρh−1 (b) · P (b → a) =
ρh−1 (b) P (b → a)
ρh−1 (b) P (b → a) + ρh−1 (b) P (b → a) . a∈ A
b∈NF( A )
a∈ A
a∈ A
If there were not b ∈ NF( A ) such that ρh−1 (b) > 0, then the second term in the sum would vanish and |ρh | = ρh−1 (b) = 1. But |ρh | < 1 by hypothesis. Hence there exists b ∈ NF( A ) such that ρh−1 (b) > 0 and thus by Lemma 8
b∈ / NF( A )
there exists a sequence a0 → a1 → · · · → ah−2 → b.
2
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.15 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
15
We are almost done: the following lemma tells us that all configurations along a computation starting from a λ A-term M are proper until the n-th configuration, where n = StepsLO ( M ). Lemma 10. Given the FPARS ( A , Pε ), and a computation (ρi )i ∈N , where ρ0 = Dirac( M 0 ), for each k ≤ StepsLO ( M 0 ), it holds that
|ρk | = 1.
Proof. Let n = StepsLO ( M 0 ). If there were k ≤ n such that |ρk | < 1, then by Lemma 9 we could find a sequence M 0 −→β M 1 −→β · · · −→β M j such that j ≤ k − 1 and M j is normal form. But this is impossible because of Theorem 2. 2 Corollary 1. For each term M in A , ExpLen M (ε ) has minimum at ε = 1. Proof. Let n = StepsLO ( M ) = ExpLen M (1).
ExpLen M (ε ) =
∞
|ρ i | =
i =1
=n+
n
|ρ i | +
i =1
∞
∞
|ρ i |
Lemma 10
=
i =n+1
n
1+
i =1
|ρi | = ExpLenM (1) +
i =n+1
∞
∞
|ρ i |
i =n+1
|ρi | ≥ ExpLenM (1).
2
i =n+1
Dually, the following Lemma shows that all configurations (i.e. |ρk | = 0), if k > n = StepsLO ( M ).
ρk along a computation starting from a λ I -term M are null
Lemma 11. Given the FPARS ( I , Pε ), and a computation (ρi )i ∈N , where ρ0 = Dirac( M 0 ), for each k > StepsLO ( M 0 ), then |ρk | = 0. Proof. Let n = StepsLO ( M 0 ). If there were k > n such that |ρk | > 0, then by Lemma 8 we could find a reduction sequence M 0 −→β M 1 −→β · · · −→β M k . But this is impossible because of Theorem 2. 2 Corollary 2. For each term M in I , ExpLen M (ε ) has maximum at ε = 1. Proof. Let n = StepsLO ( M ) = ExpLen M (1). If n = +∞ we are done. Then we consider n finite.
ExpLen M (ε ) =
∞ i =1
|ρ i | =
n
|ρ i | +
i =1
∞ i =n+1
|ρ i |
Lemma 11
=
n
|ρi | ≤ n = ExpLenM (1).
2
i =1
4.4. Further results We have seen in the previous Section that in particular cases ExpLen (ε ) has maximum or minimum at ε = 1. However, we know nothing about the shape of the curve ExpLen (ε ). In this Section we are going to show some counterintuitive results about the different shapes ExpLen (ε ) can have in λ A, λ I and in the full λ-calculus. In particular, we address the study of convexity and critical points. Indeed we are interested in minima and maxima, i.e. those points where the randomised strategy is maximally efficient and, respectively, inefficient. Definition 27 (Critical point). Let f : A ⊆ R → R. x ∈ A is a critical point of f if f is not derivable in x or the derivative of f in x is 0. First of all, the results in Section 4.3 may suggest that ExpLen M (·) could be monotonically decreasing in A , and monotonically increasing in I . This is actually not the case, and the counterexamples are precisely the terms M A and M I we already considered in Section 2, namely:
M I = (λx.xI )(λx.(λ z. zz)(xy ))
M A = (λx.xI )(λx.(λ z. y )(xy ))
The graphs of the two functions ExpLen M I (·) and ExpLen M A (·) are in Fig. 5. As a consequence: Proposition 5. ExpLen M (·) is not monotonic, neither in A nor in I . Looking at the terms M I and M A , one immediately notices that the functions we are considering not only fail to be monotonic in general, but that they fail to be convex or concave. Is this common to all terms? If one considers terms like
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.16 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
16
Fig. 5. Plot of ExpLen M I (ε ) (a) and ExpLen M A (ε ) (b).
Fig. 6. Plot of ExpLen M ∪ (ε ) (a), ExpLen M ∩ (ε ) (b) and ExpLen M ∼ (ε ) (c).
M ∪ = ((λ y . z)( I I ))((λx.xx)( I y ))
M ∩ = ((λx.xx)( I y ))((λ y . z)( I I ))
one immediately realizes that ExpLen M (ε ) can indeed be concave or concave in certain cases (see Fig. 6(a) and Fig. 6(b) for a plot). Playing a bit with terms allowed us to find terms in which ExpLen M (·) can be quite wild, having more than one critical point. Consider, as an example, the term
M ∼ = ((λ y . M ∩ )( I I ))((λx.xx)( I y )) and the plot of ExpLen M ∼ (·), reported in Fig. 6(c). Proposition 6. ExpLen M (ε ) can have more than one critical point. 5. Conclusions In this work we have started the study of randomised reduction strategies for the λ-calculus. We have defined a family of examples of such strategies, and we have shown that all of them, except one, are positive almost-surely normalising. Then we have studied how those strategies behave in λ A (the affine λ-calculus) and in λ I , proving optimality and pessimality results. Moreover, we have shown that our defined family of strategies behaves in a very complex way in the scope of the full λ-calculus. Further work could consist in better analysing the behaviour of the proposed strategies, in particular trying to characterize classes of λ-terms for which our strategies work strictly better than deterministic ones, and to develop some methods to tune the parameter ε in order to get good performances. Moreover, it would be interesting to study the behaviour of randomised strategies on non-terminating λ-terms, investigating the perpetuality phenomenon. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
JID:TCS AID:12187 /FLA
Doctopic: Logic, semantics and theory of programming
[m3G; v1.261; Prn:2/10/2019; 16:15] P.17 (1-17)
U. Dal Lago, G. Vanoni / Theoretical Computer Science ••• (••••) •••–•••
17
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
U. Dal Lago, G. Vanoni, On randomised strategies in the λ-calculus, in: Proc. of the 19th ICTCS, vol. 2243, CEUR-WS, 2018, pp. 92–103. H.P. Barendregt, The Lambda Calculus: Its Syntax and Semantics, North-Holland, 1984. Terese, Term Rewriting Systems, Cambridge University Press, 2003. H.B. Curry, R. Feys, Combinatory Logic, North-Holland, 1958. O. Bournez, F. Garnier, Proving positive almost-sure termination, in: Proc. of 16th RTA, in: LNCS, vol. 3467, Springer, 2005, pp. 323–337. L.M. Ferrer Fioriti, H. Hermanns, Probabilistic termination: soundness, completeness, and compositionality, in: Proc. of 42nd POPL, ACM, 2015, pp. 489–501. M. Avanzini, U. Dal Lago, A. Yamada, On probabilistic term rewriting, in: Proc. of 14th FLOPS, in: LNCS, vol. 10818, Springer, 2018, pp. 132–148. O. Bournez, C. Kirchner, Probabilistic rewrite strategies. Applications to ELAN, in: Proc. of 13th RTA, in: LNCS, vol. 2378, Springer, 2002, pp. 252–266. R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995. V. van Oostrom, Random descent, in: Proc. of 18th RTA, in: LNCS, vol. 4533, Springer, 2007, pp. 314–328. F.-R. Sinot, Sub-λ-calculi, classified, Electron. Notes Theor. Comput. Sci. 203 (1) (2008) 123–133. A. Church, An unsolvable problem of elementary number theory, Am. J. Math. 58 (2) (1936) 345–363. F. van Raamsdonk, P. Severi, M.H.B. Sørensen, H. Xi, Perpetual reductions in λ-calculus, Inf. Comput. 149 (2) (1999) 173–225. A. Asperti, S. Guerrini, The Optimal Implementation of Functional Programming Languages, Cambridge University Press, 1998. J.R. Norris, Markov Chains, Cambridge University Press, 1998. F.G. Foster, On the stochastic matrices associated with certain queuing processes, Ann. Math. Stat. 24 (3) (1953) 355–360. P. Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, Springer-Verlag, 1999. R.B. Ash, C.A. Doléans-Dade, Probability and Measure Theory, 2nd edition, Academic Press, San Diego, 1999.