Universal computation and physical dynamics

Universal computation and physical dynamics

PHYSICA Physica D 86 (1995) 268-273 ELqEVIER Universal computation and physical dynamics Charles H. Bennett IBM Research Division, T.J. Watson Resea...

418KB Sizes 0 Downloads 53 Views

PHYSICA Physica D 86 (1995) 268-273

ELqEVIER

Universal computation and physical dynamics Charles H. Bennett IBM Research Division, T.J. Watson Research Center, Yorktown Heights, N Y 10598, USA

Abstract

A dynamical system is said to be computationally universal if it can be programmed through its initial condition to perform any digital computation. Such systems include traditional abstract computers such as Turing machines and cellular automata, as well as more physical models such as hard sphere gases and even a single particle in an appropriately shaped box. Because of the ability of any two such systems to simulate one another, they are all equivalent in the range of computations they can perform; and the global dynamics of any one of them provides a microcosm of all cause/effect relations that can be expressed by deductive logic or numerical simulation. This allows universal computers to be used to define in an objective manner that sort of complexity which increases when a self-organizing system organizes itself.

1. Universal computers and undecidable dynamics A universal computer is one that can be programmed to perform any digital computation. The classic model of computation, where the p h e n o m e n o n of universality was first demonstrated, is the Turing machine. A Turing machine consists of an infinite passive tape on which the program is supplied at the beginning, and on which the output is left at the end of a successful computation. The tape is scanned by a finite active head, which moves back and forth on the tape reading and manipulating the tape symbols one at a time according to a fixed set of rules. Fairly simple machines of this type can perform arbitrarily complicated computations, including simulating other computers much more complicated than themselves, and can do so rather efficiently, with at most an additive constant increase in program size and a polynomial slowElsevier Science B.V. S S D 1 0167-2789(95)00107-7

down in speed compared to the machine being simulated. A dynamical system is called computationally universal if it can be programmed through its initial conditions to perform any digital computation. This property has been proved for a number of models resembling those studied in mechanics and statistical mechanics, e.g. the hard sphere gas in an appropriate periodic potential [5], ground states of tiling problems and lattice gases [15] (programmed by their interaction energies rather than initial conditions), noisy cellular automata [7,8], systems of partial differential equations [14] and even a single classical particle moving in finitely complicated array of plane and parabolic mirrors [12]. Any of these systems can be programmed to solve arbitrary problems in digital computation, such as finding the millionth digit of 7r, or determining by exhaustive search which side if any has a winning strategy in chess.

C.H. Bennett / Physica D 86 (1995) 268-273

A distinctive feature of computation is the ability to program an open-ended search for something that may or may not exist, such as a counterexample to Goldbach's conjecture, which asserts that every even number greater than 2 is expressible as the sum of two primes. If a counterexample exists, a properly programmed search will find it, but if it does not exist, the computation will continue forever. The lack of any general procedure for deciding a priori which computations terminate, called "the unsolvability of the halting problem" is the central negative result of computability theory. Not only termination, but any other global property of computations that could be used to signal the outcome of an open-ended search, is similarly algorithmically undecidable as a function of the initial condition. For dynamical systems such as Moore's particle [12], "halting" of the computation does not imply a stopping of the dynamics, but rather an event, such as irreversible entry of the trajectory into a designated region of phase space, which has been chosen beforehand as the signal that the computation is finished. Universality and undecidability are closely related: roughly speaking, if a universal computer could see into the future well enough to solve its own halting problem, it could be programmed to contradict itself, halting if and only if it foresaw that it would fail to halt. We are thus left with a tantalizing situation. On the one hand, the global map of any computationally universal process - the iterative limit of its transition function- is a microcosm of all iterative processes and all cause-effect relations that can be demonstrated by deductive logic or numerical simulation. On the other hand, because of the unsolvability of the halting problem, this same global map cannot be computed directly. It can only be approximated as the limit of an iterative process that converges with uncomputable slowness. Undecidability is a more extreme kind of unpredictability than chaos. A chaotic system is unpredictable because it exponentially amplifies

269

small differences in the initial condition; therefore, if the initial condition is known only approximately, the behavior soon becomes unpredictable. An undecidable process may or may not amplify small differences in its initial condition, but its long term behavior cannot be predicted from the initial condition, even if the initial condition is known exactly. Perhaps the simplest computationally universal dynamics, Moore's one-particle system, generalizes the ordinary Smale horseshoe-map type of chaotic dynamics, in which a bundle of parallel trajectories in 3 dimensions is stretched in one transverse direction and squeezed in the other, than cut or folded and mapped back onto or into itself. This gives rise to an invariant set of points in the x y plane, topologically equivalent to the product of two Cantor sets, which the dynamics shuffles in a manner equivalent to left-shifting the doubly-infinite binary sequence representing the point's x and y address within the invariant set. Left-shifting a binary sequence is a rather trivial sort of Turing machine computation, in which the Turing machine head shifts along the tape always in the same direction, without changing the tape contents. Moore's construction consists of altering the simple horseshoe map so that each cycle of dynamics performs a deterministic shuffle of the same invariant set, but now in a manner corresponding to one step of a universal Turing machine. The requisite local dynamics can be implemented using simple tools: plane mirrors to redirect the trajectories and to split and reassemble trajectory bundles according to few most significant bits of their x and y values (thereby implementing Turing machine reading and writing), and parabolic mirrors to squeeze and stretch trajectory bundles transversely (thereby implementing shifts). This done, the induced global behavior becomes as microcosmic and unpredictable as that of a Turing machine. For example, there is an initial condition whose trajectory eventually enters one of two designated regions according to whether the millionth digit of ~r is even or odd, and another

270

C.H. Bennett / Physica D 86 (1995) 268-273

which enters a designated region if Goldbach's conjecture is false, but remains forever outside if the conjecture is true. Needless to say, Moore's system would be useless as a practical computer: since infinitely many logically distinct trajectories are crammed into a finite space, it would be exquisitely sensitive to noise, and could not compute reliably for more than a few steps in the real world. Other physically-motivated models besides particle dynamics have been shown to give rise to uncomputable global behavior. For example, tiling systems, in which one is given an alphabet of tiles and adjacency constraints, have been devised [13] in which it is possible to tile the plane, but not in any computable pattern. The possibility that actual physical quantities may be mathematically definable but uncomputable is considered by Geroch and Hartle [9] in the context of theories of quantum gravity. Shipman [16] has reviewed various other aspects of uncomputability in physics.

2. Computational definitions of physical complexity The conspicuous complexity of many parts of the universe, especially living organisms and their byproducts, was once attributed to divine creativity, but is now generally thought to reflect a capacity of matter, implicit in known physical laws, to "self-organize" under certain conditions. But before this process can be understood at a fundamental level, we need a correspondingly fundamental mathematical characterization of "complexity", the quantity that increases when a self-organizing system organizes itself. Informally, a complex object typically contains structural features that could not plausibly have arisen save as the outcome of a long evolution. This notion of "logical depth" can be formalized with the help of the theory of universal computers. The problem of deciding whether an object of unknown origin is complex or merely random is

a classic puzzle, treated for example in Borges' [3] story, "The Library of Babel" which describes a library, containing one copy of every book-length sequence of letters, and the efforts of the librarians to interpret the books and discard the worthless ones. Computational measures of intrinsic complexity are based on a computerized version of "Occam's razor" the principle that alternative scientific theories explaining the same phenomenon ought to be weighed according to the economy of their assumptions, the theory with the fewest assumptions being most plausible. The computerized version of Occam's razor uses a standard universal computer U to transform theories into predictions. The theories and predictions must be expressed digitally, as finite binary strings, which may be identified with natural numbers via the standard lexicographic ordering

-, O, I, 00, 01, I0, II, 000, .... The expression U(p)=q will be used to mean that the standard universal computer, given a binary string p as input, embarks on a terminating computation with the string q as its output. The given object or phenomenon x, whose complexity we wish to define, is treated as an output of U, and the most plausible explanation of x is identified with its minimal program x*, defined as min{p : U(p) = x}, the least input that causes U to produce exactly x as output and then halt. Of course any given x an also be computed by infinitely many other programs besides x*, but according to the principle of Occam's razor these are less plausible because they are less economical. In particular, any x at all can be generated by a "print program" containing a verbatim description of the desired output. The size (in bits) of the minimal program x*, called the "algorithmic information content" or "Kolmogorov complexity" of x, is a property that has been extensively studied and reviewed [4,6,11,17]. It represents the amount of infor-

C.H. Bennett / Physica D 86 (1995) 268-273

mation required to generate x, and can be regarded as an approximately intrinsic property of x because it is independent of the choice of universal machine, up to an additive constant. Though it is sometimes called complexity, algorithmic information content is maximal for objects that are entirely random and structureless, such as a typical n-bit string produced by coin tossing. Thus it corresponds more to the intuitive notion of randomness or arbitrariness than that of complexity. Indeed algorithmic information provided the first satisfactory intrinsic definition of randomness (the previous definitions were extrinsic, a random string being one produced by a process, such as coin tossing, that samples a uniform probability distribution). By contrast an algorithmically random string is defined as one of near-maximal information content, a string whose minimal program is about the same length as the string itself, because the string lacks any significant internal pattern that could have been exploited to encode it more concisely. Even though information content is not a good measure of subjective complexity, the computerized Occam's razor paradigm can be used to define another intrinsic property that is. If an object's minimal program is its most plausible explanation, then the computation that generates the object from its minimal program represents the object's most plausible causal history, and the duration of this computation is a measure of the object's "logical depth" [1], or plausible amount of mathematical work required to create it. In order to make depth adequately machineindependent, it is necessary to refine the definition somewhat [2], taking weighted account not only of the minimal program, but all programs that compute the desired output, with two (k + 1)-bit programs receiving the same weight as one k-bit program. This is analogous to the explanation of a phenomenon in science by appeal to an ensemble of possible causes, individually improbable but collectively probable, as in the

271

kinetic theory of gases. Taking these considerations into account, the logical depth of x may be defined as least time sufficient to compute x by a program or set of programs whose collective weight is a significant fraction (say -> 2 -c, where c is a significance parameter) of the total weight of all programs that compute x. Informally, this depth is the computation time more quickly than which x is unlikely to have originated, at the 2 c significance level. (A similar refinement can be introduced in the definition of algorithmic information, but is not necessary.) The notion of "weight" of a program needs to be made a little more precise. A picturesque way of doing so is to think of a computation by the U machine in which the input data, instead of being written on the tape at the beginning, is chosen randomly during the computation by tossing a coin whenever the machine is about to visit a square of tape for the first time (no coin is tossed on return visits, to allow the machine to re-read without confusion data it itself has written). Such a probabilistic computation is like the old idea of monkeys accidentally typing the works of Shakespeare, but here the monkeys in effect type at a computer keyboard instead of a typewriter. Almost all such computations would result in garbage, but any nontrivial program also has a finite probability of being chosen and executed. This probability is its weight. Logical depth accords with intuitive notions of complexity, having a low value both for random strings (for which the print program provides a rapid computation path, whose plausibility is not undercut by any significantly more concise program), and for trivially ordered strings such as 1 1 1 1 1 1 . . ° , which can be rapidly computed by small programs. The digit sequence of ~r would be somewhat deeper, reflecting the time required to compute it. The distinction between depth and information is illustrated by an object such as an ephemeris, which tabulates the positions of the moon and planets over a period of years. Such a book contains no more information than the

272

C.H. Bennett / Physica D 86 (1995) 268-273

equations of motion and initial conditions from which it was calculated, perhaps half a page of information, but it saves its owner the work of repeating this calculation. Its value lies therefore in its depth rather than in its information. Similarly the depth of a biological molecule such as a D N A resides not in its information content ("frozen accidents" representing random historical choices among equally viable possibilities) nor in its superficial redundancy (unequal base frequencies, repeated sequences), but rather in subtler aspects of its redundancy (e.g. functionally significant correlations within and between enzymes) which on the one hand are determined by the constraint of viability and on the other hand could not plausibly have arisen without a long causal history. An object is deep only if there is no short computational path, physical or non-physical, to construct it from a concise description. Therefore an object's depth may be less than its chronological age. Old igneous rocks, for example, typically contain evidence (e.g. fission tracks) of the time elapsed since their solidification, but such rocks would not be called deep if this history could be quickly simulated on a computer. Intuitively this means that the history, though long in a chronological sense, was rather uneventful, and thus short in a logical sense. Because of its intimate connection with the halting problem logical depth is not a computable function. This may appear disappointing, but in fact is an intuitively necessary of any good definition of complexity because of the openendedness of science. An object that seems shallow and random may in fact be the output of some very small but slow-running program, whose halting we have no way of predicting. This corresponds scientifically to a phenomenon, or a book in Borges' library, that appears random merely because we do not yet understand it. If the cause of a phenomenon is unknown, we can never be sure we are not underestimating its depth and overestimating its randomness. The uncomputability of depth is no great

hindrance in a theoretical setting, because uncomputable functions are no harder to prove theorems about than computable ones. Indeed the chief use of computational measures of complexity, such as logical depth, may be to guide the search for mathematical assumptions and physical conditions under which self-organization, appropriately defined, can be proven to Occur.

Acknowledgements The author wishes to acknowledge three decades of discussions and suggestions on the nature of complexity from Ray Solomonoff, Rolf Landauer, Gregory Chaitin, Dexter Kozen, Gilles Brassard, Leonid Levin, Peter Gacs, Stephen Wolfram, Geoff Grinstein, Tom Toffoli, Normal Margolus, Joe Shipman, and others.

References [1] C.H. Bennett, Information, dissipation, and the definition of organization, in: Emerging Syntheses in Science. David Pines, ed. (Addison-Wesley, 1987). [2] C.H. Bennett, Logical depth and physical complexity, in: The Universal Turing Machine, a Half-Century Survey, ed. Rolf Herken (Oxford Univ. Press, 1988, pp. 227-257). [3] J.-L. Borges, Labyrinths: Selected Stories and Other Writings, D.A. Yates and J.E. Irby, eds. (New Directions, New York, 1964). [4] G. Chaitin, A theory of program size formally identical to information theory, J. Assoc. Comput. Mach. 22 (1975) 329-340. [5] E. Fredkin and T. Toffoli, Intern. J. Theor. Phys. 21 (1982) 219. [6] G. Chaitin, Algorithmic information theory, IBM J, Res. Develop. 21 (1977) 350-359, 496. [7] P. Gacs, J. Comput. Syst. Sci. 32 (1986) 15-78. [8] P. Gacs and J. Reif, Proc. 17th ACM Symp. on the Theory of Computing (1985) pp. 388-395. [9] P. Geroch and J. Hartle, Found. Phys. 16 (1968) 533. [10] A.N. Kolmogorov, Three approaches to the quantitative definition of information, Prob. Inf. Trans. 1 (1965) 1-7. [11] L.A. Levin, Randomness conservation inequalities: information and independence in mathematical theories, Inform. Control 61 (1984) 15-37.

C.H. Bennett / Physica D 86 (1995) 268-273

[12] C. Moore, Phys. Rev. Lett. 64 (1990) 2354. [13] Dale Myers, Nonrecursive tilings of the plane II, J. Symb. Logic 2 (1974) 286-284. [14] S. Omohundro, Physica D 10 (1984) 128-134. [15] Charles Radin, J. Math. Phys. 26 (1985) 1342; J. Stat. Phys. 43 (1986) 707. [16] J. Shipman, Workshop on Physics and Computation (IEEE Computer Society, Los Alamitos, NM, 1993) pp. 299-314.

273

[17] A.K. Zvonkin and L.A. Levin, The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms, Russ. Math. Surv. 256 (1970) 83-124.