Quo vadimus? — Much hard work is still needed

Quo vadimus? — Much hard work is still needed

PHYSICAD ELSEVIER Physica D 120 (1998) l - l l Q u o v a d i m u s ? - M u c h hard w o r k is still n e e d e d T o m m a s o Toffoli 1 Boston Univ...

817KB Sizes 2 Downloads 146 Views

PHYSICAD ELSEVIER

Physica D 120 (1998) l - l l

Q u o v a d i m u s ? - M u c h hard w o r k is still n e e d e d T o m m a s o Toffoli 1 Boston University, Electrical and Computer Engineering, Boston, MA 02215, USA

Abstract Physical aspects of computation that just a few years ago appeared tentative and tenuous, such as energy recycling in computation and quantum computation, have now grown into full-fledged scientific businesses. Conversely, concepts born within physics, such as entropy and phase transitions, are now fully at home in computational contexts quite unrelated to physics. Countless symposia cannot exhaust the wealth of research that is turning up in these areas. The "Physics of Computation" workshops cannot and should not try to be an exhaustive forum for these more mature areas. I think it would be to everyone's advantage if the workshops tried to play a more specialized and more critical role; namely, to venture into uncharted territories and to do so with a sense of purpose and of direction. Here I briefly suggest a few possibilities; among these, the need to construct a general, model-independent concept of "amount of computation", much as we already have one for "amount of information". I suspect that, much as the inspiration and prototype for the latter was found in physical entropy, so the inspiration and prototype for the fbrmer will be found in physical action. © 1998 Elsevier Science B.V. All rights reserved. Keywords: Physics and computation; Reversible computation; Quantum computation; Computation capacity; Action

1. Introduction According to the legend, as Peter was fleeing Rome to escape Nero's persecution Jesus appeared to him. "Quo vadis, Domine"? ("Lord, where are you going"?), asked Peter. "To Rome, to do the work you are leaving undone"! Spurred by these words, Peter turned back - to share his Christian flock's tribulations and eventually find martyrdom in Rome. Where are we going? Judging by body count, PhysComp96 has been an unreserved success. Clearly, people interested in the connections between physics and computation can today come out in the open and practice their cult without fear of persecution; better i E-maih [email protected].

still, some have been able to turn their vocation into a substantial scientific business. Twenty years after being tabled [3], reversible computation is beginning to show up in practical circuit designs [1,30]. Its offspring, quantum computation, has opened up a new and vigorous chapter of mathematical physics; she is also being courted by defense and industry for her reputedly fabulous - though as yet unsubstantiated - dowry. Perhaps owing to shrinking employment in their own discipline, some physicists have become selfappointed (and successful) consultants in many profane endeavors of a computational nature, such as neural networks [8] and optimization by simulated annealing [12]. What's more, even pure logic, when taken in bulk, appears to display a phenomenology

0167-2789/98/$19.00 © 1998 Elsevier Science B.V. All rights reserved. PIh S0167-2789(98)00040-2

T. Toffoli/Physica D 120 (1998) 1-11 (universal classes of behavior, self-organized criticality, phase transitions [9,13], etc.) that falls squarely within the "new" statistical mechanics. Much as entropy, invented by physicists, has grown into a much more abstract and general concept - essentially belonging to logic [11] - of which physical entropy is but a special case, thus many classes of behavior, originally discovered within statistical mechanics and thought to be "laws of physics", are turning out to be such only because they are special cases of "laws of logic", or combinatorial tautologies. Thus, the more the computer scientist turns to physics to ascertain the resources and the limits of computation, the more the physicist turns to computer science to help describe what's really going on at the bottom of physics. A synergy has been established that is beneficial to both fields. So, we should be satisfied, l ' m not. Besides a solid institutional core, PhysComp displays two wings that, for quite different reasons, represent distraction or at least noise. On one hand, there is an attempt to make it a forum for vague "interdisciplinary" philosophizing, risking to turn it into a refuge for ideas that cannot stand the discipline of competitive peer review. On the other hand, there is steady pressure to make it one more venue for activities that are already very successful and well-represented elsewhere - the "Bulletin of Scientific Stripmining", as it were. To paraphrase John Keynes, the role of an institution like PhysComp is not to address areas that are well served by the market - and to strive to serve them even better; its role is to identify areas that are important but are poorly served by market mechanisms. If this will mean a smaller and less visible PhysComp workshop, so be it! In this paper I will try to identify questions that perhaps have not even been asked yet, and nevertheless may be closer to the original PhysComp spirit that many of the questions about which we now know a lot. In so doing, I will be outlining the program of what I would like the next PhysComp workshop to be.

2. Defensive computing Reversible computing's concern is usually presented in terms of energy dissipation, but one could express it in an equivalent way, namely, Is there a general policy to minimize the amount of noise produced by a computation? The answer is, Yes, but we cannot just order NAND gates that dissipate less; we need different logic primitives and a different organization of the computation. The algorithm the computer will run will no longer have the single goal of manufacturing the desired results as quickly and economically as possible; it will also have to be in charge of running its own "energy recycling" business. We can achieve less noise at the cost of more infrastructure. Wonder whether there may be an analogous approach to minimizing the effects of noise on a computation. As most human activities, computation turns free energy into heat. Landauer's original question [15] was, "To what extent is this empirical fact a logical necessity? Irreversible computing operations, for one, must dissipate an amount of energy proportional to their degree of irreversibility". One answer, starting with Bennett [3], was, "In principle, one can compute without irreversible operations, and thus evade that particular necessity to dissipate". It turns out that to carry out Bennett's precepts one has to make energy recycling (or more correctly, entropy recycling) part and parcel of the computation itself. This requires additional computational infrastructure (which in turn will tend to dissipate unless special precautions are taken) and incurs a number of other "regulation" costs; nonetheless, in certain contexts where energy is a scarce resource (e.g., portable computing) or thermal pollution is severely penalized (e.g., fast, high-density computing), some recycling in this sense may be beneficial. Note that total recycling of energy in computation is only an ideal, as the cost of infrastructure grows more and more steeply as we approach 100% efficiency. Thus, Bennett's approach can be practically summarized as follows: If we have a circuit originally drawn with 10 ordinary gates, then a new circuit consisting of perhaps 100 gates, but of which only one is

T. Toffoli/Physica D 120 (1998) l II ordinary and the other 99 are invertible, may perform the same function while generating less noise than the original one. Now let us look at the other side of the coin. Instead of asking what kind of computing machinery is less likely to dump noise into the environment, we may ask what machinery is less likely to be affected by the noise already in the environment. Again, this is not simply a matter of ordering gates having a higher noise margin. We may have to favor certain logic primitives over others (just as we favored reversible primitives for "nondissipative" logic); and to compute the same function we may have to devise an alternative algorithm, perhaps one incorporating "error recycling" infrastructure. We know that passive storage media (spins in a crystal, magnetic domains on a hard disk) may be very resistant to the buffeting of noise; a potential well a few k T deep will keep a bit stable for a long time (cf. [23]). Trivially, we may think of such a bit as a circuit consisting of one identity gate (just a piece of wire) through which data are continually recirculated. In a similar way, the piece of copper wire or coaxial cable that connects two ordinary gates can be thought of as a string of identity gates, and a properly set up 1 • 1 transformer can be thought of as a logical inverter. These passive "gates" are linear: low-level noise affecting the input signal will travel through the gate as if it were simply superposed to the signal itself, and will appear at the output in a way that is independent of the signal's value; in other words, here noise is additive. By contrast, an ordinary AND gate is designed so as to display a very nonlinear response. When both inputs are near the switching threshold, the effect on the output of noise at the inputs is extremely sensitive to the actual value of the inputs themselves; the gate is caught, as it were, "with its pants down". Can we redesign a circuit so that the number of such embarassing occurrences is minimized? My surmise is that while production of noise is tied to irreversibility, sensitivity to noise is chiefly tied to nonlinearity; to the extent that this is true, reducing the number of nonlinear gates in a circuit - even at the cost of increasing the total number of gates - should, in an appropriate context, lead to better resistance to

3 CO

i .......................

i

z

!

z+q~

Fig. 1. Invertible analog gate. The signal values are arbitrary phases; the phase of the bottom line is shifted by an amount cb = f(co, Cl), where f is a continuous function. noise. In the rest of the section I will sketch a scenario in which this approach indeed works. In most realizations of digital logic, the two discrete Boolean values 0 and 1 are actually distinguished values on a continuum, such as the 0 V and 5 V of the TTL logic family. In [24], we discussed logic "families" where the continuum in which these logic values are embedded has the topology of the circle rather than that of the line, so that 0 and l can be identified with the phase angles 0 ° and 180 °. 2 In this setting, and incorporating the reversibility constraints mentioned above, a digital gate is just a special case of an analog gate organized as in Fig. 1 (the generalization from 2 to n control lines is immediate). Box f computes an arbitrary continuous function of the two control inputs cl and c2; the result, 4~ = f ( c l , c2), is added as a phase shift to the input value x of the controlled line, yielding the output value x + ~0. Even though the adding of a phase shift to the x signal is represented in Fig. 1 as taking place in a lumped fashion, in an actual device we may assume that the phase increment q5 will be distributed over the time T that x takes to traverse the gate. If the control inputs are affected by noise, the overall phase shift will be of the form f r 4~(t) dt. Suppose now that the noise on Cl and c2 is high-frequency noise that approximately averages out to zero during time T. In general, the resulting noise on ~b will not average out to zero; however, it will do so if f is linear. Thus, linear gates are insensitive to this kind of noise. If, all other things being 2 This approach was generalized by quantum computation [2,51, which embeds the two values 0 and 1 as phase angles in the "Bloch sphere", i.e., the set of pure states for a spin/ particle. Also, see yon Neumann's patent [29] for an early computing scheme using phase angles.

Z Toffoli/Physica D 120 (1998) 1 I1

equal, we can rearrange the computation so as to use fewer nonlinear gates - even at the cost of increasing the total number of gates and using a more complex algorithm - we will still be better o f f a s far as this kind of noise is concerned. For example, the x o R function, which is linear, is usually realized in conventional circuitry by means of four NAND gates, which are nonlinear; this usage of nonlinear elements is clearly not essential and could be dispensed with. Note that linear gates by themselves are not computation-universal. Thus, the issue is not to eliminate nonlinearity, but avoid nonessential uses of it. In conclusion, just as we found ways to adjust our computational habits so as to produce less garbage, so we should study how to adjust our habits so as to be as transparent as possible to the garbage that surrounds US.

3. Launching and catching signals in conservative computation There is much ongoing research in what one may call conservative computation [1,22,30] - where the energy of signals is as far as possible just "borrowed" in order to perform switching tasks, rather than spent and regenerated at every step as in conventional circuitry. Conservative computation is not restricted to reversible computation. 3 In a typical conservative computation scheme, a bit is represented by a charge that normally resides in a capacitor, and is moved from one capacitor to the next either by switched inductors or by switched resistors. 4 Here we raise the problem of connecting separate conservative-computation chips by transmission lines. This problem is representative of the difficulties one faces in conservative computation; to be effective, the conservative approach cannot be limited to just some components, but must be extended to the entire computer.

3 The two are independent concepts, though strongly interrelated in purposes and means. 4 Inductors are still impractical in integrated silicon; with resistors, the dissipation due to inrush current can be minimized using some clever multi-phase, quasi-static procedure.

\¥hat~s wror~ith this p i c ~ u r e ~ (~) And what's w r o n ~ (b)

Zo = 75 ~

il000 ~

this p i c t u r e ~

Z~- 7 ~

_~75~

Fig. 2. Unmatched (a) and matched (b) cable termination at gate input. In (b), perfect termination occurs if the gate has very high impedance; but then virtually all of the signal energy ends up in the resistor rather than in the gate. If we ask a computer engineer what's wrong with Fig. 2(a), he will immediately answer, "Impedance mismatch! A cable should be terminated by an impedance equal to its characteristic impedance, so that no energy will be reflected back". By this token, Fig. 2(b) is "right"; all of the signal energy is absorbed by the 75 f2 terminating resistor. The resistor? But was not that energy meant for the gate? What are we doing, making sure that the signal we have so laboriously crafted in an earlier part of the circuit gets turned into heat as instantly and thoroughly as possible just as we were about to make use of it? What's going on (and this is real engineering practice) is that in complex circuitry cleanliness is next to godliness: no unclaimed or partially spent signals must be left around. The simplest way to do this is to let a signal travel virtually untouched down an unbranching transmission line, and then thoroughly destroy it at the end of the line. A gate along the line can "see" the signal but not "touch" it; that is, it is designed so as to have almost infinite input impedance. As a consequence, the gate will only be able to grab a minuscule fraction of the signal's energy. It will then have to greatly amplify this fraction (thus drawing power from the supply) in order for the signal to be usable for switching purposes, and finally further amplify the logic output so as to produce a standard-strength output signal. In conservative computation, by contrast, signals are never destroyed; 5 consequently, any portion of a circuit will have exactly as many outputs as inputs. 5 In practice, once in a while they may have to be boosted or regenerated to compensate for inevitable losses.

T. Toffoli/Physica D 120 (1998) 1-11

b

'i

.

/,

5

I

Fig. 3. A conservative-computation circuit can be viewed as a collection of shift registers running parallel to one another. Through a Fredkin gate, the data in one control line determine whether the data in two controlled lines are swapped or go straight through. All logic reduces to conditional routing of signals. A circuit may be conveniently thought of as a collection of shift registers running parallel to one another and interacting with one another via conditional permutations ("Fredkin gates" [7]), as in Fig. 3. In this

Fig. 4. In this LC shift registers, discrete charges representing bits hop from capacitor to capacitor through inductors. Charge movement is regulated by a 4-phase switching sequence. Starting the cycle when all switches are open and charges are at rest in the capacitors, (a) Close switches 0 and 2. Current builds up in the inductors. (b) When the capacitors are discharged and current is at peak. close 1 and open 0, isolating the capacitors. Now current recirculates in the inductors. (c) Open switch 2 and close 3. Energy flows rightwards from inductors into capacitors. (d) When capacitors are fully charged and current is zero, open 1 and 3, completing the cycle.

scheme, a capacitor need not be only a parking place for a b i t - a passive memory element. A capacitor may be part of a gate: the charge stored in it ("control signal") modulates the gate's Hamiltonian without being,

T

in first approximation, affected by the controlled signals. Typically, this role is played by the gate capacitance of an FET transistor. In the inductive switching scheme [6], with ideal circuit elements, a lossless, data-blind transfer of charge from one capacitor to the next can be accomplished in finite time as in Fig. 4. This is a local transfer, done by means of discrete (lumped-parameter) circuit elements. Suppose now that the source capacitor in an in-

Fig. 5. (a) Transmission line directly switched onto the capacitors. (b) Interposing matching networks 4}to achieve a smoother energy transfer.

tended charge transfer is the last of a shift register on

in the trasnsmission line; a more gentle way to launch the charge is needed (Fig. 5(b)). Inserting an inductor

one chip, and the destination capacitor is the first of a shift register on another chip which may be some dis-

of appropriate value as a matching network already makes a big improvement (Fig. 6(b)). Adding a few

tance away. We would like to connect the two chips by a transmission line (a distributed-parameter circuit element), so that in one clock cycle 6 the entire charge will be transferred from one capacitor to the other. To

more elements further improves the situation; even though a complete transfer of energy is only achieved in the limit for an infinite matching ladder, virtually all the energy can be transferred using a ladder of just a few elements, with little delay beyond that intrinsic to the transmission line itself, as shown in Fig. 7.

what extent is this possible? Simply switching the transmission line in and out, as in Fig. 5(a), leads to unacceptable losses - less than 75% of the charge (and 50% of the energy) make it across the gap (Fig. 6(a)), and the rest is left sloshing 6 Plus, of course, the latency of the delay line, which can be made an integral multiple of the clock cycle.

In conclusion, in order to achieve the prospected efficiency gains, it is not enough to show that a circuit is reversible or conservative just as a digital circuit. A plausible way must be shown to smoothly interpolate the jerkiness of digital b e h a v i o r - watch the gas pedal: no infinite accelerations!

T. TofJbli/Physica D 120 (1998) l 11

(a)

2

(/

T

1

1

2 [

.736

1/

1896

2

T

One-pole

(b)

T

T 0 0

Two-pole

11 5 2 t - ~

(c)

TT

TT 1

Three-pole

2

Fig. 6. Charge transfer as a function of time, with 1-. 2-, and 3-pole networks.

2

3

r?

00

1

'

Fig. 7. Energy transfer efficiency versus delay, for increasingly longer ladder matching networks. 4. Q u a n t u m c o m p u t a t i o n

The two foregoing sections naturally lead to some questions about quantum computation. From a theoretical point of view, quantum computation research has led to an explosive, impressive development of certain chapters of mathematical physics ([2,20]; also compare Peres's textbook [19] with earlier approaches

to conceptual quantum mechanics). This is not unlike the understanding that reversible computing had yielded on connections between computation and thermodynamics. From a practical viewpoint, however, we must remember that, if plain reversible computation is beset by so many difficulties, quantum computation is even more demanding. Only extraordinary claims may induce us to forget for a moment the difficulties ahead. And the claims, or at least the veiled hopes, are indeed extraordinary: the vision of exponential speedup is hard to put down. What I would like to see is plausible reasons why this untapped power should be there to begin with; and this inevitably impinges on the interpretation of quantum mechanics. One may say, to paraphrase David Deutscb, "Of course there is an exponential amount of computing power down there, as there are an exponential number of parallel worlds that we can tap!" To which I would reply, "Even if we granted that, what makes you believe that, as soon as we open the door to those parallel worlds, people in those worlds will not try to use our world to do their computations in?" Mine, of course, is not a physical argument; just an argument from symmetry. It parallels that made by Feynman about damping a ballistic galvanometer mirror [4]: we may try to exploit the enormous number of extra degrees of freedom that are "down there", but in the end we are at their mercy. Even as schemes for quantum computation are being vigorously developed, common sense dictates that we explore two classes of questions: - Are not there any exponential costs lurking some where that would offset the prospected exponential benefits [16,17]? - What is the fraction of problems that would benefit from quantum algorithms? I would not be surprised if it were a set of "measure zero" - even though this set may contain some important problems. Most problems in combinatorial optimization are of exponential complexity, if we want an exact answer; the complexity can be lowered from exponential to polynomial, in special cases, if we slightly reformulate the problem so as to accept for good an almost optimal answer [27]. I would not be disappointed at all if, likewise, quantum computation turned out to be

T. Toffoli/Physica D 120 (1998) 1-11

a special-purpose approach that dramatically lowers the complexity on a small class of problems [18]. In conclusion, along with the question "Can it be made to work?", I would like to see explored the questions "What new physical resources are we using that would make it plausible for it to work at all?" and "If it works, what can we use it for?"

5. R e l a t i v i t y

Computation may provide the key to derive some principles of physics from simpler, more general and intuitive principles, just as information did for thermodynamics. There, a phenomenological discipline was replaced by a machinery for "prediction with incomplete information" [ 10,11 ] (i.e., statistical mechanics) whose form is to a great extent independent of physics even though its applicability to physics is impressive. Here I will give a toy example of reduction of physics to computation. Consider a "canonical" computer consisting of a large lookup table, seen as the "program", and a " c p u " that just scans this table and outputs its contents one bit at a time. Almost all programs of this computer are random bit generators with a 50 : 50 distribution for the two bit states - to which we will assign numerical values +1 and - 1 . Let a particle start at the origin and describe a onedimensional random walk as per the instructions of this random bit generator. As a function of the number t of steps, the statistical behavior of the particle is given by a distribution with center of mass tt = 0 and spread a = x/7. Alternatively, we may imagine an ensemble of particles diffusing as a cloud completely characterized, at a macroscopic level, by the paramet e r s / J and a . Suppose now that we restrict our attention to only certain programs, namely, those with a given distribution {p, q} for 4-1 and - 1 . We have 1 = p 4-q,

/4 = p - q,

or

p = (l 4-

[3)/2,

q = (1 - [3)/2.

!

,/4,qt =

[32

(2)

Thus, as the cloud's velocity goes up to 3, its rate of internal evolution (that's what the spread measures) slows down by a factor of x/1 - 3 2. Is it just an accident that this factor coincides with the well-known Lorentz time dilation? 7 Let us reason this way. At the microscopic level, the particle moves one position at every time step, no matter what; in this sense, it has a fixed amount of computational resources. Macroscopically, w e may find it convenient to separate this amount into two parts: (a) that which is used for the c o h e r e n t motion of the cloud, that is, the velocity of its center of mass, and (b) whatever is left, which is responsible for the i n c o h e r e n t motion of the particle, that is, the spread of the cloud with respect to its center of mass. Once we choose p, and thus 3, we can "undo" the effect of the coherent part of this computation by just moving our frame of reference at a velocity [3; what is left is just the incoherent part, which is less than the whole and decreases as/4 increases. (In fact, if we pick [3 = 1, then all of the computational resources are used for the coherent motion, and the internal time of the cloud stops.) From this viewpoint, a Lorentz-like time dilation is but an elementary consequence of what one may (pompously) call a "principle of invariance of computational resources"; simply, by choosing /4, we slice the resource pie into two parts ("coherent" and "incoherent") as we please, but, no matter what we do, the sum of the two pieces remains the same; at bottom, a combinatorial tautology. It is also clear that we must get a time dilation factor that is 1 when /4 = 0, and goes down to 0 when/3 = 1. The only thing that may surprise us is that with this toy model we stumbled

(1)

In this new situation, the position of the center of mass will be given by the equation kt = [3t,

i.e., the cloud will now move as a whole at a velocity ft. How will the rate of spread be affected? We will have

7 In the Lorentz transformation, what is transformed is the system's state, not its law as we do here. In a slightly richer and more realistic toy model (cf. [21]), the randomness comes from the data rather than from the program.

T. Toffoli/Physica D 120 (1998) 1-11 exactly on the factor ~/1 - f12, but this can be shown to be just a matter of kinematics. 8 In conclusion, I wonder how much more of physics we can simplify - and unify - by abstracting away certain details and looking at what is left as a general process of combinatorial dynamics - in other words, computation.

Microscopically, a Mac and a PC are very different; however, their macroscopic behavior can be programmed to be essentially the same. In fact, we can run the "same" spreadsheet program on the two platforms, hitting the same keys and getting the same responses in the same time. When we say "the same program", we really mean programs that are microscopically very different, but the differences are such that they cancel out the differences in hardware. In some sense, the two computers have the same "computation capacity" just as a telephone line and a rubber hose have the same "communication capacity", so that with appropriate transducers (a microphone, a horn) one may speak indifferently through one or the other. Communication capacity was satisfactorily characterized by Shannon; it seems much harder to arrive at a reasonable characterization of computation capacity. One reason for this may be that a theory for computation capacity is bound to be a nonlinear theory. To see how this comes about, imagine two users, A and B, wanting to share a communication channel and apportion its cost. They will need an input encoder, which will merge the two input streams, and likewise an output decoder, to split the output into two streams. Given unlimited and free computational resources, encoder and decoder working together with the channel can simulate two independent channels, one for A and one for B, whose capacities add up to that of the original channel (Shannon's theorem). Now, suppose that A and B want to share a computer rather than a

communication channel, using again an encoder and a decoder to merge inputs and split outputs. But now the encoder and decoder use the same kind of resources, namely, computational, as the resources that have to be shared. To get the most of the computer's capacity, one may have to spend a large amount of additional computation capacity on encoder and decoder. It is like going from Newtonian mechanics to general relativity, where mass creates gravitational field which in turn has mass, and so forth; the "dressed" mass is an unknown of the model rather than a given. Much as communication capacity basically counts how many different states a system can be in, 9 I propose elsewhere [26] that computation capacity should be identified with how many (on a log scale) different behaviors a system can display. More generally, if we call a computational histor)~ an assignment of values to all inputs, outputs, and any internal registers that we want to count as part of the capacity to be measured, and we assign a distribution over the set of computational histories, the computation capaciO, of this computer (the distribution over histories constitutes the "specs" of a computer much as a distribution over states constitutes the "specs" of a channel) is defined as the entropy of the distribution itself. (Much as Shannon had good reasons to call entropy the scale in which communication capacity is measured, there are good reasons to call action the scale for computation capacity.) What I want to show here is how from this very abstract, model-independent definition, one gets values for computation capacity that make good sense from a physical point of view. Consider a "canonical" computer consisting of an ordinary PROM (Programmable Read-Only Memory) (Fig. 8). This is a network consisting of a single node with m input lines called collectively the address, n output lines called the output, and k more input lines (where k = n2 m) called the program. The distribution of histories for an ideal PROM is the following. Write the program as a table of 2 m rows and m columns. For every program p (i.e., an assignement of binary

8 In the discrete diffusion process, which has discrete spacetime points, x/l - f12 reflects the invariance of the number of such points when expressed in different reference frames.

9 The log scale in the definition of entropy, its ability to take into account states with different weights, and the discounting for error rate are just technical details.

6. Computation capacity

Z Toffoli/Physica D 120 (1998) 1-11

1J

n1

X

D

p

n

Fig. 8. The PROM as a canonical computer, with m lines of address x, n lines of output y, and n2 'n bits of program p.

values to all k program lines) and address x, take the 2 n histories corresponding to the possible values of the output y, and mark that in which y coincides with the xth row of p (the collection of marked histories represents the "correct" behavior, as per specs, of the ROM). Finally, assign equal weights to all marked histories (there are 2m+k of them) and zero weight to all others, and normalize to 1. What is the action A, or the "computational worth", of this canonical computer? Using base-2 logs for evaluating entropy, we get A =k+m

=n2 m +m.

The overwhelming contribution is given by n2 m, which is simply the number of PROM program bits; this is the log of the number of distinct functions that our computer can compute, or the number of "different things it can do". The small correction m is proportional to the depth of the addressing decoding tree. Note that in a real PROM chip the programmable bits take up most of the chip's real estate; the decoding tree occupies a much smaller area and consists of a fixed - nonprogrammable - gates. Thus, our estimator A of computational worth is fully consistent with the market reality, where PROM chips of different organization (for example, 32- versus 8-bit wide) but with the same number of programmable bits and the same speed have essentially the same size, cost essentially the same, and have a comparable market share. We started with an abstract, model-independent definition, which did not know about transistors, area, speed, etc., and we ended up with a gage that makes perfect sense from a physical point of view.

Suppose now that we want to estimate the degradation of computational worth if the chip becomes less-than-ideal; for instance, a stuck address or output line, a stuck program bit, a probability of error in programming, addressing, or read-out - or even a nonuniform distribution of possible address patterns (e.g., three-quarters of the accesses are expected to be in the lower half of the address range). We just modify the probability distribution so as to reflect the new "specs" and the new usage context, and from the same formula we get the computation capacity of the new situation. In conclusion, I believe that we are getting close to having a usable definition of computation capacity, and I would like to encourage work in this direction. This, incidentally, would have useful application in setting standards and estimating the market value of a chip or an architecture.

7. Emergent computation and universal analog computation Today, when we speak of computation we commonly think of what happens in a digital computer with a v o n Neumann architecture (cPU and RAM). Of course, our brain, the cell machinery, or any large bureaucracy are living proof that there are other ways to compute effectively. So far, the success of the sequential-computation approach ("Forget about massive parallelism; just concentrate on making the single microprocessor faster and faster!") has been so outrageous that comparatively little energy has been invested in exploring the alternatives. However, the very unqualified success of miniaturization ("denser and faster!") may force us to investigate the potential fine-grained parallel computation and, more generally, as soon as we introduce noise and tolerances, of computation in bulk, amorphous media. I believe the two approaches will eventually prove to be complementary rather than mutually exclusive. I discuss motivation and issues of emergent computation in more detail elsewhere in this issue [14]. What I would like to stress here is that a theory of

10

T. Toffoli/Physica D 120 (1998) 1-11

computation outside of the three "D"s (discrete circuits, discrete signal values, and discrete time) is prac-

Biafore and Joao Lefio for their help in organizing it.

tically nonexistent.

Institute for Scientific Interchange Foundation (Turin, Italy) and by NSF (9305227-DMS). I would also

Given, say, the turbulent regime of a fluid between two planes sliding past one another, how would one go about determining whether this medium has uni-

Partial funding for this research was provided by the

like to thank Elsag-Bailey and the ISI Foundation

versal computing capabilities? Areas of research that

for their research meetings in quantum computation, which have been such a fertile ground for research on

were suffocated by the advent of the digital com-

physics and computation.

puter (see, for instance, [28]) may have to be revived and newly cultivated. In this context, one noteworthy aspect of the measure of computation capacity mentioned in Section 6 is that it can be indifferently applied to discrete or continuous systems, to deterministic or stochastic ones, to ones with lumped or distributed parameters. Before investing much in trying to domesticate a medium for computational purposes, it may be worthwhile to know an upper bound to how much computation one might possibly extract from it,

8. Conclusions Given a kit of computational primitives - be it a programming language, a set of axioms, a family of gates, a LEGO 10 assortment, or the Hilbert space of many spin -1 particles - it is in our nature to try to explore and exhaust its combinatorial possibilities. Though PhysComp may play a useful role in publicizing the first few interesting constructions with a novel kit, I believe its primary role should be in stimulating the emergence of new kits (new parts, new questions, new objectives) within the realm of physics. We know we can compute with a computer; but what makes a computer?

Acknowledgements I would like to thank Edward Fredkin, Rolf Landauer, G~inter Mahler, and all the members of the PhysComp96 organizing committee for lending their support to the workshop, and my collaborators Mike 10That of the trademark.

References [1] W.C. Athas, L.J. Svensson, J.G. Koller, N. Tzartzanis, E. Chou, Low-power digital systems based on adiabaticswitching principles, IEEE Trans. VLSI Systems (1994) 398-407. [2] A. Barenco, C. Bennett, R. Cleve, D. DiVincenzo, N. Margolus, E Shor, T. Sleator, J. Smolin, H. Weinfurter, Report on new gate constructions for quantum computation, Phys. Rev. A 52 (1995) 3457-3467. [3] C. Bennett, Logical reversibility of computation, IBM J. Res. Develop. 6 (1973) 525-532. [4] R. Feynman, R. Leighton, M. Sands, The Feynman Lectures on Physics, vol. I, Addison-Wesley,Reading, MA, 1963. [5] R. Feynman, Quantum-mechanical computers, Opt. News 11 (Feb. 1985), 11-20; reprinted in: Foundations Phys. 16 (1986) 507-531. [6] E. Fredkin, T. Toffoli, Design Principles for Achieving High-Performance Submicron Digital Technologies, proposal to DMt;'A (November 1978). [7] E. Fredkin, T. Toffoli, Conservative Logic, Int. J. Theor. Phys. 21 (1982) 219-253. [8] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. [9] J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Nat. Acad. Sci., USA 79 (1982) 2554-2558. [10] E. Jaynes, Information theory and statistical mechanics, Phys. Rev. 106 (1957) 620-630; 108, 171-190. [11] E. Jaynes, Probability theory as logic, in: E Foug~res (Ed.), Maximum Entropy and Bayesian Methods, Kluwer, Academic Publishers, Dordrecht, 1990, pp. 1-16. [12] S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671-680. [13] S. Kirkpatrick, B. Selman, Critical behavior in the satisfiability of random Boolean expressions, Science 264 (1994) 1297-1301. [14] ER. Kotiuga, T. Toffoli, Potential for computation in micromagnetics via topological conservation laws, Physica D 120 (1998) 139-161. [15] R. Landauer, Irreversibility and heat generation in the computing process, IBM J. 5 (1961) 183-191. [16] R. Landauer, Is quantum mechanics useful? Phil. Trans. R. Soc. London A 353 (1995) 367-376.

T. Toffoli/Physica D 120 (1998) 1-11 [17] R. Landauer, Is quantum mechanically coherent computation useful? in: D.H. Feng, B.L. Hu (Eds.), Proceedings of the Drexel Fourth Symposium on Quantum Nonintegrability - Quantum Classical Correspondence, International Press, Boston, in press. [18] R. Landauer, The physical nature of information, Phys. Lett. A 217 (1996) 188-193. [19] A. Peres, Quantum Theory: Concepts and Methods, Kluwer Academic Publishers, Dordrecht, 1993. [20] J. Schlienz, G. Mahler, Description of entanglement, Phys. Rev. A 52 (1995) 4396-4404. [21] M. Smith, Cellular automata methods in mathematical physics, Tech. Rep. MIT/LCS/TR-615, MIT Laboratory for Computer Science (May 1994). [22] E Solomon, D.J. Frank, The case for reversible computation, Proceedings of the 1994 International Workshop on Low Power Design, Napa Valley, CA, pp. 93-98. {231 J.A. Swanson, Physical versus logical coupling in memory systems, IBM Journal 4 (1960), 305-310; posthumous, completed by R. Landauer.

11

[24] T. Toffoli, Bicontinuous extension of reversible combinatorial functions, Math. Syst. Theory 14 (1981) 13-23. [25] T. Toffoli, in: de Bakker, van Leeuwen (Eds.), Reversible Computing, Automata, Languages and Programming, Springer, Berlin, 1980, pp. 632-644. [26] T. Toffoli, How much of physics is .just computation? in: Superlattices and Microstructures, Academic Press, New York, to appear. [27] J. Traub, H. Wo2niakowski, Breaking intractability, Sc. Am. 270 (1) (1994) 102-107. [28] A.G. Vitushkin, Theory of Transmission and Processing of Information, Pergamon Press, Oxford, 1961. [29] R.L. Wigington. A new concept in computing, Proc. IRE 47 (1959) 516-523. [30] S. Younis, T. Knight, Practical implementation of charge recovering asymptotycally zero power CMOS, Proceedings of 1993 Symposium on Integrated Systems, MIT Press, Cambridge, MA, 1993, pp. 234-250.