IEEE P896
-
the Futurebus project The committee on Futurebus have had some problems, both internally and with the IEEE microprocessor standards committee. P896 secretary, Paul Borrill, describes how the "standard" presently lies and details its probable format development In 1978, the IEEE decided that a new bus standard should be drawn up which would cater for the industry's movement towards multimicroprocessor systems. It was important, though, that this standard should preempt the emergence of a new generation of proprietary buses which become de facto, like Multibus, before they are completely specified. This paper describes the background to P896 (Futurebus), how it has developed and where it stands at the moment. The paper then details the goals and objectives set up for Futurebus and explains why decisions were made. Since the standard has yet to be completed, the paper can only predict, though with some authority, how the standard will progress. microsystems
bus standards
P896
Somewhere there are decisions made that are not rational in any sense, that are subject to nothing more than the personal bias of the decision maker. Logical methods, at best, rearrange the way in which personal bias is to be introduced into a problem. Of course, this 'at best' is rather important. Presentintuitive methods unhappily introduce personal bias in such a way that it makes problems impossible to solvecorrectly. Our purpose must be to repattern the bias,so that it no longer interferes in this destructive way with the processof design, and no longer inhibits clarity of form. Christopher Alexander Notes on the Synthesis of Form
BACKGROUND By the middle of 1978, the IEEE MSC's (microprocessor standards committee) efforts towards developing standard specifications for the P696 (SI00) and P796 (Multibus) buses had made clear, the need to consider future microcomputer system bus requirements before the emergence of yet another generation of de facto but incompletely specified and incompatible buses. The Futurebus working group set up to consider this need concluded that the buses then being specified by other working groups within the MSC did not offer the capability of being extended to satisfy the requirements anticipated for future multimicroprocessor-based systems. The PAR (project authorization request) number P896 was approved by the I EEE Standards board in September 1979 for this working group to develop an a priori standard for backplane buses to meet these requirements. The EDISG (European distributed intelligence study group) set up a subgroup in May 1980 to interact with the IEEE P896 Department of Physics and Astronomy, University College London, Mullard SpaceScience Laboratory, Holmbury St Mary, Dorking, Surrey RH5 6NT, UK
vol 6 no 9 november 1982
committee's work, and while UK experts had been participating in P896 since 1979, it was not until November 1981 when the lEE formed a subgroup of its Working Party on Standards to address the P896 work directly. Although a draft specification D4.1 received a majority approval of the P896 committee in December 1981, the MSC responded to strong technical criticisms of the specification from dissenters within the P896 committee by rejecting the draft in January 1982 and returned it to the P896 committee for reworking. The result of this action was for the chairman to resign, and a period of stagnation for around six months while another chairman attempted to redirect the committee's efforts towards a much higher performance bus. Due to the inadequate time allowed by his employer, the second chairman resigned in August 1982, and the third and current chairman, and a secretary, were appointed to continue with the work. Since August 1982, the P896 committee have picked up speed again and are now well on the way to redrafting the new specification, which is scheduled to be ready for passing on to the MSC by June 1983.
P896 GOALS AND OBJECTIVES Perhaps the single most important objective of the P896 project is to create a bus standard specification which provides 'a significant step forward' in the facilities and performance available to designers of future multimicroprocessor systems. The current goals and objectives of the P896 committee, as discussed at the September 1982 meeting, are given below. System architecture independence
This allows for true multiprocessor systems, and uses a distributed control philosophy which does not rely on any specific cards being present; in effect, this rules out daisychain arbitration and interrupt-acknowledge schemes. Mechanisms will be provided to support multiple memory hierarchies, such as cache and virtual memory. Multiple identical parallel buses will be supported; operations extending beyond the local backplane environment are, however, considered as store and forward operations. This suggests a 'communications-oriented architecture', rather than an 'instruction fetch architecture' although fetching instructions across the bus will always be permitted. The bus should not constrain the system to any specific software or task scheduling mechanism to provide for multiple independent software systems. Full support for the synchronization of parallel processes will be provided.
0141-9331/82/090489-07 $03.00 © 1982 Butterworth & Co (Publishers) Ltd
489
Performance
System configuration
This provides a burst-data-transfer rate of greater than 10 Mtransfer/s. Since the bus is optimized (but not limited to) 32-bit transfers, this translates to a peak system bandwidth of 40 Mbyte/s. To achieve this minimum level of performance, an efficient protocol and a solution to the 'bus driving problem' will be required; both of which are described more fully later.
The specifications will allow a wide variety of system configurations to be supported. As previously mentioned, start-up configuration support will be provided; though this is a requirement necessary for distributed control philosophies anyway. Dynamic reconfiguration will be provided for system flexibility and support of fault-tolerant systems. Methods for controlling individual processors, such as reset, halt and interrupt will be provided along with a simple global reset line to initialize the system to a known state on
Interoperability The bus will be processor- and manufacturer-independent, allowing the combination of various processors of differing types to be used in the same system, and attempting to optimize the interfacing penalty about equally over all manufacturers devices. The bus will also be technologyindependent (as far as possible), so future improvements in device speeds can be readily accommodated over the lifetime of the standard. In principle, this means that electrical characteristics of the bus should not be dependent on a logic type, but be defined by the physics of the backplane environment; again, this is described more fully later.
Fault tolerance This objective requires extensive support for methods of automatic fault detection (parity, etc), diagnosis (automatic fault location) and recovery (system reconfiguration). The design objective for transfer integrity (MTBF) is one correctable error per day and one uncorrectable error per year. Methods of conformance testing to the specification will be considered. Features for ease of maintenance (low MTTR) will include live insertion and withdrawal, mechanisms for confidence testing and autoinitialization, for example, without requiring the use of bit-switches.
Achievable with mid1983 technology
Mechanical implementation The bus will use the 603-2-1EC-CO96-M (DIN 41612) connector and IEC 297 SC48D card size (Eurocard) family. Although standard variations on 19-in rack mechanical specifications allow for only 28 slots (at 0.6 in), the P896 bus will provide logical support for at least 32 masters.
[
i
°°f
° IS,---O o,
¢J//
0
/
I
2
I
3
I
4
,/
:r ooo,, .-B
,or
with
I
5
I
6
l-e-aZo n/R
I
7
)
8
I
9
Figure I. Capacitance versus transmission line models
490
The P896 goals will not be constrained by cost. This does not imply that systems designers need an open cheque to implement even small systems, but that the committee intend to take full advantage of the technology available, and assume that logic (when integrated) is essentially free. It does imply however, that the P896 committee are not intending to optimize the specification to build very small or cheap systems, at the same functional level supported by the STD bus for example.
Longevity The standard will be required to support the design of new systems over a 10-year period, i.e., 1983-1993.
Scope The scope of the specification will be restricted to the physical, link and network layers, in order to maintain flexibility in the design of a variety of system architectures.
There are several fundamental aspects of bus design which have hitherto been either disregarded, because earlier generations of processors did not demand high performance or efficiency of the backplane bus, or avoided, by restricting the kinds of system the bus may be used in. The P896 committee intend to identify and face these problems directly, and to create optimal solutions based on the physics of the backplane and the requirements of a generalpurpose architecture; rather than the peculiarities or limitations of particular devices. This section deals with three of the main problem areas in the design of a backplane bus standard, they are by no means the only problems which exist, but are central to the concepts and attributes required for a bus specification intended to have a design lifetime of 10 years.
Bus driving
I
05 /
Cost efficiency
MAJOR PROBLEMS
Although the bus will be forward-looking, it will not require the development of any new devices or technologies. Bus driver and gate-array type of technologies should easily enable the design and building of prototype systems as soon as the specification is published.
A
a cold start, and in the event of the system going out of control.
.B
I
I0
The ability of a device to drive a backplane reliably, variably loaded with other modules is probably the most widely misunderstood phenomenon among microcomputer system designers. The confusion arises because of the misuse of a capacitance model for the backplane, which is an acceptable model for short buses where the transition time of the signal is long with respect to the two-way propagation delay of the signal down the backplane, and a transmission line model which is a much more faithful model of the behaviour of the signal at all times. When the transition time of the signal becomes comparable with the time the signal takes to travel down the backplane at the speed of
microprocessors and microsystems
Table 1. Comparison of capacitance and transmission line models Capacitance model
Transmission line model
Modelled by an increase Modelled by an in Tpd with an increase increase in transition time with an increase in the number of loads and average capacitance in capacitance (numof each ber of loads)
Propagation delay
Effect of varying the value of a termination resistor tied to the line and 5 V
,
Zo
=
Zo = X/1 + Cd/Co
100
~/1 + 300/18 ~24
where C o ~ 18 pF/ft and Cd ~20 pF every 0.8 in - 3 0 0 pF/ft
Quantum jump in pro-
Continuously variablepropagation delayas a function
the initial step reaches
t = CR
the receiver threshold
pagation speed when point
light and be reflected from the ends, the transmission line model comes into its own, and predicts important effects which the capacitance model does not z . Figure 1 and Table 1 shows a comparison of these two models and the nature of their predictions. In the limit as the signal transition time becomes longer than the two way delay down the backplane, the two models merge together and predict the same result. The transmission line model introduces two new concepts which do not come into the traditional capacitance model. Firstly, the concept of a characteristic impedance, Zo, (more precisely referred to as a characteristic resistance in Iossless transmission line theory). Z o is the impedance seen by the electromagnetic signal as it travels through the air and dielectric material, guided by the conductors which make up the backplane cross section. There are two fundamental methods of construction of conductor tracks on a backplane; stripline and microstrip, as shown in Figure 2. Formulas for the calculation of characteristic impedances for various backplane constructions can be found elsewhere2'3 . Secondly, the propagation delay, Tpd, which implies that the signal emitted by one module on a backplane, takes a finite time to travel down the backplane to other modules. Characteristic impedance (Zo) and propagation delay (Tpd) are described by:
Zo=L.~C ~
ing a Zo of I00 ,Q, and typical values of capacitance (Cd) for TTL transceivers, connectors and plated through holes (20 pF total), the final characteristic impedance Z o' of a fully loaded backplane will be:
Tpd = L ~
where Lo and Co are the intrinsic line inductance and capacitance respectively. A backplane may be regarded as a 'tapped' transmission line, where the relatively low DC loads of the modules plugged into the backplane can be assumed to provide only 'additional capacitance' into the line parameters (providing the electrical distance between loads is small with respect to the transition time of the signal). In this way we can see that the effect of additional loads on the backplane is to reduce Zo. Taking typical values for backplane construction yieldAC ground :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
T ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: AC ground ]
Note that very little can be gained by starting from a higher unloaded Zo, since the module capacitance dominates (starting from a Z o of 125 ~ gives only 3 ~ difference in Zo'). Also, each device must drive two microstrips in parallel, if it is situated on a module in the center of the backplane, giving an output drive requirement into 12 ,Q. In order to cater properly for the transmission line effects predicted by this model, each end of the transmission line should be terminated in a resistance as close as possible to its characteristic impedance so as to minimize reflections. Naturally, in a backplane bus system where Z o' varies widely due to the number of modules plugged into it at any one time, it is very difficult in practice to accurately match Z o with its proper termination resistance at all times. However, in a digital system, what happens to the signal above and below the defined logic thresholds is of little importance, and providing the termination and the drivers can support the proper transition of a signal through this predefined receiver transition region, the system will work correctly. To drive such a backplane properly, the driver must be able to switch a minimum current specified by the threshold voltage required by the receiver (typically 1.2 V), and the resultant Zo' , i.e. 1.2/12 ~ 100 mA. If a Tristate device is used, then the total current switched is the sum of the current the driver was sinking at DC, before the transition, and the current switched into the line (source) during the transition by the upper transistor. If a weak (48 mA) sink capability is used, then the upper transistor must be able to source the rest, i.e. 100-48 = 52 mA, just to provide an initial step to reach past the receiver transition region (this assumes also that the bottom transistor is sinking its rated maximum current from the termination under DC conditions - which is not the case if the termination network does not have a low enough DC resistance). Figure 3 graphically describes the physics of driving a transmission line with a typical TTL Tristate driver. Figures 4 and 5 show transitions of drivers on a real backplane, when empty and fully loaded, thus verifying the fidelity of this model. Note that since all the driver has to do is switch a total of around 100 mA into the line, this can in practice be done with the single transistor in an open collector driver, such as an Am26S12, providing the termination networks at the end of the lines have a low enough DC resistance to allow the order of 100 mA to flow, when a driver sinks the line to Vol. In principle, design of a bus is considerably simpler and more flexible if open collector drivers can be
T h
qi/i/D////////, Dielectric
-f
,,G
Figure 2. 3tripline and microstrip bachplaneconstruction
vol 6 no 9 november 1982
........... ...........
Vih Vi
/I//////H/I/I/I//////
......
.........
vo
]_
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: AC ground
Voh
- rising edge
/
si0.o,
Vterminator
..........
/I l//1/1/H/I/I/////////
'.2 V rain
troce ~'~:!:!:i:i:i:i:!4
..........
For Tristote drivers
This portion of the step is due to the stonding current in the line being switched by the lower tronsistor
This portion of the step is due to the current (source) switched by the upper f~onsistor (note this is generally very much srnoller thon the contributionby sink tronsistor)
Figure 3. Driving a bochplane transmission line (Tristate)
491
avfc,.
,2¢/~
3.2v-{
i 3-4v
...___L- 0.4 V
.---
-,
- - -
•
l--4.7V
-0.4 V
74LS640-1 Driving unloaded (lOOD,) backplane terminated 2 2 0 / 3 9 0 both ends
--0.4
V
26S10 Driving unloaded (100~,) backplane terminated IOOD, to 5V both ends
Figure 4. Driver comparison on an unloaded backplane used in this way. The protocol can be at the same time more relaxed, and more efficient; allowing signals from different boards to overlap during noncritical segments of the timing. Tristate drivers however, are not allowed this luxury because of the bus-fights which will occur. At this point the reader should note that the ability of a driver to switch the transmission line properly through the receiver transition region is a function of both the current it can supply, and the resultant Zo'; however, Z o' is a function of the capacitance presented to the line by the tranceiver driving it. Since heavier current transistors have larger chip areas (for the same Vol), and hence capacitance, one soon gets the impression of chasing one's tail. Tristate drivers suffer slightly more in this respect because there is already more circuitry hanging from the output pin than for an equivalent open collector device, slightly increasing their capacitance. The answer might be for the semiconductor manufacturers to optimize the device capacitance, current and receiver thresholds to more closely match the physics of driving a backplane bus, to a large extent AMD have already done this for the Am26S12, which has distinctly different receiver thresholds than standard TTL, for just this reason. At least one other semiconductor manufacturer is known to be developing a technique for reducing the capacitance of driver devices, making them more suitable for driving the backplane bus. In the meantime, it should be noted that another way to reduce the depression in Zo due to loading on the bus is to increase the distance between the loads, i.e. the pitch between connectors on the backplane.
Arbitration One of the most important decisions in bus design which affect the variety of architectures which can be supported
492
is the arbltration technique. In a multiprocessor system using a shared bus, if a processor wishes to gain access to the bus in order to make a transaction, it must first arbitrate in case there are other processors also requesting the bus, since only one module may drive the bus at any one time. The arbitration mechanism used may be either centralized, i.e. information on requestors is passed to a central unit, which makes the decision on who can have the bus next; or decentralized, where each module contends for the use of the resource with a unique priority and decides for itself if it is the winner. Daisychains are a hybrid of the two; normally, a single system controller receives the ORed result of all the requests on a small number of prioritized lines, and transmits a single grant on one of the grant lines. The modules tied to this grant line then either pass the grant signal through, if they were not requesting, or hold the line if they were, and exchange a bus grant acknowledge and bus busy signal with the system controller, in order to finally gain mastership of the bus. The problems with the daisychain scheme are numerous: they are inherently slow due to the sum of the propagation delays through each module; they require 'jumpers' on empty slots and thus disallow live insertion and withdrawal; they force fixed priorities dictated by module position on the backplane; and are sensitive to single-point failures, either on the system controller card or each of the installed modules. Centralized arbitration schemes are much more flexible in their allocation of priorities, but suffer from similar single-point failure problems to the daisychain schemes, and require a rather dedicated backplane in order to support the star connection of the request and grant lines to each potential requestor 4,s . The P896 Committee has chosen the completely distributed arbitration scheme because of the flexibility of
microprocessors and microsystems
2.9V
~
-
~ 0 . 4 V
4.5 V . . 4.0 V - - :
,--3.3V
_0.4 V
.
.
4.7 V
2.4 Vl--/~
74LS640-1
.
0.9 V - -
. - - - - ~ - - 0.4 V
Driving loaded (15pf/2Omm) backplane terminated 2 2 0 / 3 9 0 both ends
~'-------
0.4 V
26510 Driving loaded (15pf/2Omm) backplane terminated IOOD, to 5 V both ends
Figure 5. Driver comparison on a loaded backplane
priority allocation and the insensitivity to single-point failures (to support fault-tolerant systems). There are two kinds of distributed arbitration possible, decoded and encoded. Decoded arbitration simply means that each requestor on the bus asserts line n of m lines on the backplane, where n is the unique priority assigned to that requestor, and m is the maximum number of requestors in the system. By the use of another open-collector control line, all requestors are synchronized to look at all m lines to see who else is requesting, the requestor then compares" the highest priority line being asserted say line k, with the line it itself is asserting. I f n = k then that module claims the bus by asserting a bus-busy line. Encoded arbitration works in a similar manner, but its implementation is a little more complex. At a synchronization time, all requesting masters assert their unique binary priorities on the encoded arbitration lines, and at the same time, compare the value they are reading back off the lines. If the asserted priority each individual bit position is not the same as the value it has in its own priority at that bit position, the requestor removes that and all lower priority bits (keeping higher priority bits asserted). The effect is that after a variable 'arbitration time', only one requestor is left asserting all bits corresponding to its own priority, and it signals that it is the winner. The whole operation is rather like a ruck in rugby football when the ball is under a mass of players and they all push and pull until one of them picks up the ball. Figure 6 shows the circuitry required to carry out this function. The type of distributed arbitration the P896 bus will finally use is very much a tradeoff between the number of lines required and the length of the arbitration delay the P896 committee are prepared to accept. Either kind of distributed arbitration will allow the building of 'true'
vol 6 no 9 november 1982
multiprocessor systems, i,e. systems in which the control of the bus is not vested in any particular module. Metastable states
As system speeds increase, a problem of reliability of synchronization within the arbitration protocol and the bus transfer protocol arises. This is called the glitch, runt-pulse or metastable state6'7. Metastable states can never be eliminated, since their ultimate basis lies in quantum electrodynamic behaviour, and the Heisenberg uncertainty principle. However, providing the problem of metastable states is acknowledged and accounted for by the designer, their rates of occurrence can be controlled and their probabilities for producing fatal errors when they do can be made smaller than the MTBF of the rest of the system. It will be the responsibility of the P896 committee to design the standard so that designers can take maximum benefit of techniques to catch errors induced by metastable states or other means; collision detection on the arbiter, parity and CRC checks on the data are such techniques. Protocol
The protocol is probably the most difficult area in which P896 decisions have to be made. The main result so far, and one of the few decisions which has been unanimous, is to make P896 multiplexed. The savings in pin count, and driver/receiver devices for a 32-bit address and 32-bit data system are simply too great to consider the bus to be nonmultiplexed. Interestingly, the combined function of the bus, and the way in which the rest of the protocols may be implemented, can make the time penalty of a multiplexed bus insignificant. All operations in which the address is already known to
493
BP7'~
DIT
f
P7
P5 P4
P3 P2 PI Po
EQ LE
cA * Opencollector
the slave, do not require the transmission of the address. Examples are, the second cyle of a read/modify/write transaction and block transfers (where the address is simply incremented in the slave). Considering that P896 is intended to support more advanced systems of the future, the following derived requirements all indicate a strong tendency toward the more efficient transfer of blocks of information in P896 systems rather than random individual transfers in which the address/data multiplexing is more of an overhead: •
P896 is to be optimized for communication between processors rather than as an instruction fetch highway. Such communication implies the passing of complete messages rather than individual bytes or words of information. P896 will provide support for cache memories, indeed, if systems using P896 are to support up to 32 cooperating processors on the same bus, then either cache or some other method of reducing bus utilization factors, and hence bus bottlenecks will need to be used (cache memories operate on individual transfers to the local processor, with block transfers over the bus). • Virtual memory systems operate by splitting a fast backing store such as a disc, into pages, which fit neatly into the same size segments within memory. By keeping a working set of most recently used pages in memory, the system can appear to offer a very large virtual memory to the programmer, using a relatively small amount of physical memory and with only a minor increase in average access time. If a processor requests a page not currently in memory, the least recently used page may be written out to the disc, and the new page brought in. This again implies the need for a fast and efficient blocktransfer mechanism. New developments in dynamic RAMs, specifically the Inmos 64k x 1 devices, provide a 'nibble-mode' whereby the access time of the device depends on the address to the internal organization (16k x 4). Addressing a word
494
This implies that a significant improvement in apparent memory access time may be gained by using block transfers to and from memory modules which uses these devices.
Split cycle protocols
P6
Figure 6. Arbitration circuitry
on a modulo-4 boundary allows the internal access of four bits at once, and the external access by clocking the four bits out at high speed from the internal shift register
An aspect of the protocol being explored by the P896 committee is the possibility of using a write-only or splitcycle protocol. Split cycle transfers are an attempt to retrieve the wasted bus bandwidth when waiting for the access time of a slave. The bus master requesting the data transmits its own address to the slave during an initial transaction and simply gets off the bus during the rest of the slave access time. When the data are ready to be transferred, the slave then becomes a pseudomaster and addresses the original requestor as if it were a slave. In this way a split cycle protocol can accommodate slaves of varying speeds, such as memories with different cycle times and provide an effectively higher bandwidth capacity for the system. The problem of supporting task synchronization primitives, with a completely split-cycle protocol has not yet been resolved within the P896 committee. However, the requirement to support communication to remote buses through store and forward repeaters suggests that the split cycle protocol will be a considerable advantage, even if special allowances have to be made for task synchronization primitives in some other way.
Synchronous versus asynchronous The advantages of a synchronous protocol, where all processors and devices are run coherently with a single backplane clock, meaning that each processor clock is derived by dividing down or multiplying up the common backplane clock, are said to be as follows: •
the transfer speeds are improved because there are no delays associated with the finite propagation time of the signal through the system and the return of an acknowledge signal • reliability is improved by the apparent elimination of metastable states However, those advantages do not materalize in real systems since the gain in potential speed of the bus might not be directly usable as system speed, due to all processors having to be constrained by the backplane clock, and since the actual rate of occurrence of metastable states in a typical system is unknown. Using a synchronous protocol in a bus standard could have a disastrous impact on the flexibility and extensibility of systems. Compatbility between new generations of products over a 10-year lifespan would be assured only for homogenous multiprocessors, and then, only by running the system at the speed of the slowest module in the system. Whereas an asynchronous protocol can dynamically adapt its speed to that of the current bus master, instead of to the slowest potential master in the system. Two interesting ideas have been proposed to the P896 committee to overcome the handshake delay in an asynchronous protocol
Protocolpipelining may be defined as the temporal disassociation of the sender synchronization strobe from the receiver acknowledge strobe. Fastbus s already has this facility, known as the NH (no handshake) mode. This mode
microprocessors and microsystems
does not, as might first be thought, dispense entirely with the acknowledge strobes, it simply allows the sender to continue sending data as fast as it expects the receiver to be able to accept it, without waiting for the receiver acknowledge before continuing with the next cycle. Throughout the NH block transfer (no handshake mode's main use is for block-transfers), the sender 'counts', the returner acknowledges, and at the end of the transfer, compares this count with the number of cycles of data it has sent. In this way, the primary advantage of the acknowledge, namely that of 'confirmation of receipt' of the data, is kept, but the delays introduced into the protocol by the distance between the two devices on the bus are eliminated. Such a mode might also be considered as an 'anticipatory acknowledge' in the terminology of the Intel BxP bus9 . Signalpipelining is a technique not unlike that by which a synchronous bus achieves its performance. The technique, put simply, is the removal of data from the data lines, a HOLD time after the synchronization strobe has been asserted, rather than changing or removing those data only when the acknowledge arrives. However, while the sender may then immediately set up some new data on these lines, it does not proceed further until the previous acknowledge arrives. As will be appreciated, the effect of this technique is to introduce a degree of concurrency into the protocol, enabling the skew times to be masked by the travel times of the signals down the backplane and through the receivers. Essentially, this is a very similar advantage to that gained by using an entirely synchronous (clocked) protocol, but without any of the undesirable features of a fixed frequency clock. Typically, a 30 per cent improvement in throughput may be obtained by the use of signal pipelining on a backplane bus using an asynchronous protocol.
A D D I T I O N A L PROBLEMS Power supplies The cost of providing for specialized power supplies, both in economic and pin-count terms is greater than the advantage they give. Now that -5 V and +12 V are much less used than they were in the past, and that onboard hybrid and monolithic voltage converters are available, the P896 committee are likely to opt for a single rail 5 V-only system, using the virtually standard allocation for DC power on the DIN connector (and associated backplanes) of pins 1 abe and 32 abe as 0 V, and pins 2 abe and 31 abe as 5 V. Provision for additional 'signal ground' returns among the signals will be provided in order to reduce the transmission line discontinuity for signals passing through the connector, and provide additional ground returns for the high-current drivers near the connector.
Mechanical formats The choice of a single optimal card size for the P896 standard is an impossible one. Most probably, a small set of sizes will be 'recommended' from the full Eurocard family (figure shown in the STD bus paper in this issue), while 'allowing' the use of any of them.
Synchronization primitives Synchronization between cooperating sequential processes is the cornerstone of concurrent programming control in a multiprocessor system. Semaphores are a software technique for synchronizing processes running in a multiprocessor system, and involve the mal~agement of the task queue.
vol 6 no 9 november 1982
Most probably, there will be no need to provide hardware support in P896 for semaphores, since there are such a variety of ways of implementing them in Software. Lockvariables, however, are important to the backplane bus protocol. LOCK(v) and UNLOCK(v) are used to create a critical section of the semaphore code, in order to provide mutually exclusive access for one processor at a time. Lockvariables are also used to provide a controlled access to shared regions of data. Lockvariable operations must be indivisible, but a processor holding onto a bus during the period of this operation does not guarantee indivisibility in multiple bus systems. Also, the traditional methods of implementing lockvariables with a binary switch mechanism such as test-and-set, compare, compare-and-swap, swap, etc, are susceptible to processors which fail while in their critical sections. The P896 committee are investigating several methods of overcoming these problems, and will provide support in the final standard for all those which prove to be necessary to the efficient control of fault-tolerant systems.
CONCLUSIONS The P896 Futurebus project is intended to provide a 'significant' step forward in the performance and facilities offered to the multiprocessor system designer. By tackling the fundamental problems of bus design, the P896 committee will be creating a specification which will provide a level of device, manufacturer and technology independence hitherto unavailable in a backplane bus standard. In exploring the needs of future multiprocessor systems and at the same time, constraining the specification to be implementable with the best current technology, the P896 bus will be unique in its ability to satisfy the growing need for sophistication and performance in computer systems for a full decade. Information on the current activity of the P896 committee may be obtained from the chairman, Bob Davis, of Summit Computer Systems*.
REFERENCES 1 Catt, I, Walton, D and Davidson, M Digital Hardware Design CAM Publishing Vols 1 and 2 (1978) 2 Metzger, G and Vabre, J P Transmission Lines with Pulse Exitation Academic Press (1969) 3 Matick, R E Transmission Lines for Digital and Communication Networks McGraw Hill (1969)
4 Thurber, K J, Jansen, E D., Jack, L A, Kinney, L L, Patton, P C and Anderson, L C 'A systematic approach to the design of digital bussing structures' Fall Joint Comput. Conf. (1972) pp 719-740 5 Muehlemann, K 'Arbiters, priority access conflicts and the glitch problem' Proc. Euromicro North Holland (1979) 6 Chaney, T J 'Anomalous behavior of synchroniser and arbiter circuits' IEEE Trans. Comput. (April 1973) 7 Marine, L R 'General theory of metastable operation' IEEE Trans. Comput. (February 1981) 8 Gustavson, D B 'Fastbus status from a designers point of view'lEEE TransNucL ScL Vol NS-28 No 5 (October 1981 ) pp 3786-3800 9 Papenberg, R L and Rydhan, M 'Versatile memory bus handles mixed memories compatibly' Electron. Des. (1980) *IEEE P896 committee, c/o Summit Computer Systems,22685 Summit Road, Los Gatos,CA 95030, USA
495