Broadcast service implementation on a packet-switched network Ken Weaving discusses the implementation of a network of minicomputers at the CEC's Joint Research Centre
The implementation of a broadcast service as a basic service of the JRC packet-switched network is discussed, and the implementation of the network itself is described briefly. The network already contains novel features, such as the establishment of logical paths between the nodes using a multiline protocol Algorithms used to implement the service itself are introduced and the way in which the service is used by the network is described. The interface that the service gives towards applications programs is presented and its current use by applications programs is discussed. Future plans for the exploitation of this service are considered. Keywords: computer networks, broadcast service, packet-switching, batch-processing During 1980-81, an internal network (INET) was constructed at the Ispra Establishment of the Commission of the European Communities' Joint Research Centre. This development, which took place as part of a research project, is now being used as a production system with six computers interconnected; a seventh is at present being installed in Brussels. Each nodal computer has an identical architecture, performing the following functions: packet-switching and routing, • terminal concentration, •
Joint Research Centre, Commission of the European Communities, 1-21020 Ispra (VA), Italy 0140-3664/83/040178-07503.00 178
• application packages and programming languages. These computers are obviously not micros, but minis, having on average a direct access memory of 300 kbyte, and disc memory of 70 Mbyte. Each computer has its own set of functions to perform (see Figure 1), and sometimes the same function is performed by more than one computer. The resulting network is a packet-switching network with a routing algorithm that makes no assumption on the nature of the transmission media. The topology is deduced independently by each node on the basis of the incoming traffic. It could be considered as a 'recursive' system for several reasons. One reason for considering the system in this way is shown in Figure 2: two Solar machines (the nodal computers) can be connected by a point-to-point line, by some virtual links established over a public network (as will be the case for Solar 7 in Brussels which will be accessed via Euronet), or even by a satellite subchannel, and still be capable of providing the same communications services. The recursion is demonstrated by the fact that within the communications system, the X25 protocol can be implemented at different levels, i.e., at the user level, and also at the line handler level when the line in question is an X25 connection to a public network. A second example of the recursive nature of the system is the possibility of implementing bAN protocols such as basic block, byte stream, etc. on top of the broadcast service. Although this is not to be recommended for normal use, it could provide a useful tool for protocol research and development.
© 1983 Butterworth & Co. (Publishers) Ltd. computer communications
(25 Euronet
X25
TSO development
RJE
~
TSO
X25
RJE
facilities, this function has been suppressed. Instead, 'remote job entry' connections have been established between the Amdahl mainframe and the network. The various RJE connections are accessed using the network-independent file transfer protocol (NIFTP), which is also used to route the resulting listings to the appropriate printer. The same protocol (NIFTP) is used for transferring source code and binary files across the network. This facility is available to users through the extension of certain commands in the time-sharing system of the Solar. Remote job-entry stations (card readers and printers) are connected to the nodal computers and placed at strategic points in the establishment. This allows the traditional batch users of the mainframe a 'self -service' access that is more convenient than having to go to the computer centre each time they want to run a job. Even the TSO users are able to run background jobs and have the results printed on a convenient printer.
Special applications Figure 1. Network configuration RTC--reference and test centre for high-level protocols; RJEand TSO -remote job entry and time-sharing option, respectively (both on Amdahl mainframe)
NETWORK OVERVIEW Interactive computing facilities In its production environment, the network provides the internal users and external users (from Euronet) with a set of interactive computing facilities. Each Solar 16 has its own time-sharing system (TSF) which on one machine is used as a software development system, and on others supports different language processors: APL, Fortran and Basic. Access is provided to the timesharing system of the computer centre mainframe, an Amdahl 470/V8, by emulation of the 3270 terminal concentrator, and using a BSC connection to the communications controller. As this particular service is used frequently, it is available on two of the Solars. All these facilities are implemented on top of the X29 protocol following the philosophy of using international standards where they exist. This has been of enormous benefit, as those terminals attatched to the network are handled by software that supports the X3 and X28 counterparts of X29. These terminals therefore can not only access the local computing facilities, but also all the facilities of Euronet. In this way, all terminals have access to all interactive applications on the network. Commercial terminal concentrators (PAD) that are totally compatible with the existing hardware and software and can be introduced without disrupting normal operations are now being introduced into the network.
Several special applications have been written that run as tasks in the Solar, and that interface with the network software. In particular, there are programs for testing protocol implementations I-3 that form part of the Euronet reference and test centre for higher level protocols. These are driven interactively by remote terminals using the ESP20 (or 'triple X') protocol, and allow the user to control a DEVT or FTP connection. In connection with this work, a small mailbox service has also been produced that allows its users to exchange messages and comments between themselves. In summary, the facilities that are available to the users of the network are as follows: • Services on Amdahl O remote access to the batch system, o access to the time-sharing service (TSO) o access to databases ECDIN and EUROCOPI
Higher level protocols
Pocket level Broadcast Frome level
opology
N
Line level
Batch-processing facilities Although the Solar does provide batch-processing
vol 6 no 4 august 1983
Figure 2.
Software architecture
179
• Services on network o reference and test centre for high level protocols o electronic mail service o solar time-sharing facilities (TSF) that include BASIC, FORTRAN,APL and PASCAL
COMMUNICATIONS
SYSTEMS
Within the framework of this presentation, the communications system can be considered as being levels 1-3 of the ISO architecture. The higher level protocols that are used i.e. ESP204, DEVT s, and RPI~, all interface directly with the network. The network (Figure 2) may be divided into the following components: • packet level : managing the interface with its user programs and following X25/3 specifications (sequencing, flow control, etc.), • frame level : managing the logical lines with the remote nodes using the multiline protocol, • topology : defining the network structure (access paths to nodes), • router :selecting the best physical line to reach a node, • line handlers:sending and receiving frames on physical lines. A virtual call service is provided by the network which has an end-to-end significance by being built on top of the logical line concept. By usingthe Transpac multiline protocol 8, end-to-end error control and flow control is guaranteed between each pair of nodes. I n this way, an error-free logical line is established between each node and all the others, which insulates the higher levels from the peculiarities of the physical topology. Packet level The packet level of the network, which interfaces to its user processes, the frame level, and logically with the remote packet levels, has been implemented following X25/3 specifications. The interface that it presents to its user processes gives all the functionality of the X25 virtual circuit service, and is identical to the interface of the actual X25/3 implementations used for driving the X25 connections to the external networks. In the initial state, all frame levels are down, but the packet level is open and remains permanently in an open state in order to process local communications. This is necessary as some higher level protocol implementations do not perform any switching. Even for local connections where a terminal connects to an application program on the same computer, the connection passes through the packet level. The packet level refuses requests for virtual calls on logical lines that are not in the up state, and will close any virtual calls existing when a logical line goes down.
where an instance of the protocol automation exists for each active remote node. Each node is linked to another node by one or more paths through the network, which is analogous to the physical line in the multiline protocol. The use of variable routing in the physical network means that different frames of data on the same virtual connection can take different paths through the network, thereby completing the analogy to the multiline situation. The service provided in the packet level is a connection-oriented service that gives no loss of packets, no duplication, maintains sequencing, and gives some control over the flow of data. In order to provide this service, the frames are numbered when sending. Frames that are received must be acknowledged, or after some time the sender will resend the unacknowledged frames. A sender w i n d o w represents the number of frames that the correspondent is able to receive, and a receiver w i n d o w is used to order the frames, eliminating those that are duplicated etc. Acknowledgements are sent inside the header of the data frames where possible. Initially, all frame levels are down, and RAZ frames are sent on each logical line at regular time intervals. When a node receives a RAZ it replies with CRAZ (confirmation). The exchange of these control frames brings a logical line to the up state. The logical line will go down either on a frame-reception timeout, or on excessive retransmissions of the same frame. The existence of these logical lines is extracted from the topology tables.
Topology Topology is a service that allows each node to take its decisions for routing packets. Initially, a node is not aware of the existence of any other nodes, but it is aware of those physical lines emanating from it. It therefore sends a command frame on each of these lines to indicate that the node can be accessed on these lines. When a node receives such a message, it broadcasts it on the remaining lines if the received message contains new information. As all nodes perform these actions, it can be seen that after some time each node will have the knowledge of the network configuration. These command messages are sent by each node when the configuration has changed, but also on a regular time basis. The absence of the regular reception of this message leads a node to the conclusion that another node has gone down. On the basis of this information, topology maintains two tables. One called status, contains the numbers of the nodes that are to be crossed in order to reach a node by the shortest path. The routing table contains the line number leading to each node by the shortest path. The tables assume a maximum of 16 nodes in the network, which is a design decision and not a constraint.
Frame level
Router
As previously stated, the network's frame level is an implementation of the Transpac multiline protocol
The router process is responsible for putting 'level 2' frames into the destination queue of the proper line
180
computer communications
handler. It also checks that the maximum transit time of a frame in the network has not been exceeded. It discards those frames that are too old, knowing that the error recovery will be performed by level 2. It will also drop a frame if it finds that no line leads to the destination node - - a situation that can occur during a topological update. Once again, the recovery from the lost frame is performed by level 2. For each frame to be sent, the topology tables are consulted, and the physical line chosen that gives the shortest path to the destination node. Before putting the frame into the queue for this line, however, the queue length is checked to see how many frames are alreadywaitingto be sent on this line. If the result is less than a given value (an implementation parameter), then the frame is put into that queue. Otherwise, the queue is chosen that gives the smallest value for a function of the path length and queue length. If the other lines would give an even longer delay, then the frame is discarded.
Line level The line level is a set of drivers for the physical lines that connect, directly or indirectly, the various Solars that constitute the network. In most cases it is simply an HDLC line driver, but it could be an implementation of X25 in the case of nodes that are interlinked by an X25 network. This latter example is at present being implemented in order to develop a computer in Brussels which logically is part of the JRC network.
Gateways Gateways exist at the interface between the network and external networks in order to perform address mapping and possibly protocol conversion, although the latter is not necessary at present. A gateway is not required where the external network is used to interconnect nodes as described above, but where the external network can be seen as a separate entity at the user level. Each gateway has a unique number, so that call requests that begin with a 0 will always be passed to gateway 0 by the network. As one digit (currently 8) is reserved as the first digit of the network node addresses, this limits the network to having not more than nine gateways.
BROADCAST SERVICE Function Broadcast is a subservice of the network that is complementary to the classical communications system described. The unit of communication in this service is a 'page' containing 128 byte of information. The network can contain up to 128 of these pages, which are identified by a number in the range 0-127. For each page, there should be a'page manager' that is responsible for generating and updating the contents of a page. The broadcast service distributes the contents of a page to all nodal computers each time the page
vol 6 no 4 august 1983
manager modifies the contents. Although a page manager is the only process that should up-date a page, any program in any node may read it. Broadcast does not guarantee, however, that a page just read by a process is the latest version in the network of that page. It can only declare that it is the latest version of which it is aware. This is particularly true at the time when a new version of a page is being distributed throughout the network. For this reason, the service provided cannot be likened to a common memory between the computers.
User interface The user interface is extremely simple, consisting of three user commands and a single response. The commands are: • update page, which updates a page and advises broadcast that the issuing process is the page manager; it is ignored if another process is already manager for that page, • read page, which requests the contents of a particular page, • stop being page manager, which advises broadcast that the issuing program is no longer manager for the indicated page. The single response is the reply to the 'read page' in which the contents of the page are given if the page exists, otherwise an indication is given that the page does not exist. From these commands, it can be seen that there are periods when uncertainty would arise in the management of a page, particularly when a manager relinquishes responsibility for a page. In these situations, broadcast considers it the responsibility of a higher level protocol to synchronize the various process involved.
I mplementation The implementation of the broadcast service is the same in all nodes, although the behaviour of the process in a particular instance depends on whether or not it is the'page manager' of the page in question. For each page, the following information is memorized in each node: • PNO: the page number, @ PVE: its version number (modulo 256) bywhichan old version can be detected from a new version, • p: contents of the page, • PM: an indicator to show if this node is manager of this page. The implementation is based on the node being aware of its physical lines to other nodes, and therefore a bit file ACK (L, PNO) is kept with a bit per line per page. These bits record the fact that the node at the other end of the line knows the current version of the page. When necessary, a node must inform its neighbouring nodes of the state of its broadcast pages. This is done by sending messages M (PNO), a message giving information about one page and including PNO, P(PNO), PVE(PNO).
181
When receiving a message, it must be acknowledged. This is done by returning an acknowledgement packet that has the same fields as the page message without the contents (the page number and version are significant).
Yes
I
Page manager algorithms
ACK(L, PNO) = I
]
Update
I
When an update command is received from the process that is the manager for a page, the broadcast mechanism must be initiated. The action is: • update P(PNO) • increment PVE • for all L, ACK (L, PNO) = 0 Send M (PNO)
I
I I
+
Refresh It is possible, because of the inaccessibility of nodes in certain situations, that not all nodes will consistently have the same version numbers of a page. For this reason, those pages for which the node is responsible are regularly broadcast on a time-out basis. This is referred to as the refresh mechanism.
Yes
Yes
Yes
Message received A message M (PNO) has been received on line L and this node is the manager of page PNO. The action taken depends on the version number PVE (M) received. If PVE (M) = PVE, then it indicates that the other end of the line knows the latest version, therefore:
Update M (PNO)
I
~o
For oil other lines ACK (L', PNO = O) Send M (PNO)
ACK (L, P N O ) = 1 send back acknowledgement if PVE (M) < PVE, then the line knows an old version and the new one is sent to it: Yes
ACK (L, PNO) = 0 send M (PNO, PVE,P) on L
I
ACK (L, PNO) = I
I Repeat
Send ACK to L
[
Some packets (messages or acknowledgements) may get lost. At a regular interval (less than the refresh timeout) after an update, the message M (PNO, PVE,P) is sent to those lines that have not yet acknowledged this PVE i.e. lines where ACK (L, PNO) = 0
Yes
Receive acknowledgement If PVE (ACK) = PVE then ACK (L, PNO) = 1 else ignore packet.
Not page manager
Send M (PNO) to L
b
Receive acknowledgement
Figure 3.
If PVE (ACK) = PVE then ACK (L, PNO) = 1 else ignore packet.
message received (a) page manager; (b) not page
182
Algorithms of the broadcast function--
manager
computer communications
Message received When a message is received on line L. If (PVE(M) = 0 and PVE='FF) or PVE (M) > PVE then it is a new version of the page update tables with M(PNO) for all L' not equal to L ACK (L',PNO) = 0 Send M(PNO) end ACK (L, P N O ) = 1 send acknowledgement to L If PVE(M)= PVE then it is a known version of the page ACK(L, PNO) = 1 send acknowledgement to L If PVE(M) < PVE then L does not know latest version ACK (L, PNO) = 0 send M(PNO)
Repeat
This is the same function as the repeat section of the page manager algorithms.
Refresh
It is known that a page manager sends the latest version of a page at fixed time intervals. If a node does not receive this refresh for three consecutive time intervals it deletes that page from its memory, considering that either the page manager is dead, or that it has become inaccessible. In either case, it would be dangerous to maintain that particular page.
Current uses of the broadcast service Use by the network The network uses pages 0-16 for its own purposes. Page 1 6 is used for giving operational parameters, timeout values etc. When each node is initialized it takes default values that are defined at system generation time. When page 1 6 is received, however, these values are updated. In this way, an operator task can update this page, and hence the values throughout the network, in such a way as to perform a tuning of the whole system. Pages 0-1 5 are allocated one to each possible node, and are used mainly to broadcast the names of services available on that node. When a service (time-sharing for example) is activated, it declares its availability to the network, which adds the service name to the node's page. This update causes the page to be broadcast so that all nodes, by reading pages 0-1 5, are aware of the services that are available at any point in time, and on which nodal computer. This removes the need for a'name server', and is a much more flexible and dynamic solution to the same problem. For example, a service may be temporarily moved from one computer to another, for operational reasons. There are no tables to be updated, as the service
vol 6 no 4 august 1983
declares itself on the new node, and all other nodes are informed. These reconfigurations can be performed without interfering with the network's operation. In this way, the fact that he is working on a network is usually not realized by a user who simply declares the name of the service he requires, and the network, working from pages 0 - I 5, finds the node concerned and connects the user. This is also valid for external users, whose call requests arrive at a gateway. The gateway extracts the name of the service required from the call packet, and makes its own call request to the network for the service. The external user need not worry about the addressing structure, and the gateway need not worry about address mapping. A gateway is considered as a service (of a special type) in that a gateway will also declare itself to the network giving its gateway number. Database access One of the biggest groups of users of the network, at present, are those who access the time-sharing system on the Amdahl mainframe, from the network itself and from Euronet. Because of this heavy use, there are several access points from the network to TSO, all of which are declared to the network as having the same service name. This can lead to problems if a connection is interrupted by the network or Euronet during a work session. To solve all kinds of strange problems, each physical TSO connection is managed by an application program (ISTSO) that allocates each user a particular TSO port for his work session. Even if the user's network connection is broken, the port is kept open for a fixed length of time giving the user a chance to reconnect and continue his session, It cannot be guaranteed, however, that with several access points available on the network, that a user will reconnect to the same access point. For this reason, each ISTSO program uses a broadcast page to inform the other ISTSO programs of the users that currently have ports open through its access point. In this way, a check is made when a user opens a connection with ISTSO, that he does not already have a port open at another access point. If he does have a port open, then the user is told to connect to the other system (the address is given to him) where he can continue his previous session. It would also be possible, using the same information from the broadcast pages, to redirect a user and get him to use an access point where more ports are available. Although this is not being done at present, it is intended to introduce this kind of load distribution mechanism in the near future.
CONCLUSIONS The broadcast service, such as it has been implemented on the network promises to be a useful and powerful tool. Given the proper control and synchronization primitives, it could easily be the basis for a 'network memory' system which is necessary to support distributed applications. Much work still needs to be performed in this area, and the multicast (N to M)
183
aspects of local area and satellite networks are waiting for imaginative ideas in order to be properly exploited. It is intended to investigate, not only the implementation of multicast communications systems, but also their use in such areas as document distribution, teleconferencing, process control, and of course, distributed systems.
REFERENCES 1 2
Weaving, K 'Euronet reference and test centre', Comput. Commun. Vol 3 No 5 (October 1980) Weavin& K'The DEVT test facility-- user manual' JRC Internal Report (April 1980)
184
3
Rey, J C'The RPP test facility-- reference manual' JRC Internal Report (September 1981) 4 Facchin, Jegu and Thomas 'Guidelines for the implementation of the X29 part of ESP20 by Euronet host operators' CEC DG XIII Document (August 1978) 5 'Data entry virtual terminal for Euronet' CEC DG XlII Document (November 1978) 6 'Remote printing protocol for Euronet' CEC DG XIII Document (February 1978) 7 Couderc and Curioni'lNET reference manual' TITN Study No 2/EUR 83/11/01 (February 1981) 8 'Protocole de ligne en procedure multiligne' Specifications techniques d'utilisation du r~seau transpac 3/5.2 and Annexe P3 (December 1977)
computer communications