A UNIX-based gateway to distributed database systems

A UNIX-based gateway to distributed database systems

A UNIX-Based Gateway to Distributed Database Systems W. J. Barr Bell Communications Research, Morristown, New Jersey During the 1960s and 1970% corpo...

1MB Sizes 1 Downloads 129 Views

A UNIX-Based Gateway to Distributed Database Systems W. J. Barr Bell Communications Research, Morristown, New Jersey

During the 1960s and 1970% corporate productivity was significantly enhanced by the development of database systems to meet immediate corporate needs. During the 198Os, the computing focus has been on integrating these systems into corporate networks and removing manual functions by initiating host-to-host communications. Where all involved systems are provided by the same vendor, this vendor typically provides a networking solution to meet these needs. In those cases where database systems have been developed on hosts from multiple vendors, the corporate network is more difficult to implement. This is especially true when minicomputers are involved. This paper describes a gateway system which is designed to allow dissimilar systems to establish host-to-host communications. The key capability in this system is the ability to perform protocol translation across all seven layers of the IS0 reference model as opposed to the traditional link and physical layers.

1. INTRODUCTION

In the 1960s and 197Os, the focus of computer system developers was to implement database systems that improved corporate productivity. These systems were typically implemented on large mainframes using vendor supplied Data Base Management Systems (DBMSs) and were a major factor in improving corporate productivity during these years. As we entered the 198Os, a new focus began to emerge; the integration of these database systems into corporate networks. For those corporations whose systems ran on hardware from a single vendor the vendor networking solutions provided the ability to initiate host-to-host communications and eliminate manual transaction entry into multiple systerns. For those corporations whose systems were imAddress correspondence to W. J. Barr, Bell Communications Research, 435 South St., Morristown, NJ 07960.

The Journal of Systems and Software 6, 225-235 0 Elsevier Science Publishing Co., Inc., 1986

(1986)

plemented on mainframes from multiple vendors, the problem was and is more complex. A further complication results from the popularity of UNIX’ [l] and its use on minicomputers from a wide variety of vendors. One of the strong advantages of UNIX is that it is very easy to implement useful functions quickly. In some cases, unplanned systems have been deployed because an aggressive developer has built something useful and found a customer base. In other cases, UNIX-based database systems have been specified to take advantage of their attractive development environment and meet critical operational dates. In either case, the focus in many corporations today is integrating a heterogeneous network. This focus can be expected to continue as other specialized network components, such as LISP machines, are created. This paper describes an approach to providing hostto-host communications between dissimilar and incompatible systems. First, the functionality required of a gateway to provide the needed capability and place it in the perspective of a deployed heterogeneous network system called FACS (described below) will be described. Second, a software architecture is proposed which implements this functionality. Finally, the Work Manager (WM) is described; a system that implements this functionality for FACS.

2. BASIC FUNCTIONALITY

The concept of the Work Manager (WM) was created as part of the development of Release 10 of the Business Information System Customer Services Facility As‘UNIX

is a trademark

of AT&T

Bell Laboratories

225 0164-1212/86/$3.50

226

LOCALLY DEVELOPED

W. J. Barr

SOP

?

l

COSMOS PDPll/70OR AT&T 3B2OS +

SOAC L&S SPERRY

Ii00

&

BANCS THP

-

l

WORK MANAGER AT&T 3820s

0 l

+ COSMOS POP if/K) OR AT&T’ 3B2OS

Figure 1. The FAG architecture.

signment and Control System (BISCUS/FACS), or FACS for short; an inventory and assignment system for the Bell Operating Companies 12, 31. FACS was initially developed by Bell Laboratories and was transferred to Bell Communications Research as a part of the AT&T Divestiture. The distributed architecture of FACS is shown in Figure 1. The Service Order Analysis and Control (SOAC) and Loop Facility Assignment and Control System (LFACS) components of FACS are implemented on Sperry 1100 hardware using the Sperry Transaction Processing (TIP) software package. The Computer System for Mainframe Operations (COSMOS) com~nent is a UNIX-based system which is available on DEC 11170 or AT&T 3B20S hardware. These components communicate via the WM which is implemented on the AT&T 3B20S. The WM communicates with the Sperry processor through a Bell Administrative Network Control System (BANCS) Terminal Handling Processor (THP) or equivalent message switch [4, 51. The THP is implemented on Control Data hardware using software provided by AT&T Bell Laboratories. Finally, to complete this heterogeneous network, the Sperry processor communicates with a system generically referred to as a Service Order Processor (SOP) which is developed by the local Bell Operating Companies. In various parts of the country, the SOP is implemented on IBM, Sperry, Honeywell, Burroughs, or other equipment. The basic function of the WM is to provide gateway services between the synchronous mainframe world and the asynchronous UNIX-based world of the minicomputers. This section focuses on the functions required to provide these gateway services.

2.1 Role in the Architecture The role of a gateway processor in a heterogeneous network is to provide all communications processing to allow host-to-host communications between any two hosts in the network. Clearly, where two hosts are immediately capabfe of communicating directly they should do so. Equally clear, the gateway should, as much as possible, concentrate on communications oriented functions. That is, the role of the gateway should be limited to getting a message from a source to a destination without modifying message content. A second major role arises from its architectural position at the center of a many-tomany terminal access network. In our FACS example, the THP provides a terminal sharing capability for all terminals on its network to all hosts on its network. However, the COSMOS hosts are not on the network, they are connected to the network indirectiy via the WM. The WM prcvides terminal sharing capabilities among multiple COSMOS systems by supporting transaction entry using logical destinations. In fact, the logical destinations used for COSMOS addressing are the same as are required for transaction processing on the Sperry host. Thus, the multiple minicomputers supported by the WM appear to the terminal user, the THP, and other hosts as a mainframe.

2.2

Host-to-Host Communications

In order to provide reliable host-to-host communications between the hosts on the network, the gateway must provide the following functionality: 1. Protocol Conversion. This is the key capability of the gateway. Protocol conversion must be carried out on all seven layers of the IS0 reference model (See Fig-

227

A UNIX-Based Gateway

Figure 2. Seven-layer IS0 reference model.

7

APPLICATION

4

6

PRESENTATION

4

5

SESSION

4

4

TRANSPORT

m

3

NETWORK

4

PEER-TO-PEER pb

PROTOCOLS

p,

______,

APPLICATION

7

PRESENTATION

6

SESSION

5

TRANSPORT

4

l

NETWORK

3

\ 2

DATA LINK

*

b

DATA LINK

4

+

DATA LINK

2

4

PHYSICAL

*

)

PHYSICAL

4

+

PHYSICAL

1

HOST A

me 2). Conversion of less than all seven layers places protocol compatibility requirements on the hosts that counteract the purpose of the gateway. The functional boundary for protocol translation has, somewhat arbitrarily, been drawn at the application layer and performing syntactic or semantic translations on the content of the messages has been excluded. This philosophical distinction allows us to maintain the gateway as a sophisticated message switch rather than a language translation system. Store and Forward Message Switching. The gateway must be a Store and Forward Message Switch with powerful message recovery capabilities. This allows the gateway to serve hosts that do not have message recovery capability. In particular, reliable message recovery is difficult to implement in UNIXbased systems. Priority Queue Management. The gateway must also provide queue management capabilities for hosts that either do not provide adequate queue management or react badly under extreme load. Priorities are important in networks where the various hosts are mismatched in terms of performance capacity. UNIX-based minicomputer systems tend to react poorly under flooding conditions and the gateway can act as a “sponge” to protect them from network overload conditions. This is a logical but not obvious extension of the store and forward capability. Message Recovery. Although this function can be considered to be part of the Store and Forward functionality, in this environment there are stronger recovery demands than are typically found in a store and forward switch. The gateway, and the collection of minicomputer hosts it serves, is viewed by its mainrame partners as a peer with mainframe equivalent queueing and recovery capabilities. It is the

INTERMEDIATE NODE

HOST 0

role of the gateway to compensate for any weakness in these capabilities in the minicomputer hosts. Failure condition audits with automatic message recovery are specified to ensure that a host does not lose a message as a result of an outage. Assigning recovery responsibilities to the gateway is a key architectural decision. In light of the queueing responsibilities of the gateway, however, placing the recovery responsibilities in the gateway is more robust and easier to administer than distributing recovery responsibilities among the multiple hosts. Centralized recovery is easier to administer than distributed recovery.

2.3 Terminal Sharing Capabilities What happens to a terminal user in this type of “system of systems”? Regardless of the way the databases are implemented, some users on the network will need to access more than one of the database systems, preferably from the same terminal. Regardless of the physical distribution of the databases, the user would like to “see” a single system. The role of the gateway in this architecture is to act as a surrogate host performing session management and other valuable services for the user, distributing messages to appropriate hosts as desired. This role allows terminal sharing among the hosts served by the gateway, hiding, to some degree, the distributed architecture from the terminal user. In order to provide interfacing services to terminal users as well as hosts on the network, the gateway must perform the following functions: 1. Protocol conversion: Priority Queueing and Store and Forward Switching as in the host-to-host case. 2. Session Management to “convert” between the terminal,/user definition of a “message” and the host

W. J. Barr definition of a message. The hosts served by the gateway should be independent of the device characteristics and limitations of the user terminals. For example, it is the gateway’s function to paginate host messages into 24 line terminal screens. Functions involved in session management include the ability for the user to create a “file” for storing and editing multiscreen input transactions and the ability to assist the user in viewing multiscreen output messages (such as page-forward and page-back). 3. Print server services to allow messages from alihosts served by the gateway to be printed on the same printer. 4. Message routing based on logical destination. If we want to present the user with an integrated view of this distributed system, we must deal with logical rather than physical entities. 2.4 Forward Looking Functionality One of the most obvious aspects of data communica-

tions today is that networks are changing rapidly. The corporate networks of the past, largely BISYNC or asynchronous, will evolve into one or more of the following:

host-to-host services and integrated user views of such a network is an ideal role for the gateway. The second opportunity is to assist in the transition from the old to the new communication network(s), Large corporate networks are not likely to “flash cut” anything. Yet, increasing load and obsolete hardware will require the network to change. A gateway that can switch traffic between a BZSYNC network and, for example, an X.25 network, allows management to stay in control of the transition.

3. SOFTWARE ARCHITECTURE OF A GATEWAY MACHINE

Figure 3 describes the software architecture of a gateway machine. In this section, each of the software modules mentioned will be discussed according to the functionality it provides. The gateway machine is a sophisticated message switch. The decom~sition of the functions into software modules cannot lose site of this identity. Major implications of this identity include: 1. Protocol layering. The IS0 Reference Model (Fig-

Virtual Circuit Switches (VCSs)

ure 2) provides a model of protocol layering. The software architecture must support this layering. 2. Common message format. Since a major function of the message switch is full protocol conversion between dissimilar hosts, all messages must be converted to a common format which allows straightforward translation into any supported protocol.

Managing the evolution provides two opportunities for a gateway. First of all, it is likely that the network of the future will contain a combination of transport media. In particular, we are likely to see a combination of LANs, MANS, or VCSs for short-haul traffic and Packet Networks for long-haul traffic. Providing the

These two implications are complementary. Proper layering of the protocol software facilitates a filtering ap preach to translating a message into a common format. As each layer of the protocol software is traversed, an incoming message is gradualiy transformed into the common format.

Packet Networks (X.25) Local Area Networks {LANs) Metropolitan Area Networks (MANS) and Wide Area Networks (WANs)

APPLICATION

Figure

3. Software architecture of the gateway.

229

A UNIX-Based Gateway 3.1 Link Level Protocol Machines The first two levels of the IS0 protocol model are the physical and link layers. These protocol layers can be implemented in outboarded Link Level Protocol Machines (LLPM) with a Direct Memory Access (DMA) interface to the gateway software. This approach enforces layering by placing a physical boundary between the link and network layers. In addition, there are performance benefits associated with putting as much of the protocol on an outward processor as possible. The selection of the link level as the highest level to be outboarded is a pragmatic assessment of the state of the art and an expectation that this is the most cost effective boundary given today’s minicomputer technology. There is no need to stop at the link level; the more software that can be outboarded the better. In fact, the state of the art is moving rapidly in this arena. Network interfaces are increasingly being implemented in hardware or firmware. It is not unreasonable to expect this trend to continue and expect implementations of, for example, the X.25 packet layer to become available on vendor provided boards. 3.2 Message Communications Module The Message Communications Module (MCM) is the low level switch in the gateway. It performs any mappings between the physical network of the LLPMs and the logical network used by the rest of the gateway software. In addition, any intermediate message buffering which needs to be performed in support of the LLPMs is performed here. This is the module at the other end of the DMA interface to the LLPMs. As a message enters the gateway from an LLPM, the MCM switches the message to the network handler responsible for that network. Similarly, as a message is sent out of the gateway, the MCM initiates the DMA transfer to the LLPM responsible for the destination network. Note that the switching of a message which requires no protocol conversion above the link layer and has no store and forward or queueing requirements need be processed only by the MCM and the LLPMs. 3.3 Network Handlers The heart of the gateway is the collection of network handlers. Conceptually, there is one network handler for each type of network, independent of the number or type of hosts on the network or the number of lines serving that type of network. The primary function of these network handlers is to create a normalized message that has no relationship to the network that delivered the message and is indepen-

dent of network considerations. The network handlers are responsible for the higher level protocol implementations (IS0 levels 4-7) for the network. The functions of a network handler are dependent upon the network. There are two basic types of network handlers: Generic Network Handler. A Generic Network Han-

dler interfaces with networks that support standard protocols. It is designed such that any host on the network meeting the standard can talk to any other host on the network meeting the standard. Generic syntactic translations such as header format changes between standard networks are performed in the network handlers. Application Specific Network Handler.

Application Specific Network Handlers interface to nonstandard hosts. They perform all the translations of high-level protocol to allow nonstandard applications to communicate with applications on the standard network(s).

The network handlers can perform a number of functions or services for the network or hosts on the network. Message Routing. This service is most commonly a

many-to-one logical to physical mapping. This allows applications on the networks to decompose their entities in a convenient manner and rely on the gateway to map logical entities to their physical hosts. In this software architecture, the function of the network handler is to notify Switch Control of routing changes in its network. Priority Queueing. Most network switches are FIFO

switches. That is, messages are forwarded by the switch in the sequence received. In heterogeneous networks, however, it is possible for power mismatches to exist. For example, an application on a mainframe host could, on a burst basis, flood an application on a minicomputer host. In this case, the gateway offers priority queueing services to allow the gateway to absorb low priority traffic until the minicomputer host has resources available to process it. Load Control. These services are related to priority

queueing. In addition to absorbing low priority traffic for hosts, the gateway offers load control capabilities. These services are implemented using application level acknowledgments and window sizes in addition to link level acknowledgments. One risk of building a gateway like this is that the potential for “creeping featurism” is unbounded. Because of the nature of this gateway, a great deal of pres-

230

W. J. Barr

sure can be created to incorporate additional services, such as application level data translation, in the gateway. In fact, this can be a very cost effective way to resolve interface design problems. In order to avoid the need to pervert the network handlers when these issues arise, the software architecture provides a place for these features as application translators. This decomposition exists to place all of the volatile ugly stuff in a single isolated code segment. The application translators are not switch through processes. Rather, the translators are “called” by the control process. The resulting message that is returned is then forwarded through the switch without additional changes.

mitted and makes message status updates to the disk to include statistics such as the time stamp at which the message was actually forwarded to the destination. As the implementation point of the Store and Forward Message Switch, routing services are implemented in the Switch. The ability to support user friendly local aliases (as opposed to hard to remember network unique IDS), user transparent dynamic routing changes, and ambiguous destination names (by, for example, taking into account the message source), are all important gateway services. In addition to its function as a message switch, Switch Control implements system administration tools. These tools include capabilities to replay transmissions to or from any device or host and inquire about the status of any message or any queue. Additional tools are provided to manipulate the queues and initiate or terminate conversations with any device or host.

3.4 Switch Control The module in control of the whole process is ‘Switch Control,” or the “Switch.” It is this module that actually implements the Store and Forward Message Switch. Its primary functions are to switch messages between the network handlers and to implement all message recovery functions including safe storage of all messages to disk and, if desired, to tape. A subtle, but important, decomposition exists here. When a message is received by the gateway, the Switch determines that the destination of the message is “legal” in the sense that a network handler exists for that destination. If not, a negative acknowledgment (nak) is returned to the originating network handler and the message is either nak’d back to the originating host or placed in an error hold queue depending on the specific network handler involved. Legal messages are safely stored by the Switch and a positive ac~owledgment (ack) is given to the originating network handler for return at the application level (IS0 Ievel 7) to the originating host. The Switch is also responsible for queue management. It identifies messages to be dequeued and trans-

WORK

t

MANAGER

BANGS HANDLER

UNIX BASED

GATEWAY

Figure 4 describes the architecture of the Work Manager (WM), an implementation of the architecture described above. The WM is currently operational in 24 sites across the country providing the services described in this section. While the generic architecture has been followed in spirit, implementation considerations have caused some compromises. Where appropriate, these are discussed along with implications and plans for changing the implementation. 4.1 Hardware and Software Environment

The WM is implemented on the AT&T 3B20S minicomputer. This “super-mini” is configured with 3 MB of memory and 2 removable 300 MB disks. The operating system is AT&T UNIX System V.

CONTROL

I

I

4. THE WORK MANAGER-A

I TERMINAL HANDLER

COSMOS HANDLER

Figure 4. Software architecture of the WM-a

based gateway. HIGH/LOW

LEVEL

COMMUNICATIONS

MODULE

L

1 I

I 1 I

BlSYNC

1 f

1

I

DDCMP

I 1

1

I

I

x.25

I

I

UNIX-

231

A UNIX-Based Gateway 4.2 Outboard Communications

Processors

An important consideration in choosing the 3B20S was the availability of programmable communications boards. The 3B20S is available with the UN53 synchronous communications board which operates under the control of the TN82 Data Link Peripheral Controller. The UN53 provides 128K of Random Access Memory and four Universal Synchronous/Asynchronous Receiver and Transmitters (USARTs) providing RS-232 interfaces for communications. Each USART has a Polynomial Generator Checker (PGC) to perform block check calculations. Two link level communications processors have been written for the UN53: Bisync and DDCMP (Digital Data Communications Protocol developed by Digital Equipment Corporation) [6]. Work is in progress to also support X.25 on the TN82 standing alone without the UN53. Other link level protocols, such as a block mode asynchronous protocol, are under consideration to expand the range of hosts with which WM can interface.

4.3 High/low

Level Communications

Module

As described above, level two of Bisync was not initially implemented at the board level. The level 2 functions of Bisync were implemented in the High and Low Level Communications Modules (H/LCM). As intended, it also implements level 3 functions for all supported protocols including: 1. Multipoint BISYNC (for IBM 327O/TTY 4540 Terminals and Printers) 2. Point-to-Point BISYNC (Host-to-Host Interface to the BANCS Network) 3. DDCMP (for COSMOS Interface) Note that these level 3 services are frequently provided by operating systems. Under UNIX, however, the level 3 services are not provided by the operating system. Additionally, DDCMP is a “foreign” protocol on the 3B20S so one would not expect it to be supported at all. Finally, the System V UNIX kernel does support the X.25 packet layer. We expect to be able to utilize the kernel software to simplify the X.25 implementation. This will also give us some experience with a “pure” link level interface to the H/LCM. The H/LCM also performs the functions indicated in the architecture described previously; it translates line/device numbers to Logical Terminal identifiers (LTERMS) and provides the interface between the outboard processors and the Network Handlers. The H/LCM was also chosen as the implementation point to manage the system console. Attached to the H/LCM is the WM Simulator. Ini-

tially developed as a testing tool, the simulator has proven to be a useful capability for on-line diagnostics. Standard simulator scripts delivered with WM include internal loop-around transactions useful for checking software sanity and external loop-around transactions for checking out communications hardware including data sets and communications lines. 4.4 Network Handlers The WM currently supports three types of networks: a BANCS network, a COSMOS network, and a Bisync Terminal Network. In general, these handlers are intended to implement all communication functions above level 3 for the type of networks they support. Each of these networks contains features of its own and is discussed separately. 4.4.1 BANCS Handler. The BANCS handler is responsible for message communications with the BANCS network and all hosts on the BANCS network. The BANCS network is a standard network supporting a wide variety of hosts each of which have implemented standard protocols. The BANCS handler establishes network connectivity with the BANCS network, sends and receives BANCS system messages (to establish connectivity and sessions between applications), and formats the BANCS network header to insure correct message routing through the network. It provides all services for application-to-application traffic to hosts supporting the following high-level protocols: MAC (MSC Application Control) Protocol (originally developed by South Central Bell) TOP (Transaction Oriented Protocol) Protocol [ 71 RFS (Report Forwarding System; a Standard BANCS protocol) Protocol Terminals on the BANCS network can also access the WM through the THP. The BANCS handler is the access point for this terminal-to-host traffic as well as host-to-host traffic. The WM supports dynamic terminal mapping. That is, the WM will allow any terminal on the BANCS network to log on and enter a transaction. Security is provided by passwords forwarded to the COSMOS hosts. An alternative implementation, static terminal mapping, would require too much memory and table lookup time. A partial static terminal map has been implemented for a limited number of terminals ( < 50) to allow frequent terminal users freedom from passwords. This also allows “super-user” commands to be restricted to particular physical terminals if desired rather than relying on passwords for the security of such critical commands.

232 4.4.2 COSMOS Handler. The COSMOS Handler is an example of a network specific handler, In designing the WM-COSMOS interface protocol, a number of unique functions were determined to be desirable because of COSMOS capabilities and operational considerations. 1. Routing Identification Passed as Part of LOG Sequence. Each COSMOS machine contains one or more (up to as many as 150) logical database entities known as Wire Centers. As part of the LOG sequence at link initialization time, the COSMOS machine identifies those wire centers currently located on that machine. The COSMOS handler then constructs the routing table for Work Manager Control based on this data. This allows a couple of nice side capabilities: Wire Centers can be moved from one COSMOS to another for load balance reasons without manually changing any tables on the WM. Entire COSMOS databases can be moved without manually changing any WM tables. This is useful in the event of fatal hardware problems on a COSMOS machine requiring the database to be moved to a hot spare. This is also useful in the event of hardware port problems on the WM; a COSMOS machine can be plugged into a different port and the WM is notified of the change in routing through the LOG sequence. This has an interesting negative consequence as well. If, for some reason, a duplicate database is brought up on another COSMOS (for example, for training reasons), it can “steal” the wire center routing from the “correct” COSMOS. System Administrators have to take care that this does not happen. However, if it does, messages can be recovered from the WM store and forward database even after they have been acknowledged as “successfully processed” by the host. 2. Automatic Message Recovery and Load Control. In order to implement load control, a careful distinction has been drawn between link level message acknowledgments and application level message acknowledgments. Link level acknowledgments refer strictly to message transmission. The WM-COSMOS interface operates with a link level window size of one. The application level window size is typically six. By convention, an application level acknowledgment is not sent from COSMOS to WM until the COSMOS application has successfully processed the message and all output messages have been successfully transmitted to WM. This provides an effective means of controlling the number of active WM messages being processed by a COSMOS system. Under conditions of heavy load in

W. J. Barr COSMOS from other sources, this window size can be reduced. Likewise, although this seldom happens, it is possible to increase the window size to accommodate heavy WM traffic under conditions of light load from other sources. By using this strict definition, message recovery between WM and COSMOS takes on a slightly new meaning. As part of the LOG sequence, a message auditing function takes place. WM and COSMOS exchange knowledge of any active messages which have not been acknowledged at the application level. If COSMOS does not indicate knowledge of a message which WM shows as transmitted but not acknowledged, the message is simply retransmitted. This allows COSMOS to “lose” a message during an outage without any negative impact on database accuracy or user functions. This partially compensates for the lack of sophisticated recovery in this UNIX-based system. 3. A third unique capability that has been created is an application level resend. COSMOS can ask the WM to place a message back on its input queue. When this request is received, WM places the message at the end of the appropriate queue. This implements a “wait for a while and send it again” capability which is currently useful for resolving timing issues between two messages (such as you can’t modify an order until I am done processing it; requeue the modification request) or could potentially be used to resolve deadlock conditions. 4. Finally, COSMOS has the ability to tell WM to place a message on a hold queue and notify system administration of a serious error condition. This is useful since the WM has more sophisticated console monitoring and alarm sounding capabilities and a WM alarm is more likely to get immediate attention than a COSMOS alarm. 4.4.3 Terminal Handler. The Terminal Handler (TH) exists to provide access between Bisync terminals, which can be either directly connected to WM or connected to BANCS, and the COSMOS systems. Since COSMOS is a UNIX-based system, its normal mode of terminal access is via asynchronous terminals. The WM TH provides a synchronous terminal access capability that is virtually transparent to COSMOS since the WM manages the terminal “sessions” required for synchronous terminals. In addition, a Mask Service (MS) has been developed which provides a flexible way to define screen masks and develop software to process the masks. The philosophy behind the TH is to minimize the data translation done in the WM by limiting the fixed portion of the mask to a header containing the data which WM needs such as Wire Center and password. The remaining portion of the screen is free format al-

233

A UNIX-Based Gateway lowing the user to input the COSMOS transaction in COSMOS format thus eliminating any knowledge of message content in WM. COSMOS is free to add, delete, or change message content without WM impact. 1. Message Service Provides Device 4540, or IBM Processes Screen

(MS) Functions Interfaces to TTY 4014, TTY 3270 Terminals Masks and Screen Data

Provides IBM 3270fTTY 4540 Printer Interface 2. Terminal Handler Functions Processes Terminal Input and Output including management of multiscreen output messages. Provides Screen Edit Capability. This allows the user to create a “file” to store a (typically long) input transaction. The file can then be executed (sent to COSMOS) without destroying the input file. In response to the output message, the user can then reedit the input message to, for example, correct an input error. This is also useful for creating templates for complex user transactions. Security. TH forwards translated password data to COSMOS, WM does not take security responsibility for COSMOS. WM will, however, maintain a password table for forwarding to COSMOS (see static terminal map discussion above). 4.5 Work Manager Control WM Control implements the Switch Control Functionality. Its implementation is faithful to the architecture presented in Section 3. It performs the following functions: 1. Provides overall control of store and forward mes-

sage switching processing. All writes to safe-store media are done here. In addition, WM Control manages recovery when necessary. 2. Manages WM queues. Incoming messages are placed on the appropriate queue according to relative priority and destination. While specific data about a destination is generally hidden from WM Control, it does know about the number of queues that serve a destination. When a message is identitied as being next for transmission to a destination, WM Control dequeues the message and notifies the appropriate handler to initiate transmission. 3. Provides application level acknowledgments. Since WM Control safe stores the message, application level acknowledgments must originate here. A result of this is that WM Control provides the load management capability described in the discussion of the COSMOS Handler. 4. Message Routing Services

Locally unique two character aliases for user friendly access. With this capability, terminal users are not required to specify the long network unique form of the destination but can specify destination using a two character alias that need be unique only to the local WM. Ambiguous Destination Resolution using a Secondary Field. This feature allows a host to send a message to WM with a nonunique destination and WM will use a secondary field in the Message Header to resolve the ambiguity. Ambiguous Destination Resolution using Message Source. This feature allows a host to send a message to WM using a destination which is not unique. WM will resolve this ambiguity by using the source of the message to determine which instance of the ambiguous destination is the actual intended destination. This capability is useful where a mainframe system expands to cover two or more physical systems without changing the interface specification (both systems call themselves by the same logical name), and no other field exists in the message to resolve the ambiguity. Cleans up Safe Store Disk Region When Full. This “garbage collection” function reallocates disk storage by “purging” old messages thus making the space available for safe storage of new messages. Handles Initialization and Recovery. When the WM is booted, system initislization and recovery options are managed by WM Control.

5. CONCLUSIONS

The architecture and system described in this paper is an actual working system. The WM may or may not be unique. It is certainly unusual in its function as a production gateway to a minicomputer based distributed database system. We have acquired a great deal of experience in developing, maintaining, enhancing, and supporting WM in the field. In this section, some of the conclusions drawn and lessons learned from this experience will be described. The first conclusion that can be drawn from this effort is that developing a minicomputer based distributed database system for reliable production use is feasible. This conclusion is not as trivial as it sounds. The academic and research communities have made great progress over the last decade in developing the underlying technology for distributed database systems. However, implementations involving incompatible components, especially combining minicomputers and mainframes, are rare in the field where totally mainframe implementations dominate. Many of the advan-

W. J. Barr

234 tages we had hoped to attain in a network of minicomputers are being realized. These include: Ava~~a~~~~~~. Availability was a major concern of the

user community during initial design. Availability in this type of network is difficult to determine since we are trading off compound availability of a large number of systems (relative to a single mainframe) with the fact that a single outage impacts only a portion of the user community. Our availability experience has been very good. Total system availability of the network of minicomputers compares very favorably to the mainframe systems. We use a hot spare backup strategy to minimize the mean time to repair from a user perspective. Major outages are extremely rare and are usually due to environmental factors such as power. Fl~~i~~~~~~, Each of the systems in the network is independent. New software or ideas can be tried on one system and, if successful, propagated to the rest. Examples of this include new software release installation, performance tuning, and activating dormant features. In addition, logical databases can be moved from one system to another to accommodate unusual activity, in particular, abnormally high demand. Increme~?a~ Expansion. Because the involved systems are minicomputers, adding a new system to provide incremental performance capacity is both cheap and easy. A second lesson, one we have learned the hard way, is that network management in this environment is both difficult and critically important. P~viding powerful facilities for monito~ng alarms, tracking message queue and host status, and troubleshooting problem conditions were an afterthought in the original implementation. These capabilities have proven to be critical to the ability of the user community to operate the network effectively. Problems in such a complex network can be di~cult to isolate in the best of circums~nc~ and virtually impossible without the correct tools. Much of our effort since initial deployment has gone into increasing the power and utility of our diagnostic and administration tools. The selection of UNIX as our operating system has been significant. While we have, with regret, found it necessary to enhance Standard UNlX to meet our performance objectives, we have made signi~cant use of the power and flexibility of UNIX to provide a system that is easy to administer and support. WM has provided an interesting opportunity to compare two implementations of the same product. The initial pilot implementation of WM used BOS-1 lF, a “real-time” operating system. While BOS appeared, on the surface,

to be more appropriate for a message switch than UNIX, development and support ~nside~tio~ have led both developers and user to prefer the UNIX implementation. Admittedly, it was more difficult to meet our performance objectives under UNIX, but the abiiity to work in a powerful friendly software environment has translated directly into reduced development, sup port, and administration costs. Another difficult lesson in providing peer-to-peer communications between minicomputer and mainframes is not simply a matter of implementing a suitable protocol on the minicomputer. Mainframe transaction processing systems offer sophisticated recovery capabilities to applications. A mainframe system assumes that its partner in a peer-to-peer communication has similar backing power. The implications of this assumption are subtle but most frequently manifest themselves in problems associated with ending and recovering a “lost” message. Failure to provide mainframe equivalent recovery power results in an “it left here OK” response to problems. If one wishes to participate in a peer-to-peer relationship, one must act like a peer. The decomposition of these capabilities into a sep arate system has been an effective architectural ap preach to providing the needed capabilities. Finally, the emphasis we have placed on proper protocol layering has been well rewarded. We are now faced with replacing the transport services for most of our protocols. Because of our emphasis on Iayering, replacing transport services, for example, bisync with BX.25 for host-to-host communications, is proving much easier than a less structured approach would have allowed.

6. ACKNOWLEDGMENTS A large number of people have contributed to the concept, design, and implementation of the Work Manager. At the risk of leaving someone out, I would like to acknowledge the contributions of Jim ~o1tman, Charlie Rifenberg, and Jeff Edelman, now with AT&T Bell Laboratories, and Bell Communications Research Staff members Roberta Bernstein, Joe Cloutier, Eleanor Dash, Ron Fleming, Joe Kaminski, Kurt Kovach, Sheila Manischewitz, George Mathew, Bob Martin, Earl Souders, Michelle Tripp, Jay Walker, and Ron Underwood.

REFERENCES 1. D. M. Ritchie and K. Thompson, The UNIX Timesharing System, Bell System Technical J. 57, 1905-1929 (Aug 1978).

A UNIX-Based

235

Gateway

2. P. S. Boggs and C. E. Stenard, Integrating Loop Operations Systems: Two Giants Working Together, Bell Laboratories Record 56, 186-190 (1978). 3. J. M. Goett, G. A. Schweickert Jr., and A. R. Ephrath, Human Performance Engineering of a Telephone Company Software System, Proceedings of the International Conference on Cybernetics and Society 386-389 (198 1). 4. S. Ka-Lai Leung, The Concept and Implementation of BANCS, Computer 13, 80-92 (1980).

5. D. F. Lee and A. J. Pasqua, BANCS: A New Outlook on Telco internal Data Communications, Telephone Engineering

and

Management

84,

84-86,

88,

91

(1980). 6. DDCMP SpeciJication Version 4.0, Digital Equipment Corporation, 1978. 7. Operations Systems Network Communications Protocol Specification BX.25, AT&T Technical Publication 54001. Issue 3 (1982).