Distributed processing for the solution of engineering problems* G E O R G E R. C A M E R O N
Floating Point Systems, Inc.
INTRODUCTION The solution of scientific and engineering problems, using multiple processor computer systems, has been an ~volutionary process beginning in the late 1960's. Processors are being linked together to form computer systems that utilize different levels of processing capabilities based on the application program requirements. The effect has been the reduction in the cost of research dnd design projects. The trend is towards a c0st-effective, real-time design environment. Large scientific and engineering problems may generally be segmented into a preprocessing phase, a processing phase, and a postprocessing phase. The preprocessing phase will include data generation and modification, and graphic display capabilities. The postprocessing phase will include the manipulation and graphic display of result files. These phases tend to require significant interaction between the scientist or engineer and the computer system for data manipulation. They are not computationally intensive, and are best processed in a friendly, interactive environment. The processing phase, on the other hand, is generally computationally intensive, with little or no interactive requirements. This phase is best processed in a fast batch environment. Since the processing requirements are different for these various stages of problem solving, it is reasonable to use various processors within the computer system that goether offer a minimum computational cost and effectively utilize existing manpower resources. AVAILABLE PROCESSORS FOR A DISTRIBUTED PROCESSING SYSTEM The computer hardware and operating system software available for use in a~ distributed processing computer system ranges from microprocessors at one end of the scale of processing capabilities through minicomputers, super minicomputers, mainframe computers, and finally supercomputers at the other end of the scale. Within this range lie the new attached processors, developed for processing computationally intensive engineering problems. The method of linking these processors together is a critical area that is undergoing extensive examination at,this time.
Mainframe computers. The solution of large scientific and engineering problems has traditionally utilized * Paper taken from the proceedings of the Second International Conferenceand Exhibition on engineeringSoftware,April 24th to 26th, 1981 0141'1195/81/030174--0352.00 © 1981 CML Publications 174 Adv. En#. Software, 1981, l/ol.3, No. 4
mainframe computers such as the CDC CYBER computers and the larger IBM and Univac computers. These computers have relatively fast instruction cycle times, usually in the 100 nanosecond range, and operate in a sequential manner. Their operating systems are fully developed so that they may handle a mixture of interactive and batch processing. The performance of these computers, processing and mixture of engineering work, is in the one to five megaflop (million of floating point operations per second) range. The capital cost of these mainframes is in the area of one to five millions of dollars which, when combined with the maintenance costs, usually means that they are used for diverse activities such as accounting, forecasting, as well as engineerihg and research. With each function competing for the limited computer resources, the large scientific and engineering jobs are usually relegated to evening and weekend processing, or a premium is paid for prime-time access to the processor.
Supercomputers. The introduction of the supercomputers in the eartly to mid-1970's addressed the need for lower cost per computation and higher job throughput required for large scientific and engineering problems. This class of computers includes the CDC STAR 100, the CYBER 200 series, the CRAY 1, the Illiac 4, the Burroughs BSP, and the TI ASC computers. Operating system limitations of these computers generally require that they be attached to a front-end computer that handles system functions such as terminal communications, and interaction with other low-speed peripherals. They have high-speed local storage and are batch oriented. Processing a mixture of engineering work they operate in the ten-to-fifty-megaflop range. This high computational throughput comes from fast instruction cycld's (in the 12.5 to 50 nanosecond range), from a parallel architecture (memories, register banks; data paths, and arithmetic elements), and from pipelined elements (memory, resister banks and arithmetic elements). The capital cost of these computers, around five-to-ten plus millions of dollars, combined with the maintenance costs, means that they are usually used by a variety of users. Minicomputers. Minicomputers were introduced in the late 1960's to early 1970's. They are now widely used throughout scientific and engineering problem solving. Minicomputers such as the Digital PDP-11 series, the Prime computers, the Data General computers, etc., are regularly used in signal processing applications, small simulation problems, and in conjunction with large mainframes for the non-computationally intensive portions of
large scientific and engineering problems. These computers are relatively slow due to the instruction cycle times, in the microsecond area, and the limited serial instruction sets. They have a relatively low capital cost, ten-to-one-hundred-thousand dollars. In large problem solving, they may be used for dedicated interactive tasks such as input file generation and manipulation and as an aid in the display of result files. Super-minicomputers. The super-minicomputers were introduced in the late 1970's. This class of computers includes the Digital VAX-I1 series, the Prime 750, the Perkin Elmer 3200 series, the Data General MV/8000, etc. These computers have serial instruction sets wjth instruction cycle times ranging from a few hundred nanoseconds to a microsecond. The performance of these computers is in the 0.1 megaflop range for large problems requiring double precision. These processors may be used for computationally intensive portions of small to medium size problems (those that generate up to about five thousand sparsely-populated simultaneous equations), as well as handling input and result file manipulation of large scientific and engineering problems (those that generate up to one-hundred thousand sparsely-populated simultaneous equations). The cost of a super-minicomputer system is in the two-to-three hundred-thousand dollar range. Based on the capital cost of the super-minicomputer and its sustained computational rate, the cost per computation is less thafi mainframe computers; however, the throughput in large scientific and engineering p r o b l e m s is usually considerable slower. Microcomputers. Microcomputers will be used for interfacting various processors and system peripherals. Microcomputer manufacturers are involved in the definition of protocol for information transfer between processors. Microcomputers are used for signal processing applications, and for portions of large scientific and engineering programs that are not computationally intensive. Processors such as the INTEL 8086 and the Motorola 68000 have processing capability for small scientific and engineering problems. The design of problem solution systems based on configuration of microprocessors, is being studied; however, the operating system complexity of controlling large numbers of processors has yet to be solved. Special purpose attached processors. Attached processors are special-purpose arithmetic processing units, with local memory, designed specifically to process scientific and engineering problems. They are used for the computationally intensive portions of problems ranging from signal processing applications to large mathematical modelling applications. Their performance is similar to that of a mainframe computer, in the one to three megaflop range. The hardware architecture is similar to that of the supercomputers with parallel memories, register banks, data paths, and arithmetic elements. They use pipelined arithmetic elements and memories, and have relatively fast instruction cycles in the one-hundred to two-hundred nanosecond range. The cost of these specialpurpose attached processors is similar to the superminicomputers. They att~.ch to a variety of superminicomputers and mainframe computers and are used for off-loading the host computer and processing the computationally intensive portions of application programs.
INTERCONNECTION O F PROCESSORS The variety of computer hardware and operating system software presents a large number of possibilities for developing distributed processing systems. Computer manufacturers that offer various sizes and configurations of computers generally provide hardware and operating system software so that they may be configured into a multiple processor system. These computers have a certain element of compatibility so that the input/output bus adapters to the interconnect may not be required to handle complex compatibility problems. Manufacturers of high-speed interconnect buses, for computer systems with processors from various computer manufacturers, must address the compatibility problems between various host computers. The hyperchannel from Network Systems Corporation, and the new version of Ethernet, being devekoped by Xerox, Intel, and Digital Equipment Corporation, are examples of local networking of processors and peripherals. An Institute of Electrical and Electronic Engineers (IEEE) subcommittee is developing a standard protocol for local area networking. The physical link between multiple processors and peripherals is usually coaxial cable, and the transmission is bit serial at a rate of ten-to-fifty Mbps (megabites per second). The network adapters contain buffer memory for asynchronous transmission and the necessary firmware to control access to the multiplexed bus, as well as to carry out the format conversion between the host computer and network. The information transfer rate; however, is a function of not only the network transfer rate, but also the file management system within the operating systems of the processors involved in the information transfer. APPLICATION PROGRAMS IN A DISTRIBUTED PROCESSING ENVIRONMENT Application programs must be modified to execute in a distributed processing computer environment. The preand post-processing portions of large scientific and engineering programs may run on one processor system, while the processing portion of the program may run on a second processing system. This program modification will require close co-operation between the application program developer and computer system programmers familiar with the various processors comprising the system, as well as peculiarities of the link between the processors. Scientific and engineering programs usually require an iterative procedure between the mathematical model representing the physical entity under investigation, and the results of the program analysis. It involves the modification of the data base in the preprocessing phase, based on the results from the postprocessing phase. The cost of this iterative process is closely related to manpower costs in that as much as eight percent of the project cost may be labor cost, with the remaining twenty percent being computer processing costs. It is crucial then, that a reduction in project costs needs to be translated into effective use of manpower on the project. Interactive pre- and post-processing is essential in reducing idle manpower. The preprocessing phase of a program involves the generation, manipulation, and display of input data. The result of this phase is a formatted, permanent file which will be used by the processing phase as the model
Ade. Eng. Software, 1981, Vol. 3, No. 4
175
definition. It involves interaction between the program user and the computer system so that refinements to the model, and hence the data base, may be made. Three possible modes of preprocessing may be used; a batch operation on the processing computer, an interactive operation on the processing computer, or an interactive operation on a separate, dedicated computer, linked to the processing computer. It can be demonstrated that the time required by the program user to generate a model description using interactive techniques is about one-half the time required to do the same job using batch techniques. By using a dedicated system, this time is further reduced by a factor of one-half. This reduction in the time required for the pre-processing phase is directly related to a reduction in the project cost. A separate, dedicated pre- and post-processing system may consist of a microcomputer, a minicomputer, or a superminicomputer system. The data base that is to be manipulated would be stored either on a local storage device to the pre- and post-processor computer, down loaded from the processing computers' local storage device, or loaded from a system storage device directly accessible to both processors. These files are permanent, formatted files that are relatively small compared to the scratch Meg generated in the processing phase of the program. The information transfer between the computer systems or between the storage device and the pre- and post-processing phase computer, is relatively small even though it is formatted, when compared to the scratch data transfer between the processing computer and its storage device. The processing phase of large scientific and engineering application programs typically involves the solution of simultaneous equations that were developed as an approximation to events described by ordinary or partial differential equations. For example, the computer time involved in the processing phase of a finite element, linearstatic analysis may be separated into about five percent element set-up, twenty to thirty percent element generation, sixty to seventy percent in the solution of the resulting simultaneous equations, and five percent in the element stress recovery. These values vary depending on elements used and the computer architecture. The computer cost for the element generation and the solution of the simultaneous equations is, therefore, directly related to the cost of computing. The lower the cost per computation, the lower the project cost, provided all other costs remain unchanged. The processing phase of the application program may be modified to run on a mainframe computer, a supercomputer, or anattached processor. The processing program may reside on a disk local to the processing computer or is staged from the separate system storage device to the processing computer. The data tiles are staged from the preprocessing computer or from the system storage device to the processing computer. The relatively large number of scratch files typically generated in the processing phase are stored on the disk system iocal to the processing computer. These are generally unformatted files that are left unprotected on the local
176
Adv. Eng. Software, 1981, Vol. 3, No. 4
disk system at the conclusion of the job. If restart files are generated, however, they must be staged back to the permanent file storage area. To remove the burden of using a distributed processing system from the program user, some form of job control command stream must be set up which regulates program and data transfers, as well as providing execution control. In addition, the pre- and post-processing and the processing phases of the application program must be compiled and linked on the system that it will execute on. Cross compilers and linkers may be used if they are not resident. Assemblers may be used if portions of the program are in assembly language to utilize more fully the particular computer architecture. However, this inhibits maintenance of the program and transportability from processor to processor. SUMMARY For large scientific and engineering application programs that enjoy a relatively large distribution or usage, it is reasonable to expect that a distributed processing system may be developed that offers minimum computational cost, as well as providing a maximum utilization of manpower. Interactive pre- and post-processing would utilize one computer system that incorporated hardware and operating system features that were matched to that type of processing, while the batch-oriented processing phase would utilize a second computer system with features matched to that type of processing. The processors would be linked by a high-speed data channel. Each processor would have access to a local disk storage system so that files could be staged between them. Job control software would be used to control the processing environment. Although distributed processing systems are used in the large national laboratories and research centers, the concept is applicable to most computer centers engaged in large scientific and engineering projects. As manpower costs continue to climb and the cost of computations continue to fall, multiple processor computer systems, which address specific needs of various phases of application programs, become a very attractive alternative to the shared, centralized computer system. REFERENCES I Thornton, J. E. 'Back-End Network Approaches', Computer, February, 1980, pp. 10-17 2 Metcalfe, R. M. and Boggs, D. R. 'Ethernet: Distributed Packet S(vitchingfor LocalComputerNetworks',Comm. ACM, Vol. 19,No. 7, July, 1976, pp. 395~,03 3 Sugarman,R. "Superpower computers', IEEE Spectrum, April, 1980, pp. 28-34 4 Rodrigue,G., Girous, E. D. and Pratt, M. 'Prospectiveson LargeScale ScientificComputation', Computer, October, 1980,pp. 65-80 5 Structural Dynamics Research Corporation, Techniques of Finite Element Ana!ysis, June, 1977, pp. 138-139 6 Swanson, J. A. 'Present Trends in Computerized Structural Analysis', Computers and Structures, Vol. 10, pp. 33-37 7 Swanson,J. A. 'Distributed processing',ASCE National Convention, Portland, OR, April 14-18, 1980