Computing Systems in Engineering Vol. 3, Nos 1-4, pp. 401~.l 1, 1992
0956-0521/92 $5.00 + 0.00 () 1992 Pergamon Press Ltd
Printed in Great Britain.
INTERACTIVE N U M E R I C A L SIMULATION A N D GRID ADAPTION FOR HYPERSONIC FLOW E. LAURIEN, H. HOLTHOFF and J. WIESBAUM Technical University of Braunschweig, Institute for Fluid Mechanics, Bienroder Weg 3, D3300 Braunschweig, Germany Abstract--The numerical simulation of hypersonic flows requires the use of supercomputers and graphic workstations in an efficient manner. The present contribution reports on an implementation of a finite element Navie~Stokes method with grid adaption, by which user interaction is used to optimize efficiency of high performance computing in a supercomputer/workstation network. Scaling laws are derived to predict the feasibility of this technique on other computers and networks with different CPU and data transfer performance. The High Performance Computer Network of Lower Saxony is used as an example for the application of these scaling laws. We find that storage and CPU performance of this example as well as at many other supercomputer sites is sufficient for fine adaptive grids to be used, but data transfer performance must be increased. High speed data links between supercomputers and workstations can increase efficiency of high performance computing at relatively low costs. NOMENCLATURE e na, ncyde N
pCPU p' .... s T cPU T trans ~t
computational effort in megaflops number of operations necessary to perform a time step number of iterations, multigrid cycles or time steps number of grid points CPU performance of a computer/code combination in megaflops per second data transfer performance of a network/transfer-protocol in megabytes per second computer storage necessary to run a computer code in megabytes computer time in seconds time needed to transfer simulation results in seconds exponent of scaling law
1. INTRODUCTION Computational fluid dynamics ( C F D ) is among those disciplines which require powerful and expensive computer resources in order to achieve numerical accuracies, which are useful for design engineers. It is therefore important to maximize efficiency of large application runs with fine numerical grids on supercomputers. Usually, before those runs are made, the numerical method is tested using coarse grids. In the past few years renewed interest in hypersonic flows has led to the development of new large computer codes. The present group has developed a Navier-Stokes code for chemically reacting re-entry flows ~'-" using the Taylor-Galerkin finite element method 3'4 on the basis of a concept of distributed computing. F r o m the beginning this code has been designed to be applied on supercomputers in a data network, such as shown schematically in Fig. 1. The technique of adaptive grid refinement with unstructured numerical grids has been implemented. This technique, although applied successfully in structure mechanics, has been optimized for aerodynamic methods only recently. We follow the concept of distributed computing in the sense that production of data and postprocessing are performed on different machines, in particular supercomputers and workstationsJ '6 Although very
simple, in the past this concept has rarely been applied efficiently in practical applications of C F D codes. The reasons have often been the incompatibility of operating systems, the lack of high performance data links, insufficient main storage capacity of workstations, or the absence of efficient visualization software. This has led to the unpleasant situation that the visualization of results often takes far more time than their production on a supercomputer. The situation has changed during the past few years with the availability of powerful workstation computers and the integration of supercomputers into high performance data networks. Many operating systems are now U N I X based, e.g. U N I C O S (CRAY), C O N V E X - O S , S I N I X (Siemens) and U L T R I X (VAX) and therefore compatible with today's workstation standard. The T C P / I P transfer protocol and the ethernet I E E E 802.3 have begun to bridge the communication gap with a net transfer rate of approximately 0.1 Mbyte/s. High performance networks such as F D D I and U L T R A N E T are available 7 and will further increase the accessibility of results produced on supercomputers. Some professional interactive visualization tools are also available. 8,9 Although these appreciable changes in compatibility, accessibility and visibility of data open up entirely new possibilities of distributed computing and visual401
402
E. LAURIEN et al.
is examined under these aspects. Our scaling laws give quantitative measures as conceptual guidelines for C F D code development or supercomputer installations. \
2. GRID ADAPTION TO HYPERSONIC FLOWS
2.1. Local phenomena in hypersonic flows
b
c\
Fig. 1. Principle of distributed numerical simulation and postprocessing: (a) supercomputer; (b) mini-supercomputer for code development using coarse grids; (c) graphic workstation for interactive postprocessing of result data.
ization, care must be taken. It is not clear what amount of data, produced for example by a large simulation code on a supercomputer, has to be transferred, what data reduction can do, how efficient interactive use of supercomputers is, and what the performance of a data network must be to increase efficiency of a system, consisting of supercomputer, C F D code, data link, transfer protocol, workstation and visualization software. A maximum of storage and performance is not always the optimum. Consider two examples: very large storage capacity, e.g. 2 Gbyte, is worth nothing when the CPU performance is poor. On the other hand a very high performance supercomputer cannot be efficiently utilized when postprocessing is inefficient. A theory should take all aspects of hardware and software into account and be able to predict future needs of C F D in terms of parameters. Some parameters are well known, e.g. how fine a grid should be to simulate a certain flow. From this the storage can be estimated, then the CPU time on a certain computer and so on. Also the reverse estimation should be possible: what kind of flow can be simulated on a certain computer/network? In this paper the implementation of a technique for interactive numerical simulation in a supercomputer network is described. Then we derive the basis for a theory to predict the efficiency, with which existing and future supercomputer networks can be utilized by using our technique. The theory is expressed in the form of scaling laws, which can be interpreted in a logarithmic diagram. These laws are applied to typical examples of hypersonic flow simulations. From these examples the hardware requirements for an efficient application of interactive numerical simulation and grid adaption are derived. Then the example of an existing computer network (the High Performance Computer Network of Lower Saxony)
Hypersonic flows belong to the most complicated flows in the aerodynamics of flight vehicles. This is because high temperature regions, strong shock waves, boundary layers and free shear layers occur and interact with each other in a three-dimensional manner. Some examples of hypersonic flows of present technical interest are: flows around capsules or vehicles under the conditions of re-entry into the earth's atmosphere; aerodynamics of hypersonic airplanes; and hypersonic airbreathing propulsion systems. In this context the European project of HERMES, the German S~,NGEW ° concept and the German-Japanese Cooperation EXPRESS It are mentioned. An increasing amount of research and development work on these and other projects is done numerically. A characteristic of hypersonic flows is that a greater number of physical processes are involved compared with conventional aerodynamics, such as convection and diffusion of mass, momentum and energy, and chemical and thermodynamical relaxation, lz Each of these processes is associated with some characteristic time scale or length scale on which it takes place. The great number of scales involved in hypersonic flows leads to phenomena which physically take place in only a small or narrow region within the flow field. In order to resolve such local phenomena numerically the computational grid must be locally refined. Because in most cases the exact area of required high resolution is not known a priori the refinement must be done by an adaption algorithm, i.e. the location of grid refinement depends on the solution on a coarser grid. Consider a blunt body in a hypersonic re-entry flow (Fig. 2). A strong shock wave, the so-called bow shock, forms. The shape and width of it depend on inflow density, temperature, velocity and the body shape and size. The local phenomenon of this bow shock governs the entire flow field, which can only accurately be predicted, when the bow shock is well "captured" (we will not consider "shock fitting" methods here). At a closer look the shock triggers various physical mechanisms of energy exchange, each associated with a certain time scale: translational (Brownian) and rotational energy, vibration energy of molecules, and chemical energy. The translational and rotational energy exchange is very fast, so the exchange process itself cannot be resolved. Those processes are then assumed to be "in equilibrium" and the shock appears as a jump in the flow field. This jump is approximated by only a few grid points by modern high resolution schemes. Behind
Hypersonic flow
403
Fig. 2. Blunt body in a hypersonic flow with bow shock, interferogram taken in a gun tunnel.
this jump large gradients of the flow variables may occur within vibrational and chemical relaxation areas. To capture the shock sharply and to resolve those relaxation processes, finer numerical grids are necessary than in the rest of the flow field. Grid adaption becomes necessary. 2.2. Fully automatic adaption techniques An adaptive technique, which is able to capture the local phenomena in a fully automatic manner, is certainly desirable. Two concepts shall be discussed here: • adaption based on a parameterized geometric "refinement area", denoted by parametric adaption; and • adaption based on a local indicator function or an error estimator and a threshold value for refinement, denoted by local indicator adaption. The first method needs some a priori knowledge about the geometrical shape and position of the required grid refinement. In some engineering applications this knowledge can be obtained. As an example the shape of the bow shock in front of a sphere for a perfect gas in Cartesian coordinates x and y is approximately given by 12 x = x d + Rc°t2fl(( 1 + y2tan2fl~° ~ 2 - - - ~ "5 - 1)
(1)
with the shock standoff distance xd, the maximum curvature radius of the shock R and an angle fl as free parameters. These parameters can be adapted to the actual flow using an automatic optimization algor-
ithm and a global refinement criterion. This technique works in principle, but often the automatic adaption algorithm runs into local extrema without physical meaning. This is immediately recognized after plotting the grid. The second method is more general and has been applied by several authors. '3 A wide variety of indicator variables have been used, depending at which phenomenon the refinement is aimed. For hypersonic flows let us distinguish between three types of adaption: '4 • Adaption to shocks: a shock is a discontinuity in the flow field, which is numerically captured over a few grid points. Note, the indicators must detect weak and strong shocks simultaneously. • Viscous adaption: usually the location of velocity, density, or temperature gradients due to viscosity or heat conductivity is near walls, but free shear layers may also occur. • Adaption to relaxation areas: here high resolution is required, because large gradients of the species concentration due to chemical relaxation occurs. An example of adaptive grid refinement to the bow shock and the boundary layer of a blunt body flow at a Mach number of 20 and a Reynold number of 10,000 is shown in Figs 3 and 4. The refinement is based on the local Mach number of the corresponding coarse grid solution with the threshold values given in the figure caption. When the threshold value is changed the grid will vary drastically, see Fig. 5. An important fact that we have discovered is that no adaption algorithm developed so far works com-
404
E. LAURIENet al.
pletely automatic for hypersonic flows. The local indicator functions and the threshold values are not reliable enough to capture local phenomena fully automatically. Error estimators have not yet been developed for the governing equations considered here. A result leading to an unacceptable irregular grid is shown in Fig. 6. Such irregularities cause inaccurate non-smooth fine grid solutions. 2.3 An interactive adaption technique
Fig. 3. Example of grid adaption to the bow shock and the boundary layer of a blunt body hypersonic flow. A threshold value of 0.5 was used.
Our approach for defeating difficulties with the fully automatic grid adaption is to run the computer code in an interactive manner. This concept has been applied successfully to numerical grid generation beforeJ 5,16 Here the term "interactive" is used in the sense that, before the grid refinement is actually done by the algorithm, a refined grid is displayed on a workstation screen. This grid is regarded as a "proposal" made by the computer to an engineer or a physical analyst, who either accepts or rejects that proposal. If it is rejected, the refinement parameters mentioned above can be modified and a new proposal made. Our interactive technique has been implemented into the numerical simulation code as follows: the code, written in F O R T R A N , runs on a CONVEX-C2 under UNIX. This code only performs computational steps when certain keywords are entered on the keyboard or input from a graphical input device (the "mouse"). Otherwise it remains in an idle status waiting for commands (this does not necessarily
Fig. 4. Detail in the nose region of Fig. 3.
Hypersonic flow
405
• transfer the grid to a workstation and plot the picture on the screen; • create a proposal for a finer grid, transfer a plot to the workstation and plot the picture. The program can also run in a batch mode, where it follows a given c o m m a n d sequence defined in advance. Then, however, failure of the automatic grid adaption often leads to unacceptable fine grids. The computer time used to perform iterations on these fine grids is of course wasted. 3. CFD SCALING LAWS FOR S U P E R C O M P U T E R / WORKSTATION NETWORKS
3.1. General In this section we propose a theory for parameter estimations in order to obtain the most efficient utilization of supercomputers and simulation algorithms using adaptive grids. This theory will cover various numerical methods used in C F D , such as finite volume, finite difference and finite element methods. The theory contains parameters which will be estimated. We stress here that the efficiency (or cost) of numerical simulation can only be measured for the combination of computer, data link and algorithm, not for one component alone. Therefore neither the peak performance of the computer nor the
Fig. 5. Same as Fig. 3 with a threshold value of 1.0. mean that the computer runs idle). It can be distinguished between the following groups of commands: C o m m a n d s related to the grid generation and adaption: • generate an initial coarse grid using Delaunay triangulation, initialize the flow field; • select or modify the current grid adaption method and its parameters, e.g. change the threshold for local indicator adaption; • refine the current grid, interpolate all flow quantities to the new nodes using the current (coarse grid) solution. C o m m a n d s related to the computation of the flow: • perform a given number of time steps; • modify current numerical parameters, e.g. parameters for stabilization of the method such as "numerical dissipation"; • save or load the current flow to or from a data file (note that this may take a very long time or may even not be possible at all, e.g. for data on fine grids). C o m m a n d s related to graphical output: • select a method to visualize the flow, and a crosssection (three-dimensional), transfer those reduced data to a workstation and plot the picture on the screen;
Fig. 6. Strong irregularity in a refined grid caused by using an inappropriate refinement criterion.
406
E. LAURIENet al.
convergence rate of the simulation algorithm alone are an appropriate measure for the efficiency of the combination. The emphasis of our investigation is on the utilization of supercomputer/workstation networks in contrast to application runs on one computer only. We assume that the numerical work is done on a supercomputer, then the numerical results are transferred via the network to a workstation, where the graphical postprocessing is done. This concept of distributed numerical simulation and postprocessing of data is widely accepted in CFD. 3.2. Derivation o f scaling laws Assume that a computer code has been developed using coarse grids on a relatively low performance machine, e.g. a workstation or a mini-supercomputer. The purpose of our scaling laws is to estimate how the same application will perform on a supercomputer using fine grids. The number of grid points used in a numerical simulation shall be denoted by N. The computer storage (main memory) s necessary to run the code is then s oc N.
(2)
This relation holds for nearly all types of numerical methods, such as finite difference, finite volume and finite element methods, in which no sparse matrix of order N z must be stored. Storage of such matrices requires computers with extremely large memory) 7 The number of operations nat necessary to perform a time step is %,oc N (3) for explicit time discretization. We only consider stationary flows here. Thus the time coordinate is used to iterate from an initial distribution into the steady state, generally denoted as convergence of the method. Convergence acceleration such as local time stepping and multigrid techniques may be used to make a code efficient. In an ideal multigrid procedure the number of cycles n~yet¢to achieve steady state is of the order of the number of points in each coordinate direction. The number of cycles for typical (optimized) simulated codes is ncycl¢ oc N ~,
(4)
with an exponent a which depends on the type of method. The most efficient methods for the numerical simulation of stationary flows are multigrid methods. A one-dimensional multigrid method typically needs a number of cycles proportional to the number of points. This property extended to multidimensional problems, yields ~ = 1/2 in two dimensions and = 1/3 in three dimensions. For flows with vibrational or chemical relaxation, multigrid methods have so far not been developed. Conventional (single grid) methods typically need a number of cycles proportional to the square of the number of points in each coordinale direction, or • = 1 in two dimensions and ~ = 2/3 in three dimensions.
With these assumptions the computational effort (measured in Megaflop) to achieve steady state is e OCncyclenatocN 1+ ~
(5)
Our considerations so far have been based on explicit methods. With increasing difference of the characteristic length scales to be resolved, implicit methods become more efficient than explicit methods. 18 However, for the purpose of our theory this difference in numerical effort is unimportant and in principle the above relation holds for implicit methods as well. We now introduce the performance of a simulation code on a particular computer by pCPU (measured in megaflop per second) and the performance of a data transfer protocol on a particular data network by pt .... (measured in megabytes per second = 8 Mbit s-l). Then the time (in seconds) needed for a simulation is e T cPU = - (6) p cpu and the time (in seconds) to transfer the associated memory contents to a workstation is T t....
S
p trans'
(7)
where pt~ans is the performance of the transfer protocol used by this code on a particular network. These quantities must be measured, the nominal values given by the manufacturer are only upper limits and can deviate from the actual values by orders. It becomes obvious that the relationship between the parameters pCPU, ptrans, TCPU and T trans is the key for the scaling laws sought. If, for example, a certain application (a reference job) is run interactively on a low performance computer/network we will be able to determine the required performance parameters to make it run interactively with the refined grid. For our scaling laws we normalize all parameters with the reference application eref, Srfef, a reference performance Fref ncPv, Pref trans, and T' CPU, Tref trans - An ex*ref pression for the CPU-time ratio is then TcPu = \Sref// TCPU p ceu ref pCeU
(8)
ref
and the expression for the transfer-time ratio S Z trans
Sref
T~rr~ns
ptrans'
(9)
trans P ref
A particular interactive technique, which has been used for a reference job, is only applicable on a high performance computer/network, when the ratio TcPU/Tt.... is kept constant. Therefore TCPU Tt~ns TC/'U = trans" ref Tref
(10)
407
Hypersonic flow IO0
O.I~O.L
~
cpu-
Pref ~m. '0
/
/ OI l'~
/
trans =I Prt;;ns /
P
i "I
/"
IO0 /
I'' $
ers it is not appropriate to install machines which match certain network capabilities. We include this case only for completeness and consistency of our theory. Line c corresponds to our interactive technique. Computer and network performance must increase, when the turnaround time is to be kept constant. This is in fact the case at most CFD sites, where code development is done on workstations or "mini"supercomputers.
.~"
/
~;
iO0/r / J,O06
I0
1 IO0
or
SCALING
LAWS
4.1. Reference job examples
Sref
Fig. 7. Graphical interpretation of our scaling laws. (
4. A P P L I C A T I O N
)
CPU trans PCPU /Pr~f = const.; ( - - - ) p trans /p~f = const. Special cases
are: production run and referencejob are performed: (a) on the same computer; (b) on the same network; and (c) with the same turnaround time. The scaling laws can be visualized in a log-log diagram, Fig. 7, for ~ = 1/2. The CPU-time ratio and the transfer-time ratio are plotted vs the storage ratio S/Sref for constant pCPU/pCrffU and ptranS/ptrre~ns,respectively. The point (1,1) is the reference job. With all parameters of a reference job given, a new s may be chosen, corresponding to a finer grid. The scaling laws are then represented as vertical lines at s/s,or. 3.3. Interpretation of scaling laws Let us assume that we have run a low resolution reference job on a particular computer/network. We may then focus our attention on three principle problems: (a) How long will the computer time T cPU be on the same computer when the grid is refined? What is the required network performance ptra,s to keep TcPU/T t. . . . constant? (b) How large will the transfer time T t.... on the same network be when the grid is refined? What is the required computer performance pCPUt o keep TcPU/T t.... constant? (c) What must the performance of a high performance computer pCVV and a high performance network ptrans be to keep the turnaround time T cPv + Ttrans constant with a refined grid? The answers to these three questions correspond to three solid lines a, b, c in the graph, Fig. 7. Problem (a) corresponds to concepts when code development and applications are done on the same computer, line a. Because both T CPU and T t.... increase with s, turnaround times rise drastically. This situation is common to many CFD sites and leads to inefficient interactive work for large applications. Questions (b) are important, when various computers are available within the same network. It is then most efficient to choose the one which matches pCPU on line b. This situation is an exception in today's CFD, because due to the high costs of supercomput-
We consider three reference job examples which are typical in the numerical simulation of hypersonic flows. For each reference job the required main storage and the CPU performance on a low performance computer are determined using coarse grids, in our case 12 Mflop s 1 on a CONVEX-C2. The grid and the solution must be monitored every 300 s and the ratio of TCPU/Ttrans= 10 is held constant. In our present implementation transfer time and transfer rate are fictitious, because data reduction is done on the same machine as the computation. Only a fraction of the required main storage is simulation results such as nodal values of flow variables and node coordinates. These data must be transferred to a workstation in order to be visualized. This fraction may vary among the methods. It ranges from 50% for some finite difference methods to about 20% for an adaptive finite element scheme) 9 We will assume 25% in the following. (I) Two-dimensional Navier-Stokes simulation of reacting air in equilibrium: typical coarse grid simulations can be done using 40 Mbyte of main memory. An application run needs 80 Mbyte of storage. For the cases (a) (same computer), (b) (same network), and (c) (same turnaround time) we can derive all other parameters using our scaling laws. The result is shown in Table 1 with e = 0.5 assumed. (II) Two-dimensional Navier-Stokes simulation with reacting air in chemical non-equilibrium: the reference job needs 80 Mbyte and the application run 240Mbyte. With ~ = 1 results of our theory are shown in Table 2. (III) Three-dimensional Navier-Stokes simulation with reacting air in chemical non-equilibrium: the reference job needs 160 Mbyte and the application run eight times as much (1290 Mbyte). Here ct = 2/3. Results of our theory are shown in Table 3. A typical coarse grid (here without resolution of the boundary layer near the body surface 2°) is shown in Fig. 8. As expected, application runs cannot be performed on the (12 Mflop s-l) machine where the reference job has been run, in our case a CONVEX-C210, because turnaround times rise to the order of hours (except case I). However, a high performance computer can only be efficiently utilized up to a CPU performance given in the pCPU column of case (b), namely 14, 64
408
E. LAURIEN et aL
Table 1. Scaling results for reference job I: two-dimensional Navier-Stokes simulation of reacting air in equilibrium. S
(Mbyte) Reference job (CONVEX-C210): Application job (a) same computer (b) same network (c) same turnaround time
pcPU
ptrans
(Mftop s -1) (Mbyte s 1)
TcPu
Titans
(s)
(s)
40
12
0.29
300
30
80 80 80
12 15 34
0.23 0.29 0.67
848 670 300
85 67 30
The parameters s (required memory) and pCPU (CPU performance of our code on the reference computer CONVEX-C210) are measured for the reference run. A ratio TCPU/Ttra"s= 10 is assumed. All other parameters follow from our scaling laws.
and 45 Mflop s-t. These values are much lower than the performance of today's supercomputers and simulation codes. The network performance necessary is 0.29, 0.67 and 1.39 Mbyte s-~, which is much higher than, for example, ethernet can deliver. Cases (a) and (b) correspond to hardware configurations not suitable for interactive techniques. Case (c) shows the hardware requirements for interactive work. O f course, C P U and transfer performance must increase. However, all numbers are well within the reach of today's computer and network technology. 4.2. A hardware example: the High Performance
Computer Network of Lower Saxony The State of Lower Saxony in G e r m a n y has established a considerable amount of computer resources within the short time interval of the order of a year. Between February 1990 and N o v e m b e r 1991 an IBM 3090 with six processors and vector facilities and a C O N V E X - C 2 have been installed at the Technical University of Braunschweig and a Siemens/Fujitsu $400/40 at the Regional C o m p u t e r Center ( R R Z N ) in Hannover, 60 km from Braunschweig. These three computers form the basis of the High Performance C o m p u t e r Network of Lower Saxony (Nieders~ichsischer H6chstleistungsrechnerverbund), denoted by N H R V ; Fig. 9 shows a schematic overview.
It was realized early on that for the N H R V computer power alone does not provide a basis for successful engineering work in numerical simulation. F r o m the beginning fast data links and workstation pools were installed as well. Therefore the N H R V is an excellent example for a realization of the proposed concept. In general the N R H V may be a typical example of future supercomputer networks at universities or research institutions in Europe. In the following we will use the N H R V to explain the application of our technique in the framework of future supercomputer networks. The reference cases I, II and III are now used to estimate C P U and transfer times for two computers, the IBM 3090 and the $400/40, with various possible network hardware. Data for the IBM 3090 are given in Table 4. In a parallelized and vectorized version our code performs about 100 Mflop s -t using six processors. F r o m this follows the C P U time T cPU for the three reference cases. Then transfer times can be estimated for various networks. Our current configuration is an U L T R A N E T 1000 system with V M E adapter at the C O N V E X side (16 Mbyte s -t) and a B M C adapter (4.5 Mbyte s -~) at the IBM side. Those values are upper limits, and 1 Mbyte s-t is taken as an average performance (network A). We observe that the net-
Table 2. Scaling results for reference job II: two-dimensional Navie~Stokes simulation of reacting air in chemical non-equilibrium; for further explanation see Table 1 s p CPU p trans TCPU Ttrans (Mbyte) (Mflop s -I) (Mbyte s -1) (s) (s) Reference job (CONVEX-C210): Application job (a) same computer (b) same network (c) same turnaround time
80
12
0.67
300
30
240 240 240
12 64 192
0.13 0.67 2.0
4800 900 300
480 90 30
Table 3. Scaling results for reference job III: three-dimensional Navier-Stokes simulation of ceacting air in chemical non-equilibrium; for further explanation see Table 1 s p ceu p trans TCPU ztrans (Mbyte) (Mflop s-I) (Mbyte s 1) (s) (s) Reference job (CONVEX-C210): Application job (a) same computer (b) same network (c) same turnaround time
160
12
1.33
300
30
1280 1280 1280
12 45 372
0.33 1.33 10.6
9598 2434 300
960 243 30
409
Hypersonic flow Table 4. Scaling results for 1BM 3090
Ttrans
Ttrans
(s)
(s) Network A
(s) Network B
102 576 1116
20 60 320
0.6 1.8 9.6
T cvu
Reference job I Reference job II Reference job III
Network A consists ofa VME adapter at the CONVEX side and a BMC adapter at the IBM side. Network B assumes HIPPI adapters on both sides.
Fig. 8. Three-dimensional grid example for a blunt body.
work performance is still too low to guarantee the 10/l ratio of C P U and transfer time for cases l and II. For case III the C P U performance is too low and the necessary storage cannot be furnished. A future configuration may consist of H I P P I adapters on both C O N V E X and IBM sides with a maximum transfer rate of 100 Mbyte s l Assume one third of this rate can be delivered by the transfer protocol (network B).
INSTITUTE FOR FLUID MECHANICS
_
~
_
~
~
__
Workstations
Workstation-Se, vet
Mini-Supercomputet
TU B R A U N S C H W E I G
~-~
l
I ,B~ I [ °°n"°t's~'°°~°~'m's1a''°' 3090/600J BratlnSC1weg
Data for the $400/40 are given in Table 5. Our code performs about 500 Mflop s ~ which yields the C P U times given in the first column, for cases I, II and III. A 2 M b i t s -1 data link with 0.025 Mbyte s 1 net performance (network C) and the 140 Mbit s ~ (net rate e.g. 15 Mbyte s l) VBN high performance longdistance link of the German Telekom (network D) are considered. For the relatively low performance network C, the present configuration, the transfer times are much larger than the time needed for the numerical simulation, which is unacceptable. Our technique can only be applied with network D, a possible future configuration. Here the transfer times are roughly 10% of the computation time.
5. CONCLUSIONS The concept of distributed computing can be realized by distribution of numerical simulation and interactive graphical postprocessing of result data to high performance computers and graphic workstations. Because of the large amount of data results in computational fluid mechanics (nodal values of flow variables within a computational grid) certain hardware requirements must be satisfied for an efficient computer utilization. In particular the combination of the three parameters, storage, C P U performance and transfer performance must meet the requirements of a certain C F D application. When grid adaption is used those needs can be determined for a fine grid application from the requirements, which are known from a coarse grid reference job. Because the numerical effort for methods used in C F D is approximately known as a function of grid points, those requirements for fine grids can be extrapolated using scaling laws. We have, under appropriate assumptions, derived such scaling laws and applied them to three typical
Supercomputel
Table 5. Scaling results for $400/40
DATA N E T W O R K
B400/40 RRZN Hannover Vektor Computer
Fig. 9. Sketch of the High Performance Computer Network of Lower Saxony.
Reference job 1 Reference job II Reference job III
Trams
Trams
T cpv (s)
(s) Network C
(s) Network D
20 115 223
800 2400 12,800
1.3 4 21.3
Network C consists of a 2 Mbit s 1 maximum performance data link and network D of the 140 Mbit s ~ VBN link of the German Telekom.
410
E. LAURIENet al.
reference jobs from hypersonic aerodynamics. The results show that storage and CPU performance are well within the range of today's high performance computers. Data transfer requirements, however, are beyond the capabilities of most networks existing at today's CFD sites. In particular, it becomes clear that ethernet data links do not provide sufficient transfer performance. The availability of high-speed data links between supercomputers and workstations, however, makes the present concept applicable. Assuming a turnaround time of 330s (10% of it being the data transfer time) a two-dimensional Navier-Stokes simulation with reacting air in equilibrium can be done interactively on a supercomputer/code combination, which performs 34 Mflop s i and a network/transfer-protocol combination, which performs 0.67 Mbytes s -1. For a two- or three-dimensional simulation with non-equilibrium reacting air the requirements are 192 or 372Mflop s - ' and 2 or 10.6 Mbyte s -1, respectively. The High Performance Computer Network of Lower Saxony is taken as a hardware example for the realization of our concept of distributed numerical simulation and postprocessing. The present configuration is a VME adapter at the CONVEX-C210, a BMC adapter at the IBM 3090 and a temporary 2 Mbit s -l data link between CONVEX and IBM (networks A and C). Postprocessing is done on the CONVEX. The transfer times are acceptable for the IBM for the two-dimensional reference cases. Threedimensional non-equilibrium reacting air simulations cannot be done on the IBM 3090. For the $400 the present network is unacceptable for interactive work because of the extremely high transfer times. However, a future configuration with HIPPI adapters and VBN link is well within the range of efficient high performance computing. Further increase of CPU performance of future supercomputers can be expected. Especially for threedimensional hypersonic flow simulations this increase is absolutely necessary. The question arises, however, how the amount of data produced by, say a 100 Mflop machine, can be handled. Furthermore, disk space and disk throughput are limited and fast and efficient data reduction must therefore be made during run-time, Increase in CPU performance of future machines will most probably come from vectorization and parallelization. On such machines graphical data reduction software is inefficient or does not run at all. The concept of distributed production and postprocessing of data will therefore become even more important in future than it is today. We conclude that a major drawback of CFD high performance computing sites today is the lack of fast data links between supercomputers and workstations. Compared with the costs of supercomputers the high-speed links available today are relatively inexpensive (on the order of only a few per cent of the
total costs). However, efficiency of high performance computing could be greatly increased using interactive techniques according to a concept of distributed numerical simulation and postprocessing. Acknowledgement--We thank Prof. H. Oertel, Director of the Institute of Fluid Mechanics, for the conceptual ideas which led to this study.
REFERENCES
1. E. Laurien, M. B6hle, H. Holthoff, J. Wiesbaum and A. Lieseberg, "Finite-element algorithm for chemically reacting hypersonics flows," AIAA Paper No. 92-0745, 1992. 2. J. Wiesbaum, H. Holthoff and E. Laurien, "Experiences with the Taylor43alerkin finite-element method for hypersonic aerothermodynamics," in Flow Simulation with High-Performance Computers, Notes on Numerical Fluid Mechanics (edited by E. H. Hirschel and E. Krause), Vieweg, 1992. 3. R. L6hner, K. Morgan and O. C. Zienkiewicz, "An adaptive finite element procedure for compressible high speed flows," Computer Methods in Applied Mecahnics and Engineering 51, 441-465 (1985). 4. R. L6hner, "An adaptive finite element scheme for transient problems in CFD," Computer Methods in Applied Mechanics and Engineering 61, 323-338 (1987). 5. P. G. Buning and J. L. Steger, "Graphics and flow visualization in computational fluid dynamics," AIAA Paper No. 85-1507, 1985. 6. H. Oertel, Jr, "Nutzung yon Vektorrechnern f/Jr die Numerische Aerodynamik," PraMs der Informationsverarbeitung und Kommunikation 11, 164-170 (1988). 7. K. P. G6rtz and K. Schmidt, "Fast access to supercomputer applications," in Supercomputers and Chemistry (edited by U. Harms), Vol. 2, pp. 35-45, Springer, Heidelberg, 1991. 8. D. S. Deyer, "A dataflow toolkit for visualization," IEEE Computer Graphics and Applications, July, 60-69 (1990). 9. D. E. Edwards, "Scientific visualization: current trends and future directions," AIAA Paper No. 920068, 1992. 10. D. E. Koelle, "S,~NGER II, a hypersonic flight and space transportation system," ICAS-88-1.5.1, 1988. 11. H. Oertel, Jr, "The reentry validation experiment EXPRESS," in Proceedings of the 3rd Aerospace Symposium, Braunschweig (edited by H. Oertel and H. K6rner), Springer, Berlin, 1992. 12. J. D. Anderson, Hypersonic and High Temperature Gasdynamics, McGraw-Hill, New York, 1989. 13. A. S. Arcilla, J. H/iuser, P. R. Eiseman and J. F. Thompson, Numerical Grid Generation in Computational Fluid Dynamics and Related Fields, NorthHolland, Amsterdam, 1991. 14. E. Laurien, "Application of finite-element methods to the computation of external flows," Space-Course, Aachen, 51.1-51.28, 1991. 15. H. G. Pagendarm, E. Laurien and H. Sobieczky,"Interactive geometry definition and grid generation for applied aerodynamics," AIAA Paper No. 88-2515, 1988. 16. C. Dener and Ch. Hirsch, "IGG--an interactive 3D surface modelling and grid generation system," AIAA Paper No. 92-0073, 1992. 17. W. Sch6nauer, Scientific Computing on Vector Computers, North-Holland, Amsterdam, 1987.
Hypersonic flow 18. D. Hfinel, "Computational fluid dynamics," VKI Lecture Series 1989-04. 19. R. A. Shapiro, "Adaptive finite element solution algorithm for the Euler equations," Notes on Numerical Fluid Mechanics 32, Vieweg, Braunschweig, 1991.
411
20. E. Laurien, M. B6hle, H. Holthoff and J. Wiesbaum, "Reentry aerothermodynamic simulations using the Taylor-Galerkin finite-element method," Proceedings of the 13th IMACS World Congress on Computation and Applied Mathematics, Dublin, Ireland, 1991.