Atmospheric Environment Vol. 27A, No. 4, pp. 625~531, 1993.
0004-6981/93 $6.00+0.00 © 1993 Pergamon Pr~s Ltd
Printed in Great Britain.
TECHNICAL
NOTE
ON THE PARALLELIZATION OF A COMPREHENSIVE AIR QUALITY MODEL
RICK D. SAYLOR* and RYAN I.
REGIONAL-SCALE
FERNANDES~"
Center for Applied Energy Research, University of Kentucky, Lexington, KY 40511-8433, U.S.A. (First received 28 June 1992 and in final form 31 August 1992)
Abstract--The complexity and nonlinearity of atmospheric flow dynamics, coupled with the nonlinear and multiphase character of atmospheric chemistry, make air quality modeling a particularly challenging problem. During the last two decades, comprehensive regional-scale air quality models have been developed which provide relatively thorough descriptions of the physics and chemistry of the atmosphere; however, a common weakness of these models is the coarse horizontal spatial resolution that must be employed to allow for reasonable simulation turnaround times. Typical horizontal grid spacings of 20-120 km do not provide adequate resolution for many important atmospheric phenomena such as clouds and smaller-scale turbulence. One way to achieve finer grid resolutions and maintain reasonable simulation turnaround times is to use parallelization. This work reports on the parallelization of the STEM-II regional-scale acid deposition and photochemical oxidant model on an IBM 3090-600J mainframe using IBM Parallel FORTRAN. A parallel speed-up of 4.65 (using five processors) has been achieved by parallelizing the code's transport and atmospheric chemistry calculations employing a locally one-dimensional time-splitting algorithm. A parallelized fraction of 0.98 has been estimated from multiple processor timings, which according to Amdahl's law would allow us to approach an ultimate speed-up of near 50 on similarly configured massively parallel machines. Key word index: Atmospheric chemistry, convection-diffusion equation, locally one-dimensional, timesplitting, speed-up.
l. INTRODUCTION Recently, air quality modeling was identified as a Grand Challenge problem by Levin (1989). The complexity and nonlinearity of atmospheric flow dynamics, coupled with the nonlinear and multiphase character of atmospheric chemistry, make air quality modeling a computationally challenging problem. Several regional-scale air quality models have been developed over the past decade (Chang, 1985; Venkatram et al., 1988; Lamb, 1983; Carmichael et al,, 1986, 1991) that attempt to be comprehensive in the sense that all relevant and significant physical and chemical atmospheric processes are included in the model formulation. A typical horizontal grid resolution in a regional-scale air quality simulation performed by one of these models is on the order of 20-120 kin. This rather coarse resolution is largely determined by restrictions imposed by currently available computing resources. Many atmospheric phenomena such as clouds and smallscale turbulence occur on spatial scales much less than that resolved by current models. These so-called "sub-grid scale" effects are a major source of uncertainty in present-day air quality modeling. To include sub-grid scale effects more rigorously would require the use of much finer horizontal grid resolutions, possibly on the order of 1-2 km; however, on present-day serial processing computers, comprehensive air quality simulations at this level of grid resolution would require impractically long turnaround times, exceeding a one-to-one correspondence between turnaround time and *rPresent address: Atmospheric Sciences Department, Battelle Pacific Northwest Laboratories, Richland, WA 99352, U.S.A. t Present address: Center for Computational Sciences, University of Kentucky, Lexington, KY 40506, U.S.A. 625
simulation time. One way to achieve finer regional-scale grid resolutions and still maintain reasonable turnaround times is to employ parallelization. This work describes the parallelization of one of these comprehensive air quality models, the STEM-II acid deposition and photochemical oxidant model of Carmichael et aL (1991), on an IBM 3090-600J using IBM Parallel FORTRAN (Bogdonoff, 1989). Previous work by Carmichael et al. (1989) described the parallelization of the STEM-II code on a single-instruction multiple data (SIMD) computer. In contrast, the IBM 3090-600J can be classified as a multiple-instruction multiple data (MIMD) computer with shared virtual memory. From the user's perspective, the main difference between the SIMD and MIMD computer paradigms lies in ease of portability and programming. A virtual shared memory MIMD machine requires little or no explicit programmer intervention to route message-passing pathways between discrete processors, but instead relies on a common memory area that is shared by all processors. For an existing computer code that has inherent parallelization possibilities, such as a typical atmospheric transport and chemistry code, porting to a computer with a virtual shared memory MIMD architecture requires only a modest investment in programming effort. According to Bell (1992), a virtual shared memory multiprocessor "is the most likely blueprint for future massively parallel computers." The purpose of the present work is to report on the parallelization of STEM-II on a MIMD machine and thereby illustrate the utility of muitiprocessor computing in air pollution and atmospheric science. The present work was performed at the University of Kentucky Computing Center (UKCC), where the computing environment is based on IBM 3090-600J mainframe. The IBM 3090-600J has six vector processors and is capable of a peak performance of 827.6 MFlops (CNSF, 1991). Under the VM/XA operating environment, each user can access up to 1
626
Technical Note
GByte of virtual memory in either interactive or batch processing modes. Both ANSI-standard FORTRAN-77 and IBM Parallel FORTRAN extensions are supported for user applications.
2. PARALLELIZATION APPROACH STEM-II, an Eulerian numerical model which accounts for the transport, chemical conversion and deposition of atmospheric pollutants, is a second-generation model which evolved from the gas-phase model STEM-I (Carmichael and Peters, 1984a, b; Dronamraju et al., 1988). In its current form, STEM-II is three-dimensional and time-dependent and includes all of the known significant atmospheric processes that describe trace gas cycles in both cloudy and cloud-free regional-scale environments. Details of the STEM-II air quality model can be obtained from Carmichael et al. (1986, 1991). 2.1. Numerics of the STEM-II time integration The STEM-II governing equations are integrated using the concept of operator-splitting employing the locally onedimensional finite element method (LOD-FEM) of Yanenko (1971). In this procedure, the full three-dimensional conservation equations are split into several lower-dimensional equations which are solved sequentially. For example, the governing single species gas-phase equation in expanded three-dimensional form in rectangular coordinates is given by
as described in Carmichael et al. (1986). In a typical STEM-II simulation, approximately 92-93% of the total simulation CPU time is spent integrating Equation (5). 2.2. Schematic structure of the STEM-If code A simplified schematic of the structure of the STEM-II code is presented in Fig. 1. After initialization of various parameters, the code enters a time loop which advances the integration of the governing equations from some given initial condition through the entire simulation period. This time loop consists of five major code sections. The first section reads any time-dependent data into appropriate arrays. These data typically consist of time-dependent meteorological data such as wind, temperature, pressure and humidity fields. The code then sequentially calls three subroutines which perform the integration of the model's governing equations over one transport time step. Subroutine HORX integrates the x-component portion of the transport equation of the form given by Equation (2). Likewise, subroutine HORY integrates the y-component equation of the form given by Equation (3). The call to subroutine VERTLQ accomplishes the integration of the vertical transport and chemistry portions of the governing equations of the form given in Equations (4) and (5), followed by the output of intermediate simulation results. As shown in Fig. 2, subroutine VERTLQ is composed of a single loop which calculates the z-component integration of the transport, Equation (4), followed by the integration of the chemistry and interphase transport, Equation (5). As illustrated in Fig. 3, the loop in subroutine VERTLQ operates
aC a(uC) d(vC) d(wC) --+ + .+-- t3t t3x t~y ~3z
f
ac'~ ~f
ac'~ ~f
ac'~
After application of the time-splitting operator, the equivalent system
~C* ~ ~ f OC*'~ c3t +-~x(UC*)=~xtK"~-x ) OC**
~
~[
~C** \
{~C:$II ~ ~ /" 63C,** "~ dt "l-~z(WC***)=77tK=~-z ) +S
@C.+ at
=R +G
(2)
(3/
Parameter
Initialization Time Loop DO 10 I=I,IT
D
I (4)
I I CALLHORX
(5)
is solved over each transport time step. The split equations are solved sequentially, beginning with initial concentration values (7" at time step n. Equation (2) is first solved for C* values given C* as initial conditions. Equation (3) is then integrated to obtain C** values using C* values as initial conditions. Similarly, Equation (4) is solved for C*** employing C** values as initial conditions. Finally, Equation (5), the chemical and interphase transport portion of the general equation, is solved for C"÷ 1 concentrations using C*** values to begin the integration. Each of the transport equations (2)-(4) is integrated using a Crank-Nicolson Petrov-Galerkin finite element method using linear basis functions and quadratic asymmetric upwind weighting functions. This solution procedure is termed the locally one-dimensional finite element method (LOD-FEM) as suggested by Yanenko (1971) and utilized by Carmichael et al. (1986). The numerical techniques used to integrate the chemistry and interphase mass transport, Equation (5), are based on the semi-implicit Euler method of Preussner and Brand (1981) and a pseudo-analytic method
I
Readtimedependentdata
I CALL HORY
I
10 CONTINUE )
t
i
I
Printintermediateoutput
.
I
@
Fig. 1. Schematic flow chart of the STEM-II air quality numerical model.
Technical Note
DO 10 N = 1,1XY
CALL VERTCL
627
sequentially on each discretized vertical column within the modeled domain. The horizontal transport routines, HORX and HORY, are constructed in a similar manner with integrations proceeding on east-west and north-south rows, respectively. Timing tests of the STEM-II code have shown that 93-94% of the total simulation CPU time is consumed within subroutine VERTLQ, with the majority of that time spent integrating Equation (5), the chemistry and interphase transport. The next largest fraction of totat CPU time, 4-5%, is spent in subroutines HORX and HORY, performing horizontal transport calculations. The remaining CPU time (1-3%) of a typical STEM-II simulation is spent in input/output operations and miscellaneous computations. The present paraUelization effort has focussed on the speed-up of the largest and most easily parallelized fractions of the code, namely, the calculations in subroutines HORX, HORY and VERTLQ. 2.3. Implementation of I B M Parallel FOR T R A N
CALL RXN
10 CONTINUE
Fig. 2. Schematic flow chart of SUBROUTINE VERTLQ. (a)
The IBM 3090-600J consists of six vector processors with a shared virtual memory architecture. IBM Parallel FORTRAN (Bogdonoff, 1989) allows the construction of multiple tasks within a single FORTRAN application, effectively allowing independent portions of the code to be executed simultaneously on a user defined number of processors. The implementation of IBM Parallel FORTRAN consists of inserting a set of FORTRAN-like statements into the serial code that the user wishes to parallelize. The application of these extended FORTRAN statements to the STEM-II code is illustrated schematically in Figs 4-6. Figure 4 is a schematic code version of the parallelized STEM-II main program. The number of virtual processors, NPS, is specified at execution time by the user and is passed to the code via a call to the complier subroutine NPROCS0. Initially, in addition to the sequential root task, the DO 5 loop creates one parallel task on each of the NPS virtual (bl
Fig. 3. Illustration of the operation of SUBROUTINE VERTLQ. During one transport time step gas- and aqueous-phase transport, scavenging and chemistry are calculated as follows. The regional model's domain is discretized (a) from ground level to the free troposphere. The STEM-II governing equations are time split in a manner which allows consideration of vertical transport and chemistry on a column-bycolumn basis (b). Vertical pollutant profiles are fed into the vertical transport and chemistry modules (c) which calculate modified pollutant vertical profiles (d) which are then passed back to the horizontal transport modules as initial conditions for the next time step. (Adapted from Berkowitz et al., 1989.) AE(A) 27:4-K
628 C C C
Technical Note .... .... ....
C
°
°
•
.
C
•
°
•
•
C C C
Schematic Main of the ST~-II
.... .... ,
o
•
Progr~ of the Acid Deposition Oxidant Model
CoRRnon B l o c k D e f i n i t i o n s , Par~eter Definitions
Parallelized Version and Photochemical
Array
Dimensioning,
•
C C C C C
•
o
•
•
•
•
o
•
•
•
°
•
•
•
•
•
•
•
•
°
•
•
•
•
•
•
•
o
•
•
•
•
°
•
•
•
•
°
•
•
•
°
NPS = NPROCS() WRITE(6,*) 'Number
of processors
used
and
C C C C C C C C C C C C C
=',NPS
C C
•
C C
....
•
•
•
Originating DO
5 I=I,NPS
ORIGINATE ANY TASK IDTASK IDSAV(I) = IDTASK S C H E D U L E A N Y T A S K IDTASK, C A L L I N G CONTINUE
5 C C
•
C
.... ....
C C C C C
tasks
•
.... ....
•
C C C
•
Read
time-independent
Time DO
integration
i0
C
data
C C C C C
loop
I=I,IT
C
C
C
....
C C C C
.... ....
Read
time-dependent
Integrate CALL CALL
C C C
....
10 C C C C C
NOWORK
Write
the
C
data
governing
equations
C C C C
HORIZ VERTLQ
intermediate
C C C
output
CONTINUE
.... ....
Terminate
all
parallel
activity
C C C C C
WAIT FOR ALL TASKS D O 20 I = I , N P S IDTASK=IDSAV(I)
20
TERMINATE CONTINUE
TASK
IDTASK
C
C STOP
END Fig. 4. Schematic code of the main program of the STEM-II air quality model illustrating the implementation of IBM Parallel FORTRAN. processors. This is accomplished via the ORIGINATE ANY TASK statement which creates a Parallel FORTRAN task and returns the task identifier and saves it in the array IDSAV (I). The SCHEDULE ANY TASK statement then assigns a subroutine (in this case subroutine NOWORK) to
execute a specific task. In this task origination operation, the dummy subroutine NOWORK is called to prime the NPS tasks and ready them for future useful work. At the end of execution of the STEM-II code, each of the NPS tasks must be deleted before normal program terrain-
Technical Note
629
S c h e m a t i c C o d e of S U B R O U T I N E V E R T L Q in P a r a l l e l i z e d V e r s i o n of S T E M - I I
C C C C C C C C C C C C
Common Block Definitions, Parameter Definitions
.
Parallel
I0 C C C C
20 C
.
.
.
.
Array Dimensioning,
•
.
.
.
.
.
.
.
and
.
F O R T R A N loop s t r u c t u r e
DO i0 N = I , I X Y WAIT FOR ANY TASK IDTASK ID(N) =N S C H E D U L E T A S K IDTASK, * SHARING(BLCK2,BLCK3,BLCK4,BLCK5), * CALLING VTCRXN(ID(N)) CONTINUE
Schedule tasks
in p r e p a r a t i o n
W A I T FOR A L L T A S K S DO 20 I = I , N P S IDTASK=IDSAV(I) S C H E D U L E A N Y T A S K IDTASK, CONTINUE
for next V E R T L Q call
C C C C
CALLING NOWORK
RETURN END Fig. 5. Schematic code of S U B R O U T I N E V E R T L Q illustrating the implementations of IBM Parallel F O R T R A N .
ation can occur. This termination of parallel activity is illustrated in Fig. 4 in the D O 20 loop construct. The W A I T F O R ALL TASKS statement stops program execution until all previously defined tasks are free. The T E R M I N A T E T A S K statement then deletes each specified parallel task, leaving only the sequential root task active. Figure 5 presents a schematic code version of the parallelized subroutine VERTLQ. As illustrated in Fig. 2, the original D O 10 loop only contained calls to the subroutines V E R T C L and RXN. In the parallelized code, a new subroutine V T C R X N was created which simply makes the appropriate calls to V E R T C L and RXN. In order to parallelize this loop several Parallel F O R T R A N statements must be inserted directly into the loop's structure. The W A I T F O R ANY T A S K statement stops program execution until any previously defined task is free. The S C H E D U L E TASK statement then assigns the subroutine V T C R X N (N) to execute on the newly available processor. The S H A R I N G portion of the S C H E D U L E statement allows the stated C O M M O N blocks (in this case BLCK2, BLCK3, B L C K 4 and BLCKS) to be shared a m o n g all NPS tasks and the root task. This explicit definition of shared memory blocks is necessary since Parallel F O R T R A N assumes all variables to be private to each task unless they are explicitly shared or passed as arguments in the C A L L I N G subroutine portion of the statement. At the end of subroutine VERTLQ, a W A I T F O R ALL TASKS statement is included to resynchronize all NPS tasks. These tasks are then again primed for further work by the S C H E D U L E A N Y T A S K IDTASK, C A L L I N G N O W O R K statement. This priming insures that during the next pass through the time loop, all N P S tasks will be ready to perform useful work when the next parallel section is encountered. As shown in Fig. 6, subroutines H O R X and H O R Y are parallelized in a similar manner.
The most difficult aspect of using Parallel FORTRAN to parallelize an existing FORTRAN code lies in determining which variables should be shared by all tasks and which should remain private to each task. Parallel FORTRAN passes the responsibility of insuring computational independence between tasks to the programmer. If computational independence between tasks has not been fully accomplished, then Parallel FORTRAN will not recognize the situation and the code will run without any apparent errors. In practice, without complete independence between parallel tasks, each separate program execution will normally result in different computational results due to the erroneous sharing of memory locations that should have remained private to each task.
3. COMPUTATIONAL RESULTS To test the performance of the STEM-II code employing Parallel FORTRAN, several simulations were performed for a typical air quality simulation condition. The domain of the test region and the prevailing meteorological conditions during the simulation period are described in detail in Saylor et al. (1991), The discretized model domain consisted of 2268 gridpoints encompassing the lower 4 km of the troposphere and covering a sub-regional area in the lower Ohio River Valley. The model was run for an actual simulation period of 24 h. Table 1 presents the results of these test runs employing various numbers of processors. All of these test runs were performed on the UKCC IBM 3090-600J in interactive processing mode during periods of off-peak processor loading. For the IBM 3090-600J running under the VM/XA operating system at the UKCC, up to five so-called "virtual"
630 C C C C C C C C C C C
Technical Note .... .... .... .... .... .... ....
Schematic Code of SUBROUTINE HORIZ Calls SUBROUTINES HORX and HORY in parallelized STEM-II version Common Block Definitions, Parameter Definitions
Array
Dimensioning,
C C C C C C C C C C C
and
. . . . . . . . . . . . . . .... ....
Parallel
FORTRAN
loop
structure
for
x-direction
transport
DO
10 N = I , I Y Z WAIT FOR ANY TASK IDTASK I D (N) = N SCHEDULE TASK IDTASK, * SHARING(BLCK2,BLCK3,BLCK4,BLCK5), * CALLING HORX(ID(N)) CONTINUE
i0 C C
•
•
C C
....
•
Schedule
tasks
in preparation
WAIT FOR ALL TASKS DO 20 I=I,NPS I D T A S K = I D S A V (I ) SCHEDULE ANY TASK CONTINUE
20 C C
o
•
C C
....
•
C C C C
•
for
IDTASK,
next
CALLING
parallel
activity
NOWORK C C C C
°
Parallel
FORTRAN
loop
structure
for y-direction
transport
DO
30 N=I,IXZ WAIT FOR ANY TASK IDTASK ID(N) =N SCHEDULE TASK IDTASK, * SHARING(BLCK2,BLCK3,BLCK4,BLCK5), * CALLING H O R Y (I D (N)) CONTINUE
30 C C
o
C C
....
o
o
°
Schedule
in preparation
WAIT FOR ALL TASKS DO 40 I=I,NPS I D T A S K = I D S A V (I ) SCHEDULE ANY TASK CONTINUE
40 C C
tasks
IDTASK,
for
next
CALLING
parallel
activity
NOWORK C
•
o
•
o
•
•
°
•
•
o
o
•
o
o
o
o
•
o
°
•
•
°
°
o
o
•
°
°
°
o
C
°
°
°
•
o C
C RETURN END
Fig. 6. Schematic code of parallelized calls to SUBROUTINES HORX and HORY illustrating the implementation of IBM Parallel FORTRAN. processors can be created in addition to the main "root" processor. This configuration allows users to test and perform timing runs for parallelized codes without actually having total access to all six physical processors on the mainframe. The timings generated from runs with these virtual processors can be expressed as "virtual wallclock" timings. This can be defined as the elapsed wallclock time that would be measured if the user had sole access to the requested number of processors. Table 1 presents virtual wallclock timings along with associated parallel speed-ups for various numbers of processors. The virtual wallclock time for the 24-h air quality simulation decreased from 3494 s for serial execution to only 752 s
for parallel execution employing five processors, resulting in a speed-up of 4.65. According to Amdahl's law (Amdahl, 1967), the speed-up, s, that can be achieved on a parallel processing machine with N processors is given by s
1 (1 -p)+p/N
,
(6)
where p is the fractional amount of time spent on portions of the program that have been parallelized. Using the speed-ups of Table 1 and Equation (6), the parallelized fraction, p, can be back-calculated for each processor configuration. As shown in Table 1, a fairly consistent result of approximately
Technical Note
631
Table 1. Parallel performance of the STEM-II air quality model on the IBM 3090-600J Number of processors
Virtual wallclock time (s) Speed-up Estimated parallelized fraction
1
2
3
4
5
3494
1779
1200
920
752
--
1.96
2.91
3.80
4.65
0.982
0.985
0.982
0.981
0.98 is obtained in this way. As stated in Section 2.2, previous timing studies have shown that between 97 and 99% of the total CPU time of a STEM-II simulation is taken up by calculations in the subroutines HORX, HORY and VERTLQ. A parallelized fraction of 0.98 thus agrees very well with these independent serial timing studies. According to Amdahrs law, as the number of processors, N, approaches infinity, the best speed-up achievable for a code with p = 0.98 is s = 50. Consequently, for a massively parallel machine configured similarly to the IBM 3090, a speed-up approaching 50 may be achievable. 4. CONCLUSION The STEM-II acid deposition and photochemical oxidant model has been parallelized for use on IBM 3090-600J mainframe computers. By concentrating the parallelization effort on the transport and chemistry portions of the code, a parallel speed-up of 4.65 using five processors has been achieved. It has been demonstrated that the parallelized version of the STEM-II code has a parallelized fraction of 0.98, which according to Amdahrs law would result in an ideal speed-up of 50. Future work on the parallelization of STEMII will focus on the migration of the code to massively parallel platforms, with the goal of approaching a turnaround time speed-up near 50. Speed-ups of this magnitude will allow much finer resolution of the dynamics and chemistry of future air quality simulations, hopefully leading to a deeper understanding of the effects of the human activities on atmospheric air quality. The techniques employed in parallelizing the STEM-II code could be easily extended to other air quality models that utilize the locally one-dimensional time-splitting algorithm.
Acknowledgements--The authors are grateful to Dr Karin R. Bennett and Ms Anne Leigh for extensive and very helpful discussions on the use of IBM Parallel FORTRAN. This work was supported by the Center for Computational Sciences and the Center for Applied Energy Research at the University of Kentucky and by the National Aeronautics and Space Administration through NASA grant NAGW-2292. REFERENCES
Amdahl G. M. (1967) Validity of the single processor approach to achieving large scale computer capabilities. In Proc. AFIPS Sprin9 Joint Computer Conf. 30, pp. 483-485. AFIPS Press, Reston, VA. Bell G. (1992) Ultracomputers: a teraflop before its time. Science 256, 64. Berkowitz C. M., Easter R. C. and Scott B. C. (1989) Theory and results from a quasi-steady-state precipitation scavenging model. Atmospheric Environment 23, 1555-1571.
Bogdonoff P. (1989) Parallel FORTRAN: an overview and description of how to use Parallel FORTRAN at the CNSF. Cornell National Supercomputing Facility, Ithaca, NY. Carmichael G. R. and Peters L. K. (1984a) An Eulerian transport/transformation/ removal model for SO2 and sulfate - - I. Model development. Atmospheric Environment 18, 937-952. Carmichael G. R. and Peters L. K. (1984b) An Eulerian transport/transformation/ removal model for SO2 and sulfate - - II. Model calculation of SO x transport in the eastern United States. Atmospheric Environment 18, 953-967. Carmichael G. R., Peters L. K. and Kitada T. (1986) A second generation model for regional-scale transport/chemistry/deposition. Atmospheric Environment 20, 173-188. Carmichael G. R., Cohen D. M., Cho S.-Y. and Oguztuzun M. H. (1989) Coupled transport/chemistry calculations on the massively parallel processor computer. Comput Chem. Engn O 13, 1065-1073. Carmichael G. R., Peters L. K. and Saylor R. D. (1991) The STEM-II regional-scale acid deposition and photochemical oxidant model--I. An overview of model development and applications. Atmospheric Environment 25A, 20772090. Chang J. (1985) NCAR Eulerian Regional Acid Deposition Model. NCAR/TN-256 + STR National Center for Atmospheric Research. CNSF (Cornell National Supercomputer Facility) (1991) The Cornell Theory Center's Cornell National Supercomputer Facility-System Highlights. Dronamraju M., Peters L. K., Carmichael G. R., Kasibhatla P. S. and Cho S.-Y. (1988) An Eulerian transport/transformation/removal model for SO 2 and sulfate - - III. Comparison with the July 1974 SURE data base. Atmospheric Environment 22, 2003-2011. Lamb R. J. (1983) a regional scale (1000kin) model of photochemical air pollution - - 1. Theoretical formulation. EPA-600/3-83-035, U.S. Environmental Protection Agency, Research Triangle Park, NC. Levin E. (1989) Grand challenges to computational science. Comm. ACM 32, 1456-1457. Preussner P. R. and Brand K. P. (1981) Application of a semiimplicit Euler method to mass action kinetics. Chem. en#ng. Sci. 10, 1633-1641. Saylor R. D., Peters L. K. and Mathur R. (1991) The STEMII Regional-scale acid deposition and photochemical oxidant model--IIl. A study of mesoscale acid deposition in the lower Ohio River Valley. Atmospheric Environment 25A, 2873-2894. Venkatram A., Karamchandani P. K. and Misra P. K. (1988) Testing a comprehensive acid deposition model Atmospheric Environment 22, 737-747. Yanenko N. N. (1971) The Method of Fractional Steps. Springer, New York.