Parallel weather modeling with the advanced regional prediction system

Parallel weather modeling with the advanced regional prediction system

cm __ _k!iB PARALLEL COMPUTING Parallel Computing ELSEVlER 23 (1997) 2243-2256 Parallel weather modeling with the advanced regional prediction sy...

946KB Sizes 0 Downloads 77 Views

cm __

_k!iB

PARALLEL COMPUTING Parallel Computing

ELSEVlER

23 (1997) 2243-2256

Parallel weather modeling with the advanced regional prediction system A. Sathye a,*,2, M. Xue b,3,G. Bassett bx4,K. Droegemeier It N0AA/Forec~~~r h Center,for

Systems Laboratory.

Anulysis

und Prediction

R/EF/FSS, c$Storms.

325 Broudway. University

Boulder.

of Oklohomu.

b,5

CO 80303, USA

Normun.

OK. USA

I8 July 1997

Received 6 May 1997; revised

Abstract The Center for Analysis and Prediction of Storms has developed a regional weather prediction model called the advanced regional prediction system. The massively parallel implementation of the model has been tested in an operational setting each spring, since the spring of 1995. The model has been quite successful in predicting individual storms and storm clusters during these real-time operations which were made possible by the use of massively parallel machines. Q 1997 Elsevier Science B.V. Kqwordst Regional mesh retinement

weather

modeling;

Parallel

computing;

Severe storms:

Software

portability;

Adapttve

1. Introduction The Center for Analysis and Prediction of Storms (CAPS) at the University of Oklahoma (OU), is a National Science Foundation (NSF) Science and Technology Center, which was one of the first 1 I centers to be established in 1988. The center’s mission is to develop techniques for practical prediction of weather phenomena ranging

* Corresponding author. E-mail: [email protected]. I Jointly affiliated with the Cooperative Institute for Research University, Ft. Collins, Colorado, USA. * This work was done when the author was at CAPS. 3 E-mail: [email protected]. 4 E-mail: [email protected]. 5 E-mail: [email protected].

in the Atmosphere

0167.8191/97/$17,00 0 1997 Elsevier Science B.V. All rights reserved PII SO167-8191(97)00112-9

(CIRA), Colorado

State

2244

A. Sathye et al. / Parallel Computing 23 (I 997) 2243-2256

from individual storms to storm complexes and demonstrate the practicability of storm-scale weather prediction for operational, commercial and research applications. The center’s ultimate goal is to develop and test a fully functioning, multi-season storm-scale numerical weather prediction (NWP) system around the turn of the century L&101. Operational NWP is now conducted almost exclusively at the national or multinational level, wherein a centralized facility collects and processes all relevant observational data, runs a suite of numerical models ranging in scale from regional to global and generates and disseminates forecasts and related products, often targeted to specific needs such as aviation. Because the associated models used have spatial resolutions on the order of several tens to hundreds of kilometers, they are usually unable to represent explicitly individual thunderstorms or storm complexes. The desire to predict such events with a lead time of a few hours is obvious, as thunderstorms and their related weather are principally responsible for flash floods, damaging surface winds, hail, tornadoes and low-level turbulence that poses a threat to aircraft. Three-dimensional modeling (simulations) of atmospheric convection started in the mid-1970s and these simulation studies significantly advanced our understanding of convective storm dynamics as well as other small-scale meteorological phenomena. However, the studies on the storm scale remained in the simulation mode for much of the last two decades. The ability of CAPS to accomplish its mission depends critically upon the effective use of high performance computing and communications systems. Because thunderstorms have relatively short lifetimes (a few hours) compared to larger-scale weather systems (a few days), their associated numerical forecasts must be generated and disseminated to the public very quickly (5-10 times faster than the weather evolves) to be of practical value. Considering the pre-forecast data assimilation procedure, which involves running the prediction model before the actual forecast to arrive at an appropriate set of initial conditions, along with the fact that not one but several forecasts will likely be made during each prediction cycle in order to evaluate forecast variability, it becomes clear that computers having sustained teraflop performance, gigabyte central memories and parallel input/output will be needed if operational storm-scale prediction is to be successful. 2. The advanced

regional prediction

system

Central towards achieving the primary goal of CAPS is a three-dimensional, nonhydrostatic model system known as the advanced regional prediction system (ARPS). This system has been under development for the past several years and its various earlier versions were described in [5,14,18,21]. The model has been used by many groups over the world in applications including idealized studies and numerical simulations of density currents, squall lines, thunderstorms, tornadogenesis, mountain flows, land-sea breeze, drainage flow, fog formation, heavy rainfall events, drylines and frontogeneses. It’s also being used as a classroom tool and for research in parallel computing. Furthermore, foreign governments are evaluating the model as their potential mesoscale operational forecasting model.

A. Sathye et al. / Parallel Computing 23 (1997) 2243-2256

2245

CAPS and the University of Oklahoma have entered into a three-year collaborative research and development partnership with AMR Corp./American Airlines to adapt small-scale numerical weather prediction technology to commercial airline operations. This research project affords CAPS a unique opportunity to demonstrate the feasibility of storm-scale numerical weather prediction within the context of forecasting requirements of the private sector and in particular the aviation industry. At the same time, AA is presented an opportunity to take part in developing a capability that could save its operation revenue presently lost as a consequence of weather. 2.1. Design philosophy Before the development of the model, the ARPS was required to meet a number of criteria: - It had to accommodate, through various assimilation strategies, new data of higher temporal and spatial density than has been traditionally available, with the Doppler radar data being a special example. - The model must also serve as an effective tool for studying the dynamics and predictability of storm-scale weather in both idealized and more sophisticated settings. - The model should also be able to handle atmospheric phenomena on the larger regional scales as well as the smaller micro-scales as they are known to have profoundly important interactions with the storm-scale phenomena. - The model should have a flexible and general dynamic framework. Part of the solution to the scale interactions can be achieved through the use of interactive grid nesting and adaptive grid refinement. - The model should be extensively documented to facilitate ease of learning and modification. - The model should take full advantage of the power of massively parallel processors so that operational prediction can be carried out in a timely manner. At the same time, the model should have a maximum portability across computing platforms so that a minimum effort is required to implement the system in perhaps diverse regional operational centers or even the National Weather Service (NWS) forecasting offices. In short, we intended to develop a model system which can be used effectively for both basic scientijk research and operational numerical weather prediction, on scales ranging from regional to micro-scales. It is according to these criteria that ARPS was developed. Currently the ARPS system contains about two hundred thousand lines of computer code excluding its adjoint. The code has been developed under a stringent set of rules and conventions. For example, no implicit variable typing is used, all array data are passed through subroutine argument lists with the arrays and variables used as input listed first, followed by the output, and then work arrays. All variables and arrays have their definitions clearly stated inside the subroutines. Variables and subroutines are named according to their meaning and function in a way that facilitates global searching. Capital letters are used for control statements. A fixed number of indentations are used for control structures. Readability, maintainability and portability of the code have been

A. Suthye et al. /Parallel

2246

Computing 23 (1997) 2243-2256

high priorities during the model development. These virtues, together with extensive internal and external documentation, are perhaps what set this code apart from most other application codes. The highly modular design and the clearly defined module interfaces greatly ease the process of code modification and the addition of new packages. The clean and uniform coding style throughout the model as well as the external documentation has proved to be extremely beneficial to both novice and experienced users alike. The former also makes the porting of the code to a variety of MPP platforms straightforward [4,7,13]. 2.2. Relevant model features The governing equations of the atmospheric model component of the ARPS [15] include momentum, heat (potential temperature), mass (pressure), water constituents and the equation of state. The ARPS solves prognostic equations for u, u, w, O’, p’ and q,,,, which are, respectively, the X, y and z components of the Cartesian velocity, the perturbation potential temperature and perturbation pressure and the six categories of water constituents (water vapor, cloud water, rainwater, cloud ice, snow and hail). The equation of state for an atmosphere containing water constituents is given by

P=&(l-*)(l

+qv+tqliquid+icewarer)

where T is the air temperature, R, the gas constant for dry air and E = RJR, = 0.622 is the ratio of the gas constant for dry air and water vapor. q,iquid+icewnrerrepresents the total liquid and ice water content in the air. The conservation equations are, respectively,

a( P * u>

a( p’ - (Y Div)

= - [ADV(~)]

-

+(iv~-zfq+& dX

dt

a( F* dt

v>=

a( p’ - (Y Div) - [ADV(~)]

- p.fu + D,. ,

8)

a( P * w>=

-_[ADv(~)]

at

_

‘(“ir D1v) +FB+jfi+D,v,

a( ji * 0’) = -[ADv(B’)]

-PW~

+D~+s~,

dt dP’ -=at

a(P;tq”)

a( P’) u-+v~+w~]+ppw-~~.~[~+~+~]. [

8X

= -[ADV(q,)]

+ a”;;ql)

+De+,Se,

A. Suthye et al. / Parallel

Computing

23

f 1997)

2243-2256

2241

where

and ADV()

= pux

a0

80

+ puay

+ pwZ

do

These equations are represented in a curvilinear coordinate system projected onto a plane tangent to or intercepting the earth’s surface. This coordinate system is orthogonal in the horizontal with a coordinate surface at the lower boundary defined to follow the terrain. In addition to the above equations, ARPS also solves prognostic equations for 5 variables in a coupled two-layer soil model. Microphysical processes are represented by the Kessler warm rain parameterization or a scheme that includes three ice phases. Sub-grid scale processes are parameterized using the Smagorinsky/Lilly first-order closure scheme; a 1.5-order sub-grid scale turbulent kinetic energy-based parameterization, or the German0 dynamic closure scheme. The model also include a careful treatment of the surface and boundary later processes. A radiation package, that includes cloud radiation interaction, based on that of the NASA/Goddard Space Flight Center is available in the ARPS. The continuous equations are solved using finite difference methods on a staggered Arakawa C-grid (Fig. 1) [15]. The mode-splitting time integration technique of Klemp and Wilhelmson [8] is employed. The large time step integration uses a leap-frog time differencing scheme. The advection terms are discretized using second- or fourth-order-

Fig.

1.ARPS’ implementation

of a staggered

Arakawa

C grid.

2248

A. Sathye et al. /Parallel

Computing 23 (I 997) 2243-2256

centered differencing while most other terms are discretized using second-order differencing. Additional choices for using monotonic advection schemes are also available for scalars. The small time step integration uses a Crank-Nicholson scheme which solves the w and p equations implicitly in the vertical direction. In addition, second-order and fourth-order numerical diffusion are included in the model to damp small scale noise. The vertical implicit schemes used to handle the sound wave modes as well as vertical subgrid scale mixing render the computations non-local in the vertical direction, Furthermore, many of the physical processes such as radiation and cumulus parameterization require columnwise computations that are non-local in the vertical direction. Therefore, a domain decomposition strategy can usually only be carried out efficiently in the horizontal direction. With the fourth-order advection and/or numerical diffusion, five grid points are involved in a single time step in each horizontal direction. However, we chose to implement these calculations in two steps, each involving only three grid points, so that only one ‘fake’ zone is needed at subdomain boundaries. Of course, data in the single ‘fake’ zone has to be updated after each of these two steps, implying communication between processors. Rigid wall, periodic, zero gradient, wave-radiating open boundary and externally forced boundary condition options are available in the ARPS. Optional upper-level Rayleigh damping can be used to control top-boundary wave reflection. A wave-radiating top-boundary after Klemp and Durran [9] is also available. The latter requires the use of an FFT in the horizontal direction on a single level at the top boundary, which is the only non-local operation in the horizontal direction of the ARPS model. An efficient version of the FFT routine is currently being developed at CAPS. ARPS has a full capability of adaptive grid refinement, which provides multiple levels of two-interactive grid nesting and the options for adding, removing and moving the grid in response to flow evolution. The current implementation of the nested grid portion only works on shared memory platforms. Future effort will be put into the implementation of a parallel version.

3. Parallel implementation In keeping with the characteristics of the ARPS model, the message passing version implemented uses domain decomposition with one domain per processor (each processor gets its own complete simluation to run). Each processor performs its own I/O to read in and write out the data it is responsible for. The primary goal of CAPS is to build a portable storm-scale weather model which can produce a weather forecast 5-10 times faster than evolving weather phenomena. Traditional methods of supercomputing, i.e. shared memory vector computers and automatically parallelizing compilers, though sufficient for research and prototype testing will fall far short of the required computing power that will be needed to produce a reliable storm-scale forecast. CAPS researchers estimate that a domain on the order of 1000 X 1000 X 16 km3 will be required for a storm-scale forecast. Such a model would require approximately 5000 floating point operations per grid point per time step, with

A. Suthye et al. / Parallel

Computing

23 (I 9971 2243-2256

2249

basic physics. The model will have to compute at a rate of 130 Gflops (5000 X 1000 X 1000 X 32 X 5/6 flops) to compute 5 times faster than real time, with 1 km resolution in the horizontal and 0.5 km on the vertical and a six second time step. 3.1. Translation

tools

In order to conform with emerging standards such as HPF and MPI, aid in rapid prototyping and avoid vendor specific software, translation tools were developed which preserve the original source form while maintaining performance. Initially, a translator was developed to convert Fortran 77 source code to array syntax and insert data distribution directives for a data parallel port. The translator supported both the Cray Adaptive Fortran (CRAFT) and HPF environments. A simple translator was written to implement message-passing by converting comment line ‘markers’ into library calls. The initial implementation of these ‘markers’ were simple descriptions of the actions to be taken at a given source line. These actions were written using a simple syntax that could be translated with awk into Fortran 77 code. For example, ‘send buffer X to processor on the left’. Additional code moditications such as message-passing library initialization were done by hand. In the current version of ARPS, the translator parses the source code for comment lines starting with the string ‘cMP’ and then alters the code immediately following according to the specific marker type. Markers for inserting, removing and modifying the original source code have been defined. For example, source code following ‘cMP insert’ is inserted directly into the program. Typically, the insert marker would be used for message-passing library initialization or global reduction type operations. The ‘cMP remove’ marker instructs the translator to remove the next source line from the program. A ‘cMP if’ marker identifies ‘if statements’ which must be altered so that they are executed only at global domain boundaries and not at internal subdomain boundaries. Finally, a ‘cMP bc’ marker identifies statements associated with global boundary conditions (e.g. rigid wall, periodic) where an exchange of data between processors might be required. cMP bc ‘tl real !Message l>assing marker IF (whc .eq. 2) THEN DO 50 ,j = 1,7ay - 1 DO 50 I; = t?, ttz - I! te?n?(l,j,k) = fern%(r~.c-‘L,j,l;) 50 CONTINUE ENDIF

The above ‘cMP bc 2d real’ directive instructs the translator that the following ‘if statement’ implements a boundary condition involving the exchange of a two dimensional real array. The translator parses the ‘if statement’, recognizes the exchange direction (in this case the western boundary wbc) and then generates the necessary message-passing calls to implement the data exchange along with additional do loops to

A. Surhye et al. /Parallel Compuring 23 (I 997) 2243-2256

2250

copy the exchanged above code fragment

data in and out of temporary becomes:

arrays.

Following

translation,

~~~Pl_REAl~,p~oc(loc_x

+ I,foc-y),

the

cMP bc 2d real !Messa.ge pa.ssirlg marker. call inchg C send

data:

IF (loc_.r.n~e.nproc_r)

THEN

DOj=l,ny--1

DO k = 2, nz - 2 tenas’lrlu,(j, k) = tc7722(nz - 2, j, k) ENDDO ENDDO call nlp7._se~n.d(terrls2dw, NY-MX * NZ_MX, : gentug

+ tag_2dw, MPI_COMM_WORLD,

imstat)

ELSEIF (tobc.eq.2) THEN DO j=l,ny-1 DO li = 2, rkz - 2 tems2dw(j, k) = tem2(nx - 2, j, k) ENDDO ENDDO call n,pisenrl(tern,s2d~~~, NY_MX * NZ_MX,

: gentag $ tagJdw, ENDIF

I~~PI_COI~~M_~VORI,I),

nfPI_RE.4L,~~r~c(l,loc_ll), imstat)

c receive dat.a: IF (loc_x.ne.l) THEN call nlpi_recll(t~:rr~2rln!, NY_MX * NZ_MX, MPI_REAL,proc(loc_x : gentag + tn~l_2dw, I\/IPl_COM~1~_~~ORLD,rnpi_status,imstc~t) DO j = l,n;y - 1 DO k = 2, wz - 2 tem2(l,j,k)

- l,loc_y),

= tcmr2dw(j,k)

ENDDO ENDDO ELSE IF (zubc.cg.2) THEN call ~npi_~ecv(telrtr2dru, NY_MX * NZ_MX, MPI_REAL,proc(nproc_.r, : gentag + tagldw, MPI_CO1MI1/f_W0RLD,rnpi_stat~~s, imstat)

loc_y),

DOXlj=l,ny-1 DO50k=2,nz-2 C

tem2(1, j, k) = tem2(ns tem2(1,

50

j, k) = temr2dw(j,

- 2, j, k) k)

CONTINUE ENDIF ENDIF

In the above example, the translator generated calls to the portable Message-Passing Interface (MPI) library, but the translator can also generate calls to the parallel virtual machine (PVM) message passing library. The variables loc_x and loc_y specify the

A. Suthye et al. / Parallel Contputing 23 (I 997) 2243-2256

location

of a given

2251

in a 2D logical processor mesh having dimensions call to inctag generates a unique tag for each message, where identical tags are employed by both the sender and receiver. Storage for the receive and send buffers temr2dw is allocated by the user at compile time and is declared in global common for compatibility with the Cray shmem distributed sharedmemory library specification. In addition, the translator can generate code to exchange one and two dimensional real and integer data between processors in any of the four horizontal directions, including ‘corners’ in the overlap ‘halo’ region.

nproc,r X nproc,..

processor

The subroutine

3.2. Code portabilig The translator currently generates calls to the parallel virtual machine (PVM) or MPI. As a result, the model can be run on all platforms which support these environments, including the Cray T3D, the IBM SP-2 and networks of workstations. Since most parallel platforms lack I/O support, each processor manages its own file read/write operations. We have created tools that split/merge the input/output data files, which carry out their operations based on the parameters specified in the model input file. A Unix script, makearps, which was written to handle the entire translation, compilation and linking process, presents a common interface across all platforms. It can be controlled by user supplied options which specify the compilation options, external libraries to link, etc. Generally, a user can modify the Fortran code freely without much knowledge about the message-passing paradigm and practices. If the user is planning to execute a parallel run, makearps creates the tools to split the input data files, merge the output data files, and the translator. The translator then creates the parallel version of the code, which is then compiled and returned to the user. In the case that the modifications do affect boundary communication, the user can very easily learn to add the directives. 3.3. Petformunce

and scalabilit_,

The performance of the message passing version of the ARPS model has been investigated by running on several platforms: a Cray J90, an IBM SP2 (thin node2), a SGI Power Challenge, and a Cray T3D. On all platforms a distributed memory paradigm was used (even on the shared memory J90 and Power Challenge), with PVM message passing. The only optimization applied was through compiler options. A real domain of 64 X 64 X 16 km with a 1 km horizontal and 500 m vertical resolution equated to a 67 X 67 X 35 computational domain, including fake zones. The time step was 6 s and the simulation ran for 1 h of simulated time. The results are summarized in Table 1. For the J90 and SGI the wallclock time was proportional to N-o-9 (where N is the number of processors) for 1 to 8 processors, but the performance flattened off greatly for 16 processors. This is likely due to the SGI’s limit on memory bus bandwidth and the fact that the message passing used network IO on the J90 ;qstead of more efficient internal communication. The wallclock time for the SP2 and T3u scaled as N-O.* over the range tested. The performance for one processor was 83, 65 and 61 Mflops for the J90, SP2

A! Sathye et al. / Parullel

2252

Compuring

23 (1997) 2243-2256

Table 1 ARPS performance

on parallel machines

Wallclock

Cray J90

IBM SP2 (thin node2)

SGI power challenge

1 processor

5700

7200

7700

-

4 processors

-

-

-

9300

8 processors

960 930 -

1200 777 -

1200 960 -

5200 2900 1800 990

(s)

16 processors 32 processors 64 processors

and SGI, respectively. total).

4. Operational

The 4 processor

Cray T3D

T3D ran at 14 Mflops per processor (55 Mflops

weather prediction

CAPS started a yearly evaluation and testing of the ARPS in 1993. These tests, termed cooperative regional assimilation and forecast test (CRAFT), were a means of involving the operational community with CAPS’ development efforts [7]. The tests in 1993 and 1994, were patterned after project STORMTYPE [l], while in 1995 and 1996, ARPS was tested in a full NWP mode. The primary goal of these experiments was to gain experience in evaluating storm-scale models and the logistics of integrating them into an operational environment as a prototype for future forecast offices. The experiments were very useful in presenting the constraints and challenges of an operational environment. The operational runs during the Spring of 1995 and 1996 were closer to the storm-scale prediction concept envisioned by CAPS. The model runs were more comprehensive than previous years and an MPP was used, for the first time ever, in operational severe weather prediction [3,5,14,16,17]. A 3 km resolution storm scale forecast was one way nested inside a 15 km (9 km during the Spring of 1996) resolution large domain run. These model runs included surface parameterization (soil type, vegetation type), terrain, and were initialized from observations and a National Center for Environmental Prediction (NCEP), forecast. The 1995 experiment was conducted in collaboration with the verification of the origins of rotation in tornadoes experiment (VORTEX) [ 111, while the Spring of 1996 tests were conducted in collaboration with the National Weather Service Forecast Office and National Storm Prediction Center in Norman, OK. The Spring of 1996 built on the experiments in 1995 and included use of NIDS (level III) digital wind and reflectivity data; application of the new ARPS data analysis system (ADAS), a continuously-operated data ingest, quality control and objective analysis package designed to provide the background state for Doppler radar data, use of real time Oklahoma Mesonet data and application of the ARPS single-Doppler velocity retrieval and forward-variational data assimilation system in non-real time. An ARPS forecast for 24 May 1996, made about 5 h before the event and the actual radar image for that time period (Fig. 2) demonstrate the encouraging results achieved from the

A. Sathye et al. /Parallel

Computing

23 (1997) 2243-2256

2253

2254

A. Suthye et al. / Purallel Computing 23 (I 997) 2243-2256

operational tests w$ere the model produced generally the same storm type and motion as the actual storms. Computing resources for these experiments were provided by the Pittsburgh Supercomputing Center (PSC). They provided dedicated time on two (six during the Spring of 1996) processors on their Cray C90 and a dedicated 256 node partition on their Cray T3D MPP system, which was used to demonstrate the feasibility of MPP machines for operational weather prediction. A 6 h forecast on a 300 km wide domain at 3 km resolution and a 6 s time step, took approximately 74 min on the T3D, five times faster than real time. On each processor there were 9 X 9 X 35 computational zones, but only 6 X 6 X 32 real zones. The performance with respect to actual computations, 2.9 Gflops (or 11 flops per processor), compares quite well with the timing runs discussed above, but because of the small problem size on each processor, nearly half of the computation is done in the fake zones. The joint efforts made by CAPS and the PSC (Pittsburgh Supercomputing Center) in the development of the ARPS and operational forecasting of severe storms won CAPS and PSC two major prizes in 1997; the Discovery Magazine Award for Technical Innovation in software category and the Computerworld Smithsonian Award in science category. Both awards recognize the achievements by CAPS and collaborating institutions in developing and applying the Advanced Regional Prediction System to the prediction of severe weather. 5. Future plans The ARPS Version 4.0 was officially released to the public for a variety of applications on September 1995 together with a comprehensive user’s guide [18]. Since then many updates have been released. We will continue to develop and improve the ARPS in the coming years, through improvements to both the model numerics and physics. We will explore the alternative of semi-implicit semi-Lagrangian time integration strategies and their impact on the parallel implementation of the model. Continued emphasis will also be given to the testing of various parallel processing strategies, in particular techniques for handling two-way interactive grid nesting. Due to the past emphasis on code portability and readability, code performance has potential for improvement and will be pursued. In the area of model physics, emphasis will be given to improving the capabilities of the model’s surface energy budget so as to accommodate snow cover, urban areas and snow melt. A microphysics scheme that also predicts the number concentration of hydrometer species will be implemented. Furthermore, we will implement the parallel strategies used in the forward model of ARPS to the data analysis and assimilation components of ARPS, which include the full adjoint of the model. 6. Conclusion The AFWS is a regional weather model designed to execute on a variety of computer systems including MPP’s. Since the spring of 1993, it has been tested each spring

A. Sathye et al. /Parallel

Computing

23 (I 997) 2243-2256

22.55

(traditionally the storm season in and around Oklahoma) in a quasi-operational mode. During the past two seasons part of these runs were made on MPP’s to test their suitability in a production environment. Due to the research tool nature of the ARPS and its emphasis on code readability, the code performance has not been optimum especially during these experiments. The future versions of the ARPS will include rewrites of certain core routines which will improve cache re-usability and performance.

Acknowledgements

CAPS is supported by Grant ATM91-20009 from the NSF and a supplemental award through the NSF from the Federal Aviation Administration. Computer resources were provided by the Pittsburgh Supercomputing Center, which is also sponsored by the NSF. We would like to thank the operations staff at the Pittsburgh Computing Center for providing us with exceptional computing facilities and support since CAPS began these experiments in spring of 1993. We are also grateful for the continued support for our development and research efforts. Among the PSC staff, we would like to especially thank Ken McLain, Chad Vizino, David O’Neal and Sergiu Sanielevici. We would also like to thank Mike Tuttle of Silicon Graphics/Gray Research and the support staff at the Maui High Performance Computing Center.

References [I] H.E. Brooks, CA. Doswell, 111, L.J. Wicker, STORMTIPE: A forecasting experiment using a three dimensional cloud model, Weather Forecast. 8, 352-362. [2] K. Droegemeier, Toward a science of storm-scale prediction, Preprint, 16th Conference on Severe Local Storms, Kananaskis Park, Alberta, Canada, 1990. [3] K. Droegemeier, M. Xue, A. Sathye, K. Brewster, G. Bassett, J. Zhang, Y. Liu, M. Zou, A. Crook, V. prediction of storm scale weather during Wong, R. Carpenter, C. Mattocks, Realtime numerical Vortex-9s: Goals and methodology, Preprints, 18th Conference on Severe Local Storms, 19-2? February. American Meteorological Society, San Francisco, CA. [4] K. Droegemeier, M. Xue, K. Johnson, M. O’Keefe, A. Sawdey, G. Sabot, S. Wholey, N. Lin, K. Mills, Weather prediction: A scalable storm-scale model, in: G. Sabot (Ed.), High Performance Computing, Addison-Wesley, Reading, MA, 1995. [S] K. Droegemeier, M. Xue, K. Brewster, Y. Liu, S. Park. F. Carr, J. Mewes, J. Zong. A. Sathye. G. Bassett, M. Zou, R. Carpenter, D. McCarthy, D. Andra, P. Janish, R. Graham, S. Sanielvici, J. Brown, B. Loftis, K. McLain, The 1996 CAPS spring operational forecasting period: Realtime storm-scale NWP. Part I: Goals and methodology, Preprint, 1Ith Conference on Numerical Weather Prediction, Norfolk, VA, 1996. [6] P.R. Janish, K.K. Droegemeier, M. Xue, K. Brewster, J. Levit, Evaluation of the advanced regional prediction system CARPS) for storm scale operational forecasting, Preprints, 14th Conf. on Weather Analysis and Forecasting, 15-20 January, American Meteorological Society, Dallas, TX, pp. 224-229. [7] K.W. Johnson, J. Bauer, G.A. Riccardi, K.K. Droegemeier, M. Xue, Distributed processing of a regional weather model, Mon. Weather Rev. 122, 2558-2572. [8] J. Klemp, R. Wilhelmson, The simulation of three-dimensional convective storm dynamics. J. Atmos. Sci. 35 (1978) 1070-1096. [9] J. Klemp, D. Durran, An upper boundary condition permitting internal gravity wave radiation in numerical mesoscale models, Mon. Weather Rev. 1 I 1 (I 983) 430-444.

2256 [lo] [I I]

1121 [13]

[14] [15]

[I61

[17]

1181

A. Sarhye et al. /Parallel

Computing 23 (1997) 2243-2256

D. Lilly, Numerical prediction of thunderstorms: Has its time come?, Q. J. R. Meteorol. Sot. 116 (1990) 119-798. E.N. Rasmussen, J.M. Straka, R. Davies-Jones, C.A. Doswell, III, F.H. Carr, M.D. Eilts, D.R. MacGorman, Verification of the origins of rotation in tornadoes experiment, Bull. Am. Meteorol. Sot. 75, 995-1005. G. Sabot, S. Wholey, J. Berlin, P. Oppenheimer, Parallel Execution of a Fortran 77 weather prediction model, Proceedings of the Supercomputing ‘93 Conference, IEEE, Piscataway, NJ, 1993. A. Sathye, G. Bassett, K. Droegemeier, M. Xue, Towards operational severe weather prediction using massively parallel processors, Preprints, Proceedings of the International Conference on High Performance Computing, Tata McGraw Hill, India, 1995. A. Sathye, G. Bassett, K. Droegemeier, M. Xue, K. Brewster, Experiences using high performance computing for operation storm scale weather prediction, Concurr. Pratt. Exper. 8 (10) (1996) 731-740. M. Xue, K. Droegemeier, V. Wong, A. Shapiro, K. Brewster, ARPS Version 4.0 User Guide, Center for Analysis and Prediction of Storms, University of Oklahoma, 1995, available at http: / / wwwcaps.ou.edu/ARPS/ARPSQ.quide.html. M. Xue, K. Brewster, F. Carr, K. Droegemeier, V. Wong, Y. Liu, A. Sathye, G. Bassett, P. Janish, J. Levit, P. Bothwell, Realtime numerical prediction of storm scale weather during Vortex-95. Part II: Operations summary and example predictions, Preprints, 18th Conference on Severe Local Storms, 19-23 February, American Meteorological Society, San Francisco, CA. M. Xue, K. Brewster, K. Droegemeier, V. Wong, D. Wang, F. Carr, A. Shapiro, L. Zhao, S. Weygandt, D. Andra, P. Janish, The 1996 CAPS spring operational forecasting period: Real time storm-scale NWP. Part II: Operational summary and examples, Preprint, I Ith AMS Conference on Numerical Weather Prediction. American Meteorological Society, Norfolk, VA, 1996, pp. 169-173. M. Xue, K. Droegemeier, The Advanced Regional Prediction System (ARPS), A multiscale nonhydrostatic atmospheric model: Model dynamics, Mon. Weather Rev., submitted.