Parallel processing applied to digital terrain matrices

Computer Physics Communications 37 (1985) 357—361 North-Holland, Amsterdam 357 PARALLEL PROCESSING APPLIED TO DIGITAL TERRAIN MATRICES A. RUFFHEAD M...

Download PDF

472KB Sizes 2 Downloads 95 Views

Report

PDF Reader
Full Text

Computer Physics Communications 37 (1985) 357—361 North-Holland, Amsterdam

357

PARALLEL PROCESSING APPLIED TO DIGITAL TERRAIN MATRICES A. RUFFHEAD Mapping and Charting Establishment RE, Elmwood Ai’., Fe/thani, Middy TWJ3 7,4 F. UK

In the production of geographic data covering regions of the world, a typical project requirement is a set of matrices totalling 100 million heights interpolated from contour data. An investigation on a sample area using the DAP shows a 10 to 1 advantage in speed, even when based on a method originally designed for a serial computer. The use of the DAP to manipulate and re-arrange data with inconvenient structures is also demonstrated.

I. Introduction

To assess the feasibility of parallel processing in this field, the problem selected for study was interpolation within a single terrain matrix, comprising about 0.4 million points.

Digital terrain modelling consists of determining height values at matrix points over a region of the earth’s surface. Each constituent set of heights, called a “digital terrain matrix”, covers the whole or a fraction of a geographic degree square. In all cases considered in this paper, the matrix interval corresponds to about 30 m on the ground. In production work at the Mapping & Charting Establishment, this is one of 2 interval sizes used, the other being 100 m for a lower density of coverage, A typical project requirement in terrain modelling is a set of around 150 terrain matrices totalling over 100 million points on the earth’s surface. One way of generating them is to interpolate the required heights from contour (and other associated) data.

i1~us

+

ted}

The input data is taken to be already organised into “profiles”, which correspond to rows of the matrix (“X-profiles”) or columns (“Y-profiles”). In either case, the input file consists of “control cuts”, which are basically the intercepts made by the profiles with the contours. It also includes the height values at the start and finish of each profile (as a result of preliminary interpolations along the terrain matrix boundaries). The form of the contour data and control cuts is illustrated in figs. 1 and 2; these also show the

3+ç+ E-:~:~ ~/ ~

The

2. The model problem

: : ,

+

1:~:,,,,,‘: +

Fig. 1. Part of a contour map with superimposed matrix nodes.

0010-4655/85/$03.30 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)

+

+

+

-

>IIR:T

A. Rufjhead / Parallel processing applied to digital terrain matrices

358

FT

: level

WL~L¼I1NII~h1~i~

(Contrnl

cuts are ringed; urbos are indicated

by arrows)

Fig. 2. Section drawing of part of a profile.

matrix points, or “nodes”, where interpolated heights are required. As the diagrams show, not all control cuts come from contours. Some are spot heights (where the gradient will be zero), some are lake edges (where there is a zero gradient from one side but not necessarily from the other) and some are cliff edges (where heights are undefined). The input data therefore includes “feature type”, a numerical field which identifies the type of each control cut. The interpolation itself is done by a piecewise cubic function fitted along each profile. Except at the edges of cliffs and lakes, each function and its first derivative are continuous, In general, interpolation along the X-profiles will generate a different matrix of heights from interpolation along the Y-profiles. The final terrain matrix is derived as a weighted mean of these 2 components. For each node the appropriate weights are a measure of proximity to the nearest control cut along the orthogonal profiles (such as the inverse square of the distance).

3. Defining the problem for the DAP Height interpolation at millions of points is a task tailor-made for the ICL Distributed Array Processor. The DAP actually used in this investigation is at London University Queen Mary College (QMC) and has 64 x 64 processors. Since each individual terrain matrix comprises hundreds of thousands of nodes, it has to be generated in sub-units with not more than 4096 nodes in each. One obvious option is a “patchwork” partition whereby the DAP generates submatrices up to

64 x 64 in size. Alternatively, the terrain matrix can be produced in “long strips”, each consisting of a small number of complete profiles whose combined total of nodes does not exceed 4096; if there are (say) 621 nodes per profile, then the DAP can process up to 6 profiles simultaneously. The “patchwork” method has an advantage in that when a submatrix is on the DAP, interpolation can be done in orthogonal directions and a submatrix of weighted means can be generated at the same time. With “long strips”, the component matrices have to be generated in separate calls to the DAP, although the DAP can still be used to mean them. One drawback about “patchwork” partitioning is that the input file of control cuts has to be broken down into sub-units that overlap rather messily (in order that the edges of the “patches” blend together). In any case, with one profile of control cuts following on another, the contour data was already structured in a form suitable for the “long strips” approach, so that method was adopted (more for convenience than proven superiority). The DAP was therefore employed as a 4096-element vector processor. In the “long strips” approach, the half-dozen (or so) profiles are fed to the DAP to be treated as a “composite profile”. For the purpose of numbering the nodes each profile is disguised as a continuation of the previous one, and the intercept values of the control cuts are transformed accordingly. Logical masks are used to cover the control cuts which open and close a profile, so that (for example) the height at the end of one profile is not used in the estimation of the gradient at the start of the next. This simulation of a single composite

A. Ruffhead

/

Parallel processing applied to digital terrain matrices

profile is to be done on the DAP itself. When the half-dozen profiles are to be processed on the DAP, only the control cuts along those particular profiles are needed as input. There could be as few as 300 control cuts compared with over 3600 nodes on those profiles. Nevertheless, it is expedient to allow for 4096 control cuts, in the knowledge that (a) unassigned array elements can easily be given dummy values once they are vector elements in the DAP, and (b) logical masks can be applied to dummy control cuts wherever necessary. The requirement can now be specified as a DAP subroutine which interpolates heights at the nodes on a composite profile, using the conesponding control cuts (augmented by dummies to make a total of 4096).

359

b~96,c~96,d~96).To access the standard routines for converting arrays from 2900 mode into DAP mode they would have to be stored as separate arrays (a5, a2,. ,a~96 b1, b2 b~96 c1, c2,. ,c~96d5, d2,. The QMC utility routine “CONV FT M4” avoids what would have been a transposition problem in the host program and converts the 4 interwoven arrays into 4 long vectors. The effect of this is to put the nth set of numbers (an, b~,c,~,d~) into store element n; that is, into the column of store below the nth processor. This routine uses the parallelism of the DAP to perform the necessary data movements. It is sufficiently general to convert 2, 4, 8 or 16 interwoven arrays into DAP mode, as a corresponding number of long vectors. . .

. .

. .

4.2. Preparing intervals for interpolation 4. Structure of the interpolation subroutine The DAP subroutine can be summarized in terms of 5 stages. The first and last of these (outlined in subsections 4.1 and 4.5 below) are concerned with the transfer of data between the DAP and the host program written in 2900 FORTRAN. The body of the subroutine is concerned not only with the interpolation (4.4 below), but with how to go from a structure of one-control-cut-toone-processor to a structure of one-node-to-oneprocessor. The re-arrangement of data is performed in the third stage (4.3 below) after the preparatory work (4.2 below). The QMC DAP Support Unit provided 4 utility routines to overcome data-shifting problems, including one that arose in the initial phase. 4.1. Converting input into DAP mode The host program reads input records serially, each one corresponding to a control cut and consisting of 4 numbers a~,b~,c~,d~.(The numbers are actually profile number, intercept value, feature type and height.) Due to the limitations of FORTRAN, efficiency demands that they be read into 4 interwoven arrays (a1, b1, c1, d1, a2, b2, c2, d2,. ,a~96, . .

Each processor has one control cut, and by a simple shift can be given the data of the next control cut. It now has an interval A ~ x < B which could contain any number of nodes (possibly none). In this instance the 4 parameters chosen to determine the cubic “piece” in each interval are the heights and gradients at A and B. Gradients are calculated for the “A” points in parallel, then for the “B” points in parallel. In each case they are calculated using techniques developed at the Mapping & Charting Establishment for contour map data. Where these techniques require data at neighbouring control cuts, it is accessed (in parallel) via simple shift functions. Each control cut is, in fact, assigned a gradient twice (once for the interval it starts, once for the interval it ends). Gradient discontinuities can be introduced wherever appropriate (see fig. 3), but for each interval the gradients at A and B are unique. When gradients are being calculated, some loss of parallelism is inevitable because of special cases. These can, however, be grouped into categories. One such category is where the height at the next control cut is not available for use in gradient estimation (due to unsuitable feature types or the end of a profile) while the heights at the 2 preced-

361)

A. Ruffhead / Parallel processing applied to digital terrain matrices

Gradient at P is pcaitive fr the interval ccntainingriuden 77.-~

Gradient at P is zeru fcr the interval containing nodes 69~6

I

Fig. 3. Example of a discontinuity in gradient.

ing cuts are available. Within each category of cases, gradients are calculated in parallel. Each interval now has sufficient data for height interpolation at the nodes in that interval, although the data is not yet suitably located for parallel interpolation.

LEFT”. It is similar to “Gather” and “Compress” routines written for vector processors. Another QMC utility routine “DISPERSE RIGHT” (similar to the “Scatter” routines written for vector processors) is used to move the remaining intervals to the store elements of their largest nodes. For example, data for the interval contaming nodes 77—82 goes into store element 82; it now needs to be copied into store elements 77-81. This last requirement is met by “REPLICATE LEFT”, a QMC utility routine that copies data in specified store elements into the preceding gaps. All these routines use recursive doubling, a technique well-known to DAP users. The way it works in the case of “REPLICATE LEFT” is illustrated below:

4.3. Giving each processor the correct interval data To interpolate heights at the nodes in parallel, nodes must be assigned to different processors, and the intervals must be relocated accordingly. For example, if the 5th interval contains nodes 77—82, the data for that interval is needed by store elements 77—82. Firstly, the intervals which enclose no nodes are squeezed out by the QMC utility routine “PACK Original string

A

shiftedstring 1st iteration 2nd iteration 3rd iteration

B

A

merged string shifted string

—

—

—

—

merged string shifted string merged string

—

—

—

—

A A

A A A A

A

A A A A —

(On the nth iteration, the string is shifted left 2’~’ places. Note the disappearance, shown by asterisks but achieved by logical masks, of shifted elements which would have moved into spaces already filled. The string and its shifted version are merged each time, the process ending when all spaces are filled.)

A

C

B—C—

A

A

—

—

—

—

B

B

C

C

—

—

—

—

B

B

*

*

—

—

A

—

—

B

B

B

B

C

C

B B

B B

*

*

—

—

—

—

B

B

B

B

C

C

A A —

—

*

A A

*

A

the heights and gradients at the ends of an interval uniquely determine a cubic polynomial, all that remains is the application of a suitable formula. Interpolated heights at all nodes are generated in parallel. 4.5. Converting output into 2900 mode

4.4. Interpolating at the nodes Each processor now corresponds to one node and has access to the correct interval data. Since

The heights are returned to the host program by standard storage mode conversion subroutines.

A. Ru/fhead / Parallel processing applied to digital terrain matrices

5. Performance and implications The chosen test problem was height interpolation at 373,221 nodes on 601 profiles, from 28,371 control cuts. On the 64 X 64 DAP linked to the QMC 2980 computer, the task required 29 s of mill time, of which DAP time was 5 s; elapsed time was under 4 mm. These were substantially less than the corresponding times on a serial machine (an ICL 2956 computer) which were 393 s (mill) and 44 mm (elapsed). It is expected that future DAP capability will be offered by ICL in the form of DAPs with 32 X 32 processors. The 4: 1 reduction in parallelism is likely to be partially offset by the greater speed of the smaller DAP. (A threefold increase in DAP time would increase the total mill time from 29 to

361

39 s on the test problem.) Advice on the characteristics and anticipated performance of these miniDAPs suggests that they could provide very costeffective facilities for this type of production work. An additional advantage is that they would also meet the needs of production staff for interactive working in order to carry out their overall task more efficiently.

Acknowledgements The author is grateful to the staff at the DAP Support Unit, Queen Mary College, for their specialist help and for access to the QMC DAP. Thanks are also due to the author’s colleagues at the Mapping & Charting Establishment for their suggestions and advice.

Parallel processing applied to digital terrain matrices

Parallel processing applied to digital terrain matrices

Recommend Documents