Computer Physics Communications 38 (1985) 211-219 North-Holland, Amsterdam
211
LANGUAGES AND SOFTWARE DEVELOPMENT TOOLS FOR SUPERCOMPUTERS Hiroshi INA, Sachio KAMIYA and Jiro MIKAMI Software Division, FUJITSU Ltd., 140 Miyamoto, Numazu-shi, Shizuoka, 410-03 Japan
This paper gives first a survey of programming languages for supercomputers and examines the advantages and disadvantages of each type of language. It Stresses that an efficient auto-vectorizing Fortran compiler and related tools are most helpful to the current users. Then the FORTRAN77/VP software system developed by Fujitsu for the FACOM VP-100/VP-200 is presented and the various techniques employed are described with performance.
1. Infroduction
2. Programming languages for supercomputers
The advent of supercomputers in the past several years has introduced a new era. Extensive studies have already been conducted in the area of vectorization techniques to fully exploit the potential power of the hardware architectures. On the other hand it has also been a concerning issue how to lead the users without sufficient knowledge about these machines to use them efficiently and easily. It is the purpose of this paper to discuss various approaches to assist the engineers and scientists in developing their own software, from the view point of programming languages and related tools. Section 2 gives a survey of the past and current programming languages for supercomputers, ranging from conventional Fortran to problem-onented ones. From section 3 through 5, Fujitsu’s approach in the programming language will be fully described. First, the FORTRAN77/VP cornpiler for the FACOM VP-100/VP-200 [1], a vector processor with pipelines, is introduced with techniques employed for vectorization. We demonstrate the powerfulness of the compiler by using sample Fortran programs taken from actual applications. Secondly, INTERACTIVE VECTORIZER, which is a Fortran tuning tool to help vectorization with use of full-screen display terminals, is described. Finally, the SSLII/VP, a general purpose mathematical library for the VP, aimed at reducing users’ burden in developing their programs, is introduced with performance.
In scientific computations the Fortran language is widely used. Therefore the majority of programming languages which have been available for supercomputers such as vector processors belong to a Fortran-based language. Also attempts have been made to introduce a higher level language which is aimed at exploitation of problem parallelism as well as high productivity. One possible way to classify programming languages for supercomputers is shown in fig. 1. 2.1. The standard Fortran with auto-vectorizing compiler
The programmer constructs the problem solution in a sequential programming language, usually Fortran, and the compiler attempts to detect the inherent parallelism of the program. This type of compiler has relatively long history and we can see examples in the IVTRAN for the Illiac IV and the CFT for the Cray. Recently have been available the FORTRAN77/VP (Fujitsu, Japan) and the FORT77/HAP (Hitachi, Japan). The advantages of this type of language are as follows: i) the user can use supercomputers by using languages familiar to him, ii) existing Fortran programs can be moved straightforwardly to supercomputers, iii) it enjoys compatibility and portability of the programs. The only disadvantage will be that all the inherent parallelisms of a
OO1O-4655/85/$03.30 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)
212
H.
ma
et aL
/
Languages and software for supercomputers
languages
1 procedure—or iented
problem—or iented
standard Fortran
Fortran— extended
auto—vectorizing
building blocks
vector(array) processing
architecture —oriented
Fig. 1. Classification of languages for supercomputers.
program are not always detected successfully; it depends upon the capabilities of compiler and it is possible the compiler gives poor results. 2.2. Building block approach Matrix (or vector) manipulations or typical numerical algorithms are often necessary to construct application programs within scientific cornputations. So, this approach means that it is one possible way to build a variety of computational blocks of algorithms highly suited for the hardware architecture and then to provide the user with such blocks as a library. The user can use the library by means of the Fortran CALL statements for example. Examples of these are SSL Il/VP (Fujitsu), CRAYPACK (BCS company) [3], APMATH (FPS Ltd.) [4] and MATRIX/HAP (Hitachi). Some of the libraries cover only relatively fundamental functions such as matrix or vector operations, while others cover an extensive scope of applications including FFT, sorting, linear equations, algebraic eigenvalue problems, or differential equations. When using these libraries, however, the user is often required to adjust his program to the library interfaces. This seems to be the only disadvantage, 2.3. Vector (or array) processing languages
Languages in this approach allow the direct expression of vector or matrix processing, where the units of variable or data are vectors or matrices rather than scalars. Examples are BSP-FORTRAN (Burroughs) [5] and VECTRAN (IBM) [2]. It is to
be noted that the ANSI standards of Fortran 8X, now being investigated, also includes expressions for array processing. The languages belonging to this category are required to include the following specification items. The expressions shown below are examples. i) Expression of vector and array data total body of array A( * a vector in the array B(1, *) constant-strided data C(2 * IX) randomly located data D(IL( *)) ii) Expression of vector operations arithmetic/compare/logical operation C( *) V( *) + U( *) summatjcn/inner product, rnaximum/ minimum search X VSUM(A( *, *)) edit of vector data (gathering) C( *) GAT(L( * ),D( *)) The in the expressions indicates the complete set of elements the length of which has been declared prior to execution beforehand. Also IX, called index variable, indicates a subset of location indices for array elements. VSUM or GAT, called vector intrinsic functions, computes the sum of elements, or gathers elements to edit another vector according to information which contains the locations of input elements via the logical vector L( *), respectively. The advantages are that this type of languages can enable direct expressions of the inherent parallelisms without failure of detection by compilers, and that this specification gives high productivity. At present, however, there exists no international standard for expressions, hence giving rise to a portability problem. —
,
— —
—
=
—
=
—
=
“
*“
H. ma et at
/
Languages and software for supercomputers
2.4. Architecture-oriented languages
These languages are aimed at exploiting the maximum computing power of a specific hardware architecture to be used. The structure of these languages is very similar to that of vector and array processing languages above mentioned, CFD(Illiac IV) [6] and DAP-FORTRAN (ICL) [2] for example are widely known. The advantage of a language like CFD is that the syntax enables the generation of object code best matched with the underlying architecture. On the contrary, the syntax enforces the user to take account of the specific architecture, which can add the complexity to the program development. 2.5. Problem-oriented languages
The objective of these languages is to enable users to formulate their problems in a straightforward way by means of widely accepted mathematical notation without breaking down into conventional languages. Also these languages considerably accelerate the process of solving mathematical problems. The typical problem which can be dealt with by this type of languages is the solution of partial differential equations (PDE). Examples are ELLPACK (Purdue Univ.) [7] and DEQSOL (Hitachi). In ELLPACK for example, the user communicates with it by means of a special input language. First, he defines a problem to be solved (PDE,
213
domain, boundary conditions, and rectangular mesh). Then, in the execution phase, he specifies the sequence of modules (or numerical algorithms) to be used. See sample coding in fig. 2. The advantage is gained at the cost of flexibility, since developing a program with the special language interface unavoidably imposes certain restrictions on the use of the languages. This can be understood in view of the fact that PDE problems allow a considerable freedom of formulation and therefore certain features of PDE problems require individual treatment. So far we have discussed different types of programming languages for supercomputers. Needless to say, the Fortran language at present time is most widely used. This fact has led us to believe that it is of great importance above all to provide the user with a truly effective autovectorizing Fortran compiler and Fortran-based software tools. This is what we shall discuss in the following sections.
3. FORTRAN 77/VP compiler This section introduces the FORTRAN 77/VP compiler [8] which is capable of automatic vectorization and describes the current status of vectorization techniques [12]. 3.1. Vectorization techniques
—
principles
The vectorization means that for a DO loop the ~,<±~
+• ~
exp (x
(0< x
<
+
~
—
y)
1, -1
<~
compiler issues vector instructions when the cornpiler successfully finds out that the results by
u= x
4~ in (x) <2)
U= 0
U= y/2
U= sin
(15 x)
—
x/2
BOUNDARY.
TWO DIMENSIONS S CONSTANT COEFFICIENIS UXXs+UYY3÷3.UXS-C.U$EXPIX+Y)~S1N(X) y :10 ~ SIN(3.141592~X)-X/2.
GRID.
UNIFORM X
INDEXING. SOLUTION. OUTPUT. OPTIONS. END
5PDINT STAR LINPACK BAND TABLE—SOLUTION TIME
EQUATION.
: ~‘~:~= =
vector operations do not differ from those by scalar ones. The vector operation here means the type of SIMD operation in which a single instruction is applied to multiple data. The possibility of vectorization can be determined by examining a “data dependency”. The data dependency can be expressed by using the arrow —s We use the notation “f g” with f and g being operands or statements to show that f must be computed before g because of the data dependency. If a DO loop does not yield any cyclic path we can condude that the DO loop in principle can be vectorized. For example the DO loop shown on “
6 5
U N I FORM
V
$ PLOT—SOLUTION $ MEMORY
Fig. 2. Example of ELLPACK interface [7].
so
—
“,
214
H. ma et at
12 10 I 2
22 13
A11)=B(I—1)+C(I1/1)
C(I)=C1-r~)
(~1~
(11=1(11-1.0
(~)
H(1)=S2/G(I-11
(~)
77/VP compiler semantically detects such expressionandissuesthemacroinstructions.
5(1-1) =1(11=2.0 011=11
~15)~
-~ .~
H)I1=02/G(1-11
10 CONTINUE
3.3. Vectorization of conditional statements
The FORTRAN 77/VP compiler also tries to DO loop as far as possible which contains IF statements. The basic concept is that a change of control by a IF statement defines both a subset of elements to be processed and its complement set not to be processed. The following example illustrates this (fig. 5). In this example the set ~mi} is called a mask, which is a vector with logical values. If a certain mi is true then the corresponding elements of B and A should be altered, otherwise old values should be retained. Therefore, the subset of elements whose corresponding mi is true can be processed in parallel. This kind of operations is called rnasked operation. The FACOM VP hardware provides vector arithmetic instructions with mask. In addition to the masked operation, the FORTRAN 77/VP compiler also incorporates other techniques for vectorization of conditional statements. These are gathering/scattering functions and list vector functions. The compiler attempts to select the most appropriate one for a given DO loop among the three, according to the number of vector elevectorize a
F(I)~E(I)C2
~‘06)
1(11=1(1) ~C2
121\
A(1)=U(I~1)-C(I)~1)
~>
~4)
E(I11-’A(1)+2.O
1=2,0
D(I)=U1/E(I)
(3).
C(I)=C1-F(I)
Languages and software for supercomputers
BI1C(I)~2(I)
~(2) O)I)SIIE(I)
/
(8)
10 CONTINUE
Fig. 3. Vectorization by rearranging statements.
.
the left in fig. 3, in spite of being relatively cornplex, is the case. As a result, the loop can be vectorized by rearranging the order of executions. 3.2. Vectorization with macro operations
Scientific computions are likely to require the inner product of two vectors, algebraic sum of elements, or the maximum/minimum element of a vector. This kind of operations, which we call macro operations, differ in nature from arithmetic operations between two vectors, To illustrate one way for vectorizing macro operations, let us consider the inner product: DO 100 1 1 N X X +A(I) * B(I) 100 CONTINUE
.
ments to be actually altered. For details see ref. [12].
—
=
In this case, we require the evaluation of a sum ~1. The FACOM VP hardware provides macro instructions for such a kind of operations (fig. 4 illustrates the mechanism of the macro for vectorization of summation). Then the FORTRAN
3.4. Other vectorization techniques The FORTRAN 77/VP compiler also has the following functions: partial vectorization, in which operations in a DO loop are distinguished between vectorizable parts and others, then the former is vectorized, vectorization with respect to the outer loops as well, see examples in fig. 6, inclusion of user’s external procedures when these are used in a loop, to enhance vectorization, —
—
A\
—
~\\
~.
~
.
ec orize
A\\ x
tI 12
t3
tn-l in
U
Fig. 4. Vectorization of summation,
t2
~
~
tn-I
~
po~f~ft~e5il~ as well FACOM VP hardware. Table 1 shows execution rate of well-known kernels from the Lawrence
H. ma el a!.
/
Languages and software for supercomputers
215
DO 100 l=1,N IF(A(I).GE.EPS) THEN
mi=Ai.GE.EPS
>
B(l)=A(I)*C(l) A(l)=B(I)—~D(l)
i=1,2,...,N
Bi =Ai*Ci
: mi
,
Ai =Bi—Di
: mi
, i=1, 2,..., N
ENDIF 100 CONTINUE Fig. 5. Vectorization of conditional loop.
NI
V (‘1 M
Table 2 Benchmarks
DO 100 I=2,L1 P=(I—1)*DP
A(I,1)=O.125*P*(ZZ—P)
VP-200(s)
A(1,11P2)=O.125n(Pn(ZZ—P)—S)
_____________
V
DO 100 J = 2, L 2
V V 100
Q=(J—1)iDQ A(I,J)= R~*ZZ
Scalar VORTEX
Fig. 6. Vectorization of Outer loops. Note: The indicators ‘V’, ‘M’ and ‘S’ on the left of statements, printed Out by the compiler, mean that the corresponding Statement has been fully vectorized, partially vectorized or not vectorized, respectively.
Scalar _________
Vector
Vector
217.2
35.4
6.1
6.3
4.8
1.3
43.4 164.4 1107.8
2.6 83.6 41.1
16.7 2.0 27.0
(500 Steps) EULER (Niter = 1000) 2DMHD SHEAR BARO
Livermore National Laboratory on the FACOM VP-100/200. Table 2 is quoted from Raul H. Mendez [13]. 4. Tuning facilities To use the vector processor efficiently, it is sometimes recommended to rearrange the program by the user himself. This is partly because the
Table I LIVERMORE Loops No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Average
VP-200
MFLOPS M380
VP-100
vP-200
M380
10.0 11.3 7.7 5.8 9.6 9.3 13.9 12.4 12.6 7.9 4.6 4.7
187.1 104.7 174.7 73.9 10.0 9.5 189.2 86.1 160.8 50.0 4.8 59.1
331.4 180.4 338.2 88.1 10.0 9.5 331.0 90.4 260.8 85.9 4.8 115.3
33.1 16.0 43.7 15.3 1.0 1.0 23.8 7.3 20.8 10.9 1.0 24.5
2.4 5.7 8.4
6.0 12.9 80.6
6.2 13.8 133.3
2.6 2.4 15.8
Note: The M380 is a scalar machine, of which performance is almost the same with that of scalar units of the VP100/VP-200.
possibility of vectorization sometimes depends upon parameters to be determined at the time of execution. In addition, the rearrangements can often result in more effective vectorization. The FORTRAN 77/VP system provides some facilities which will help the user to tune up the codes. 4.1. Optimization Control Lines
—
VOCL
These are a kind of compiler directives to provide such information that the compiler cannot foresee and to enhance further vectorization. The information that the user can provide includes: the fact that elements of the vector have no dependency on each other, —
the repeat count of the DO loop, the true ratio of the IF statement. Fig. 7 shows an example of the VOCL, where the — —
H. mna et a!.
216
V V V V
/
Languages and software for supercomputers
*VOCL LOOP, NOVREC (DAFG) DO 100 114, VMIX(IXL(I))*25.
control line indicates that the elements of IXL never coincide.
DAFGD DAPG(IXL(I))*75. DAFG (IXL (I)) (DAFGD+VMI XD)/100. 100 CONTINUE =
4,2. Interactive vectorizer [9]
=
This is a FORTRAN tuning tool to promote vectorization by use of full-screen type of display
Fig. 7. Example of the VOCL.
(VECTorn ZE
COSTISiRD
==
LiST
Nssor1cs
--
-
---
131.03
---—-——--—-—
LINE 000012 EEL.
— -—-
>
S 3002371—I I..NO:00003200
(J50~
001
I FOOL.
SCROLL
=—==~
00003000 ERNEST BE VECTEIRIZEL) EECPIUEE. REC’.JRSICJC REFERENCE FRY -‘
PERRY TF000I
TARE PLACE. 3502372-I [~4s):00003201) — OOO03~O0 ARRAY TROll 000001’ BE 1.11(01024 ZOO FIRI:PuOE: RECUSEIVE F(L~-LTRERCL( (Ill) TAKE PLACE J0020II-I LH5:013003000 — 00003300 TIlE STPTEIiE N’iS I N TOIS RONGE PRE VECTOR I SRI) BY (30 OCR I PALE I. FOIT
S( V ci
/
——
110 SC’) Dy br)
I 1022454 024010(31 LOT)
S N 7 0 611
DO X
003001)
I
I
001101)
14 .1
003200
277111
00,3300 003400 003500 003600 \.,~o037OO
X
/0 I=
—
/1
II
IE0001II (I))— SQRT(x(I))—’LFoooll.I: IIU ci) CT 03 THEN TF0001 (LII)
——
1140
V
C~
011 L?~
i
t~
~I,
N
1 N 71
/
s
1
TF000I (LI I.>3+TFO24L (Iii ‘\ ENOIF I TFO2(I)= )J X TF0001(L(I))--0QRT(X(I)) 1/ 70 CONTINUE
7206611
—
51 ‘4’
‘~
51743
20870
71
100
0
F E
1345240
25370
S
C
Fig. 8. Example of screen — no. 1. Note: More detailed information about a specific message can be obtained by specifying ‘s’. Then the next screen appears as shown in fig. 9.
VECTORIZE DETPIL INFORMRTION —
--
JNO237I
(i/i:>
TUNING
INFORN,TTION SF’ECIFY NOUREC
IF THERE IS NO RECURSIVE REFERENCE TO RN ARRAY TF0001 AND THE INDICES ACCESSED BY ‘lOB LIST VECTOR ARE DIFFERENT. TUNING EXAMPLE LIST : VECTOR DRIP 000ESSIN0 RANDOM POSITION OF TF0001 —
*VOCL *--~-> DO
I
10
LIST
. .
TF0001 = *—--
LISTN
TF000i
ILIST)
I
0028ClO 002900 003000 003100 003200 003300
(L1ST
150001 ~--—
EDIT ——— b2~72 SYOVI LI
10 I=1,NHN
TF0001
=
10 CSNTIN’JE
COMIIF1NO
LOOP NOURECI: TF0001)
+——> DO I LIST
I=1,NNH
10
(LIET~
CONtINUE
1102 ?.~T DATPPBTTLO3)
===>
COLUMy SCROLL.
1311 7206411
DO 70 1=2,11 TF0001(LcI))=~ SPRT(Xl.I))—TF0001(I) IF(L(I).GT.0) THEN TF0001(L(Ifl= X TF0001(L(I))*TF02IL(I:)) X
369’) 277111
Fig. 9. Example of screen
—
no. 2.
134 134524S 5 51746
5174S S
001 -==>
0 /~ FREE
130 2507E E 5.587E 100.0% 2557E B
H. ma et a!.
/
Languages and software for supercomputers
terminals. With this tool, the user can extract useful information to achieve higher level of vectorization and he will be instructed what to do. Then he can modify the program interactively according to the suggestion. Figs. 8 and 9 are examples displayed by this tool. The upper half of the screen includes tuning messages, while the lower half includes a part of source code to which the user can make modification according to the messages. The source field includes other information such as execution count and estimated execution cost of each statement in both scalar and vector computational modes.
5. SSL 11/VP
—
Scientific Subroutine Library
The SSL Il/VP is an extensive library of Fortran coded subroutines which are intended to be used as computational building blocks for scientific users’ applications on the FACOM VP. The package originates from the SSL II [10,11] which has been provided for scalar computer users. The two packages cover completely the same areas of applications, consisting of about 400 subroutines and including the following: matrix manipulations, simultaneous linear equations, algebraic eigenvalue problems, single or simultaneous nonlinear equations, minimization of functions, interpolation and curve fitting, transforms (Fourier and Laplace), quadrature, ordinary differential equations, special functions, random numbers. In developing the vector version from SSL II, many sophisticated techniques for efficient vectorization have been employed on the majority of subroutines, especially in the areas of linear algebra and Fourier. transforms, which have led to near optimal computation rates on the FACOM Vp. We present here, as an example of techniques of SSL Il/VP, the construction of the multiplication of matrix by vector, which has wide applications to other computations such as LU factorization of
217
dense matrices. Let us write the multiplication as c Au, where A denotes an m by n matrix, v a vector of order n and c the resultant vector of order m. On scalar machines, the most conventional Fortran code for the multiplication may be such as (I) below. We refer to (I) as the inner product scheme. As has been often said, a significant improvement can be made on vector processors by rewriting (I) as (II). This is mainly because (I) generates a vector macro instruction while (II) is free from it. Also (II) allows the vector c to reside on a vector register until the loop DO 30 completes. We refer to (II) as the outer product scheme. DO 20 1= 1, M C(I) 0.0 DO1OJ=1,N (I) 10 C(I) C(I) + A(I,J) * V(J) 20 CONTINUE =
=
=
DO 10 I 1,M 10 C(I) 0.0 DO 30 J 1,N DO 20 I 1,M 20 C(I) C(I) + A(I,J) * V(J) 30 CONTINUE =
=
(II)
=
=
=
— — — — — — — — — — —
.
,
Now we propose several techniques to improve (II) furthermore: i) Unrolling with respect to the outer loop The loop DO 30 in (II) can be written as DO 30 J 1 N 1 2 DO 20 I i’M 20 C(I) C(I) + A(I,J) * V(J) * + A(I,J + 1) * V(J + 1) (III) 30 CONTINUE =
—
=
=
.
.
(plus DO 20 in (II) with J
=
N,
.
if
N is odd).
This gives another advantage in which the pipeline units can be utilized more efficiently than in (II). ii) Loop segmentation When m, the row number of matrix A, is greater than a number, say I which is the maximum vector length attainable with one vector instruction, A should be divided into submatrices each of which have n columns and at most / rows. Then the unrolling technique i) should be applied to the submatrices one by one. This technique has
H. ma et a!.
218
/
Languages and software for supercomputers
Table 3 Linear equations for dense matrices N
Nonsymmetric (ms)
50
100 250 500
Positive symmetric (ms)
M380
VP-200
Ratio
M380
VP-200
Ratio
20.6 135.9 2109.0 19530.0
3.5 10.1 53.0 266.5
5.5 13.5 39.8 73.3
8.6 61.0 1022.4 8088.6
1.9 5.5 28.4 132.6
4.5 11.1 36.0 61.0
Note: 1) “Ratio” = M380/VP-200. 2) Test matrices for nonsymmetric version are A = (au), au metric version are Frank matrices. 3) The nonsymmetric version uses the partial pivoting.
been introduced to overcome a problem that when m > / the code (III) as it is would require the vector store and load instructions for the vector c every time J changes. iii) Switching between Outer product and Inner product schemes Suppose that m 1 n. Is it wise to apply the outer product scheme to this case? The answer is of course no. From this typical example, it can be seen that we should switch adaptively between the two schemes according to the ratio of m to n: if m =4z n the inner product scheme should be employed and otherwise the outer product scheme. These three techniques have been incorporated in subroutines as well for linear equations or eigenvalue problems with real dense coefficient matrices. We now show the performance of the SSL Il/VP on the FACOM VP-200 in the areas of linear equations with real dense nonsymmetric or symmetric coefficient matrix and radix-2 complex FFT in tables 3 and 4, respectively. It should be noted that all the algorithms have been written in the standard Fortran language without any use of =
=~
Table ‘~ Radix-2 complex FFT
=
au
=
SQRT(2/(N + 1)) * SIN(ij’rr/(N + 1)), and those for sym-
assembler coding. The FFT employs so-called not-inplace and self-sorting algorithm (or isogeometric algorithm for single precision routine to overcome memory conflict) [11].
6. Conclusion The point to be made first is that we have to recognize that the standard Fortran language absolutely dominates the computing society at present time, and that this situation will not change in the near future. Then the most important and practical way to assist the Fortran users to transfer their programs inexpensively into supercomputers, is to provide a sophisticated auto-vectorizing Fortran compiler which tries to exploit parallelisms by every possible means. Also important is a collection of computational building blocks which are widely applicable and designed to extract the potential speed of the hardware. We would like to say that the FORTRAN 77/VP system consisting of FORTRAN 77/VP compiler, INTERACTIVE VECTORIZER and SSL Il/VP, has satisfied these demands and achieved high performance. Tables 1 and 2 in section 3 show that the actual application programs can be executed over 10 times faster on .
___________________________________________ N
M380
VP-200
Ratio
64 128 256
391.0 759.0 1869.0
51.0 57.0 91.0
7.7 13.3 20.5
1024
6940.0
217.0
32.0
.
an average without any modification than on the corresponding scalar machine. Also, tables 3 and 4 in section 5 show that numerical solutions commonly needed can be computed 20 to 70 times faster for typical problem size, by highly tuned Fortran coding. Thus, it can be said that the majority of existing Fortran programs in scientific
H. ma et a!.
/
Languages and software for supercomputers
applications can extract the potential power of the FACOM VP hardware by utilizing the above mentioned FORTRAN 77/VP system. Secondly, the vector (array) processing languages and the problem-oriented languages are attractive because these languages can express parallelisms directly and can also contribute to high productivity. We believe that the conventional Fortran will be taken its place by the vector (array) processing languages in the future, and that the forthcoming ANSI standards of Fortran 8x will ensure this trend,
Acknowledgements We would like to give acknowledgement to Dr. G.H.F. Diercksen at Max Planck Institute for Physics and Astrophysics for inviting us to this conference. Also our thanks are due to Dr. Kenichi Miura, Project Manager at Fujitsu for reviewing the manuscript and for many useful suggestions.
References [1]K. Miura and K. Uchida, FACOM VECTOR PROCESSOR SYSTEM:VP-100/VP-200, Proc. NATO
219
Advanced Research Workshop on High Speed Computing, NATO ASI Series, vol. F7 (Springer-Verlag, Berlin, 1984) 127-138. [2] ix R.W. Hockney and CR. Jesshope,. Parallel Computers (Adam Hilger, Bristol, 1981). [3]MAINSTREAM - EKS/VSP CRAYPACK Supplement to BCSLIB Users Manual, Boeing Computer Services Company, 1982). [4]AP-120B Math Library, Publication 7288-02, Floating Point Systems Inc. (1976). [5] J.H. Austin Jr. The Burroughs Scientific Processor, Infotech State of the Art Reports, Supercomputers, Infotech International Limited (1979). [6] CFD — A Fortran-based language for ILLIAC IV, Computational Fluid Dynamics Branch, NASA, Ames Research Center (1974). [7] JR. Rice, ELLPACK 77 User’s Guid, CSD-TR289, Cornputer Science Department, Purdue University (1980). [8] FACOM OSIV/F4 MSP FORTRAN77/VP User’s Guide, 78SP5680E-1, Fujitsu Ltd. (1984). [9] FACOM OSIV/F4 MSP Vectonzer User’s Guide, 785P5690E-1, FujitsuInteractive Ltd. (to appear). [10] FACOM FORTRAN SSL II User’s Guide, 99SP0050E-5, Fujitsu Ltd. (1982). [11]FACOM FORTRAN SSL II Extended Capabilities User’s Guide, 995P0060E-1, Fujitsu Ltd. (to appear). [12]5. Kamiya et al., Practical vectorization techniques for the FACOM VP, INFORMATION PROCESSING 83, IFIP (Elsevier Science Publishers, The Netherlands, 1983). [13] Raul H. Mendez, Supercomputer Benchmarks Give Edge to Fujitsu, SIAM News vol. 17, no. 2, (1984).