55
CHAPTER 4
ADVANCES IN HIGH P E R F O R M A N C E PROCESSING OF SEISMIC DATA
by E R N S T L. L E I S S D e p a r t m e n t of C o m p u t e r S c i e n c e Research C o m p u t a t i o n
Laboratory
U n i v e r s i t y of H o u s t o n and O L I N G. J O H N S O N D e p a r t m e n t of C o m p u t e r S c i e n c e U n i v e r s i t y of H o u s t o n a n d t h e H o u s t o n Area Research Center
1.
INTRODUCTION A d v a n c e s in g e o p h y s i c a l p r o c e s s i n g a r e d e p e n d e n t
o n a d v a n c e s in
computer
h a r d w a r e a n d s o f t w a r e . H e n c e , it is i m p o r t a n t for g e o p h y s i c i s t s t o b e a w a r e of r e s e a r c h efforts a n d n e w p r o d u c t s in c o m p u t e r d e s i g n , I/O d e v i c e s , a l g o r i t h m s , a n d programs. H e r e w e s u r v e y t h e s e a r e a s . S e c t i o n t w o a d d r e s s e s a d v a n c e s in
hardware.
M a n y r e s e a r c h p r o j e c t s in n e w c o m p u t e r a r c h i t e c t u r e s a r e r e v i e w e d . S o m e of t h e s e h a v e a l r e a d y b e e n u s e d successfully in g e o p h y s i c a l m o d e l i n g o r p r o c e s s i n g .
I/O
a d v a n c e s a r e a l s o c o v e r e d . S e c t i o n t h r e e a d d r e s s e s s o f t w a r e a d v a n c e s in l a n g u a g e s a n d c o m p i l e r s . S e c t i o n f o u r c o n s i d e r s t h e p r o b l e m s of i m p l e m e n t i n g
geophysical
a p p l i c a t i o n s in t h e s e n e w e r s y s t e m s . T h e realities a n d pitfalls of t h e i m p l e m e n t a t i o n p r o c e s s a r e briefly d i s c u s s e d . T h e s u b j e c t of i n - c o r e p r o g r a m m i n g v e r s u s out-ofs c o r e p r o g r a m m i n g is c o n s i d e r e d in s o m e d e t a i l . F i n a l l y , i m p l e m e n t i n g v e c t o r a n d p a r a l l e l p r o g r a m m i n g is d i s c u s s e d . S.S.Ε —C
56 2. H A R D W A R E
ADVANCES
T h e t r a d i t i o n a l v o n N e u m a n n c o m p u t e r c o n s i s t s of a m e m o r y , a p r o c e s s o r , a n d a bus between
them. D a t a
and
i n s t r u c t i o n s a r e s t o r e d in t h e m e m o r y ,
and
the
p r o c e s s o r c o n t r o l s a n d p e r f o r m s t h e c o m p u t a t i o n s , t h a t is, it g e n e r a t e s a d d r e s s e s for d a t a a n d i n s t r u c t i o n s , fetches t h e m a n d c o m p u t e s o n d a t a . T h e b u s is t h e m o s t f r e q u e n t l y u s e d c o m p o n e n t of t h e s y s t e m . T o a v o i d a p o t e n t i a l b o t t l e n e c k ,
von
N e u m a n n m a c h i n e s often i n c l u d e a s m a l l fast l o c a l s t o r a g e ( l o c a l m e m o r y a n d / o r c a c h e ) w h i c h is a c c e s s e d m o r e f r e q u e n t l y b y t h e p r o c e s s o r . T h e v o n N e u m a n n c o m p u t e r is a c o n t r o l flow c o m p u t e r w h e r e t h e flow of c o n t r o l c a u s e s t h e e x e c u t i o n of i n s t r u c t i o n s . C e n t r a l t o t h e v o n N e u m a n n m a c h i n e is t h e c o n c e p t of t h e s t o r e d p r o g r a m , t h e p r i n c i p l e t h a t i n s t r u c t i o n s a n d d a t a a r e t o b e stored
together
intermixed
in
a
single,
uniform
storage
medium
rather
than
s e p a r a t e l y . T h e a m b i g u i t y of t h e i n t e r p r e t a t i o n of a n e l e m e n t in s t o r a g e is r e s o l v e d only temporarily
when
it is fetched
and
either executed
as a n instruction
or
o p e r a t e d o n a s d a t a . A d a t u m , c r e a t e d a s a r e s u l t of s o m e o p e r a t i o n s in t h e A L U ( a r i t h m e t i c l o g i c u n i t ) , m i g h t p o s s i b l y b e p l a c e d in s t o r a g e a s o t h e r d a t u m , b u t t h e n fetched a n d e x e c u t e d a s a n i n s t r u c t i o n e i t h e r d e l i b e r a t e l y b y p r o g r a m d e s i g n o r b y e r r o r . A n o t h e r c o n c e p t c e n t r a l t o t h e v o n N e u m a n n m a c h i n e is t h e p r o g r a m c o u n ter, a r e g i s t e r t h a t is u s e d t o i n d i c a t e t h e l o c a t i o n of t h e n e x t i n s t r u c t i o n t o b e e x e c u t e d a n d w h i c h is a u t o m a t i c a l l y i n c r e m e n t e d b y e a c h i n s t r u c t i o n fetch.
2.1 New
Architectures
T h e s t u d y of a r c h i t e c t u r e s t h a t utilize v a r i o u s t y p e s of c o n c u r r e n c y is m o t i v a t e d b y t h e n e e d t o i n c r e a s e t h e p e r f o r m a n c e of c o m p u t e r s . T h e n e w m a c h i n e s w h i c h will s u p e r s e d e t h e v o n N e u m a n n m o d e l will h a v e g r e a t e r p e r f o r m a n c e a n d m a y u s e very l a r g e scale i n t e g r a t i o n ( V L S I ) t o i m p l e m e n t t h e c o n c u r r e n t a r c h i t e c t u r e s . T h e a d v a n c e d c o m p u t e r s s t u d i e d h e r e h a v e b e e n classified a s m u l t i p r o c e s s o r s , dataflow
computers,
array
processors,
pipelined
computers,
supercomputers,
systolic a r r a y s , v e r y l a r g e i n s t r u c t i o n w o r d ( V L I W ) m a c h i n e s , a n d based
on
the
reduced
instruction
set
computer
(RISC)
uniprocessors
architecture.
This
classification is b a s e d o n t h e m o d e of e x e c u t i o n of t h e p r o c e s s o r s , t h e p e r f o r m a n c e a n d size of m e m o r y , t h e c o n t r o l m e c h a n i s m , a n d a n y s p e c i a l i z e d a r c h i t e c t u r e like VLIW and RISC.
57 2.1.1 Pipelined
Computers
P i p e l i n i n g s p e e d s u p s i n g l e - t h r e a d e d c o d e . I n s t r u c t i o n e x e c u t i o n is b r o k e n i n t o its c o m p o n e n t s (levels) s u c h a s i n s t r u c t i o n fetch, o p c o d e d e c o d i n g , o p e r a n d a d d r e s s c a l c u l a t i o n , o p e r a n d fetch, a n d e x e c u t i o n , e a c h of w h i c h c a n b e e x e c u t e d i n d e p e n d e n t l y w i t h s i m u l t a n e o u s c o m p u t a t i o n s o n different sets of d a t a . A f l o a t i n g a d d c a n b e p i p e l i n e d a s follows: sign c o n t r o l , e x p o n e n t c o m p a r e , m a n t i s s a shift, m a n t i s s a add, e x p o n e n t adjust, a n d normalization. T h e E X P R E S S I O N U n i v e r s i t y of W a s h i n g t o n ,
PROCESSOR
P I P E a t U n i v e r s i t y of W i s c o n s i n - M a d i s o n
and
at TIP
f r o m J a p a n fall in t h i s c a t e g o r y .
2.1.2 Array
Processors
A r r a y p r o c e s s o r s o b t a i n c o n c u r r e n c y b y p e r f o r m i n g i d e n t i c a l o p e r a t i o n s o n different p o r t i o n s of d a t a , t h a t is, t h e y a r e S I M D (single i n s t r u c t i o n s t r e a m , m u l t i p l e d a t a s t r e a m ) . T h e y a c t a s fast c o p r o c e s s o r s w h i c h offload m a n y of t h e r e p e t i t i v e c a l c u l a t i o n s n e e d e d in scientific a p p l i c a t i o n s . T h e y a r e c o n n e c t e d / c o n t r o l l e d b y a h o s t . T h e h o s t p r o v i d e s t h e m e c h a n i s m s for c o m m u n i c a t i o n s a n d c o n t r o l b e t w e e n t h e a r r a y p r o c e s s o r a n d t h e o u t s i d e w o r l d . It a l s o p e r f o r m s t h e t a s k s of d a t a management,
compilation,
and
resource allocation/control
functions
commonly
associated with a general-purpose operating system. Although array processors are high performance machines, they are b u r d e n e d with several p r o b l e m s . First, struct u r e d d a t a t h a t a r e v e c t o r s of i r r e g u l a r s t r i d e s a r e difficult t o h a n d l e b e c a u s e of m e m o r y conflicts. S e c o n d l y , p r o g r a m s d o n o t c o n s i s t o n l y of v e c t o r i n s t r u c t i o n s . The
ADAPTIVE
ARRAY
NEIGHBORHOOD
PROCESSOR
PROCESSOR
at
from
Japan,
University
of
PARALLEL Missouri,
IMAGE
MULTIPLE
P A R A L L E L P R O C E S S O R at G o o d y e a r Aerospace C o r p o r a t i o n , R I C E PROCESSOR
at Rice University, V E R Y F A S T P A R A L L E L
ARRAY
PROCESSOR
at
C o l u m b i a U n i v e r s i t y a r e s o m e of t h e c u r r e n t a r r a y p r o c e s s o r p r o j e c t s . A b i n a r y a r r a y p r o c e s s o r is a p a r a l l e l m a t r i x p r o c e s s o r in w h i c h e a c h p r o c e s s ing e l e m e n t is c o n s t r a i n e d t o bit serial o p e r a t i o n s . A p a r a l l e l m a t r i x p r o c e s s o r is a S I M D m a c h i n e t h a t h a s a set of p r o c e s s i n g e l e m e n t s ( P E ' s ) o r g a n i z e d a s a t w o dimensional matrix such that d a t a m a y only be transferred between adjacent PE's. D a t a i n t e r c o n n e c t i o n s b e t w e e n P E ' s a r e o n e bit wide. B i n a r y a r r a y
processors
process picture data, conventionally represented by a large two-dimensional array of p i c t u r e e l e m e n t s c a l l e d P i x e l s . B A S E a t P u r d u e U n i v e r s i t y a n d C L I P
from
England are binary array processors. The W A V E F R O N T
ARRAY
PROCESSOR
a t t h e U n i v e r s i t y of
Southern
58 C a l i f o r n i a is a s p e c i a l i z e d a r r a y p r o c e s s o r b a s e d o n t h e w a v e f r o n t c o n c e p t . T h e w a v e f r o n t n o t i o n d r a s t i c a l l y r e d u c e s t h e c o m p l e x i t y in t h e d e s c r i p t i o n of p a r a l l e l algorithms. The mechanism
provided
for t h i s d e s c r i p t i o n
is a
special-purpose,
w a v e f r o n t - o r i e n t e d l a n g u a g e . R a t h e r t h a n r e q u i r i n g a p r o g r a m for e a c h p r o c e s s o r in t h e a r r a y , t h i s l a n g u a g e a l l o w s t h e p r o g r a m m e r t o a d d r e s s a n e n t i r e front of processors. The wavefront architecture can provide asynchronous waiting capability a n d consequently can cope with timing uncertainties such as local clocking, r a n d o m delay
in c o m m u n i c a t i o n s ,
and
fluctuations
of c o m p u t i n g
times. In
short,
the
w a v e f r o n t n o t i o n l e n d s itself t o a ( a s y n c h r o n o u s ) d a t a f l o w c o m p u t i n g s t r u c t u r e t h a t c o n f o r m s well w i t h t h e c o n s t r a i n t s of V L S I . T h e i n t e g r a t i o n of t h e w a v e f r o n t c o n cept, the wavefront language, a n d the wavefront architecture leads to a p r o g r a m m a b l e c o m p u t e r n e t w o r k c a l l e d t h e w a v e f r o n t a r r a y p r o c e s s o r ( W A P ) . T h e W A P is in a sense a n o p t i o n a l t r a d e off b e t w e e n t h e g l o b a l l y s y n c h r o n i z e d a n d d e d i c a t e d systolic a r r a y a n d t h e g e n e r a l - p u r p o s e d a t a f l o w m u l t i p r o c e s s o r . It is m a i n l y a i m e d at
incorporating
the
vast
VLSI
computational
capability
into
modern
signal
processing applications.
2.1.3 Dataflow
Computers
I n a d a t a f l o w c o m p u t e r t h e a v a i l a b i l i t y of i n p u t o p e r a n d s t r i g g e r s t h e e x e c u t i o n of t h e i n s t r u c t i o n w h i c h c o n s u m e s t h e i n p u t s . It is a s s o c i a t e d w i t h s i n g l e - a s s i g n m e n t languages
in
which
data
flows
from
one
statement
to
another,
execution
of
s t a t e m e n t s is d a t a - d r i v e n a n d identifiers o b e y t h e s i n g l e - a s s i g n m e n t r u l e . A n o d e is said t o b e
firable
( e n a b l e d ) if a t o k e n
a r r i v e s o n e a c h of t h e i n c o m i n g
arcs
r e p r e s e n t i n g t h e n e c e s s a r y o p e r a n d s for t h e n o d e , a n d if n o t o k e n s a r e p r e s e n t o n the outgoing arcs where the resulting tokens are to be emitted. T o
hold
the
d a t a b a s e of a l a r g e scale c o m p u t a t i o n , t h e d a t a f l o w c o m p u t e r h a s a r r a y m e m o r i e s . T h e p r o c e s s i n g e l e m e n t s c o n s i s t of t w o k i n d s of u n i t s — c e l l b l o c k s a n d
functional
u n i t s . Cell b l o c k s h o l d t h e i n s t r u c t i o n s a n d p e r f o r m t h e b a s i c f u n c t i o n of r e c o g n i z i n g w h i c h i n s t r u c t i o n s a r e r e a d y for e x e c u t i o n . T h e f u n c t i o n a l u n i t s p e r f o r m
the
e x e c u t i o n of e n a b l e d i n s t r u c t i o n s . Dataflow machines can be static or d y n a m i c (tagged), based on the m e t h o d by which they pass t o k e n s from n o d e to n o d e . A static dataflow m a c h i n e allows only o n e t o k e n o n a n a r c a t a t i m e . A p r o g r a m , a s s t o r e d in t h e c o m p u t e r ' s m e m o r y , c o n s i s t s of i n s t r u c t i o n s l i n k e d t o g e t h e r . E a c h i n s t r u c t i o n h a s a n o p e r a t i o n c o d e , s p a c e s for h o l d i n g o p e r a n d v a l u e s a s t h e y a r r i v e , a n d d e s t i n a t i o n fields t h a t i n d i c a t e w h a t is t o b e d o n e w i t h t h e r e s u l t s of i n s t r u c t i o n e x e c u t i o n . T h e r o u t i n g n e t w o r k
59 provides
pathways
needed
to
send
result
packets
to
instructions
residing
in
o t h e r p r o c e s s i n g e l e m e n t s . If a p r o c e s s o r h a s m a n y i n d e p e n d e n t a c t i v i t i e s w a i t i n g for its a t t e n t i o n ,
then delay can be tolerated
in t h e i n t e r c o n n e c t i o n
M U L T I U S E R D A T A F L O W M A C H I N E from C a n a d a , D E N N I S
network.
DATAFLOW
M A C H I N E a t M I T , D A T A D R I V E N M A C H I N E # 1 a t t h e U n i v e r s i t y of U t a h , CHICAGO DATAFLOW
M A C H I N E a t t h e U n i v e r s i t y of C h i c a g o ,
DATAFLOW
MULTIPROCESSOR
DATAFLOW
MACHINE
at
at
Hughes
University
of
Aircraft
California,
HUGHES
Company, Irvine,
IRVINE
PIECEWISE
D A T A F L O W M A C H I N E at Lawrence Livermore N a t i o n a l L a b o r a t o r y are some of t h e s t a t i c d a t a f l o w p r o j e c t s . In a dynamic dataflow computer, multiple tokens o n an arc at a time are a l l o w e d . T o k e n s c a r r y d i s t i n g u i s h i n g t a g s w h i c h identify t h e i r i n d i v i d u a l c o n t e x t . T h i s m e t h o d a l l o w s for m a x i m u m p a r a l l e l i s m in e x e c u t i o n of p r o g r a m s . A R V I N D D A T A F L O W M A C H I N E at M I T , D A T A F L O W C O M P U T E R PROCESSING DATAFLOW
at
the
MACHINE
University from
of
England,
North
Carolina,
FOR
SIGNAL
MANCHESTER
PROGRAMMABLE
MODULAR
S I G N A L P R O C E S S O R a t R C A G o v e r n m e n t S y s t e m s D i v i s i o n a r e s o m e of t h e d y n a m i c dataflow projects.
2.1.4
Multiprocessors
M o s t of t h e p r e s e n t a r c h i t e c t u r e r e s e a r c h p r o j e c t s a r e m u l t i p r o c e s s o r s , shared-memory
or
message-passing.
Multiprocessors
use
several
either
processors
( h o m o g e n e o u s or h e t e r o g e n e o u s ) concurrently t o solve o n e or m o r e p r o b l e m s . T h e e a r l y d e v e l o p m e n t of m u l t i p r o c e s s o r h a r d w a r e a n d t h e o p e r a t i n g s y s t e m s n e c e s s a r y t o m a k e it effective in a p p l i c a t i o n s w e r e l a r g e l y o r i e n t e d t o w a r d i n c r e a s e d s y s t e m t h r o u g h p u t o v e r single p r o c e s s o r s y s t e m s . T h e y h a v e t h e m o s t flexible c o m p u t e r a r c h i t e c t u r e in e x p l o i t i n g a r b i t r a r i l y s t r u c t u r e d p a r a l l e l i s m . M u l t i p r o c e s s o r s y s t e m s h a v e m u l t i p l e i n s t r u c t i o n s t r e a m s o v e r a set of i n t e r a c t i v e p r o c e s s o r s w i t h s h a r e d r e s o u r c e s s u c h a s m e m o r i e s a n d d a t a b a s e s of a u t o n o m o u s shared
resources,
but
with
an
inter-processor
processors with
communication
network.
no
Multi-
p r o c e s s o r s offer a n o t h e r d i m e n s i o n of p a r a l l e l i s m , n a m e l y m u l t i t a s k i n g ( c a p a b i l i t y of a s y s t e m t o s u p p o r t t w o o r m o r e a c t i v e t a s k s s i m u l t a n e o u s l y ) in a d d i t i o n t o v e c t o r i z a t i o n ( t h e p r o c e s s of r e p l a c i n g s e q u e n t i a l c o d e b y v e c t o r i n s t r u c t i o n s ) . T h e y a r e m a i n l y t w o t y p e s of m u l t i p r o c e s s o r s , s h a r e d - m e m o r y a n d m e s s a g e - p a s s i n g . In the shared-memory
m o d e l , t h e d a t a is in p r e a l l o c a t e d l o c a t i o n s in
the
60 s h a r e d - m e m o r y w h e r e it c a n b e a c c e s s e d b y e a c h p r o c e s s o r a n d o p e r a t e d
upon
w i t h o u t interruptions from other processors. These m a c h i n e s are structured with a s w i t c h i n g n e t w o r k , e i t h e r a c r o s s b a r c o n n e c t i o n of b u s e s o r a m u l t i s t a g e n e t w o r k between processors a n d m e m o r y . P r o c e s s o r - m e m o r y c o m m u n i c a t i o n can also be via a m u l t i p o r t e d m e m o r y . A n i n t e r l e a v e d m e m o r y is v e r y s u i t a b l e for memory
multiprocessors
to
avoid
some
of
the
memory
shared-
contentions.
Com-
m u n i c a t i o n b e t w e e n p r o c e s s e s r u n n i n g c o n c u r r e n t l y in different p r o c e s s o r s o c c u r s t h r o u g h shared variables a n d c o m m o n access to o n e large a d d r e s s space. A n a d v a n t a g e of s h a r e d - m e m o r y m u l t i p r o c e s s o r s is t h e m e m o r y s p a c e s a v i n g s i n c e o n e c o p y of t h e o p e r a t i n g s y s t e m suffices. T h e r e is a l i m i t o n t h e n u m b e r of p r o c e s s o r s in a shared-memory multiprocessor due to the m e m o r y contentions that increase with a n i n c r e a s i n g n u m b e r of p r o c e s s o r s . S o m e of t h e s h a r e d - m e m o r y projects
are
University
BUTTERFLY
at
Bolt,
Beranek,
of Illinois, a t U r b a n a - C h a m p a i g n ,
and CM*
Newman, and
multiprocessor
CEDAR
C.MMP
at
at
the
Carnegie-
M e l l o n U n i v e r s i t y , C O N C E R T a t M I T ( M a s s a c h u s e t t s I n s t i t u t e of T e c h n o l o g y ) , H O M O G E N E O U S M U L T I P R O C E S S O R from C a n a d a , G I G A C O M P U T E R
at
A r g o n n e N a t i o n a l L a b o r a t o r y , M I D A S a t t h e U n i v e r s i t y of C a l i f o r n i a a t B e r k e l e y , P U M P S a t P u r d u e U n i v e r s i t y a n d R i c e U n i v e r s i t y , R E M P S a t t h e U n i v e r s i t y of S o u t h e r n C a l i f o r n i a , T A M I P S f r o m J a p a n , T R A C a t t h e U n i v e r s i t y of T e x a s a t Austin, a n d U L T R A
at N e w Y o r k
University. C E D A R
has processor
clusters
w h e r e a p r o c e s s o r c a n a c c e s s its o w n l o c a l m e m o r y o r t h e l o c a l m e m o r y of o t h e r p r o c e s s o r s in t h e cluster. C E D A R c o m b i n e s t h e c o n t r o l m e c h a n i s m of d a t a f l o w a r c h i t e c t u r e a n d t h e s t o r a g e m e c h a n i s m of v o n N e u m a n n m a c h i n e s . D I R E C T , a multiprocessor
developed
at
the
University
of
Wisconsin
has
an
associative
m e m o r y . A n a s s o c i a t i v e m e m o r y is a c o n t e n t a d d r e s s a b l e s t o r a g e , t h a t is, cells in memory
are addressed
not
by location, but
by content. T R A C
has a
special
p r o p e r t y called varistructurability which m e a n s t h a t a n η-byte o p e r a n d can processed by one or m o r e byte-wide processors. T h e o p c o d e that directs
be
these
o p e r a t i o n s m u s t b e i n d e p e n d e n t of t h e p h y s i c a l s t r u c t u r e of t h e m a c h i n e . T h e message-passing multiprocessors d o not have any globally shared memory. Each processor has a local m e m o r y a n d a n interprocessor connection network. T h e a d v a n t a g e of t h e m e s s a g e - p a s s i n g m o d e l is t h a t d a t a is p a s s e d o n l y o n c e t h r o u g h t h e c o n n e c t i o n n e t w o r k w h i l e t w o p a s s e s ( w r i t e a n d r e a d ) a r e n e e d e d for
the
s h a r e d - m e m o r y m o d e l u n l e s s t h e d a t a is in t h e l o c a l s t o r a g e . Y e t a n o t h e r a d v a n t a g e is t h a t
for
data-driven
computation,
data
is p a s s e d
through
the
network
at
g e n e r a t i o n t i m e a n d n o t w h e n it is n e e d e d . T h u s l o n g e r d e l a y s t h r o u g h t h e n e t w o r k c a n b e t o l e r a t e d in t h e c a s e w h e n d a t a is n o t u s e d i m m e d i a t e l y after its g e n e r a t i o n .
61 T h e s e m a c h i n e s c a n h a v e a v e r y l a r g e n u m b e r of p r o c e s s o r s , t h u s p o t e n t i a l l y h a v i n g a very h i g h p e r f o r m a n c e . M e s s a g e - p a s s i n g m u l t i p r o c e s s o r s a r e difficult t o p r o g r a m since a p r o g r a m m e r m u s t k n o w t h e c o d e e x e c u t e d b y e a c h p r o c e s s o r in o r d e r t o p a s s t h e d a t a b e t w e e n p r o c e s s o r s c o r r e c t l y . S o m e of t h e m e s s a g e - p a s s i n g m u l t i p r o c e s s o r p r o j e c t s a r e C H I P a t t h e U n i v e r s i t y of W a s h i n g t o n a n d P u r d u e U n i v e r sity, C O N N E C T I O N M A C H I N E a t M I T a n d T h i n k i n g M a c h i n e s , Inc., C O S M I C CUBE DON
at California from
Japan,
I n s t i t u t e of T e c h n o l o g y , MANIP
at P u r d u e
DADO
University,
at C o l u m b i a
MU6V
from
University,
England,
and
Z M O B a t t h e U n i v e r s i t y of M a r y l a n d . P A S M is a m e s s a g e - p a s s i n g m u l t i p r o c e s s o r at
Purdue
University
with
a partitionable
SIMD/MIMD
architecture. A
par-
t i t i o n a b l e S I M D / M I M D s y s t e m is a p a r a l l e l p r o c e s s i n g s y s t e m w h i c h c a n b e s t r u c t u r e d a s o n e o r m o r e i n d e p e n d e n t S I M D a n d / o r M I M D m a c h i n e s of v a r i o u s sizes. FAIM-1
at
Fairchild
Laboratory
for
Artificial
Intelligence
has
a
number
of
p r o c e s s o r s w h e r e e a c h p r o c e s s o r is a f a n a t i c a l l y r e d u c e d i n s t r u c t i o n set c o m p u t e r (FRISC).
FRISC
uniprocessor
supports
Lisp-Machines:
low-level
symbol
tagged-memory
processing architecture,
in
ways
stack
similar
caches,
and
to a
t a i l o r e d i n s t r u c t i o n set. The W A F E R S C A L E I N T E G R A T E D M U L T I P R O C E S S O R at the University of Illinois a t U r b a n a - C h a m p a i g n h a s t h e m u l t i p r o c e s s o r p l a c e d o n a wafer. A wafer scale i n t e g r a t e d m u l t i p r o c e s s o r is a m a c r o - c i r c u i t c o n s i s t i n g of a r e c t a n g u l a r a r r a y of i n t e r c o n n e c t e d m o d u l e s a r r a n g e d o n a l a r g e p i e c e of silicon. E a c h of t h e s e m o d u l e s c o u l d b e a s c o m p l e x a s t h e v e r y l a r g e scale i n t e g r a t e d
(VLSI)
multi-
processor. These m o d u l e s are n o t separately manufactured, tested a n d then assemb l e d a s V L S I c h i p s a r e . T h e y a r e f a b r i c a t e d a s a single u n i t , t h e V L S I wafer. R P 3 a t I B M , T . J. W a t s o n R e s e a r c h C e n t e r , C H O P P a t C o l u m b i a U n i v e r s i t y , H M 2 P at Rennsselaer Polytechnic Institute, M U L T I
PROCESSOR/COMPUTER
a t P r i n c e t o n U n i v e r s i t y h a v e a o r g a n i z a t i o n a l d u a l i t y of s h a r e d - m e m o r y
multi-
processors and message-passing multiprocessors. They incorporate the advantages of b o t h m o d e l s a n d h e n c e s e r v e m o r e a p p l i c a t i o n s . U L T R A a n d R P 3 h a v e a s p e c i a l switch feature called c o m b i n i n g . In this process, m e m o r y requests a i m e d at the same m e m o r y location are c o m b i n e d into one request at the switch they are passing by. F F P M a t t h e U n i v e r s i t y of N o r t h C a r o l i n a , M U L T I P R O C E S S O R TION
MACHINE
from
England, S E R F R E
from
REDUC-
France, R E D I F L O W
at
the
U n i v e r s i t y of U t a h a r e all R e d u c t i o n m u l t i p r o c e s s o r s . I n a r e d u c t i o n c o m p u t e r , t h e r e q u i r e m e n t for a r e s u l t t r i g g e r s t h e e x e c u t i o n of t h e i n s t r u c t i o n t h a t will g e n e r a t e t h e v a l u e . It is a s s o c i a t e d w i t h a p p l i c a t i v e ( r e d u c t i o n of f u n c t i o n a l ) l a n g u a g e s . T h e
62 reduction
computer
maps
the
functional
language
expressions
onto
s t o r a g e d y n a m i c a l l y . T h i s is a m a c h i n e - w i d e p r o c e s s w h i c h i n v o l v e s
hardware
interrupting
c o m p u t a t i o n s in t h e m a c h i n e , d e t e r m i n i n g w h e r e r e s o u r c e s a r e a v a i l a b l e o r n e e d e d , a n d finally r e d i s t r i b u t i n g t h e a v a i l a b l e r e s o u r c e s . T h e r e d u c t i o n l a n g u a g e s a t t e m p t t o relieve t h e p r o g r a m m i n g p r o b l e m s , s u c h a s explicitly specifying flows of c o n t r o l a n d m a n a g i n g m e m o r y cells, n o r m a l l y a s s o c i a t e d w i t h c o n v e n t i o n a l c o m p u t e r s . T h e style of p r o g r a m m i n g is strictly f u n c t i o n a l , b a s e d o n a few e l e m e n t a r y m a t h e m a t i c a l constructs featuring a binary tree structure, from which c o m p l e x expressions are built u p by recursive application.
2.1.5
Supercomputers
Supercomputers
are
computers
with
colossal
computational
speeds,
large
m e m o r y , a n d h i g h c o s t . B a s e d o n t o d a y ' s t e c h n o l o g y , a c o m p u t e r is c o n s i d e r e d t o be a s u p e r c o m p u t e r
if it c a n
perform
hundreds
of m i l l i o n s
of f l o a t i n g
point
o p e r a t i o n s p e r s e c o n d ( l O O M F l o p s ) w i t h a w o r d l e n g t h of a p p r o x i m a t e l y 64 b i t s a n d a m a i n m e m o r y c a p a c i t y of m i l l i o n s of b y t e s . S u p e r c o m p u t e r s a r e s t r u c t u r e d in three
architectural
processors.
A
classes: pipelined
supercomputer
is
computers,
implemented
array
processors,
using
the
and
fastest
multi-
and
most
s o p h i s t i c a t e d c i r c u i t s a v a i l a b l e a n d it is a l s o a r c h i t e c t u r a l l y b a l a n c e d for t h e h i g h e s t e c o n o m y of t h r o u g h p u t . A s u p e r c o m p u t e r ' s usefulness is n o t e n t i r e l y d e t e r m i n e d b y its h a r d w a r e c a p a b i l i t i e s . I n fact, t h e efficiency availability
of " s u p e r - s o f t w a r e "
that
relies t o a l a r g e e x t e n t o n
is e a s y t o u s e a n d
can obtain
the
maximum
p a r a l l e l i s m f r o m t h e h a r d w a r e . T h e p r o c e s s of r e p l a c i n g a b l o c k of s e q u e n t i a l c o d e b y a few v e c t o r i n s t r u c t i o n s is c a l l e d v e c t o r i z a t i o n . T h e p o r t i o n of t h e c o m p i l e r t h a t regenerates
this
parallelism
regenerates
the parallelism
is
known
as
vectorizer.
lost b y u s i n g s e q u e n t i a l
A
vectorizing
languages. N O N - V O N
C o l u m b i a U n i v e r s i t y , E M S Y f r o m t h e F e d e r a l R e p u b l i c of G e r m a n y , STOKES
COMPUTER
Research
Center,
PAX
at Princeton from
Japan,
University, G F 1 1 SI
at
compiler
Lawrence
at
NAVIER-
at I B M , T . J .
Watson
Livermore
National
L a b o r a t o r y a n d S t a n f o r d U n i v e r s i t y a r e s o m e of t h e n e w e r s u p e r c o m p u t e r p r o j e c t s b e i n g p u r s u e d . All t h e s e a r e m e s s a g e - p a s s i n g m u l t i p r o c e s s o r s . T h e y h a v e a v e r y l a r g e n u m b e r ( u p t o 1,000,000) of p r o c e s s o r s c o m m u n i c a t i n g via a n efficient c o m m u n i c a t i o n n e t w o r k . N A V I E R - S T O K E S C O M P U T E R is e x p e c t e d t o h a v e a s p e e d of 6 0 G F l o p s
and
P A X , a s p e e d of
lOOGFlops. Commercial
supercomputers
i n c l u d e t h e C r a y X - M P , C r a y 2, N E C S X Series, E T A - 1 0 , F u j i t s u F a c o m a n d H i t a c h i 8 0 0 Series.
Series
63 2.1.6 Systolic
Arrays
W A R P at Carnegie-Mellon University a n d G E a n d S Y S T O L I C
PROCESSOR
a t E S L I n c o r p o r a t e d ( T R W s u b s i d i a r i e s ) a r e s y s t o l i c a r r a y p r o j e c t s . T h e systolic a r r a y is a n a r r a y of p r o c e s s i n g e l e m e n t s (cells) of t h e s a m e t y p e , e x c e p t t h a t t h e b o u n d a r y cells m a y b e different. S i m u l t a n e o u s c o m p u t a t i o n s t h a t a r e s h o r t
and
e x e c u t e s y n c h r o n o u s l y a r e s a i d t o b e systolic. E v e r y p r o c e s s o r p u m p s d a t a in a n d o u t , e a c h t i m e p e r f o r m i n g s o m e s h o r t c o m p u t a t i o n , s o t h a t a r e g u l a r flow of d a t a is k e p t u p in t h e n e t w o r k . C o m m u n i c a t i o n is b e t w e e n a d j a c e n t p r o c e s s i n g e l e m e n t s a n d e x t e r n a l c o m m u n i c a t i o n is via t h e b o u n d a r y p r o c e s s i n g e l e m e n t s . P r o c e s s o r s are a t t a c h e d to a host. T h e systolic a r r a y processor executes c o m p u t a t i o n intensive, b u t r e g u l a r r o u t i n e s , a n d t h e h o s t r u n s t h e m a i n a p p l i c a t i o n p r o g r a m s . T h e cells a r e p r o g r a m m a b l e s o t h a t t h e p r o c e s s o r a r r a y c a n i m p l e m e n t different a l g o r i t h m s . E a c h d a t a i t e m c a n b e u s e d a n u m b e r of t i m e s o n c e it is a c c e s s e d , a n d t h u s , a h i g h c o m putation
throughput
can
be
achieved
with
only
modest
bandwidth.
These
processors are especially suited t o a l g o r i t h m s with regular d a t a m o v e m e n t patterns.
2.1.7 Very Large
Instruction
Word
(VLIW)
Machines
E L I - 5 1 2 , d e s i g n e d a t Y a l e U n i v e r s i t y , is a V e r y L a r g e I n s t r u c t i o n W o r d ( V L I W ) m a c h i n e . V L I W m a c h i n e s a r e h i g h l y p a r a l l e l a r c h i t e c t u r e s t h a t offer a n a l t e r n a t i v e to multiprocessors a n d array processors. They resemble ordinary
multiprocessors
b u t h a v e a t i g h t l y c o u p l e d , single-flow c o n t r o l m e c h a n i s m . P r o g r a m s for V L I W s m u s t specify
fine-grained
h a r d w a r e c o n t r o l . It is i m p o s s i b l e t o h a n d c o d e
VLIW
m a c h i n e s . V L I W m a c h i n e s h a v e o n e c e n t r a l c o n t r o l u n i t i s s u i n g a single w i d e instruction
per
cycle.
Each
wide
instruction
consists
of
many
independent
o p e r a t i o n s . E a c h o p e r a n d r e q u i r e s a s m a l l , s t a t i c a l l y p r e d i c t a b l e n u m b e r of cycles to execute. O p e r a t i o n s are pipelined. T h e underlying sequential architecture
is
i n v a r i a b l y a r e d u c e d i n s t r u c t i o n set c o m p u t e r . T h e i n s t r u c t i o n s in t h e u n d e r l y i n g R I S C - l e v e l a r e c a l l e d o p e r a t i o n s , w h i l e t h e t e r m i n s t r u c t i o n is r e s e r v e d for t h e v e r y l o n g i n s t r u c t i o n w o r d s , w h i c h a r e c o l l e c t i o n s of o p e r a t i o n s . T h e i n s t r u c t i o n s a r e in a single flow of c o n t r o l . T h u s a single l o n g i n s t r u c t i o n w o r d is fetched, a n d all t h e p r o c e s s o r s d o t h e i r i n d i v i d u a l o p e r a t i o n s . T h e o p e r a t i o n s differ for t h e
various
p r o c e s s o r s . After a n i n s t r u c t i o n is e x e c u t e d , t h e n e x t i n s t r u c t i o n is c h o s e n
and
fetched. T h e i n s t r u c t i o n w o r d c o m p l e t e l y c o n t r o l s all c o m m u n i c a t i o n s a m o n g t h e p r o c e s s o r s . D a t a t r a n s f e r s a n d t h e i r t i m i n g s a r e c o m p l e t e l y c h o r e o g r a p h e d in t h e
64
c o d e . C o m p a c t i o n is t h e p r o c e s s of g e n e r a t i n g v e r y l o n g i n s t r u c t i o n s f r o m
some
s e q u e n t i a l s o u r c e . A c o m p a c t i n g c o m p i l e r is a c o m p i l e r t h a t t a k e s s o m e s e q u e n t i a l high-level s o u r c e a n d g e n e r a t e s c o m p a c t e d c o d e . A c o m p i l e r ( B u l l d o g ) exists ( a t Y a l e ) t h a t c a n p r o d u c t h i g h l y p a r a l l e l c o d e f r o m a b r o a d r a n g e of o r d i n a r y s e q u e n tial p r o g r a m s . T h i s c o m p i l e r
uses a technique called T r a c e
Scheduling.
Trace
s c h e d u l i n g is a c o m p l e x p r o c e d u r e . T o h a n d l e c o n d i t i o n a l j u m p s in a p r o g r a m , a t r a c e s c h e d u l i n g c o m p i l e r uses i n f o r m a t i o n
a b o u t t h e d y n a m i c b e h a v i o r of t h e
p r o g r a m t o d o g r e e d y s c h e d u l i n g of o p e r a t i o n s . T h e c o m p i l e r c a n m a k e
good
g u e s s e s w h e n j u m p s a r e w e i g h e d h e a v i l y t o w a r d s o n e l e g — b e c a u s e in t h i s c a s e it is productive to be greedy. Otherwise V L I W s are p r o b a b l y the w r o n g architecture to use.
2.1.8 Reduced
Instruction
Set Computer
(RISC)
Uniprocessors
R I S C a t U n i v e r s i t y of C a l i f o r n i a a t B e r k e l e y a n d M I P S a t S t a n f o r d
University
a r e u n i p r o c e s s o r s b a s e d o n a R e d u c e d I n s t r u c t i o n Set C o m p u t e r ( R I S C ) a r c h i t e c t u r e . R I S C a r c h i t e c t u r e f e a t u r e s a s i m p l e , r e g u l a r i n s t r u c t i o n set w h i c h a l l o w s a combination
of i n s t r u c t i o n s t o b e e x e c u t e d faster t h a n t h e e q u i v a l e n t
complex
i n s t r u c t i o n s . A t r a d i t i o n a l c o m p l e x i n s t r u c t i o n set c o m p u t e r relies o n h u n d r e d s of specialized
instructions,
dozens
of
addressing
modes,
and
several
high-level
l a n g u a g e s i m p l e m e n t e d in h a r d w a r e . I n s u c h a c o m p u t e r t h e c o m p i l e r m u s t c o n sider t h e m a n y p o s s i b i l i t i e s i n h e r e n t in a c o m p l e x i n s t r u c t i o n a n d p e r f o r m a n u m b e r of m e m o r y t r a n s f e r s t o e x e c u t e it. T h i s r e q u i r e s i d e n t i f y i n g t h e i d e a l a d d r e s s i n g m o d e a n d t h e s h o r t e s t i n s t r u c t i o n f o r m a t t o a d d t h e o p e r a n d s in m e m o r y . Y e t o n l y a s m a l l n u m b e r of i n s t r u c t i o n t y p e s t a k e s u p m o s t of a c o m p u t e r ' s e x e c u t i o n t i m e . L o a d , call a n d b r a n c h i n s t r u c t i o n s a r e f o u n d in c o m p i l e d c o d e m o r e often t h a n a n y other instruction
type. C o m p l e x
o p e r a t i o n s c a n a c t u a l l y b e e x e c u t e d faster
by
b r e a k i n g e a c h o n e d o w n i n t o a series of s i m p l e i n s t r u c t i o n s t h a t m o v e d a t a b e t w e e n r e g i s t e r s a n d m e m o r y . T h i s is t h e p r i n c i p l e b e h i n d t h e R I S C a p p r o a c h . S o m e salient f e a t u r e s of a R I S C - b a s e d m a c h i n e a r e r e g i s t e r t o r e g i s t e r o p e r a t i o n s t h a t
allow
o p t i m i z a t i o n of c o m p i l e r s t h r o u g h r e u s e of o p e r a n d s w i t h i n s t r u c t i o n f o r m a t s , a n d a d d r e s s i n g m o d e s t h a t p e r m i t i n s t r u c t i o n s t o b e d e c o d e d in a s i n g l e - m a c h i n e cycle. M e m o r y reference i n s t r u c t i o n s c o n s i s t i n g of l o a d a n d s t o r e o p e r a t i o n s a r e a l s o typical. A R I S C m a c h i n e h a s a high p e r f o r m a n c e
m e m o r y hierarchy
including
g e n e r a l p u r p o s e r e g i s t e r a n d c a c h e . O n e of t h e a d v a n t a g e s of t h e R I S C a p p r o a c h is t h e p o t e n t i a l t o r e u s e a n y r e s u l t w i t h o u t c o m p u t i n g it.
65 2.2 I/O
Advances
S e i s m i c p r o c e s s i n g is i n d i s p u t a b l y o n e of t h e m o s t d a t a i n t e n s i v e a p p l i c a t i o n s t o b e f o u n d . W e s t e r n G e o p h y s i c a l often c l a i m e d t h a t its t a p e l i b r a r y w a s s e c o n d o n l y t o t h a t of t h e U . S . g o v e r n m e n t in size. D a t a c o l l e c t i o n , p r o c e s s i n g a n d s t o r a g e is t h u s a m a t t e r of c o n s i d e r a b l e i m p o r t a n c e . C l e a r l y , a c o m p u t e r w i t h t h e fastest of p r o c e s s o r s is u n e q u a l t o t h e t a s k of c o m m e r c i a l seismic p r o c e s s i n g if its I/O
com-
p o n e n t s a r e i n a d e q u a t e . T h e seismic i n d u s t r y h a s n o t b e e n j u s t a c o n s u m e r of
I/O
devices. It h a s , i n s t e a d , b e e n a p r i m a r y m o t i v a t i n g force in t h e d e v e l o p m e n t of n e w devices. It h a s l o n g b e e n s t a n d a r d o p e r a t i n g p r o c e d u r e for I/O
manufacturers
to
a r r a n g e e a r l y e x p e r i m e n t s a n d t e s t s of t h e i r e q u i p m e n t in a s e i s m i c e n v i r o n m e n t . I/O
a d v a n c e s h a v e o c c u r r e d in m a n y t y p e s of h a r d w a r e : c h a n n e l s , c a r t r i d g e
t a p e s , o p t i c a l d i s k s , h y p e r d i s k s , solid s t a t e d e v i c e s , r a s t e r i z e r s , p l o t t e r s a n d
CRT
g r a p h i c d i s p l a y s . It is p o s s i b l e o n l y t o s u m m a r i z e t h e l a t e s t s t a t u s of t h e s e t y p e s of devices w i t h o u t l a r g e c h a p t e r s of t e c h n i c a l d e t a i l .
CHANNELS
It s h o u l d
be mentioned
that
mainframes
and
supercomputers
use
channels
w h e r e a s m i n i c o m p u t e r s u s e b u s s e s . T h e e s s e n t i a l difference in t h e s e is t h a t a b u s h a n d l e s all d a t a traffic b e t w e e n u n i t s of a c o m p u t e r s y s t e m w h e r e a s c h a n n e l s h a n d l e o n l y t h e traffic t o a n d f r o m specific I/O
controllers and memory. The
standard
c h a n n e l s p e e d o v e r t h e p a s t s e v e r a l y e a r s for I B M - l i k e s y s t e m s h a s b e e n M b y t e s / s e c w i t h a m a x i m u m of 32 c h a n n e l s . R e c e n t l y , I B M , A m d a h l a n d
three others
h a v e a n n o u n c e d 4.5 M b y t e c h a n n e l s . C D C - l i k e s y s t e m s ( C D C , C r a y , E T A ) h a v e a l l o w e d o n l y 16 c h a n n e l s b u t a t e s s e n t i a l l y t w i c e t h e s p e e d . C r a y p i o n e e r e d t h e d e v e l o p m e n t of 100 M b y t e c h a n n e l s b e t w e e n m e m o r y a n d I/O
s u b s y s t e m s w h i c h in effect a r e c o m p u t e r s in t h e i r o w n r i g h t . T h e r e a r e s o m e
s i m i l a r i t i e s in t h i s i d e a w i t h t h e e a r l i e r " d i r e c t l y c o u p l e d s y s t e m " d e v e l o p e d b y I B M for N A S A . T h e I/O s u b s y s t e m s in t u r n h a v e s p e c i a l c h a n n e l s for h i g h p e r f o r m a n c e disk u n i t s , " h y p e r d i s k s , " s u c h a s t h e I b i s a n d H y d r a d r i v e s . C r a y a l s o d e v e l o p e d a 1.25 G b y t e c h a n n e l for d a t a t r a n s f e r s b e t w e e n its S o l i d S t a t e D e v i c e ( S S D ) a n d memory
on
its
X-MP
series. T h e
following
figure
shows
relative
speeds
M b y t e s / s e c for t h e v a r i o u s d a t a p a t h s in a t y p i c a l m o d e r n s u p e r c o m p u t e r .
in
66 TAPE
1.25
CHANNEL
1.8
MEMORY
11000
CPU
DISK
3.0
CHANNEL
3.0
MEMORY
11000
CPU
HYPERDISK
10.0
SUBSYSTEM
100
MEMORY
11000
CPU
SSD
1300
CHANNEL
1300
MEMORY
11000
CPU
T a p e channels, t h o u g h slower a n d cheaper t h a n disk channels, are usually r a t e d a t a h i g h e r s p e e d t h a n t h e t a p e s t h e m s e l v e s . P e r h a p s faster t a p e s a r e t o b e expected shortly. C o m p u t e r s of different v e n d o r s c a n a l s o b e c o n n e c t e d b y h i g h s p e e d d e v i c e s such as N e t w o r k System's H Y P E R c h a n n e l a n d C D C ' s Loosely C o u p l e d N e t w o r k w h i c h o p e r a t e a t 50 M b i t s / s e c (6.25 M b y t e s / s e c o r less). By c o m p a r i s o n , t h e s p e e d of a D E C U n i b u s is e s s e n t i a l l y 1 M b y t e / s e c a n d a n E t h e r n e t is 10 M b i t s / s e c (1.25 M b y t e s / s e c o r less). W i d e a r e a n e t w o r k s o p e r a t e a t 56 K b i t s / s e c a n d u s e r t e r m i n a l s a t n o m o r e t h a n 19.2 K b i t s / s e c . A few n e t w o r k s n o w o p e r a t e a t T l s p e e d s of 1.54 M b i t / s e c .
TAPES T a p e a d v a n c e s h a v e n o t s h o w n t h e s a m e m a g n i t u d e in i m p r o v e m e n t s a s o n e finds
in c o m p u t a t i o n s . T h e f o l l o w i n g t a b l e s u m m a r i z e s t h e r e l a t i v e
performance
r a t e s a t t h e b e g i n n i n g of e a c h of t h e last t h r e e d e c a d e s in t a p e t e c h n o l o g y a n d in computational performance.
T A B L E 1.
Year
T a p e Speed
Tape Density
(in/sec)
(bpi)
MIPS
1960
75
800
1
1970
125
1600
20
1980
200
6250
200
T h u s , t a p e s a r e 2 0 t i m e s a s fast w h e r e a s c o m p u t e r s a r e 2 0 0 t i m e s a s fast. T h e p r e s e n t d e c a d e h a s w i d e n e d t h i s difference w i t h c o m p u t e r s o p e r a t i n g a t o n e g i g a f l o p ( a p p r o x i m a t e l y t h e e q u i v a l e n t of 3 0 0 0 m i p s ) w i t h n o s u b s t a n t i a l i m p r o v e m e n t in
67 t a p e I/O.
Fortunately,
arithmetic per unit
the
computing
is m o r e
sophisticated
now,
with
more
I/O.
T h e r e c e n t c a r t r i d g e t a p e s r e p r e s e n t i m p r o v e m e n t s in t h e h a n d l i n g a n d s t o r a g e of t a p e a r c h i v e s . N o t o n l y d o t h e y l o a d a u t o m a t i c a l l y a n d a r e s m a l l e r b u t a l s o t h e y c a n s t o r e u p t o 3 G b y t e s of d a t a w h i c h rivals t h e c a p a c i t y of t h e o p t i c a l d i s k s .
OPTICAL
DISKS
O p t i c a l s t o r a g e t e c h n o l o g y is g r a d u a l l y b e c o m i n g m o r e i m p o r t a n t . I t s c h a r a c teristics
make
it
an
interesting
alternative
to
conventional
magnetic
storage
technology, especially m a g n e t i c tape. O p t i c a l s t o r a g e w a s first u s e d c o m m e r c i a l l y for v i d e o a n d a u d i o c o m p a c t d i s k s . W h e r e a s in m a g n e t i c m e d i u m ,
information
is r e c o r d e d
and
read
by
changing
m a g n e t i c p r o p e r t i e s , o p t i c a l s t o r a g e t e c h n o l o g y uses t i n y s o l i d - s t a t e l a s e r s t o c r e a t e ( w r i t e ) a n d s e n s e ( r e a d ) m i c r o s c o p i c p i t s in t h e d i s k ' s surface. T y p i c a l l y , t h e d i s k is c o a t e d w i t h a reflective m a t e r i a l ; w r i t i n g t h e n c o n s i s t s of b u r n i n g a pit i n t o t h a t s u r face m a t e r i a l u s i n g t h e l a s e r a t a h i g h e r p o w e r s e t t i n g , w h i l e r e a d i n g is d o n e b y m e a s u r i n g t h e reflectivity of a p a r t i c u l a r p o s i t i o n . T h u s , h i g h reflectivity ( n o p i t ) m i g h t r e p r e s e n t a 0 a n d l o w reflectivity ( p i t ) a t 1. T h i s s e t - u p is t h e b a s i s for all of t h e c u r r e n l y ( 1 9 8 7 ) c o m m e r c i a l l y a v a i l a b l e l a s e r d i s k s ; it follows f r o m t h i s t h a t i n f o r m a t i o n c a n b e r e c o r d e d o n l y o n c e , b u t r e a d m a n y t i m e s , g i v e n rise t o t h e a c r o n y m W O R M ("write once, read m a n y " ) . This indicates the major d i s a d v a n t a g e of c u r r e n t
optical
storage
technology:
it is g e n e r a l l y
not
possible
to
change
information stored o n such a laser disk. ( S t r i c t l y s p e a k i n g , t h i s is n o t q u i t e t r u e ; if o n e u s e s c e r t a i n n o n - s t a n d a r d c o d e s t o r e c o r d i n f o r m a t i o n , a c e r t a i n n u m b e r of c h a n g e s of i n f o r m a t i o n r e c o r d e d o n a W O R M l a s e r d i s k is p o s s i b l e . F o r a d i s c u s s i o n of t h i s issue a n d h o w t o g u a r a n t e e t h a t s u c h c h a n g e s c a n b e p r e v e n t e d , see [ L E I S S 8 4 ] . H o w e v e r , since t h i s w o u l d r e q u i r e c h a n g e s in t h e r e c o r d i n g s o f t w a r e a n d
firmware,
t h i s p o s s i b i l i t y is i g n o r e d
here.) The
ability
accustomed
to
rewrite
t o it. H o w e v e r ,
information
seems
upon examining
crucial,
mainly
the requirements
because
one
of seismic
is
data
s t o r a g e ( a s well a s t h o s e of m a n y o t h e r t y p e s of i n f o r m a t i o n ) , it s h o u l d b e o b v i o u s that the W O R M
m e d i u m l a s e r d i s k is q u i t e a c c e p t a b l e , e s p e c i a l l y since it h a s
s e v e r a l i n t e r e s t i n g f e a t u r e s t h a t a r e q u i t e a t t r a c t i v e for s t o r a g e of s e i s m i c d a t a : 1. Permanence
and Robustness:
C o m p a r e d with magnetic media,
information
68 s t o r e d o n l a s e r d i s k s is far less affected b y e n v i r o n m e n t a l f a c t o r s . A l a s e r d i s k c a n b e r e m o v e d a n d s t o r e d m u c h like a m a g n e t i c t a p e b u t u n l i k e a m a g n e t i c disk. M a g n e t i c fields, h e a t , h u m i d i t y , a n d w i t h i n l i m i t s d u s t d o n o t affect a l a s e r d i s k t h a t is s t o r e d for l o n g p e r i o d s of t i m e in a n office o r a w a r e h o u s e . M a g n e t i c t a p e o n t h e o t h e r h a n d m u s t b e s t o r e d in a very c o n t r o l l e d e n v i r o n m e n t if it is t o
survive
reliably for e v e n o n l y five y e a r s . 2. Information
Density:
B e c a u s e i n f o r m a t i o n is o p t i c a l l y r e c o r d e d , t h e infor-
m a t i o n d e n s i t y is significantly h i g h e r t h a n t h a t of m a g n e t i c m e d i a . F o r e x a m p l e , a single
one
of
the
ubiquitous
audio
compact
disks
holds
540 M e g a b y t e s
or
4.32 G i g a b i t s of i n f o r m a t i o n ( a b o u t 3 0 0 , 0 0 0 p a g e s of d o u b l e - s p a c e c o p y ) . K o d a k r e c e n t l y i n t r o d u c e d a s y s t e m t h a t s t o r e s o n e trillion b y t e s (8 T e r a b i t s ) o n
four
14-inch d i s k s [ H E C H 8 7 ] . 3. Elimination
of Head
Crashes:
T h e technical set-up allows a distance on the
o r d e r of m i l l i m e t e r s b e t w e e n h e a d a n d d i s k ; t h u s t h e d r e a d e d h e a d c r a s h e s of m a g n e t i c s t o r a g e m e d i a , w h e r e d i s t a n c e is o n e o r d e r of m a g n i t u d e s m a l l e r ,
is
eliminated. ( H e a d crashes occur when dust particles are caught between the head a n d the disk surface; they destroy the disk a n d the head, b u t even m o r e d a m a g i n g , they irretrievably erase the data. They can be avoided by keeping the environment d u s t free). 4. Fast
Access:
Compared
w i t h m a g n e t i c t a p e , w h i c h is p e r h a p s t h e
most
c o m p a r a b l e s t o r a g e m e d i u m , l a s e r d i s k s p r o v i d e m u c h faster a c c e s s t o i n d i v i d u a l p o r t i o n s of t h e d a t a . T h i s is d u e t o t h e fact t h a t l a s e r d i s k s a l l o w d i r e c t access t o t r a c k s s i m p l y b y m o v i n g t h e r e a d / w r i t e h e a d . I n t h i s , t h e y b e h a v e j u s t like m a g n e t i c d i s k s . M a g n e t i c t a p e o n t h e o t h e r h a n d p r o v i d e s o n l y s e q u e n t i a l access. 5. Removability:
Laser disks c o n t a i n i n g sensitive d a t a c a n be r e m o v e d
from
t h e d i s k d r i v e s ; t h e y a r e m o r e o v e r s m a l l e n o u g h t o fit i n t o safes. T h e r e a r e o t h e r a d v a n t a g e s t h a t a r e n o t d i r e c t l y r e l e v a n t t o seismic
data
s t o r a g e , in p a r t i c u l a r t h e fact t h a t p r e r e c o r d e d c o m p a c t d i s k s a r e c h e a p t o m a s s p r o d u c e . It c a n c o s t b e t w e e n $ 3 0 0 0 a n d $ 5 0 0 0 t o c r e a t e a m a s t e r d i s k of a c o n v e n t i o n a l a u d i o c o m p a c t d i s k , b u t c o p i e s f r o m it c a n b e m a n u f a c t u r e d for less t h a n $5 p e r c o p y [ M A T T 8 7 ] . E n c y c l o p e d i a s a r e a l r e a d y b e i n g d i s t r i b u t e d in t h i s w a y . A m o n g t h e c u r r e n t m a i n p l a y e r s in l a s e r d i s k s (for i n f o r m a t i o n s t o r a g e for u s e w i t h c o m p u t e r s ) a r e L a s e r M a g n e t i c S t o r a g e T e c h n o l o g y ( L M S ) (a j o i n t v e n t u r e between N.V. Philips (Netherlands) a n d C o n t r o l D a t a ( C o l o r a d o ) , K o d a k ,
and
T o s h i b a ) . A significant n u m b e r of c o m p a n i e s a r e a l s o m a n u f a c t u r i n g l a s e r d i s k d r i v e s for p e r s o n a l c o m p u t e r s a n d w o r k s t a t i o n s , w i t h p r i c e s for t h e d r i v e s s t a r t i n g a r o u n d $ 2 5 0 0 a n d t h e 5 1/4 i n c h d i s k s c o s t i n g o n t h e o r d e r of $ 1 0 0 [ H E C H 8 7 ] .
69 E r a s a b l e o p t i c a l d i s k s h a v e b e e n a n n o u n c e d e v e r y y e a r since a t least 1984, a l w a y s for t h e n e x t y e a r . T h e y a r e e x p e c t e d t o use a m a g n e t o - o p t i c
technology
w h e r e b y a l a s e r is u s e d t o c h a n g e t h e c o n f i g u r a t i o n of a m a g n e t i c field o n
the
r e c o r d i n g surface [ M A T T 8 7 ] . T h e m a j o r p r o b l e m s o far s e e m s t h a t t h e n u m b e r of p h a s e c h a n g e s ( c h a n g e s of t h e s t r u c t u r e of t h e a l l o y o n t h e r e c o r d i n g s u r f a c e ) t h a t t h e m a t e r i a l s p e r m i t is n o t h i g h e n o u g h t o yield t r u l y e r a s a b l e l a s e r d i s k s . A n o t h e r p r o b l e m is r e l a t e d t o t h e i n f o r m a t i o n d e n s i t y t h a t c a n b e a c h i e v e d in t h i s w a y . A t present (1987), n o erasable optical disks are commercially available [ H E C H 8 7 ] . F o r t h e s e r e a s o n s , w e e x p e c t l a s e r d i s k s of W O R M
type to be phased
in
g r a d u a l l y a n d in s o m e c a s e s t o r e p l a c e m a g n e t i c t a p e s for t h e s t o r a g e of seismic data. While technologically laser disks are superior to m a g n e t i c tape, the large i n v e s t m e n t in b o t h m a g n e t i c t a p e d r i v e s a n d e v e n m o r e s o in m a g n e t i c t a p e s (all of which w o u l d have t o be copied to laser disks, were o n e to c h a n g e over completely t o o p t i c a l s t o r a g e ) , will s l o w t h i s d e v e l o p m e n t .
HYPERDISKS T h e s t a n d a r d h i g h p e r f o r m a n c e d i s k s for t h e C D C a n d C r a y s y s t e m s h a v e b e e n m a n u f a c t u r e d b y C D C . T h e D D - 2 9 series t r a n s f e r s d a t a a t 4 M b y t e s / s e c a n d h a s a c a p a c i t y of .6 G b y t e s . T h e n e w e r D D - 4 9 series h a s a s p e e d of 10 M b y t e s / s e c a n d a c a p a c i t y of 1.2 G b y t e s . S i n c e 1982, I b i s S y s t e m s of W e s t l a k e , C a l i f o r n i a transfer disk drive m a d e with a p r o p r i e t a r y
has produced a
parallel-
14-inch t h i n film m e d i u m . I t s
first
p r o d u c t , t h e M o d e l 1400, h a s a 12 M b y t e / s e c d a t a t r a n s f e r r a t e a n d a 1.4 G b y t e s t o r a g e c a p a c i t y . I n o r d e r t o m a k e t h e s e d i s k s useful t o i n d u s t r y in g e n e r a l , I b i s h a s d e v e l o p e d t w o i n d u s t r y s t a n d a r d i n t e r f a c e s , I b i s - I a n d I b i s - I I . B o t h of t h e s e i n t e r faces satisfy t h e r e q u i r e m e n t s of t h e I n t e l l i g e n t S t a n d a r d I n t e r f a c e ( I S I ) . I b i s h a s s h i p p e d o v e r 1000 of t h e s e u n i t s t o C r a y , its single l a r g e s t c u s t o m e r . I n o r d e r t o u s e t h e s e d i s k s e v e n m o r e effectively t h a n s i m p l y r e l y i n g o n t h e i r i n h e r e n t s p e e d , t h e c o n c e p t of d i s k s t r i p i n g h a s a r i s e n . I n t h i s t e c h n i q u e , s e q u e n t i a l e l e m e n t s of a file a r e d i v i d e d i n t o s m a l l g r o u p s s o t h a t o n e g r o u p o c c u p i e s o n e t r a c k of a disk. S e q u e n t i a l g r o u p s a r e s t o r e d a c r o s s t h e d i s k u n i t s s o t h a t s e v e r a l g r o u p s c a n b e r e a d in p a r a l l e l . U s i n g a m u l t i d i m e n s i o n a l v a r i a t i o n of t h i s t e c h n i q u e along with
other
c o n v e r t a n I/O bound program.
programming
bound
techniques
Lhemann
three dimensional migration
[LHEM85]
was able
algorithm into a
to
compute
70 RASTERIZERS AND
PLOTTERS
R a s t e r i z e r s , s u c h a s t h e H o u s t o n Scientific H S R series, a r e h a r d w a r e
devices
w h i c h c o n v e r t p i c t u r e s s t o r e d in t h e f o r m of v e c t o r m o v e d r a w files i n t o d i s p l a y files called r a s t e r s . I n t h e s e r a s t e r s , e a c h pixel is r e p r e s e n t e d b y a s little a s o n e bit of d a t a u p t o s e v e r a l b y t e s . O f t e n t h e r e is o n e b y t e for b l a c k a n d w h i t e r a s t e r s a n d u p t o t h r e e for c o l o r . S e i s m i c s o f t w a r e v e n d o r s a r e split a s t o w h e t h e r it is b e t t e r t o r a s t e r i z e w i t h t h e s o f t w a r e of a s u p e r c o m p u t e r o r t o u s e t h e r a s t e r i z e r b o x e s a n d b e tied t o o n e v e n d o r . It is n o w c o m m o n p r a c t i c e t o p r o v i d e b o t h a l t e r n a t i v e s a n d let t h e u s e r select.
3. A D V A N C E S I N
3.1 Languages
SOFTWARE
and
Extensions
F o r t r a n r e m a i n s t h e m o s t c o m m o n l y u s e d p r o g r a m m i n g l a n g u a g e for scientific computing. While other languages are being used (Pascal, C, A d a ) , they should not p r e s e n t m a j o r c h a l l e n g e s t o F o r t r a n ' s d o m i n a t i o n ( s t r a n g l e - h o l d ? ) o n t h i s field for t h e n e a r future. O f i m p o r t a n c e h o w e v e r , is t h e fact t h a t C r a y s e e m s i n t e n t t o p h a s e in U N I X a s m a i n o p e r a t i n g s y s t e m ; t h i s s h o u l d give C a c e r t a i n a d v a n t a g e . T h e e m p h a s i s p l a c e d b y t h e U S D e p a r t m e n t of D e f e n s e ( D o D ) o n A d a d o e s n o t s e e m t o b e s h a r e d b y t h e m a n u f a c t u r e r s of h i g h - p e r f o r m a n c e c o m p u t i n g e q u i p m e n t n o r their software suppliers, mainly because D o D has not (yet) materialized as a major buyer.
On
the
other
hand,
the
proposed
Fortran
Standard,
hopefully
called
F o r t r a n 8X ( t h e X t o b e r e p l a c e d b y e i t h e r 8 o r 9 — t h i s is w h e r e t h e h o p e c o m e in: if final a d o p t i o n d o e s n o t t a k e p l a c e in t h i s d e c a d e , it will b e F o r t r a n 9 X ! ) , will i n c o r p o r a t e c e r t a i n l a n g u a g e f e a t u r e s t h a t will a i d in utilizing v e c t o r , a n d t o a lesser e x t e n t , p a r a l l e l , c o m p u t e r s . F o r t r a n is h i g h l y s u i t a b l e for v e c t o r p r o c e s s i n g b e c a u s e its m a i n p r o g r a m s t r u c t u r e is D O - l o o p , a n d t h i s is p r e c i s e l y t h e c o n s t r u c t vectorizes best automatically. T h e p r o p o s e d
SEG
seismic s u b r o u t i n e s
that
(Seismic
S u b r o u t i n e S t a n d a r d ) a r e b a s i c a l l y a l i b r a r y of s u b r o u t i n e s w h i c h facilitates seismic processing; they are formulated l a n g u a g e - i n d e p e n d e n t l y b u t are clearly a i m e d at F o r t r a n . F o r t r a n h o w e v e r , a l t h o u g h e x c e l l e n t for v e c t o r i z a t i o n , is a p o o r vehicle for parallel c o m p u t a t i o n s . F o r this reason, various languages have been designed with t h e a i m of facilitating t h e u s e of p a r a l l e l i s m t h a t is a v a i l a b l e in t h e h a r d w a r e ; t h e y
71 e n a b l e t h e p r o g r a m m e r t o c o n t r o l p a r a l l e l i s m explicitly. N o n e of t h e m h o w e v e r h a s r e a c h e d a level of a c c e p t a n c e t h a t p r o m i s e s significant p r o s p e c t s for b e c o m i n g a standard (or even only dominating).
3.2
Compilers
There
are
two
p r o d u c e vectorized parallelized
kinds
of c o m p i l e r s
of i n t e r e s t ,
compilers
that
automatically
code (V-compilers) and compilers that automatically
produce
c o d e ( P - c o m p i l e r s ) . I n b o t h c a s e s , t h e s o u r c e p r o g r a m is w r i t t e n in
s o m e s t a n d a r d l a n g u a g e , u s u a l l y F o r t r a n . V - c o m p i l e r s h a v e b e e n in u s e for a n u m b e r of y e a r s ; t h e y a r e t h e m a j o r r e a s o n for t h e r o a r i n g s u c c e s s of v e c t o r c o m p u t e r s . T h e i r m a i n a d v a n t a g e is t h a t t h e y a u t o m a t i c a l l y t r a n s f o r m s t a n d a r d l a n g u a g e i n t o v e c t o r i z e d c o d e , w i t h r e l a t i v e l y little p r o g r a m m e r i n t e r a c t i o n . I n i t i a l l y (six t o e i g h t years ago), V-compilers were rather simple-minded a n d primitive; now, there are fairly s o p h i s t i c a t e d V - c o m p i l e r s a v a i l a b l e for all m a j o r m a c h i n e s w h i c h
approach
r e a s o n a b l y well h a n d v e c t o r i z a t i o n a n d a r e t h e r e f o r e h i g h l y cost-effective. V e c t o r i z a t i o n is t h e a l p h a a n d t h e o m e g a of s e i s m i c p r o c e s s i n g a n d will r e m a i n s o for quite some time. P-compilers (compilers that automatically detect parallelism a n d generate code t o t a k e a d v a n t a g e of t h i s ) a r e a n e n t i r e l y different parallelization
must
explicitly c o d e for
be d o n e
b y h a n d ; in o t h e r
parallelism. A u t o m a t i c
s t o r y . T o d a t e , m o s t of t h e words
the p r o g r a m m e r
parallelization
must
t o d a t e is l i m i t e d
to
i n d i v i d u a l l o o p s [ F E R R 8 5 ] ; p a r a l l e l i s m a t a h i g h e r l a n g u a g e c o n s t r u c t level m u s t still b e specified b y t h e p r o g r a m m e r [ K A R P 8 7 ] . S e v e r a l p r o j e c t s , in a c a d e m i a a n d in i n d u s t r y , a r e u n d e r w a y , b u t t h e p r o b l e m of d e t e c t i n g i n h e r e n t p a r a l l e l i s m in a p r o g r a m is s u b s t a n t i a l l y m o r e difficult t h a n v e c t o r i z a t i o n . E v e n a r a t h e r p r i m i t i v e P - c o m p i l e r is still r e l a t i v e l y far a w a y . O n t h e o t h e r h a n d , it is q u e s t i o n a b l e w h e t h e r parallel
computer
systems
will e v e r
by
variable
without
a
reasonably
smart
P - c o m p i l e r ; t h e c o s t of r e c o d i n g e x i s t i n g a p p l i c a t i o n p r o g r a m s for p a r a l l e l i s m b y h a n d is s i m p l y t o o h i g h .
4. I M P L E M E N T A T I O N : R E A L I T I E S A N D
PITFALLS
P r o b l e m s in seismic d a t a p r o c e s s i n g a r e c h a r a c t e r i z e d b y h u g e d a t a sets, o c c u r ring b o t h as input a n d as o u t p u t . F o r e x a m p l e , a 3 D m i g r a t i o n p r o g r a m m a y h a v e
72 a s i n p u t a d a t a set c o n s i s t i n g of 2 4 0 t r a c e s o n 2 4 0 lines, w i t h e a c h t r a c e c o n t a i n i n g 3 0 0 0 s a m p l e s ( S A L N O R 7 ; see N e l s o n , 1982). C o n s e q u e n t l y , t h e i n p u t file c o n t a i n s 172.8 m i l l i o n n u m b e r s ; if e a c h n u m b e r ( w o r d ) h a s 32 b i t s , t h e i n p u t file is of size 5.5 G i g a b i t s , w i t h t h e o u t p u t file b e i n g of t h e s a m e o r d e r of m a g n i t u d e .
Therefore,
p r o c e s s i n g realistic seismic d a t a sets is very likely t o a t least severely s t r a i n , if n o t e x c e e d t h e c a p a c i t y of m o s t c u r r e n t c o m p u t e r s y s t e m s . T h r e e issues a r e of m a j o r i m p o r t a n c e in t h i s c o n t e x t : - T h e a m o u n t of p r i m a r y o r m a i n m e m o r y a v a i l a b l e for p r o c e s s i n g - T h e a v a i l a b i l i t y of v e c t o r p r o c e s s i n g - T h e p o s s i b i l i t y of utilizing p a r a l l e l i s m , especially m a c r o p a r a l l e l i s m . I n t h e f o l l o w i n g s e c t i o n s , w e d i s c u s s e a c h of t h e s e issues a n d o u t l i n e t h e i r i m p l i c a t i o n s for t h e p r e s e n t a n d t h e f u t u r e of seismic d a t a p r o c e s s i n g .
4.1 In-Core
and Out-of-Core
Programming
A p r o g r a m w h o s e d a t a in t h e i r e n t i r e l y c a n b e r e a d i n t o m a i n m e m o r y
from
s e c o n d a r y s t o r a g e d e v i c e s ( d i s k s , t a p e s ) is c a l l e d i n - c o r e . I n c o n t r a s t , a n o u t - o f - c o r e program
requires
that
the operations
performed
by the p r o g r a m
be
grouped
t o g e t h e r i n t o p r o g r a m p a r t s in s u c h a w a y t h a t t h e d a t a set c a n b e p a r t i t i o n e d i n t o subsets with the following properties: -
E a c h s u b s e t fits i n t o t h e a v a i l a b l e m a i n m e m o r y The
operations
in
one
program
part
require
only
the
data
in
the
c o r r e s p o n d i n g d a t a subset. T h e r e f o r e , a t different t i m e s d u r i n g t h e e x e c u t i o n of t h e p r o g r a m , different d a t a s u b s e t s will r e s i d e in m a i n m e m o r y . W i t h t h e e x c e p t i o n of t h e C r a y 2, c u r r e n t l y a v a i l a b l e c o m p u t e r s y s t e m s a r e u n a b l e t o a c c o m m o d a t e in m a i n m e m o r y d a t a sets of size in excess of 5 G i g a b i t s ; t h e r e f o r e i n - c o r e p r o g r a m s a r e n o t feasible. T h i s l e a v e s t w o a l t e r n a t i v e s , n a m e l y out-of-core p r o g r a m m i n g a n d virtual m e m o r y m a n a g e m e n t . A virtual m e m o r y e n v i r o n m e n t provides a u t o m a t i c paging; this m e a n s that the data
set is u n i f o r m l y
subdivided
into
relatively
small
portions
(in t h e
VAX,
512 w o r d s ) , c a l l e d p a g e s . T h e s e p a g e s initially r e s i d e o n d i s k . W h e n e v e r a d a t a i t e m is n e e d e d d u r i n g e x e c u t i o n , t h e o p e r a t i n g s y s t e m d e t e r m i n e s a u t o m a t i c a l l y in w h i c h page the item resides a n d reads t h a t p a g e from disk into m a i n m e m o r y . While this is d o n e , t h e p r o g r a m w a i t s . T h e r e t r i e v a l of a p a g e f r o m d i s k m a y r e q u i r e t w o o r d e r s of m a g n i t u d e ( o r m o r e ) m o r e t i m e t h a n t h e o p e r a t i o n t h a t is e v e n t u a l l y p e r f o r m e d o n t h e r e q u e s t e d i t e m . S i n c e t h e n u m b e r of p a g e s t h a t fit i n t o m a i n
73 m e m o r y is l i m i t e d , t h e r e q u e s t for a n o t h e r p a g e m a y n e c e s s i t a t e t h e r e m o v a l of a p a g e c u r r e n t l y in m a i n m e m o r y . A l s o , t h e s a m e p a g e m a y h a v e t o b e r e t r i e v e d a g a i n , e v e n if a different d a t a i t e m is r e q u e s t e d , b e c a u s e m a n y different i t e m s r e s i d e in t h e s a m e p a g e . If t h e p a g e h a s b e e n r e m o v e d in t h e m e a n t i m e , it will h a v e t o b e r e a d f r o m d i s k a g a i n in t h i s c a s e . A s a n i l l u s t r a t i o n c o n s i d e r t h e f o l l o w i n g
two
functionally identical F o r t r a n loops: D O 107=1,512
DO
D O 20 J = 1,512
D O 2 0 1 = 1,512
J) = B{1 J) + C(7, J)
A(l
107=1,512
20 C O N T I N U E
A(I, J) = B(I9 J) + C(7, J) 20 C O N T I N U E
10 C O N T I N U E
10 C O N T I N U E
Loops (LI)
Loops (L2 )
If w e a s s u m e t h a t 512 a r r a y e l e m e n t s fit i n t o o n e p a g e , t h e n ( L I ) p e r f o r m s o v e r a q u a r t e r of a m i l l i o n p a g e r e t r i e v a l s , w h e r e a s in ( L 2 ) o n l y 512 p a g e r e t r i e v a l s a r e necessary
b e c a u s e a r r a y s in F o r t r a n
are stored
in c o l u m n s . R u n n i n g t h e
two
p r o g r a m s o n a V A X - 1 1 / 7 8 0 yields t h e f o l l o w i n g t i m i n g s : ( L I ) r e q u i r e s 2 9 3 sec, ( L 2 ) r e q u i r e s 9 sec. V i r t u a l m e m o r y is n o t a t all t h e s a m e a s o u t - o f - c o r e p r o g r a m m i n g : in a n o u t of-core v e r s i o n , t h e e m p h a s i s is a t least a s m u c h o n p a r t i t i o n i n g t h e o p e r a t i o n s of t h e p r o g r a m a s it is o n p a r t i t i o n i n g t h e d a t a ; in fact t h e t w o h a v e t o b e v e r y well c o o r d i n a t e d . I n a v i r t u a l m e m o r y e n v i r o n m e n t , n o a t t e n t i o n is p a i d a t all t o t h e p a r t i t i o n i n g of t h e o p e r a t i o n s , a n d a s t h e e x a m p l e a b o v e s h o w s , v a s t l y
different
d a t a t r a n s f e r r e q u i r e m e n t s a n d c o n s e q u e n t l y v a s t l y different t i m i n g s m a y result. In a virtual m e m o r y e n v i r o n m e n t
the p r o g r a m m e r
is less a b l e t o
control
precisely t h e flow of i n p u t a n d o u t p u t ; t h i s m a y r e s u l t in inefficient u s e of t h e c o m p u t e r resources. F o r this reason, virtual m e m o r y h a s n o t been preferred high-performance systems
at
data
present
do
processing. not
support
Indeed,
supercomputers
virtual
memory
such
management;
as
the
for
Cray
instead
the
p r o g r a m m e r is r e q u i r e d t o p a r t i t i o n d a t a a n d o p e r a t i o n s explicitly. T h i s r e s u l t s in a t r a d e o f f b e t w e e n s a v i n g s in c o m p u t e r r e s o u r c e s ( a t t h e c o s t of
additional
p r o g r a m m e r effort) a n d s a v i n g s in p e o p l e r e s o u r c e s ( a t t h e c o s t of c o m p u t e r t i m e ) . At
present,
out-of-core
programming
is still n e c e s s a r y
in realistic
seismic
p r o c e s s i n g . T o give a c o n c r e t e e x a m p l e of t h e a m o u n t of c o m p u t e r t i m e t h a t c a n b e saved by intelligently restructuring d a t a a n d instructions coordinately, consider a n i m p l e m e n t a t i o n of t h e 3 D P h a s e Shift m i g r a t i o n of t h e S A L N O R 7 m o d e l o n t h e Cray X - M P
[ L H E M 8 5 ] . A perfectly c o m p e t e n t
initial i m p l e m e n t a t i o n
has
an
74 e s t i m a t e d C P U t i m e of 130 sec for lines of 2 5 6 t r a c e s , e a c h t r a c e w i t h 2 0 4 8 s a m p l e s ; however,
closer
inspection
indicated
that
t h e I/O
waiting
time
(the
time
the
p r o g r a m s p e n d s in w a i t i n g u n p r o d u c t i v e l y for r e q u e s t e d d a t a t o b e t r a n s f e r r e d ) w a s approximately
2 8 0 0 sec! T h i s
was
due
to
the
fact
the
initial
implementation
r e q u i r e d t h e t r a n s f e r of a p p r o x i m a t e l y 4 m i l l i o n d i s k s e c t o r s ( s i m i l a r t o a p a g e ) . R e s t r u c t u r i n g t h e a l g o r i t h m r e s u l t e d in t h e s a m e C P U t i m e , b u t t h e n u m b e r of d i s k s e c t o r s t h a t h a d t o b e t r a n s f e r r e d w a s n o w r e d u c e d t o 2 5 0 , 0 0 0 , r e s u l t i n g in a n
I/O
w a i t i n g t i m e of o n l y 175 sec. I n g e n e r a l , a careful a n a l y s i s of t h e d a t a t r a n s f e r s s h o u l d b e m a d e , w i t h special e m p h a s i s o n t h e fact t h a t i t e m s o c c u r in b l o c k s ( s e c t o r s , p a g e s ) a n d t h a t it is t h e b l o c k w h i c h c o n t a i n s a n i t e m t h a t is t r a n s f e r r e d , n o t t h e i n d i v i d u a l i t e m . A s a r u l e of t h u m b , a n y p r o g r a m r e q u i r i n g t h a t i t e m s (i.e., t h e b l o c k s t h a t c o n t a i n t h e m ) b e transferred m o r e t h a n once from s e c o n d a r y storage to m a i n m e m o r y or m o r e t h a n o n c e from m a i n m e m o r y t o s e c o n d a r y s t o r a g e m u s t b e c o n s i d e r e d a c a n d i d a t e for restructuring. S e v e r a l s u p e r c o m p u t e r s h a v e s u p e r f a s t l a r g e s e c o n d a r y s t o r a g e (e.g., t h e C r a y X-MP
has
the
SSD—Solid-State
Storage
Device;
the
N E C SX
has
the
X M U — E x t e n d e d M e m o r y U n i t ) . T h i s s t o r a g e is t y p i c a l l y significantly l a r g e r t h a n t h e m a i n m e m o r y a n d a c c e s s t i m e t o it is m u c h s h o r t e r t h a n t h a t t o d i s k . T h e i n t e n t is t o s t o r e all d a t a r e q u i r e d for t h e p r o g r a m in t h a t s t o r a g e (from d i s k o r t a p e ) a n d t h e n u s e it, i n s t e a d of t h e d i s k o r t a p e , a s s e c o n d a r y s t o r a g e m e d i u m . W h i l e t h e access t i m e t o t h i s s u p e r f a s t s e c o n d a r y s t o r a g e is less t h a n t h a t t o d i s k , a d a t a t r a n s f e r a n a l y s i s is still a d v i s a b l e since a c c e s s t i m i n g s a n d t y p e of a c c e s s a r e still closer t o t h o s e of d i s k t h a n of m a i n m e m o r y . ( C l e a r l y , t h e t r a n s f e r f r o m d i s k o r t a p e t o t h i s d e v i c e s h o u l d o c c u r o n l y o n c e ; s i m i l a r l y for t h e t r a n s f e r t o d i s k o r tape).
4.2 Vector
Processing
V e c t o r p r o c e s s i n g is c u r r e n t l y t h e m a i n s t a y of all s e r i o u s s e i s m i c d a t a p r o c e s s i n g . T h i s is d u e t o t h e f o l l o w i n g o b s e r v a t i o n : Any Fortran program that: -
uses l a r g e a m o u n t s of m e m o r y ,
-
h a s l a r g e i n p u t a n d o u t p u t d a t a sets, a n d
-
p e r f o r m s a t least 1 0
12
operations
c a n b e v e c t o r i z e d w i t h a r a t h e r m o d e s t a m o u n t of effort, t o s u c h a n e x t e n t t h a t a s p e e d - u p of a t least o n e o r d e r of m a g n i t u d e is a c h i e v e d .
75 S p e e d - u p is defined a s t h e C P U - t i m e of t h e s c a l a r v e r s i o n d i v i d e d b y t h e C P U t i m e of t h e v e c t o r i z e d v e r s i o n ( e v e r y t h i n g else u n c h a n g e d ) . M o d e s t a m o u n t of effort means 5 %
o r less of t h e t i m e r e q u i r e d t o d e v e l o p t h e ( s c a l a r v e r s i o n of t h e )
p r o g r a m . I n d e e d w i t h t o d a y ' s v e c t o r i z e r s it is p o s s i b l e t o s u b m i t a s c a l a r v e r s i o n of a ( F o r t r a n 7 7 ) p r o g r a m a n d o b t a i n a p r o g r a m t h a t is s u b s t a n t i a l l y v e c t o r i z e d ; for c e r t a i n v e c t o r i z e r s ( C o n v e x F o r t r a n V e c t o r i z i n g C o m p i l e r ) , it is c l a i m e d t h a t t h e r e s u l t i n g c o d e a p p r o a c h e s 9 0 % efficiently of h a n d - c o d e d v e c t o r c o d e . M o r e o v e r , t h o s e p a r t s t h a t c a n n o t b e v e c t o r i z e d b y t h e s o f t w a r e t o o l c a n b e flagged s o t h a t t h e p r o g r a m m e r m a y a t t e m p t t o r e s t r u c t u r e t h e c o d e a c c o r d i n g t o well u n d e r s t o o d rules. T h e r e a r e " c a t a l o g u e s " of t h e s e r u l e s w h i c h c a n b e a p p l i e d w i t h o u t
great
difficulty. T o give a c o n c r e t e e x a m p l e , a 2 D P S P I a l g o r i t h m w a s r u n b a s e d o n
that
d e s c r i b e d in [ M A J O 8 6 ] w h e r e t h e v e l o c i t y v a r i e s o n l y in t h e x - d i r e c t i o n ,
from
4 0 0 0 ft/sec t o 5 8 0 0 ft/sec a t t h e m i d p o i n t a n d t h e n b a c k t o 4 0 0 0 ft/sec ( l i n e a r l y ) . T h e s y n t h e t i c t i m e s e c t i o n c o n s i s t s of a r o w of l's a t t h e 10th r o w ; t h e size is 5 1 2 x 5 1 2 . T h i s p r o g r a m w a s r u n in t w o v e r s i o n s o n a V A X - 1 1 / 7 8 0 , o n e v e r s i o n u s i n g t h e V A X a l o n e , w i t h t h e F F T s in s c a l a r m o d e , t h e o t h e r v e r s i o n u s i n g o n e F P S 100 a s v e c t o r p r o c e s s o r . T h e v e c t o r p r o c e s s o r w a s o n l y u s e d for t h e involved
in t h e v e c t o r i z e d
PSPI
version, the remainder
of t h a t
FFTs
program
was
u n c h a n g e d , i.e., n o t v e c t o r i z e d . T h e I/O w a i t i n g t i m e s a r e i d e n t i c a l for t h e t w o versions,
but
4 2 , 6 7 0 sec
the
CPU
timings
(11:51:09.15),
are
whereas
not: the
the
scalar
vectorized
version version
took took
approximately about
2 6 7 0 sec
(0:44:27.38). Consequently, the speed-up obtained by using a library routine that uses t h e F P S 100 for t h e F F T s o n l y is 16! T h i s c l e a r l y c o n s t i t u t e s a significant p e r f o r m a n c e i n c r e a s e a t a r a t h e r m o d e s t i n c r e a s e in c o s t .
4.3
Parallelism
At t h e h a r d w a r e level, p a r a l l e l i s m d e n o t e s t h e p r e s e n c e of s e v e r a l p r o c e s s o r s , e a c h w i t h its o w n i n s t r u c t i o n s t r e a m a n d u n d e r its o w n c o n t r o l . E a c h p r o c e s s o r m a y u s e a s h a r e d m e m o r y ( c o m m o n m e m o r y ) a n d / o r h a v e its o w n p r i v a t e m e m o r y . Since t h e r e a r e s e v e r a l i n d e p e n d e n t a g e n t s , p r o v i s i o n s m u s t exist for t h e
com-
munication between processors. This m a y be achieved through c o m m o n m e m o r y or b y m e s s a g e p a s s i n g . I n t h e f o r m e r c a s e , t h e s y s t e m is c a l l e d t i g h t l y - c o u p l e d
(an
e x a m p l e is t h e C r a y X - M P / 4 w h e r e u p t o f o u r p r o c e s s o r s use t h e s a m e l a r g e m a i n m e m o r y ) , in t h e l a t t e r c a s e t h e s y s t e m is c a l l e d l o o s e l y - c o u p l e d ( a n e x a m p l e is
76 p r o v i d e d b y t h e I n t e l H y p e r c u b e ) . T h e u n d e r l y i n g i d e a is t o p r o v i d e Ν p r o c e s s o r s a n d t h e r e b y t o a c h i e v e a s p e e d - u p of N ; t h i s is clearly a l s o t h e t h e o r e t i c a l u p p e r b o u n d on any speed-up. In contrast to vector processing where one vector instruction acts on m a n y d a t a i t e m s , in p a r a l l e l s y s t e m s e a c h p r o c e s s o r e x e c u t e s i n d e p e n d e n t l y . T h e r e f o r e , in contrast
to
vector
processing,
where
most
of
the
vectorization
is
done
a u t o m a t i c a l l y , in o r d e r t o e x p l o i t p a r a l l e l i s m efficiently o n e m u s t specify explicitly w h i c h p o r t i o n of t h e p r o g r a m is t o b e e x e c u t e d o n w h i c h p r o c e s s o r u s i n g w h i c h p o r t i o n of t h e d a t a . T h e s o f t w a r e t o o l s (called v e c t o r i z e r s ) t h a t a l l o w t h e u s e r t o submit
scalar
code
and
perform
the rewriting
necessary
to
utilize t h e
c a p a b i l i t i e s of t h e t a r g e t m a c h i n e d o n o t exist yet for a u t o m a t i c a l l y
vector
parallelizing
code. In addition, some questions have been raised as to whether the currently a v a i l a b l e l o o s e l y c o u p l e d s y s t e m s a r e s u i t a b l e for p r o c e s s i n g seismic d a t a b e c a u s e of t h e i r l i m i t a t i o n s o n i n t e r p r o c e s s o r c o m m u n i c a t i o n a n d I/O
[KAOL87].
I m p l e m e n t a t i o n s o n t h e C r a y X - M P / 4 of m i g r a t i o n a l g o r i t h m s s u c h a s P S P I [ A M E S 8 7 ] a n d finite difference m e t h o d s [ T E R K 8 7 ] i n d i c a t e t h a t a s p e e d - u p of 3.5 is q u i t e a t t a i n a b l e ; t h i s is c l o s e t o t h e t h e o r e t i c a l u p p e r b o u n d of 4. H o w e v e r , f o u r p r o c e s s o r s a r e still m a n a g e a b l e for t h e p r o g r a m m e r s o t h a t t h e c o d e for t h e s e a p p l i c a t i o n s c a n b e carefully h a n d - c o d e d . F o r m o r e p r o c e s s o r s , w e w o u l d e x p e c t t h e a c t u a l s p e e d - u p t o b e significantly less t h a n 80 %
of t h e t h e o r e t i c a l
upper
b o u n d . A l s o u n c l e a r is h o w o n e m i g h t a c h i e v e s i m i l a r r e s u l t s a u t o m a t i c a l l y , i.e., with a software tool akin to a vectorizer. A t t h e p r e s e n t t i m e , l o o s e l y - c o u p l e d s y s t e m s d o n o t a p p e a r c o m p e t i t i v e for production
processing
of s e i s m i c d a t a .
No
software
that
would
automatically
p a r a l l e l i z e u n i p r o c e s s o r c o d e is c o m m e r c i a l l y a v a i l a b l e . T h e l a c k of p a r a l l e l i z e r s is p a r t i c u l a r l y d a m a g i n g b e c a u s e d e b u g g i n g p a r a l l e l c o d e is significantly h a r d e r t h a n d e b u g g i n g u n i p r o c e s s o r c o d e . T h e e x i s t i n g p r o c e s s i n g s o f t w a r e , a l m o s t exclusively written
in
Fortran
(unless
a
lower-level
language
is
used),
is
written
for
u n i p r o c e s s o r s a n d will n o t b e a l l o w e d t o b e c o m e o b s o l e t e w i t h t h e a r r i v a l of n e w p r o c e s s i n g h a r d w a r e . F o r t r a n is a p o o r vehicle for p a r a l l e l p r o g r a m m i n g (in c o n t r a s t t o v e c t o r i z i n g , for w h i c h it is very well s u i t e d since t h e o n l y d a t a s t r u c t u r e it s u p p o r t s is t h e a r r a y ) . P r o p o s a l s h a v e b e e n a d v a n c e d of s y s t e m s t h a t a r e specifically d e s i g n e d seismic
processing
but
do
not
serve
any
other
purpose.
For
example,
it
for is
t e c h n o l o g i c a l l y feasible t o d e s i g n a n d m a n u f a c t u r e a c h i p for m i g r a t i o n . It is safe t o e x p e c t t h a t a c h i p c a n b e d e s i g n e d t h a t will b e a t a n y s o f t w a r e i m p l e m e n t a t i o n of m i g r a t i o n . T h e r e a r e h o w e v e r t w o m a j o r p r o b l e m s w i t h t h i s a p p r o a c h . O n e is
77 o b v i o u s l y c o s t — s i n c e t h e m a r k e t for s u c h a s y s t e m is q u i t e r e s t r i c t e d , t h e d e v e l o p m e n t c o s t p e r s o l d u n i t m i g h t b e p r o h i b i t i v e . A l s o , s u c h a s y s t e m w o u l d severely stifle w o r k
on
new
processing
methods,
since a
chip containing
a
migration
a l g o r i t h m will r e n d e r u n a t t r a c t i v e w o r k o n i m p r o v e d m i g r a t i o n m e t h o d s . T h e field is n o t m a t u r e ( s t a g n a n t ? ) e n o u g h t h a t a n y o n e c o m p a n y c o u l d m a k e a d e c i s i o n t o use o n e p r o c e s s i n g m e t h o d , a n d o n e o n l y , for t h e n e x t d e c a d e o r so.
5.
CONCLUSION H i g h - p e r f o r m a n c e p r o c e s s i n g of s e i s m i c d a t a m u s t c l e a r l y s t a r t w i t h a n efficient
a l g o r i t h m . T h e r e is a h o s t of efficient m e t h o d s t h a t c a n b e t a i l o r e d t o a g i v e n situation. M o s t applications use vector processing, a n d with very g o o d r e a s o n : at p r e s e n t , t h i s is t h e single m o s t i m p o r t a n t f a c t o r in t h e p e r f o r m a n c e of a c o m p e t e n t l y w r i t t e n a p p l i c a t i o n p r o g r a m . H o w e v e r , in r e a l i s t i c i m p l e m e n t a t i o n s , q u e s t i o n s s u c h a s t h e I/O
b e h a v i o r a n d t h e i n h e r e n t p a r a l l e l i s m of a p r o g r a m b e c o m e of c o n c e r n
since t h e y c a n very s e r i o u s l y affect t h e p e r f o r m a n c e of t h e p r o g r a m if t h e y a r e n o t p r o p e r l y c o n s i d e r e d . A t p r e s e n t , I/O
a n a l y s i s a n d d e t e c t i o n of p a r a l l e l i s m m u s t b e
c a r r i e d o u t m a n u a l l y . W e e x p e c t t h a t in t h e n e x t few y e a r s , s o f t w a r e t o o l s will b e c o m e a v a i l a b l e t h a t assist in t h e s e t a s k s . H o w e v e r , t h e a c t u a l r e s t r u c t u r i n g of t h e c o d e will r e q u i r e k n o w l e d g e of t h e a p p l i c a t i o n a n d t h e r e f o r e it is h i g h l y u n l i k e l y t h a t r e s t r u c t u r i n g c a n b e fully a u t o m a t e d , in t h e n e a r of in t h e l o n g - t e r m f u t u r e . T h e r e f o r e , p r o g r a m m i n g t h e n e w m a c h i n e s will p l a c e a significant b u r d e n o n t h e p r o g r a m m e r s . T h e r e a s o n w h y v e c t o r i z a t i o n is s u c h a s u c c e s s is t h a t it c a n b e d o n e s y n t a c t i c a l l y , i.e., w i t h o u t a n y u n d e r s t a n d i n g of t h e u n d e r l y i n g a p p l i c a t i o n . T h i s is n o t t h e c a s e for t h e r e s t r u c t u r i n g of a p r o g r a m in o r d e r t o i m p r o v e its I/O
behavior
or to exploit inherent parallelism. In particular, there are two major p r o b l e m s associated with parallelism at the hardware
level,
one
related
to
hardware,
the
other
related
to
software.
The
h a r d w a r e p r o b l e m is o n e exclusively a s s o c i a t e d w i t h l o o s e l y - c o u p l e d s y s t e m s , w h i l e t h e s o f t w a r e p r o b l e m is c o m m o n t o b o t h l o o s e l y - a n d t i g h t l y - c o u p l e d s y s t e m s . T h e hardware
problem
is
that
of
interprocessor
communication;
at
present
the
b a n d w i d t h is s i m p l y t o o s m a l l for realistic s e i s m i c p r o c e s s i n g . W h i l e t h e r e m e d y is o b v i o u s , it is a l s o c o s t l y a n d m a y s e r i o u s l y affect t h e p r i c e / p e r f o r m a n c e r a t i o of t h e resulting systems. Nevertheless, i m p r o v e m e n t s here are expected as s o o n as the manufacturers
realize t h a t i n t e r p r o c e s s o r c o m m u n i c a t i o n
bandwidth
is a
major
b o t t l e n e c k . T h i s s h o u l d b e in t h e n e a r f u t u r e ; i n d e e d t h e r e a r e i n d i c a t i o n s t h a t t h e
78 C o n n e c t i o n M a c h i n e h a s a d d r e s s e d t h i s p r o b l e m . T h e s o f t w a r e p r o b l e m is o n e t h a t cannot
be
solved
that
fast.
The
objective
are
software
tools
that
parallelize
u n i p r o c e s s o r c o d e a u t o m a t i c a l l y ; t h i s i m p l i e s t h a t it m u s t b e b a s e d o n s y n t a c t i c c o n s i d e r a t i o n s . W h i l e t h i s a p p e a r s feasible, t h e first r e a s o n a b l y
purely efficient
p a r a l l e l i z e r is p r o b a b l y s e v e r a l y e a r s a w a y . U n t i l t h e n , p a r a l l e l i z a t i o n will h a v e t o b e d o n e b y h a n d , w h i c h is t i m e c o n s u m i n g , n o t least of all b e c a u s e
debugging
p a r a l l e l c o d e is a t least o n e o r d e r of m a g n i t u d e h a r d e r t h a n d e b u g g i n g u n i p r o c e s s o r c o d e . A l s o , t h e l a r g e r t h e n u m b e r of p r o c e s s o r s , t h e m o r e difficult will it b e t o d e s i g n efficient p a r a l l e l c o d e ; t h i s is a g a i n m o r e in f a v o r of t h e
tightly-coupled
s y s t e m s w h i c h t y p i c a l l y h a v e fewer p r o c e s s o r s (four for t h e C r a y X - M P / 4 ; e i g h t for t h e E T A - 1 0 for t h e t i m e b e i n g ) t h a n of t h e l o o s e l y - c o u p l e d s y s t e m s w h i c h m a y h a v e u p to 65000 processors.
REFERENCES A m e s t o y , P., L a r s o n n e u r , J. L., Leiss, E. L., a n d G a r d n e r , G . H . F . , 1987, P r e s t a c k Migration
with
P h a s e Shift
M e t h o d s on the Cray X - M P : Research
Com-
p u t a t i o n L a b o r a t o r y , A n n u a l P r o g r e s s R e v i e w , 3, 8 0 - 1 2 9 . A s h t o n - T a t e , 1984, T h e d B a s e I I I R e f e r e n c e G u i d e , A s h t o n - T a t e . B a s a r t , E., 1985, R I S C d e s i g n s t r e a m l i n e s h i g h p o w e r C P U ' s : C o m p u t e r
Design,
July Issue. Date,
C.
J.,
An
Introduction
to
Database
Systems:
1981,
Addison-Wesley
P u b l i c a t i o n , 1981. Dettmer,
R.,
1985, C h i p
architecture
for
Parallel
Processing:
Electronics
and
Power, M a r c h Issue. F a t h i , Ε. T. a n d K r i e g e r , M . , 1983, M u l t i p l e M i c r o p r o c e s s o r S y s t e m s : W h a t , W h y , a n d W h e n : I E E E C o m p u t e r , M a r c h Issue. F e r r a n t e , M . W., 1985, T a k i n g P a r a l l e l P r o c e s s o r s t o t h e scientific
community:
C o m p u t e r Design, D e c e m b e r Issue. F i s h e r , J. Α., D o n n e l , J. O . , 1984, V L I W m a c h i n e s : m u l t i p r o c e s s o r s w e c a n a c t u a l l y program: Spring Compcon. F o l g e r , D . , 1985, R I S C a r c h i t e c t u r e a s a n a l t e r n a t i v e t o p a r a l l e l p r o c e s s i n g : C o m p u t e r Design, August Issue. G a j s k i , D . D . , P a r a l l e l P r o c e s s i n g : P r o b l e m s a n d s o l u t i o n s : U n i v e r s i t y of Illinois a t U r b a n a - C h a m p a i g n , Technical Report.
79 H e c h t , J., 1987, O p t i c a l M e m o r i e s Vie for D a t a S t o r a g e s , H i g h T e c h n o l o g y , A u g u s t Issue, p p . 4 3 - 4 7 . H e n n e s s e y , J., 1985, V L S I R I S C p r o c e s s o r s : V L S I S y s t e m s D e s i g n , O c t o b e r I s s u e . Hwang,
K.,
1985,
Multiprocessor
Supercomputers
for
scientific/engineering
applications: I E E E C o m p u t e r , J u n e Issue. K a o , S. T . a n d Leiss, E. L., 1987, A n E x p e r i m e n t a l I m p l e m e n t a t i o n of M i g r a t i o n Algorithms Annual
on
the
Progress
Intel
Review,
Hypercube: 3; T h e
Research
International
Computation Journal
of
Laboratory,
Supercomputer
A p p l i c a t i o n s V o l . 1, N o . 2, 1987, p p . 7 5 - 9 9 . K a r p , A. H . , 1987, P r o g r a m m i n g for P a r a l l e l i s m , I E E E C o m p u t e r , M a y
Issue,
pp. 43-57. K u c k , D . J., S u p e r c o m p u t e r s : E n c y c l o p e d i a of C o m p u t e r Science, S e c o n d e d i t i o n . Van N o s t r a n d t Reinhold, Inc. Leiss, E. L., 1984, D a t a I n t e g r i t y in D i g i t a l O p t i c a l D i s k s , I E E E T r a n s a c t i o n s o n C o m p u t e r s , S e p t . I s s u e , V o l . C - 3 3 , N o . 9, p p . 8 1 8 - 8 2 7 . L h e m a n n , O . , 1985, A 3 D P S P I M i g r a t i o n , R e s e a r c h C o m p u t a t i o n
Laboratory,
A n n u a l P r o g r e s s R e v i e w , 1, 8 6 - 1 0 8 . Ma, H. H. and Johnson,
O. G ,
1986, I m p l e m e n t a t i o n
of P S P I
Migration on the C Y B E R 205: Research C o m p u t a t i o n
and
Prestack
Laboratory,
Annual
P r o g r e s s R e v i e w , 2, 1 4 8 - 1 7 0 . M a t t h e w s , M . , 1987, A P e r m a n e n t R e c o r d , L o g i c V o l . 2, N o . 2, S u m m e r
Issue,
pp. 8-13. N e l s o n , H . R , Jr., 1982, S A L N O R N o r t h S e a M o d e l : B u i l d i n g , D a t a A c q u i s i t i o n and
Interpretation:
Seismic
Acoustics
Laboratory,
Semiannual
Progress
R e v i e w , 9, 3 2 1 - 3 6 0 . Patton,
P.
C,
1985,
Multiprocessors:
Architectures
and
Applications:
IEEE
C o m p u t e r , J u n e Issue. P o l a v a r a p u , U . R. a n d J o h n s o n , O . G., 1986, A D a t a b a s e o n A d v a n c e d C o m p u t e r Research Projects, Research C o m p u t a t i o n L a b o r a t o r y , pp. 289-307. R a g u s k u s , A. G., 1985, I/O c o m p u t e r s u p e r c h a r g e s m i n i s y s t e m s : C o m p u t e r D e s i g n , J u l y issue. S a s h t i , J., J o h n s o n , O . G., a n d Leiss, 1986, F r o m S u p e r m i n i s t o S u p e r c o m p u t e r s — A Survey, Research C o m p u t a t i o n L a b o r a t o r y , pp. 213-238. S c h w a r t z , J., 1983, A t a x o n o m i c t a b l e of p a r a l l e l c o m p u t e r s b a s e d o n 55 d e s i g n s : N e w Y o r k U n i v e r s i t y n o t e # 6 9 , N o v e m b e r issue. S i e w i o r e k , D . P., A n z e l m o , T., a n d M o o r e , R., 1985, M u l t i p r o c e s s o r e x p a n d user vistas: C o m p u t e r Design, August Issue.
computers
80 T e r k i - H a s s a i n e , Ο . a n d Leiss, E. L., 1987, A M u l t i t a s k i n g I m p l e m e n t a t i o n of 3 D F o r w a r d M o d e l i n g u s i n g H i g h - O r d e r F i n i t e Difference M e t h o d s o n t h e C r a y X-MP/416: Research C o m p u t a t i o n
Laboratory
A n n u a l Progress Review, 3;
1 9 0 - 2 1 6 , T h e I n t e r n a t i o n a l J o u r n a l of S u p e r c o m p u t e r A p p l i c a t i o n s ( t o a p p e a r ) . T r e l e a v e n , P . C , 1984, C o n t r o l - d r i v e n , d a t a - d r i v e n , a n d d e m a n d - d r i v e n c o m p u t e r architecture: I E E E C o m p u t e r , M a r c h Issue. W a l l i c h , P . , 1985, T o w a r d s i m p l e r faster c o m p u t e r s : I E E E S p e c t r u m , A u g u s t I s s u e . W i l s o n , A , 1985, A r r a y P r o c e s s o r s - I n c r e a s i n g s p e e d b y M I P S , M O P S , a n d G O P S : C o m p u t e r Design, August Issue.
FLOPS,