A model of scientific data bank and its applications to geological data

A model of scientific data bank and its applications to geological data

Computers & Geosciences, Vol. 2, pp. 279-291. Pergamon Press, 1976. Printed in Great Britain A MODEL OF SCIENTIFIC DATA BANK AND ITS APPLICATIONS TO ...

834KB Sizes 1 Downloads 76 Views

Computers & Geosciences, Vol. 2, pp. 279-291. Pergamon Press, 1976. Printed in Great Britain

A MODEL OF SCIENTIFIC DATA BANK AND ITS APPLICATIONS TO GEOLOGICAL DATA FRANCOIS BOU1LLE lnstitut de Programmation, Universit~ Pierre et Marie Curie, 4 Place Jussieu 75230, Paris Crdex 05, France

(Received 27 November 1975) Abstract--This paper presents a new model for a scientificdata bank applicableto geologicaldata. It is composed of four concentric rings, described from outside to inside as: (1) an optional conversational module for graphic display, (2) a data structure (indicated as DS), (3) an interface data structure/storage structure, and (4) a storage structure which is a file system. The users do not need to know the nature of 3 and 4. In this paper, I emphasize the DS and the principles to build it correctly; it must never be created to answer to a particular problem; it only depends on the analysis of the nature of geologicalphenomena and of the relations among them. The DS is composed of four abstract data types and consists of an arborescence with links. The programming language of the entire system is SIMULA 67, which is ideal for data structuring and for geologicalprogressing.The result.is a powerful DS which may resolve almost all problems. Two applications in geology are presented. The first example concerns geological, mining, and topographical cartography. The graphic elements are consideredas belongingto oriented multigraphs,accordingto a method based on graph theory and used originallyto digitize maps. Among the problems which may be resolved are selective twoand three-dimensionaldrawingswith isolines and boundaries, estimates of dips, surfaces and volumes, correlations, etc. This system establishes a base for geologicalsimulation and modeling.The second example, discussed briefly relates to the collections of paleontologyof Universit~P. et M. Curie. This project is in progress and shows potential for a wide application in the natural sciences. As part of the overall objective, the concept of the data bank developed is intended to allow for scientificand industrial applications, in which display and simulation of natural and physical processes may be necessary. At present, many aspects of this research are experimental.

Key Words: Arborescence, Attribute, Automatic cartography, Boundaries, Class, Collections storage and retrieval, Concentric abstractions, Data bank, Data structure, Graph theory, Hypergraph,lsolines, Link, Object, Paleontology, SIMULA 67, Simulation of geological processes. INTRODUCTION

The new model of a data bank described in this paper initially was not my principal objective. I wanted to simulate geologic processes such as folding, erosion, and sedimentation. To do this, four tools were necessary: (1) A method to digitize maps. (I have designed one based on graph theory, Bouill6, 1973, 1974a, 1974b.) (2) A system of data storage and retrieval for archiving geological data (which is the subject of this paper). (3) A language for simulation of natural and physical phenomena. I have chosen SIMULA 67, which is a high-level language well suited for simulation. (4) An interactive graphic system to display and update the models. Graph theory and several concepts contained in SIMULA 67 have influenced strongly the model of data bank that I present in which important requirements of a complete geologic data processing sequence, include capture, management, and displaying, especially in cartographic applications. A new model for a data bank is proposed because previous models, although well suited for business (Cabanes, 1975; Codasyl, 1971; Codd, 1970), are unsatisfactory for complex scientific structures. In the first part of this paper, I present the model and describe specifically the data structuring, and in the

second part, I give two examples of applications in geology. MODEL

Purposes and principles Before modeling the data bank, I would like to emphasize some basic principles: (1) Building a data bank for geological data does not restrict applicability to other scientific data. All the most powerful tools in computer science may be required: graph theory, automata, semaphores, concentrical abstractions, structured programming, Petri networks, and schemata. (2) The specific objective of building a system must be known before beginning to write algorithms. Generally, data banks are built first; then people try to make them shareable and portable and by that time it is too late. The system that is proposed here was conceived so as to be protected, with the data being shareable simultaneously and to work in batch processing, time sharing or real-time on sequential or parallel computers. The data and system are potentially portable. I will not describe the mechanisms which enable us to accomplish these tasks because they do not concern geology. A data bank must not be built for a particular 279

281)

FRAN¢OIS BOUILLI~

language such as proposed in the CODASYL (1971) report which refers to COBOL. General principles about data structure are taken into account to establish a methodology. Later one or several languages may be chosen. Algorithms are written without using a programming language. The data structure is built upon the analysis of the phenomena, never on the analysis of user's problems, because all the relationships may not be known when describing the structure. This last point will be developed later. The difference between a file system and a data bank must be emphasized. In a file system, users know the files and express their problems by using concepts of "file", "record", "read", "write", and so on; in a data bank, users work on data and they should not have to know where the data are stored.

General structure oJ the data bank (DB ) The DB is built on the principle of concentrical abstractions, stated by Dijkstra (1968); there are four, from outside to inside (Fig. l): (1) an optional module for interactive graphic display (IDM), (2) the data structure (DS), (3) an interface between data structure and storage structure (IDSSS), and (4) a storage structure (SS) which is a file system. The most important concept is the DS, so I will describe in detail those aspects of it the users must see. I then will give some explanations of the nature of IDSSS. I will say nothing about SS, because almost any file system may be used. (It is not necessary for the files to be shareable, however, the data are simultaneously shareable. This property is ensured by a powerful, hidden mechanism operating directly in the DS.) The IDM will not be described here, for it concerns more the displaying than the orderly storing of the data using specific tools, namely: 'questionnaires" (Picard, 1972) and state and super-state diagrams (Newman and Sproull, 1973). Data structure: DS What are the principles used to build the DS? Users work on data, which are discrete values of continuous or

E'$/ SS

Figure 1. Four concentdcal abstractions of DB.

discrete natural and physical phenomena composing the real world. There is no general agreement for the methodology for building a data structure. The task usually is approached by studying the problems that we want to resolve with the DS. The DS describes some aspects of the structure of phenomena relating to those particular problems, but only to those; if there is a new problem it becomes necessary to modify the DS. If we compare business and geology, we obtain an insight for data structure modeling. Phenomena which have a complex structure are described in terms of the structure which exists without taking into account user's problems. Furthermore the structure does not depend upon processing by a computer. A method used usually for considering specific problems may be a fatal error because it leads to something which may be named a "problem structure", but never a "data structure". What must be done is to study the nature and structure of the scientific phenomena. The DS must describe them completely and only then, can it provide a potential to answer most problems. If the DS cannot resolve a problem directly, there are three possible reasons: (1) The problem is not properly defined; it is a user's error.

(2) Some aspects of the nature or structure of geological phenomena have been omitted. This may be a serious error indicating that the analysis may have been poorly done. (3) Some aspects of structure are not yet known. They may be discovered later during further research. There are three levels of conception, giving successively three models. In the mathematical model, we define components based upon graphs and hypergraphs; in the algorithmic model, we write algorithms for solving all types of processing; in the programming model, we translate algorithms into a programming language. Mathematical model o[ DS--Conceptually the model of the DS comprises two concentrical abstractions; the Outer DS (ODS) and the Inner DS (IDS). ODS is seen by the users; IDS is behind the scenes to ensure processing of the ODS. Because the IDS corresponds to hidden aspects of relations between elements of ODS it is highly developed and complex but it will be discussed no further in this paper. Outer data structure: ODS. Our concept of data is based upon two principal thoughts; the definition of set given by N. Bourbaki-"A set is made of elements which have properties and may present relations between them" (Bourbaki, 1954)--and the concept of "abstract data type" stated by Liskov and Zilles (1974). Data are considered as objects of classes. A class C is an entity having a name, one or several attributes A~p corresponding to properties of the class which may be qualitative or quantitative. A~p are named "attributes of the class C/'. We may generate objects of class; an object is indicated here by O{~; it also has attributes, indicated by a?p, which are values of corresponding attributes of the class (Fig. 2). Objects may be considered as copies of a class, but with

A model of scientific data bank and its applications to geological data . . . . . . . . . . . . . ,~

C.

.

.

.

.

~]ef¢S or:q O. . . . . .

281

This may be summed by four categories: (data)::= (class)t(object)l(attribute)l(link)

Figure 2. Class, objects, and attributes.

their own values. In the example of Figure 3, there are three objects of a class having three attributes: depth, color, shape. Now it is possible to build a hierarchical structure of classes. Figure 4 gives an example of an arborescence (not a tree, but a rooted tree) and provides the skeleton of the DS. Each class has its own attributes, and also, implicitly, the attributes of all the upper classes, up to the root. The relation between a class and its objects is based upon the concept of "hypergraph" (Berge, 1970a, 1970b). In Figure 4 one can see the edge around the vertexes corresponding to the objects which always will be represented at the inside of a "cloud" as the edge of an "hypergraph". The number of connected components of a "hypergraph" associated with DS is equal to the number of classes. Some objects belonging to the same class may have attributes with the same values. Such objects may belong to a group in the hierarchy and the structure so built becomes a "forest" (Fig. 5). Between classes, there are many relations such as topological relations which may not be necessarily hierarchical. These allow establishment of links between classes. A link between classes C and Cs is indicated by c,Lcr The figure now is an arborescence with links with each link having a reciprocal one. A class may have several links with several classes (Fig. 6) or with the same class (Fig. 7), so that the links belong to a"multigraph". The link may be a loop (Fig. 8). We can see that these links correspond to relations among classes; the objects of these classes themselves may or may not confirm these relations. For example, on a map a geologic boundary may or may not intersect an isoline. If it does, it gives a link between an object of class "boundary" and an object of class "isoline". A link between an object Oi" of class C~ and an object Or. of. class . Cj . is mdtcated by c,' o, Lc~. o 7 In Figure • 9, a link is indicated between objects and the corresponding link between their classes. An object may be linked with one or several objects of one class corresponding to one type of relation or belonging to different classes, corresponding to different types of relations (Fig. 10). Two objects may be linked by one or several relations (Fig. 11). Objects of the same class also may be linked (Fig. 12) to form a loop-like link. Now we may give a definition of the concept "data":

lnnerdata structurelDS. With the visible part of the DS summarized only a few words will be said about the Inner DS because it operates behind the scenes. This structure is more complex than the visible outer one, but it depends not on geology but only the theory of data bank modeling. A link has been seen as a relation between objects or classes. But behind the scenes, the link is processed by the system as a special object or a special class which is hidden from the users. A link between classes becomes a class; a link between objects becomes an object of the corresponding class of links (Fig. 13). In this example we see six links between objects of three classes, corresponding to two links between classes which occur as objects of four classes of links (four and not two, because of reciprocal links). The link c,Lc, is considered by the system as hidden class, indicated by Lij. c, 07Lc~ o; becomes an object belonging to L,s. The reciprocal link c~Lc, is respresented by a class o~ which as o~ o7 is an Li~, and the reciprocal of o~ c, Lq, c~Lc, object of Ls,. Only the principal elements of the mathematical model of DS are given here; the many details are treated in Bouill6 (1976a). A problem which is discussed is the following--which is better to represent a data structure: a tree or a network.9 Aip

am iP

deepness colour Q

3.14 green •

O. white •

2.7172 7 blue •

shape

square

round

shapeless

Figure 3. Example: values of attributes of class-givingattributes of objects.

Figure 4. Arborescence of classes with connected component of bypergraph, based upon objects of class.

(data):: = (class)l(object)l(attribute of class)i(attribute of object) I(link between classes)i(link between objects)

282

FP.ANCOISBOUILLI~

ol"or

'

\01'

o -lorl ,cS~,- ~ . ~

~"'@) j}~

s.l~s ~ /,.

o~

.

,..
Figure 5• Hierarchical structure of objects: •Forest A ; on this example, Ao is an arborescence, (only one connected . o component). S Is arc-giving successor and P predecessor.

2\ Cl~l

o

CI

I

~ ~

C~Ci

Figure 8. Link may be loop; loop has no reciprocal link (that is evident).

CkiC t

Ac o

Figure 6. Class C linked with two classes, C and C. i

~o

i

k

°7\

Vi,Vj,

Ct

(

',

Figure 7. Classes C~ and Cj linked by two links, expressing two relations• On this example, degree of multiplicity is 2. The answer is neither a tree nor a network. To begin with, there are many other concepts to be considered, especially the following three points: (1) The concept of tree is not completely accurate. One must distinguish arborescence (rooted tree), lattice or "dedekindien".

Figure 9. Link between two objects, and corresponding link between their classes. (2) A "network" of a DS may be broken down into several particular graphs (arborescences or bipartite graphs); this is demonstrated in Bouill(~ (1976a). (3) The concept of a "hypergraph" (Berge, 1970a, 1970b) is fundamental when structuring data. This is

A model of scientificdata bank and its applications to geological data

283

Figure 10. Object may be linked with objects belonging to different classes (reciprocal links are not drawn so figure is readable). 1

Figure 12. Objects of same class may be linked; corresponding relation is represented by link on class which is loop. v. vi, v,.,v., rn~ (o? o?) ~ m+r., (q,~) Figure 11. Two objects may be linked more than once; they verify several relations expressed by several links between their classes. On this example, degree of multiplicity is 2. Multiplicity of objects is less or equal than multiplicity of classes. not considered usually in most models of data structure. Algorithmic model of DS--The algorithmic model is written completely without using any programming language. It is based upon mathematical notations used in structured programming (Arsac, 1972; Dijkstra, 1%8) and allows for the possibility of parallel calculations. This "synchronism" is possible by using Petri nets and schemata (Luconi, 1%8; Patti, 1970; Petri, 1962, 1973). Programming model of DS--The data do not depend upon their storage as file, record, etc. so that they may be portable. But the system also must be portable. To do this it cannot be written in assembly language. Instead a high-level language must be chosen. This approach, which is not new, has many advantages; it is less error prone, easier to read and modify, and it is portable and not

appreciably less efficient. I have chosen SIMULA 67, (Birtwistle and others, 1973; Bouill6, 1975b, 1976b; Dahl, Myhrhaug, and Nygaard, 1971; Rohlfling, 1973; and Vaucher, 1973) for the following reasons: --it is the most suitable for the modeling of data structures. --the language is autoextensibTe and allows the definition and use of abstract data types. In this manner it is possible to describe structured objects such as those with which we generally work in geology. --SIMULA 67 is one of the best languages and is considered to be as good as ALGOL 68. --SIMULA 67 may be implemented on many big computers: CDC (3000/6000/Cyber), IBM (360/370), PDP 10, UNIDATA (CllJlris), UNIVAC (1100), and in the near future, ICL 4 and 2900. ----compatibility of compilers. To give a comparison, there are eleven PL/1 compilers for the IBM 3601370 computers, whereas there is only one SIMULA 67 compiler for these computers. --for geological simulations, it is easier to use access to a DB through SIMULA. This can be demonstrated in a

284

FRAN(~OISBOU|LLE

AL

LOo

2L~s E cla~

~ ~ _

~

Links

J Figure 13. Hidden components: classes of links and their objects. By example: link between classes Cj and Cj gives

classes L,~ and Lj,; links between objects of these classes become objects of L,~ and L~,. major project whose name is GEOSIM, in which many geologic processes, working in quasiparallelism, must access the DB. --interfaces must be written for people who prefer another language (FORTRAN, COBOL, ALGOL . . . . ). It is possible to write the interface between SIMULA and one of these languages in SIMULA. --the final reason is more subjective. The more I use SIMULA, the more I consider it to be especially useful for geological data. The idea to use SIMULA for a data bank is not new (Kirkerud, 1974; M~ikilii, 1975) but this is the first time it has been used for other than CODASYL data banks.

Interlace between DS and SS : IDSSS This interface is described in this section and illustrated by Figure 14. It comprises five concentrical abstractions; the four at the inside are independent of the user's language. The fifth corresponds to the user's language which is here SIMULA 67 and indicated as SOG (SIMULA Object Generator). To the user all data appear to be in core memory (this is an abstraction). When he wants to access the data, he gives a request to SOG. If the data are not in the sets controlled by SOG, then GDC is requested to obtain a descriptor and to build a SIMULA object. GDC also requires CPAD to determine the corresponding predescriptor. If CPAD has it, it gives this to GDC. If not, then CPAD gives a request to CDF. The task of CDF therefore is to build predescriptors with the strings given by CASS. Then CDF requests CASS which contacts the file system (Storage Structure) to obtain a record and to extract the string to be given to CDF. When creating new data, they are decomposed and coded by the same method

for storage in the SS. Thus the user may never know the concepts: file, record, read, write, etc., which are typical of file systems but may be invisible in a DB. The reader should note that there are two levels of "virtual" presence of data in core memory: --in the sets managed by SOG, --in the list of predescriptors managed by CPAD. The system controlling the priority of data to stay in core memory is not given in this paper. APPLICATIONS

Geological cartography Only two types of maps are presented here, composed of: --isolines, corresponding to topography, gravimetry, etc., --boundaries, corresponding to geology, mining, geomorphology, etc. The DS is deduced from the nature of phenomena and not by asking what type of problem we want to solve. lsolines--In Figure 15 a simplified topographical map is shown. An arc is associated with each isoline. A duplicate of this graph is built and automatically transformed to give the graph in Figure 16. Each vertex corresponds to an isoline and the arcs describe several types of relations, namely: - - a relation with the higher curve (may be one or several or none), --another with the lower curve (idem), --another with each curve of same height which may be neighboring (idem). The graph allows an automatic recognition of other phenomena such as saddles, peaks, and other types of morphological patterns. Figure 17 is the partial structure

A model of scientific data bank and its applications to geologicaldata control , presence/ absence descriptor

\

cooeano ~ dec°de ~ .

.

graph d e s c r i p' t o r correspond.

$1MULA

\ _ _o b j e c t s \generato,

\

\

285

~ ~

%

\

pad~gdc

field

Ca~ctrol storage structure

\

cdf ._

~

ssL

SS ~ ~

\

\

t~-= p

__L -

! --

....... , i. . . . . . ,,_,o.i

,.__7 J T,::

D *la

I

I Figure 14. Five concentrical abstractions of interlace between storage structure and data structure (IDSSS).

T ,.i. T i

Figure 15. Simplified map with isolinies (by example: topography); arc is associated with each isolinie, and dual of this graph is built. of the topography, with the skeleton of classes on the left. One can see a hierarchical structure of the objects, with links expressing predecessors and successors, neighbors, and so on. This structure is used to allow access to a particular curve or a set of isolines of the same value. It also is used to detect directly all isolines surrounding one. This is especially convenient for some calculations on C A G E O Vol. 2, No. 3=-B

localized data. The structure expresses topological properties without taking into account distances. There also are partial structures corresponding to maxima and minima Bouill6 (1975a) (not discussed in this paper) which are used, among other applications, for simulating natural processes, such as erosion and sedimentation. (Erosion has the greatest effect on "maxima" and sedimentation on

286

!

FRANqOIS BOUILL/~

! '°°

90

p~

) 80

80

)gO

,o :::::::::::::::::::::::::,o I ) 60

101

........ IT.'ZIZ:::::

~o

~o cZI~,Z:::::::

~o

Figure 16. Graph obtained by transformation of dual; several types of arcs giving, for each isolinie three categories of neighbors-: of upper height, of lower height, of same height.

TOPO

ij

D., TOPOTOPO 60-150-1

L

/

T2 T2

TOPO TOPO / 60-2 6 0 - 3 / L T2 T2

TO PO 40 ,

$

TI

Figure 17. Hierarchical structure of objects corresponding to isolinies. S indicates "successor", L indicates "links"; these links are those which are drawn on Figure 22. On left skeleton of DS (arborescence of classes) with links on class, which are loops. "minima" which must be directly accessed in the DS. These relations are used in the project GEOSIM.) Natural or physical phenomena which are drawn on a map are disturbed by the limits of a map. There are common relations among adjacent maps (Fig. 18) and also among elements of these maps. By this method, each map

is linked with eight adjacent maps (the general situation). The cartographic elements belonging to a map and intersecting the border of the map are linked with corresponding elements of the adjacent map so that there are no more artificial limits. With isolines, many other calculations may be done

A model of scientificdata bank and its applications to geologicaldata

287

Figure 18. Links allowing connect directly each cartographic element of map to corresponding element on neighboring map. with these structures: e.g. surfaces or volumes of hills. One important graphic application is a powerful algorithm which allows drawing three-dimensional views with hidden lines eliminated, even if there are eight or ten thousand lines on a map. Boundaries--For boundaries, only some examples of the links which are the most useful will be indicated. All are established without any previous idea of their future use. The reader is reminded of the method consisting of the association of an oriented multigraph with the map, which was used to digitize maps (Bouill~, 1973, 1974a, 1974b). Figure 19 shows a simplified geological map in which i and j are the numbers of the layers on left and right, whereas n and m are only used to distinguish the

/ i

.

Eli k --

!

Nq ,

......... ~ t .................

Dim•

Djn

F : Ill F,jk III= N' N :IN I S = Figure 19. Associating arc with geological boundary (Np, Nq: vertex, D~., Dr.: cartographic units, F~j~:arc). Each arc is linked with several types of cartographic elements.

cartographic units belonging to the same layers. What are the relations to be taken into account within each arc of this multigraph? --Both adjacent cartographic units, --generally, two vertex (if the arc is not a loop), --arcs belonging to the cocycles of each vertex of this arc, ----other arcs around both adjacent layers. All this implies several types of links which are drawn in Figure 20. The partial structure associated with the boundaries is indicated by F (for "frontier"); the partial structure associated with cartographic units is indicated by D. The first (F) has four levels and the second (d) only three. With this entire structure, we may access selectively one cartographic element, all elements of a layer or of a set of layers, cut the stratigraphy into new intervals, compute surfaces, dips, and volumes. For many types of calculations it is possible to determine directly the elements to be considered. Accordingly, the topological nature of a map with faults is constructed. It is the homeomorph of a generalized tore with holes (Bouill6, 1973). A fault is considered as composed of two arcs having opposite orientation (top of Fig. 21). Faults give a partial structure (Fig. 21) in which they are linked to several cartographic units and which also lead to the appearance of special links which express the synonymous nature of the phenomena. (In this situation faults may be both cartographic boundaries and specific features with their own attributes.) In Figure 21 F is the partial structure for geological boundaries, D is the partial structure for cartographic units, and FE is the partial structure for faults. Note that a fault also may be a geological boundary, belonging to a particular type. The synonymical (identical) links, indicated by L', correspond to this double aspect. Association o[ isolines with boundaries--The most interesting association perhaps is the union of topography and geology. Several links allow the user to know which

288

FRANCOIS BOUILLI~

F-~F

_

F3 F3

Figure 20. Arc F~jkis object of structure F and is linked with D,, and D~. whichare objects of structure D (cartographic units); D~. and D~. also are linked; this link expresses relation between two adjacent cartographic units.

] I
FEIJPX ~IEK FEIJ P

FLE3FE3

'\ i,

F E I J P DIM

/

\

~

L jJ f ~

Figure 21. Geologicalboundary correspondingto fault; two arcs are associated with it (cf. in rectangular shape); these arcs belong to class of stratigraphic boundaries (expressed by structure F); they also belong to another structure corresponding to faults, indicated by FE; synonymical link is so created (indicated by Is).

arcs-isolines are intersecting a boundary or which boundaries are intersecting an isoline (Fig. 22 on the left): F is the partial structure for geological boundaries, and TOPO is the partial structure for topographical isolines. These relations are useful in project GEOSIM for representing differential erosion. They are used for three-dimensional views, when drawing topography and

geology so that hidden lines are eliminated even for geologic boundaries hidden behind hills.

Paleontological collections A second example in geology concerns archiving of the paleontological collections of the University Curie in Paris (Bouillr, 1975c). Each sample is defined by its name,

A modelof scientificdata bankand its applicationsto geologicaldata

/

/

289

"

,\

2TOPO

Figure 22. Associating isolinies (by example topography)with geologicalboundaries. On left in upper corner, boundaries indicated by Fj~ and Fi,~ are intersectingtwo isoloniesindicatedby T=, and T,. TOPO is structure for topographicalisoioniesand F for geologicalboundaries;objects of these classes are linked; these links express two relations: or isolinie is intersected by boundarieseither boundary is intersected by isolinies.

a code (given by the manager of the collections) and its complete taxonomy, its stratigraphy, location where it was found, location where it is stored, the author of the report on the specimen, the name of the collection it belongs to, and many other parameters which will not be described in this paper. I have built the DS, without taking into account questions which will be put to the DS. Note that this DS (Fig. 23) has some powerful links. The power of a link is measured by the length of the path which is replaced by the link. It is indicated by a number at the inside of a circle on the figure. A link which has the power of 16, may be considered as being accessed by a path sixteen times shorter. But it is possible to access selectively data without searching for them among many others (and thus may be up to a thousand times quicker). If we study this DS, we discover that it answers directly twenty-two questions. For example, there are the five which were among those wanted by the managers of the collections: --place where a sample is stored, --samples stored in a drawer, --those which occur in a bed of a given locality, --samples belonging to a taxonomic group, and --samples belonging to a taxonomic group and coming from a bed and belonging to a given stage. The concept of a link in a DS replaces the concept of an inverted file when using a file system. In a data bank an inverted file cannot be seen by the user. Why then are inverted files used? To indicate which records contain data having a given property: it is better instead to consider directly the properties of the DS. For example, a granite has a property which is hardness; all samples of

granite with the same value of hardness belong to a group of granites defined by this property. Why should this group be considered as a possible record in an inverted file? In my opinion the geologist need not know the existence of an inverted file. Another example would be the contact between two cartographic units. The contact expresses a relation (adjacency) which is defined by the two objects, namely: the cartographic units. What is more important for the geologist--this contact relationship or the file in which it is stored?

CONCLUSIONS

The most important part of the DB is the DS; it is seen by the users, and is built by studying the nature and structure of geological phenomena, and not the users' problems. The system allows the user to consider that all data are in core memory. They need not know the files and records. The DS is based upon graphs and "hypergraphs" and is neither a tree nor a network. Four principal abstract data types are defined; class, object, attribute, and link. The system is programmed in SIMULA 67 language, because I consider it is the best for DS description and for processing both geological data and data structures. Almost any language may be used after an interface is written in SIMULA (BouiUt, 1976a). The model which is presented in this paper will be fully operational in two years. We have been using the structures for more than one year and complete automatic cartography system is based upon these (Bouillt, 1974c, 1975a).

290

FRANCOIS BOU1LLE

0

co

m

m

m

w ~

o m

i'- ~

• .

m

w

0

--~-o

u~Wt ~ "OZ

.o

--

~b.J

U..l W >_

, ~

Ol

.

~

zF

~-~

Z

~

LU Z

~u

I .a.,

~ -;.._~,~..6. ~ . . . .

~ ,q



m= ~

"~

d. ~

N

o~, z

D

m

,~-

~ o~

~.~ ~ ._~

~

- ' ~ ~o~a

o

L

~g

pow

0 0

o

~ - ' - ' - -

6

*~

~ ~ o ° ~ ~ ~=I~, o,~ - ~

m

.~ ~= -"~

:~

~o~

N

I-o o

w

~

A model of scientific data bank and its applications to geological data

Acknowledgements--The author wishes to thank W. W. Hutchison for assistance in writing the English version of this paper. REFERENCES

Arsac, J., 1972, Un langage de programmation sans branchements: RIRO 6 6me annre, B2, p. 3-24. Berge, C., 1970a, Graphes et hypergraphes: Dunod, Paris, 516 p. Berge, C., 1970b, Hypergraphs generalizing bipartite graphs, in interger and non-linear programming: North-Holland Publ. Co., Abadie, p. 507-509. Birtwistle, G. M., Dahl, O. J., Myhrhaug, B., and Nygaard, K., 1973, SIMULA Begin: Studentlitteratur, Auerbach Publ. Inc., Philadelphia, Pennsylvania, 391 p. Bouillr, F., 1973, Contribution/~ la digitalisation et ~ l'analyse des cartes grologiques: Th/~se 3c., Univ. Paris, 285 p. Bouillr, F., 1974a, Application de throrie des graphes au traitement de la carte grologique: Revue de I'IFP, v. 29, no. 2, p. 173-216. Bouillr, F., 1974b, Numrrisation des cartes grologiques par application de la throrie des graphes: 38 ~me srance du CFC, 10 p. Bouillr, F., 1974c, Crration et utilisation d'une banque de donnres de cartes grologiques: 7 6me Conf. Int. de Cartographic, Madrid, 54 p. Bouillr, F., 1975a, Structuration et saisie des donnres cartographiques: journre d'rtude sur acquisition et structuration de I'information graphique, Paris, 32 p. Bouillr, F., 1975b, Initiation au langage SIMULA 67: Support de cours, 320 p. Bouillr, F., 1975c, Projet de crration d'une banque de donnres concernant les collections de palrontologie de I'Universit,~ P. et M. Curie: Rapport prrliminaire, 61 p. Bouillr, F., 1976a, Un mod/~le universel de banque de donnre protrgde, simultandment partageable et portable: thdse d'Etat, in press. Bouillr, F., 1976b, Le langage SIMULA 67: (in prep.) 450 p. Bourbaki, N., 1954, Th~orie des ensembles: Livre 1, Paris.

291

Cabanes, A., 1975, Cours Banque de donnres: CNAM, 100 p. Codasyl, 1971, Programming languages: ACM Data Base Task Group (DBTG), New York, 269 p. Codd, E. F., 1970, A relational model of data for large shared data banks: CACM, v. 13, no. 6, p. 337-387. Dahl, O. J., Myhrhaug, B., and Nygaard, K., 1971, SIMULA 67 Common base language: Norwegian Computing Center, Publ. S-22, 145 p. Dijkstra, E. W., 1%8, The structure of "the" multiprogramming system: CACM, v. 11, no. 5, p. 341-346. Dijkstra, E. W., 1%9, Notes on structured programming: University of Maryland, College Park, 84 p. Kirkerud, B. R., 1974, SIMBAS a simple data base system: Proc. 2nd SIMULA Users' Conf., Monte-Carlo, 17 p. Liskov, B. and Zilles, S., 1974, Programming with abstract data types: Proc. ACM Sigplan Symp. on very high-level languages, Santa Monica, p. 50-59. Luconi, F. L., 1%8, Asynchronous computational structures: unpubl, doctoral dissertation, MIT. M~kilfi, K., 1975, A CODASYL-type DBMS system in SIMULA: FOA Report C-10038-M3 (ES), Proc. 3rd SIMULA User's Conf., Brighton, 33 p. Newmann, W. M., and Sproull, R. F., 1973, Principles of interactive computer graphics: McGraw-Hill, Inc., New York, 607 p. Patil, S. S., 1970, Coordination of asynchronous events: unpubl. doctoral dissertation, MIT. Petri, C. A. 1%2, Kommunikation mit automaten: unpubl. doctoral disseration, Univ. Bonn, Germany. Petri, C. A., 1973, Concepts of net theory: Proc. of the Syrup. on Mathematical Foundations of Computer Sciences, High Tatras, p. 137-146. Picard, C. F., 1972, Graphes et questionnaires, Tome 2: Gauthier-Villars, Paris. Rohlfing, H., 1973, SIMULA eine Einfuhrung: Bibliographisches Institut Mannheim, B. I. Wissenschaftsverlag, 243 p. Vaucher, J. G., 1973, La programmation avec SIMULA 67: Librairie de l'Universitd de Montrral, 171 p.