THE NATURE OF THE ENVIRONMENTAL INFORMATION CONSEQUENCES FOR DATA COLLECTION, BANKING, PROCESSING AND DISPLAY IN INFORMATION SYSTEMS Ph. Grandclaude Centre de Recherches Petrographiques et Geochimiques du C.N.R .S. C. O. No. 1-54500 Vandoeuvre Les Nancy (France) I
INTRODUCTION
more g eneral field of environmental information systems (Grandclau de , 1974, 1976).
The need of efficient environmental information systems liable to provide , among others, the decision- makers , with accurate and up-to-date information is more and more emphasized (see for instance Kaldwell and the f rench "Group e interministeriel d' evaluation de l'environnement"). From a logical point of view , the principal difficulty to build these systems lies in the diversity and often in the lack of " social objectivi tY " ( 1 ) of the information to be gathered , processed and displayed . As a matter of fact , one can consider that the environment, rather than being a new field of knowledge , leads to the consideration of new dimensions in various fields of human activity and knowl~ ge , for instance agriculture , mineral reso~ ces , urban planning and so on . Correlatively one can ascertain that the environmental information systems may be " sets " of information systems devoted to some fields of human activity and knowledge , with two possibilities : oriented towards environmental purposes networks of other systems or specific (to the environment) integration of various data in an environmental data base. Furthermore the environmental dimensions of pointsa view are very large and depend on criterions liable of time and space variations (indivi dua l and social subjectivity)(1) . Lastly the built information systems must be very flex ible and " adhocratic ".
'rUe first point will be a review of the general notion of datum, aiming so to the defin ition and use of a formalism liable to remedy some aggravating sit uat ions , especially the rediscovery by each system or D. B.M.S. ( 2) designer of more or l ess similar methodologies and concepts expre ssed in very different ter minologies , the variety of which restraining later on exchanges ofilleas and experiences as well of da ta and masking often imaginary differencies and relative advantages and drawbacks.
Th e second point will deal with more specific aspects of the environmental data and the description of some practical consequences for their c ol lec tion , banking, processing and display. 11
COMPONENTS OF A DATUM AND INFORMATIONAL MODEL OF THE REAL WORLD
- Main components of a datum General information studies oriented towards data ban~are now carried out (Codd, 1970 ; Codasyl, 1971 ; Langefors , 1974 ; Li ndgreen , 1974 ; Benci and alii, 1976) . These studies aim to the definition of methods of description of a set of information and of its evolution, independently of any access stru ct ur e and software. The re s ult is an image of the phenomena and of the " object system " which wil l be represe nted by a data bank .
In such a variable and di fficult- context where a very prag matic approach is a neces sity and wher e the actual use is the best criterion to distinguish " bad " and " good " systems , I shall not try to give general solutions . The reflexions written here are those of a geologist dealing with a geochemical data bank and trying to extend here his own experience and conclusions to the
The r oles a ssigned to t he corresponding conceptual organization are t he following (Benci and alii, 1976) : - informational model of the real world, - refe rence model for the system analysis and de sign for the data base implementation,
( 1) i.e. the fact that people belonging to the same social group (or individual) , placed in the same conditions obtain (and display) the same data . The degree of objectivity is decreasing from primary to theoretica l data , from quantitative to qualitative data etc ••• and more generally from l ow to high observation and decision levels .
- documentati on tool of the data base and reference element fo r the users at the time of operation of the information system.
(2) Data Base Manag ement System
219
220
Ph. Grandclaude
~
Character 1
Character 2
Object defined and named
Character
Character 4 •••.
(CAR 1)
(CAR 2)
Set 1
Set 2
Value 11
Value 21
Value 31
Value 41
Value 12
Value 22
Value 32
Value 42
Value 2j
Value 3k
Vall" 41
(CAR 4)
fCAR 3) Set 3
Observed or measured characters
Set 4
...... .. . Value li
.........
IINDI CAR11lil
IIND 1 CAR 21 2j 1 IlNDI CAR 31
Datum 1
Datum 2
Datum 3
3k
I
IINDI CAR414l1
Attribution of one or several values. chosen in a set of values to the observed or measured cha racters
Datum 4
Logical record (set of da ta relative to an object)
Fig. 1. Coll ec tion of the information and components of the data.
Our immediate experience of the perceived real world detects objects and relationships linking them. The objects are described by properties and designated by a label. Each property is made of a character and of value(s) assigned through observations or measurements to the character for a given objec~ Thus a datum may be v.Titten as
c (o)=:lv} * where C is the name of a character, 0 is the label of the object and * is a part of the set of "signifiants" (words, numbers ••• ) a priori associated with the character C.
IV}
Such a decomposition of a datum in three major components may be showed in an inductive manner, with examples, or in a more de-
ductive one ( Lindgreen , 1974). In a general way, 0 and C are designating entities well determined (Example: population density in a geographic area). But in some cases it may occur that a value of character could be obtained without knowing to what "natural" object it is linked. In that case, such data are recorded, organized and handled after definition of "artificial" objects, most often points identified by coordinates. ~hus, a file, set of data related to a class of objects liable to be processed by the same methods may be seen as represented by Fig. 1.
Lastly , bes ides the basic elements of a
Nature of the environmental information consequences
ENTITIES
Objects
OPERATIONS
TTTTT Record 1
I
File
221
Record 2
Record 3
I
I
Record 4
I
not relevant
relevant
RecorJ 5
I
I
,
Data coIlection and preparation
Selection
, not relevant
relevant
relevant Extraction
Working-file
Fig. 2. Building a file and extracting a working-file.
datum described above, some attributesliable to be formalized- of the data may be proposed such as space location, time location, duration, values dimensions and units for quantitative characters ••• But these attributes correspond in fact to peculiar data (extrinsic or "de conjoncture" (3) data), which precise respectively the relationships of the observed or measured object with others or the conditions s.l. of the observations or measurements (4).
2 - Assignment value(s) - character and forms of the data In assigning one or several values to a (3) Data describing the conditions of observation or measurements. (4) The "integrity contraints" (allowing validations of the data, based on internal and crossed consistensy, confidentiality and so on) and the "evolution rules" (referring to updating and any external action on the information system) have to be taken into account in a comprehensive conceptual model. But they are beyond the scope of this paper.
character, the three following situations can be encountered
1) the observation or the measurements give rise to the assignment of a value to the character,
2) one tried without success to assign one value, 3) no value is assigned (generally by lack of observation or measurement. In the case of discrete characters or continuous characters discretized in classes, the following situations practically are to be considered
c -. C C
-. -.
Vi Vi or Vj or ••• (hesitancies) Vi and Vj and ••• (repetitive data)
these three first cases with, possibly, qualifiers of probability C =: studied not determined (cf. the preceding 2)), C =: not studied (cf. the preceding 3)).
222
Ph. Grand c laude
I n the case of continuous characters with numer i c values , the fo l lowing sit uations are to be consider ed : C =: Vi C =: Vi , Vn ( r ange) , with the particu lar cases C < Vi and C > Vi C _ . s tud i ed not deter mined C ~ : not studied T his with different possibilities , among wh ic h the " de conjoncture " data are the most u sed for the notation of the accuracy . 3 - S tru cture of the data F rom the analysis given for a datum , we can draw a scheme of building and using a f ile ( F ig .2 ) : 1 ) a set of objects fo r which the data are grou ped in logical records identified by labe l s , 2 ) from a file are selected records , rele van t to se l ection criterions , from which are extracted and then stored in a working file and/or displayed the data necessary for the real ization of the u ser ' s projects . T he efficiency of reco r ds selection and data extraction requires that the relationsh i ps e x isting i n the human mind , the most adequ ate - i n cons i derat ion of the nature of the most frequent questions - are identified and taken into account . Often represented as h ierarchical or associat ive , these r elations h ips link concepts u sed as c h aracters or as values of characters . In the discourse , they are implicit ( " paradigmatic " ) and cont ra st with the e x p l icit ones ( " syn t agmatic ") . Bu t t he distinction between both is bearing mo r e on their use than on their nature . One of the tasks for people designing an informa t ion system is to establish , according to t h e pur poses of the system , the frontier between , on the one han'"L , what is to be " learne d" to the machine and , on the other hand , wha t is to be r equested from the data s u p pliers and expressed as data . The syntagmat ic re l at i ons link chiefly objects and are repr esented often as associative linkages between records . Lastly it is worthy to note that , at the pre sent t i me , the trend in some D. B.M.S. is towards t h e suppr ession of the notion of fil e , group of char acter s and so one and to wa rd s the d efinition of l inkages joining o ther entities , es pecially the data themsel ves in comp l ex networks which are then implemented and u sed in machine . III
THE "SPECIFICI TY " OF THE ENV I RONMENTAL DATA
1 - Ch ar a c t e ristics of the environmen t al d a ta I f we conside r the table 1, we c an find examp l es o f e nvi r onmental data in each dis ti ngui shed cat egory , from " data wh ich can be me a s ur ed r epeat edly" to " symb olic data ". Thu s
the first specific c har acteristic of t he environmental data is an extreme diversity i n t heir nature and highness in their number. This diversity reflects principal ly t he diversity in nature and in scale of the objects and also the diversity of the poin t s of view under which these objects ar e defi ned or considered (hydrology , hydrogeology , soil science , agronomy , human health , mine ral resources , town planning ••• ) . The second characteristic of the environmental data is the fact that some corresponding objects have hazy out l ines . This lack of sharpness regards not only the objects to be described but also the categor ies of objects, i . e . the divisions of classifications and hence the references given to the objects and the names given to these categories . The third characteristic of all the environmental data is their location dependance . The other characteristics given by this table are not common to all environmental data . And one can find some environmental data which can be measured repeatedly or not, primary observational , combined , or theoretical data , determinable or stochastic (but chiefly stochastic) , quantitative , semiquantitative or qualitative , presented as nume rical values , graphs , models or symbols . Of course all these characteristics are not exclusive f rom each other and furthermore , it would be possible and desirable to add other characteristics such as the time dependance (data managed by systems in which the des cription of the evolution of phenomena is taken into a account) , the objectivity - social or individual - of the data and so on . 2 - Location of the data and graphic devices Let us deal now with the character i stics common to all the environmental data and see how the third characteristic can help to pa lliate the drawbacks caused by the first and second ones . As said above , (second characteristic) , the objects treat ed in an environmental syst em may be more or less well defined . We l l defined are the " artificial " objects (ex: points defined by coordinates) , created or u sed as support of some data not primari l y or directly referred to " natural " objects . Badly defined may be some " natural " objects , designated by words the meaning of which is liable to variations , for instance in their extent . We are coming to the notion of so cial subjectivity and of its growth in increasing t h e level of observation or deci sion , mentione d in the foot - note ( 1 ) . In fact are distinguished , i n t he real world , objects , the classifications of whi c h pr esent the following characteristics :
Nature of the environment a l information consequences
223
TABLE 1 - Var iet i es of cat egor ies of dat a (aft er Codat a , 197 4)
CATEG OR IES OF DATA Data whi c h can be me as ur ed r e peatedl y
CHEMISTRY/PHYS ICS Most data
Da ta whic h can be measured only onc e Locati on- independent
Most data
Locat ion- dependent
GEO-/ ASTRO-SCIENCES
BIOSCIENCES
Geo l. st r uctures , r ocl
Rare spe cimens Fossils
Miner als Global Tectonics
Most dat a , excluding extraterr estr ial
Rocks, fossils Astronomica l data Meteor o l. data
Rare specimens Fossi l s
Primary obser vational or experimental data
Optical spectra Crys tallogr aphic F-values
Seismographic records Weather charts
Physio l ogical da ta (e . g . , r esp ir ation rates) Biochemical data (e . g. , composition of organs)
Combi nat i ons of pr imar y dat a with the aid of a theoretical model
Fundamental constants Crystal structures
Fossil zoning Temp . distribu tion i n Sun
Genetic code Body surfac e ar e a Mode l of vascular bed
Data der ived by theoretical cal cul ation
Mo l ecular properties calculated by quantum mechanics
Solar ec lipses pr edicted by celestial mechanics
Prediction of phenotypic expression f r om genotypes
Deter minable data
Most macroscopic data
Elements of planetary orbits
Gene loc i Chromosome number s
Stochastic data
Polymer data Structure- sensitive properties
Soil , r ock composi t. Solar f l ares - Freq . of visibl e met eor s per unit inter val
Most data
Quanti t ative data
Most data
Seismic data Meteorol . data
Physiological dat a Biochemical data
Semiquantitative data
Mohs hardne ss scale
Wind fo rc e s c ale
Qualitat ive data
Chemical struct. forRock classif . mulae Classif . of stellar Proper ties of nuclides spectra Fossil shapes
Data present ed as numerical values Da ta presented as graphs or model s
Symbolic data
Note
Meteor ol. data Phase diagr ams Molec ul ar diagrams and mode l s
Amino acid sequences Taxonomi c classif . of or ganisms Physiological dat a Bi ochemical dat a
Geologi c al maps Metabolic pat hwa ys Weat her maps Elec t r ocardio gr ams Sky mapping a t a part i - Electr oence phal. cular r a dio f r e q. Li t hology in bore hol e dat a
A given group of data c an be categorized simul taneous l y by sever a l "f ace ts ".
Ph. Grandc1aude
224
I
/
A
I I I
ex
I I I I
C
Division {A,B,C
~
I B
I
I
Division
1er, ~ }
Fig. 3. Example of map generated by two divisions of the reality.
y
/"lU -
I
'-
/
Mapped object 0 (OM)
J
Object 0 in user's mind (OU)
x
(
~
y
Data from "0"
x Silence and noise
No silence, no noise
Fig. 4. Digitizing contours in order to deliver objective answer.
Nature of the environmental information consequences
(
\
Vj
~
x
x
~
x
IS.
0 0
0 0
225
0 0
0 0
0 0
Vi 0 0 0 0
0 0
t
0 0
0 0 Cl 0
0 0
t
hypothesis
~ '-
0 0 C 0
hypothesis new hypothesis
,/
/
------------~
Fig. 5. Testing hypotheses with an interactive graphic terminal.
1) the classifications are established according to various points of view , 2) for a given point of view, the classifications happen to be established according to various hierarchies of characters, 3) for given points of view and hierarchies of characters and regarding the continuity of the reality, they happen to differ by the positions of the divisions, 4) the classifications are not usable for all scales and do not concern all the observation levels. Finally, for one or the other of the above reasons, the scheme of the figure 3 is a real one, where we observe the superposition of two divisions. How to merge the corresponding data in an environmental data base and possibly let communicate several kinds of specialists through an information system ? In such a situation we must refer to a language known and used by everybody and as accurate as possible i.e. the geographic coordinates. This common language can be used at two levels in an environmental system : 1) collection, by digitizing of the coordinates of the contours of the objects , 2) retrieval in the data base with criterions defined not in terms of words, the meaning of which might be variable, but in terms of contours defined by coordinates, that allows to make selections in total agreement with the divisions used by the user of the information system (Fig.4). Thus adequate D.B.M.S. used for environmental data have to offer some facilities as far the location of the data by coordinates
IFAC S.E.S.-H
is concerned: digitizing and management of coordinates, transformations (at the stage of input in or output from the system) of the coordinates from one system into another (for instance Lambert and UTM coordinates_~ elliptical coordinates), procedures of searching the data corresponding to objects the contours of which are previously digitized etc ••• With regard to the hardware, it is important that automatic digitizing and graphic dis la devices (plotter, interactive grabe used. As a matter of fact one can imagine and obtain - a great increase of the "accuracy" of selection from data banks or working-fil es, by using the possibilities offered by graphical display and selection devices. Particularly, interactive graphic terminals allow to practise, very quickly, not only divisions tailored according to the meaning of the user's concepts in a given space , as said above, but also projections of the corresponding clouds of points in other spaces (Fig.5). One can thus test hypotheses and arrive by successive approximations to "solutions" or to prepare the final edition (scale , limits, suppression of aberrant points ••• ), by means for instance of photography or plotter, of maps and diagrams . IV
CONCLUS ION
To sum up, the presented review-short, very general (but perhaps partial) - of the so-called environmental data stressed on three major points : 1) the need of a formalism and of a kind of metalanguage liable to describe the data to be managed,
Ph. Grandclaud e
226
2) the extreme complexity of the mutual relationships linking objects and concepts sometimes poorly defined , 3) the necessity to use some computer device. and software liable to assist the users of environmental data banks and other information systems . But the computer assistance is relatively poor with respect to the amount of organizational , economic and techical problems involved by the highness and the broadness of the scope of the " environmental point of view ". Undoubtedly the main problems are not lying at the present time in the limits of the technology but in the lack of standards bet ween all the partners involved , at various levels and in various domains , in the col lect ion , processing and cor,vnunica tion of the environmental information . There is a challenge on the ways to be used to study the possibility of defining and the convenience of applying such standards .
REFERENCES BENCI G. , BODART F ., BOGAERT H. and A. ( 1976) - Concepts for the design ceptual scheme . IRIA - Rocquencourt Banques de Donnees . Unpub . rept . 50
CABANES of a con- Club p.
CODASYL (1971) - Features analysis of generalized data base management systems . Association for computing machinery . New York , 518 p . CODATA Task Group on accessibility and dissemination of data ( 1975) - Study on the problems of accessibility and dissemination of data for Science and Technology . Codata Bul letin nO 16 , 31 p . CODD E. F . ( 1970) - A relational model for larg e shared data banks . Communication at the A. C. M., vol. 13 , nO 6 , pp . 78-81 .
GRANDCLAUDE Ph . ( 1974) - Contribution a la methodo l ogie d ' un systeme d ' information en geologie - Appl ication a la geochimie . Sci . de la Terre , Nancy , Serie I nformatique Geo·· · logique , mem o n O 2 , 273 p . GRANDCLAUDE Ph . ( 1976) - Design and use of a geochemical jata baDk . Computers and Geo sciences , Pergal:lon Press, vol . 2 , pp . 163-
170 . GRANDCLAUDE
structure of Sciences - Sofiles and data Terre ,
Nancy , Groupe interministeriel d ' evaluation de l ' envi~onnement ( 1975) - Rapport annuel 1974 Documentation fran~aise , Paris , 258 p . HUBAUX A. ( 197 1) - Are there critically evaluated data in Geolo ? Geosciences documentation , London , 3 1, pp . 3- 5 . KALDWELL L. K. ( 1975) - Protlemes que po sent l ' organisation et l ' administration de l ' environnement aux niveaux local , national et international. In " Organisation et administration des programmes relatifs a l ' environnement ". Nations Unies , New York , pp . 15- 50 . LANG EFORS B. (1974) - Informati(,n systems . I . F . I . P . 1974 North Holland Publ . Co , pp .
937 - 945 . LEYMARIE P ., ISNARD P ., de BEAUCOURT F . ( 1975) - Le traitement automatique des donnees geochimigues - Methodes utilisees au Centre de Recherches Petrographigues et Geochimigues . Sci . de la Terre , Nancy , Serie Informatique Geologique , mem o nO 6 , 69 p . LINDGREEN P . ( 1974) - Basic operations on informations as a basis for data base design . I . F . I . P . 1974 , North Holland Publ . Co , pp .
993- 997 .