Infororm Sysrrm.s Vol. Pruned in the U.S.A
IO. No
2, pp.
139-148.
1985
NONDETERMINISTIC DEPENDENCIES IN RELATIONS: AN EXTENTION OF THE CONCEPT OF FUNCTIONAL DEPENDENCY R. HAUX Institute of Medical Documentation, Statistics and Computer Science, University of Heidelberg. Neuenheimer Feld 325, D-6900 Heidelberg. F.R. Germany
Im
and U. ECKERT Medical Computing Center, University of Heidelberg. Im Neuenheimer Heidelberg, F.R. Germany
Abstract-In the present article, some special semantic istic dependencies-are proposed. These dependencies functional dependencies. After some basic definitions. is introduced. Examples are given and an implementation Some properties are discussed.
Feld 293, D-6900
integrit) constraints-so called nondetermincan be regarded as stochastic extensions of the concept of nondeterministic dependency for a statistical analysis system is described.
Keywords: nondeterministic dependencies, functional dependencies, relational data model, data- and methodbase systems, statistical analysis systems, database systems. 1. INTRODUCTION
After having studied examples for relations in database systems literature, a reader is quite willing to think that there exist only real world mappings of employees and departments, suppliers and salesmen or students and classes. The relationships of attributes in such data structures are mostly specified in a deterministic manner, e.g. “an employee is working in exactly one department” or “a student belongs to exactly one class”. However, there exist other kinds of relationships-although not such strict ones-also. As will be demonstrated in the present article, a class of constraints can be described which cannot be specified in the usual deterministic manner, but in pointing out stochastic properties. Describing these stochastic properties, we obtain the opportunity to state facts, which could not be stated by commonly used types of constraints. These constrainsts will be called nondeterministic dependencies or briefly NDs. As will be pointed out nondeterministic dependencies are a subset of all semantic integrity constraints, which includes the set of functional dependencies (Fig. 1). NDs can be applied to data stored in databases. The database can be part of a database system or of related systems such as statistical analysis systems. Before pointing out how to construct such nondeterministic dependencies, some basic definitions will be given in the next section. 2. CONCEYWAL DATA MODEL AND
BASIC DEFINITIONS
In this section we are considering some properties of data- and methodbase systems which are
based on the relational data model (e.g. [l], [2], [3]; the notation concerning relations is based on the terminology of [4]. Let DMBS : = (DB, MB, DMBMS) denote a data- and methodbase system (DMBS). A DMBS consists of a database (DB) and a mrthodbase (MB). Both are created and managed by a data-
and
methodhase
management
system
(DMBMS). The DMBMS can activate methods of MB and access to data stored in DB. In addition it represents the interface to a user. In this context, n method is defined as a program-module, which produces a certain performance. If methods in MB are mainly dealing with data analysis based on statistical methods, the data- and methodbase system is called statistical analysis system (a survey of such systems can be found in [S]). If MB is empty, or if MB does not exist respectively, a DMBS is called a database system (DBS) its management component a dutabase munagement system (DBMS). Similar to a statistical analysis system a DBS can be regarded as a special case of a DMBS. In the sequel we will focus on the database and its management, especially the data structures. More detailed definitions concerning the methodbase and its management can be found in [6]. We will assume that the database of the DMBS is-at the conceptual level-based on the relational model. Following [4], we define the terms attribute, relation and database as variables specified by their names and types (schemes) and an extension of a variable as a value (realisation). A data structure 139
R. HAUX and U. ECKERT
FACTS
IN
REALITY
SEMANTIC
Fig.
type relation is defined p
1. Relationships
INTEGRITY
between
facts,
: = rel(A 1CA),
where ZA denotes a set of semantic integrity contraints (SICs) concerning the set of attributes A. A SIC u, say, is a fact about the reality which is either ‘true’ (fulfilled) or ‘false’ (violated). It will be examined at some predefined times and-if violated-will cause a specified reaction. Each attribute named Ak E A : = {A,, . . . , AK}, with pairwise disjoint AK, k = 1, . . , K, is defined by its data type, mainly by the data type’s value set (domain) VS(AL), or briefly-if the attribute’s name is not of importance-VVS(A). The values of A will be denoted as aA, where VA E A.
A K-tuple a, say, can be described a:=
as
(aAIAEA)
with KY(A) : = {(aA 1A E A)}. Now, we can define the (valid) value set of a relation variable named R, say, R : re/(A 12,) as W(R)
: = {r c V.S(A) ( Vu E CA:
constraints
and dependencies.
I.e. A valid realisation r of R is a subset of the value set of A in which no SIC is violated. A type database is defined as
as
aA E W(A)
CONSTRAINTS
u = ‘true’}.
6 : = db(R 1CR) where R denotes DB, say,
the set of relations
DB:db(R
in a variable
) I&)
R := {R,, . . . , R,j with pairwise disjoint Ri, i = 1, . . . , I. ZR denotes the set of interrelational SICs. Relations are usually visualized as tables, the columns corresponding to attributes, the rows corresponding to tuples. As example let us take a database with one relation, named CARRIER, and let us regard a realisation of this relation (Fig. 2). CARRIER contains data of HBSAG-carriers (HBSAG: hepatitis B surface antigen) taken out of a multicentre study about blood donors which are healthy HBSAG-carriers [7]. For data security purposes, the patients’ identification have been changed and the dates of birth have been shortened. The attributes of the HBSAG-CARRIERS relation are: PATIENT (patient’s name, unique for each patient, DOE (date of examination, in years), DOB (date of birth), SEX, WEIGHT (in kg), SGPT (alaninaminotransferase, in units/litre) HBSAG (concentration of HBSAG in microgram/millilitre) Let us restrict the set of SICs 2 to the set of functional dependencies S, say, S C 2. Let furthermore denote rf C VS(A) a subset of the value
Nondeterministic CARRIER
dependencies
DOE
PATIENT
0
alice alice alice alice alice white queen white queen white queen white queen white queen red queen red queen red queen red queen red queen tweedledee tweedledee tweedledee tweedledee tweedledee tweedledum tweedledum tweedledum tweedledum tweedledum knight knight knight knight knight
in relations
DOB
SEX
1950
female female female female female female female female female female female female female female female male male male male male male male male male male male male Inale Inale nale
1950 1950 1950
2 3 4 0 2 3 4 0 2 3 4 0 2 3 4
1950 1942 1942 1942 1942 1942 1955 1955 1955 1955 1955 1932 1932 1932 1932 1932 1935 1935 1935 1935 1935 1952 1952 1952 1952 1952
141 WEIGHT
SGPT
57
19
56 55 55
13 13
53 54
11 13 17 12
55 58 57 66 64 64 62 64 71 70 71 71 72 91 81 81 83 84 62 65 67 65 70
IO 14 7 15 11 12 18 IO 17 15 IO 2 21 22 30 30 17 20 18 17 12 12 15
56
HBSAG 38.0 40.0 26.0 33.0 44.0 6.0 4.8
39.0 34.0 34.0 30.0 17.0 12.0 14.0 16.0 13.0 2.5 1.0
2.5 1.9 1.2 25.0 16.0 19.0 14.0 12.0
Fig. 2. Relation of HBSAG-carriers (db’) set of A at time t. In relation R : rel(A 1S) a set of attributes Y c A is functional dependent to a set of attributes X c A iff vt : Vl’, 11’E rr : 7J.X = M’.X j
?‘. Y = M’.Y.
Here
Thus, the present be defined as
is the projection of the tuple ~1 concerning the attributes X (for 11.x, u. Y, M’.Y of course, it is of the same kind). A functional dependency (FD) named FD,. say, will be designated as FDi : X +
: rel({PATIENT, DOE, DOB, SEX, WEIGHT, SGPT, HBSAG} 1 {FD,: PATIENT -+ DOB, FD2: PATIENT --+ SEX, FD7: PATIENT DOE + WEIGHT, FD4: PATIENT DOE + SGPT. FD5: PATIENT DOE + HBSAG}).
Y.
Usually, the cardinalities of X and Y are defined as I~IXI~IAI,l~IYI~IAI.Becauseofthe decomposition/union rule (see e.q. [S]) X~Z1Uz2~x-+ZIAX+z2 for some arbitrary Zl, 22 C A we can, for simplicity, restrict the cardinality of Y to I Y / : = 1. In the CARRIER relation we can specify 5 functional dependencies: FD, : PATIENT+
db’, say, of DB can
db’ E VS(DB). DB : dh ({CARRIER} 1 f3), CARRIER
il,.X : = (M’/, / A E X c A)
realisation
DOB
FDz : PATIENT
-+
FDs : PATIENT
DOE + WEIGHT
FD4 : PATIENT
DOE + SGPT
FD5 : PATIENT
DOE + HBSAG.
SEX
3. NONDETERMINISTIC DEPENDENCIES
In stating u-priori knowledge in functional dependencies we achieve two purposes: (1) FDs can serve as criteria to map a set of relations into a set of ‘well-structured’ relations. Criteria for well-structuredness could be. e.g. that all relations are in (CODD-) third normal form or in improved third normal form [3]. Additionally, as stated in section 2, (2) FDs can be regarded as a special class of semantic integrity constraints which cause a reaction if they are violated. Because of (1) we can, e.g. split the CARRIERrelation into two third normal form relations shown in Fig. 3. Let us now focus on the relationship of WEIGHT and PATIENT. Obviously there does not exist a functional deoendencv ---f L 2 PATIENT
R. HAUXand U. ECKERT
142
CARRIER 1
I PATIENT alice white
CARRIER 2
SEX
DOB 1950
female
queen red queen tweedledee tweedledum knight
1942 1955 1932 1935 1952
female female male male male
PATIENT
DOE
WEIGHT
SGPT
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
57 56 55 55 56 53 54 55 58 57 66 64 64 62 64 71 70 ;:
19 13 13 11 13 17 12 IO 14 7 15 11 12 18 IO 17 15 IO 2 21 22 30 30 17 20 18 17 12 12 15
alice alice alice alice alice white queen white queen white queen white queen white queen red queen red queen red queen red queen red queen tweedledee tweedledee tweedledee tweedledee tweed ledee tweedledum tweedledum tweedledum tweedledum tweedledum knight knight knight knight knight
72 91 81 81 83 84 62 65 67 65 70
Fig. 3. Relation of HBSAG-carriers
WEIGHT, because we do have varying values of weight in a realisation of DB. We have varying values, because each patients weight had been recorded at several examination dates (additionally the variation could be caused by errors in (repeated) measurement). However if we regard the patients’ empirical weight distributions, we can see that each patient has its typical characteristic concerning her or his weight (Fig. 4). Although the weight of all HBSAG-carriers ranges between about 50 and 100 kg, we could hardly believe that Alice weights 70 kg but no one would doubt if Tweedledee is of that weight. Kinds of relationships as mentioned above can be regarded as semantic integrity constraints and as a special class of dependencies S’, say. Yet, these dependencies are not functional or deterministic. To describe this dependency, we have to interpret each patient’s weight as a random variable.
HBSAG 38.0 40.0 26.0 33.0 44.0 6.0 4.8 5.4 3.1 5.2 40.0 39.0 34.0 34.0 30.0 17.0 12.0 14.0 16.0 13.0 2.5 1.8
2.5 1.9 1.2 25.0 16.0 19.0 14.0 12.0
(db’).
The distribution of the weight is dependent on the values of PATIENT. Such a dependency will be called nondeterministic dependency (ND). In a relation R : rel(A 15” ) an attribute Y E A is nondeterministic dependent (named NDj) to a set of attributes X c A, 1X f 2 1, based on a distribution law 2&V,~CX)(Y), denoted NDj: X-+ Z(y) iff Qt : Qv, w E r’ : u.X = w.x 3 %.x (Y) = %.,, (u). This means: the distribution of the Y-values is depending on the X-values, but in a stochastic manner.
Nondeterministic
dependencies
in relations
143
J
7 ‘ALICE =WITE
2
nlJEcR
3 ali0 OUEW 4 =TuEEoLEDEE 5 =IvEEDLEDw ej ,wtrwr
i I
10
I
I
75.00
t
no.00
1
*
SS.00
I
I1
11
t
SO.00
95.00
100.00
WE1GHT
4 S 6
i
Fig. 4. Empirical cumulative
distribution function (ECDF) and 0, 25, SO, 75, 100% quantiles of WEIGHT for each patient.
, = ALTCE 2 =
WITE
OUEW
3 = REDsum 4 =
fYEEm.ED(;E
5 =
nEEDLEDuN
6 =
KNICMT
SGPT
I
3 2 4 5 6
s-----i
+--tszz mi
l---t-I
Fig. 5. Empirical cumulative dist~bution function (ECDF) and 0, 25, 50, 75, 100% quantiles of SGPT for each patient.
R. HAUX and U. ECKERT
144
Obviously there exists a nondeterministic dependency between any pair X E A, Y E A in R. The specification of an ND in order to describe a semantic integrity constraint is useful, when specific stochastic properties of some well-distinct objects (e.g. patients) exist. In our example we now can specify the following FDs and NDs FD, : PATIENT
+
DOB
FD2 : PATIENT
+
SEX
FDj : PATIENT
DOE -+ WEIGHT
FDA : PATIENT
DOE -+ SGPT
FD5 : PATIENT
DOE -+ HBSAG
ND, : PATIENT
+
.Z(WEIGHT)
ND2 : PATIENT
+
%!(SGPT)
ND3 : PATIENT
+
Z(HBSAG).
The empirical distribution functions of SGPT and HBSAG are shown in Figs. 5 and 6. The present realisation db’, say, of DB can now be summarized as
with F,(y), e.g. by using a Kolmogorov-Smirnov test statistic 191based on
or by (3) assuming that-for each x-value-we obtain a time series, and then check some properties of the time series (see e.g. [9] and [lo]), who describe an application for public health services). Possibility (3) is e.g. appropriate, if a dependency between the date of examination and F,(y) is assumed. This would be the case for the weight distributions of babies, where the mean and the variance for each baby is dependent on the examination date. Unfortunately F,(y) is usually unknown. In addition, for low sample sizes like n = 5, for each patient in the example it is doubtful to proceed in the above mentioned ways. As a simple and easy to use violation criterion we could choose the range and define, that a nondeterministic dependency is violated, if some Yvalue is outside the interval (0, - range/2, 0, + range/2). d denotes the sample median of the Y-val-
db2 E VS(DB),
DB : db ({CARRIERl, CARRIER
CARRIERZ} 1fl),
1 : rel ({PATIENT,
DOB, SEX} 1
{FD, : PATIENT+ CARRIER
2: rel ({PATIENT,
DOB, FD2 : PATIENT+
DOE, WEIGHT,
{FD, : PATIENT
SGPT, HBSAG} 1
DOE -+ WEIGHT,
FD2 : PATIENT
DOE + SGPT,
FDj : PATIENT
DOE -) HBSAG,
ND, : PATIENT
+ y(WEIGHT),
ND2 : PATIENT
+ T(SGPT),
ND, : PATIENT
+ T(HBSAG)}).
Up to now the ND specification is still incomplete: the definition of the assumed distribution law is missing. Let us in the sequel assume that V.S( Y) is ordered. If for some ND the distribution law y(Y), e.g. defined as F,( Y) for all x E VS(X), y E VS( Y), is known, or if the distribution parameters can be estimated, we could investigate violations by (1) checking tolerance intervals so, that for a given probability (Y
SEX})
ues for a given X-value. Especially this proceeding seems to be useful, if-although F,(y) is unknown-we can assume a shift model F,(Y)
:=
F(Y
+
w
for all x E VS(X) and estimate the unknown 8, by Ox. E.g. for ND, a shift model can be used. The ND-specification can be stated completely by NDj : X + ye(Y) range
P(l., < Y < u,) = 1 - cx where lx and u, denote the upper and lower limit for some X-value or by (2) comparing the empirical distribution fix;(y)
For SGPT and especially for HBSAG we can see that the shift model is not appropriate. Clinicians have observed that carriers with high level of HBSAG have a greater dispersion in their HBSAG-
Nondeterministic
dependencies
in relations
145
6
4
7
1
A
L-L
3 1
LL
. ~o-L+-5.00
4
I 15.00
10.00
1 =ALICE
I
I
20 .oo
I
I8
2s.00
I
30.00
I
55.00
2
=WllE
3 4 5 6
=RED OULW ‘1MEDLEoEE ‘1uEEDLEDUn =nr1wr
I1
I
40.00
OuEEa
1,
I
45.00
SO.00
HBSA G
Fig. 6. Empirical cumulative distribution function (ECDF) and 0, 25, 50,75. 100% quantiles of HBSAG for each patient.
CARRIER
PATIENT
alice alice alice alice alice white queen white aueen white queen white queen white &een red aueen red &een red queen red queen red aueen tweedledee tweedledee tweedledee tweedledee tweedledee tweedledum tweedledum tweedledum tweedledum tweedledum knight knight knight knight knight
DOE
WEIGHT
SGPT
LOGSGPT
HBSAG
LOGHBSAG
0 1 2 3 4 0 1 2 3 4 0
19 13 13 11 13 17 12 10 14 7 15
2.94444 2.56495 2.56495 2.39790 2.56495 2.83321 2.48491
38.0 40.0 26.0 33.0 44.0 6.0 4.8
3.63759
2.30259 2.63906 1.94591 2.70805
5.4 3.1 5.2 40.0
11
2.39790 2.48491
39.0
2 3 4 0 1 2 3 4
57 56 55 55 56 53 54 55 58 57 66 64 64 62 64 71 70 71 71 72
0
91
1
1
2 3 4 0 1 2 3 4
81 81 83 84 62 65 67 65 70
12 18 10 17 15 10 2 21 22 30 30 17 20 18 17
12 12 15
2.89037 2.30259
2.83321 2.70805 2.30259 0.69315 3.04452 3.09104 3.40120 3.40120 2.83321 2.99573 2.89037 2.83321 2.48491 2.48491 2.70805
34.0 34.0 30.0 17.0 12.0 14.0 16.0 13.0 2.5 1.8
2.5 1.9 1.2 25.0 16.0 19.0 14.0 12.0
Fig.7.Relation including log(SGPT)and log(HBSAG) values.
3.68888 3.25810 3.49651 3.78419 1 .79176 1.56862 1.68640 1.13140 1 .64866 3.68888 3.66356 3.52636 3.52636 3.40120 2.83321 2.48491 2.63906 2.17259 2.56495 0.91629
0.58779 0.91629
0.64185 0.18232 3.21888 2.77259 2.94444 2.63906 2.48491
R. NAUX and U. ECKFZRT
r-r
?! 90.00
,
, 0.50
,
, 1.00
1
,
*so
( 2.00
,
; 2.50
,
, s.oa
j
, 3.50
,
, 4.00
i
,
,
t*Sa
LOG(HESAG)
1
I
2 3 4
St----c3 6
Fig. 8. Empirical cumulative
d~strjbuti~~ function (ECDF) and 0, 25, 50, 75, 100% quantites log(WBSAG) for each patient.
Procedure
Attribute, database, data structure and ND-name are names in SAS conventicn. ND-name has to be unique. Interval is either a real number or an integer.
Fig. 9. Syntax diagram of SAS procedure ND.
( 5.00
of
Nondeterministic
dependencies
values, whereas carriers with a low level of HBSAG do not vary so strongly in their HBSAG values. Here we can assume a scale model F,(Y)
:=
F(Y
. exp(@.,))
in relations
147
PROCEDURE ND DATA = CARRIER3: ND FDl: PATIENTDOE -> WEIGHT, FD2: PATIENTDOE -' SGPT,
for all x E VS(X). As is known, such scale models can be reduced to shift models by calculating the logarithm of F. Here we obtain
FD3: PATIENTDOE -> HBSAG,
F,Jlog(y)) : = F(log(y)
ND2: PATIENT-> LOGSGPT185,
+ 0,).
Therefore, we only have to compute log(HBSAG) and log(SGPT) and specify ND2 and ND7 by using the shift model for the logarithmic Yvalues (see Fig. 7 for a relation, which contains the logarithms, and Fig. 8 for the empirical distribution function of log(HBSAG)). Of course, beside the median-range method other procedures to check NDs can be imagined, too. However NDs can only be used to check SICs, not to find out well-structured relations, like FDs.
4. DEPENDENCY
RELATIONSHIPS
As will be shown, FDs are some special NDs, hence S c S’ c 2,. A nondeterministic dependency degenerates to a functional dependency if the variance VAR,( Y) of the dependent variable is zero. Let us again assume, that there is a relation R : rel(AIZ~),thatX~A,IXI?l,YEA,aEVS(A) and that the subsets of V.!?(A) are denoted as r’. Lemma: A nondeterministic dependency ND, : X + y(Y) is a functional dependency FD, : X + Y, iff VAR,( Y) = 0 for all .r E V.!?(X). Proof: Recall the ND-definition:
If VAR,JY) = 0 then every distribution function or F,..dy) reduces to a one-point distribution and so Y to a constant y(7j.X) or y(w.X), say. If F,&y) = F,,.Ay), i.e. if the law is identical, then obviously the constants are identical. Thus, for all v.X = w.X, we obtain the ssme Y-value in the vand w-tuple: therefore 7j.Y = w.Y. So we obtain the FD-definition that with probability I F,,.Ay)
NDl: PATIENT-> WEIGHT 20, ND3: PATIENT-> LOGHBSAG1,5; Fig. IO. Procedure all for ND
stored in the database of the statistical analysis system SAS (for references see [5]). For checking the validity of the data before starting the analysis nondeterministic and functional dependencies have been specified. However in SAS there exists no possibility to add such semantic integrity constraints to the data structures’ type definitions. So we implemented an additional method (in SAS procedure) into the SAS methodbase. By this method NDs can be specified and checked. The simplified syntax is given in Fig. 9. In a first version we implemented the checking by defining a range, assuming a shift model. Functional dependencies can be stated by specifying no range-value. In this case range 0 is expected by the method and therefore a “distribution” with variance 0. As we have proved in section 4 a ND with a variance 0 distribution is a FD. A complete documentation of the method’s syntax and semantics can be found in [ 121. The stated nondeterministic and functional dependencies of the HBSAG-carriers data in CARRIER3 can be found in Fig. 10. The procedure classifies the tuple of Tweedledee’s 3-year examination date as invalid because of the very low SGPT-value.
6. DISCUSSION
Relations are database representations for objects of the real world. For limiting the number of extensions of such relations, we often want to state some facts about the reality’s excerpt, which is conVf:Vv,M’ E rf : v.x = w.x j ?I. Y = U’.Y. sidered, to the data structure definitions or to the database definition. Usually in database theory On the other hand, if VAR,( Y) > 0, then F,,.dy) facts are specified which are either true (fulfilled) and F,,.x(y) do not reduce to a one-point distribuor false (violated): the so-called semantic integrity tion. Therefore we cannot assume that ~1.Y and w. Y constraints. Well known semantic integrity conare identical. Q.E.D. straints are the functional dependencies (recall Fig. 1). But often, facts about the real world are more of a stochastic nature than of a deterministic one. 5. FUNCTIONAL AND NONDETERMINISTIC In this article we have defined a class of semantic DEPENDENCIES IMPLEMENTED integrity constraints, where not only deterministic, functional dependencies can be stated, but also The HBSAG-carriers data have been collected nondeterministic ones. In data used for statistical for statistical data analysis. Therefore they were
R. HAUXand U. ECKERT
148
analysis, e.g. such dependencies appear frequently, but cannot be expressed by FDs or other deterministic dependencies like multivalued dependencies, etc. The concept of nondeterministic dependency, which can be regarded as extension of the concept of functional dependency, enables a user to specify more a-priory knowledge about the reality considered. Of course NDs cannot be used to describe appropriately all relationships between any two attributes. However they are useful for specific applications, in which stochastic relationships can be observed for some well-distinct objects but which would disappear if the data structure as a whole is under investigation. Examples for such relationships are persons and persons’ data, clinics and clinics’ physical examination methods or populations and populations’ properties. Acknowledgements.-The authors would like to thank M. Alle, Heidelberg, F. J. Leven, Heilbronn and W. Stucky, Karlsruhe for their comments and corrections, which led to an improvement of a previous version of this article. REFERENCES
[l] E. F. Codd: Extending the database relational model to capture more meaning, ACM Trans. Database Syst. 4, 397-434 (1979).
[2] E. F. Codd: Relational database: a practical foundation for productivity, Comm. ACM 25, 109-117 (1982). [3] T.-W. Ling, F. W. Tompa and T. Kameda: An improved third normal form for relational databases, ACM Trans. Database Syst. 6, 329-346 (1981). [4] G. Schlageter and W. Stucky: Datenbanksysteme: Konzepte und Modelle. 2nd edition, Teubner, Stuttgart (1983). [5] I. Francis: Statistical Software: A Comparative Review, North Holland, New York (1981). [6] R. Haux: Die Verwendung komplexer Datenstrukturen in Statistischen Auswertungssystemen, Doctoral thesis. Univ. of Ulm (1983). [7] U. Kaboth et al. (DFG st;dy group ‘viral hepatitis’): Kooperative prospektive Studie ‘Klinisch gesunder HBsAg-Trlger’ (DFG). Verh. dt. Ges. f. innere Med. 86, 749-756 (1980). [8] W. W. Armstrong: Dependency structures in data base relationships. J. L. Rosenfeld (Ed.): Information Processing 74, 580-583, North Holland, Amsterdam (1974). [9] J. Hajek and Z. Sidak: Theory of Rank Tests, Academic Press, New York (1967). [lo] T. Yasaka: Health control data-base system and subject-specific normal range. Med. Inform. 1, 105-132 (1976). [I I] K. Nakano, T. Atobe, Y. Hiraki and T. Yasaka: Estimation of subject-specific normal ranges based on some statisticaimodels of an individual’s physiological variations. Med. Inform. 6, 195-205 (1981). 1121 M. Alle and U. Eckert: SAS Procedure ND. Techn. *Report, Institute of Medical Documentation, Statistics and Computer Science, University of Heidelberg (1983).