Nondeterministic dependencies in relations: An extention of the concept of functional dependency

Nondeterministic dependencies in relations: An extention of the concept of functional dependency

Infororm Sysrrm.s Vol. Pruned in the U.S.A IO. No 2, pp. 139-148. 1985 NONDETERMINISTIC DEPENDENCIES IN RELATIONS: AN EXTENTION OF THE CONCEPT OF...

754KB Sizes 0 Downloads 20 Views

Infororm Sysrrm.s Vol. Pruned in the U.S.A

IO. No

2, pp.

139-148.

1985

NONDETERMINISTIC DEPENDENCIES IN RELATIONS: AN EXTENTION OF THE CONCEPT OF FUNCTIONAL DEPENDENCY R. HAUX Institute of Medical Documentation, Statistics and Computer Science, University of Heidelberg. Neuenheimer Feld 325, D-6900 Heidelberg. F.R. Germany

Im

and U. ECKERT Medical Computing Center, University of Heidelberg. Im Neuenheimer Heidelberg, F.R. Germany

Abstract-In the present article, some special semantic istic dependencies-are proposed. These dependencies functional dependencies. After some basic definitions. is introduced. Examples are given and an implementation Some properties are discussed.

Feld 293, D-6900

integrit) constraints-so called nondetermincan be regarded as stochastic extensions of the concept of nondeterministic dependency for a statistical analysis system is described.

Keywords: nondeterministic dependencies, functional dependencies, relational data model, data- and methodbase systems, statistical analysis systems, database systems. 1. INTRODUCTION

After having studied examples for relations in database systems literature, a reader is quite willing to think that there exist only real world mappings of employees and departments, suppliers and salesmen or students and classes. The relationships of attributes in such data structures are mostly specified in a deterministic manner, e.g. “an employee is working in exactly one department” or “a student belongs to exactly one class”. However, there exist other kinds of relationships-although not such strict ones-also. As will be demonstrated in the present article, a class of constraints can be described which cannot be specified in the usual deterministic manner, but in pointing out stochastic properties. Describing these stochastic properties, we obtain the opportunity to state facts, which could not be stated by commonly used types of constraints. These constrainsts will be called nondeterministic dependencies or briefly NDs. As will be pointed out nondeterministic dependencies are a subset of all semantic integrity constraints, which includes the set of functional dependencies (Fig. 1). NDs can be applied to data stored in databases. The database can be part of a database system or of related systems such as statistical analysis systems. Before pointing out how to construct such nondeterministic dependencies, some basic definitions will be given in the next section. 2. CONCEYWAL DATA MODEL AND

BASIC DEFINITIONS

In this section we are considering some properties of data- and methodbase systems which are

based on the relational data model (e.g. [l], [2], [3]; the notation concerning relations is based on the terminology of [4]. Let DMBS : = (DB, MB, DMBMS) denote a data- and methodbase system (DMBS). A DMBS consists of a database (DB) and a mrthodbase (MB). Both are created and managed by a data-

and

methodhase

management

system

(DMBMS). The DMBMS can activate methods of MB and access to data stored in DB. In addition it represents the interface to a user. In this context, n method is defined as a program-module, which produces a certain performance. If methods in MB are mainly dealing with data analysis based on statistical methods, the data- and methodbase system is called statistical analysis system (a survey of such systems can be found in [S]). If MB is empty, or if MB does not exist respectively, a DMBS is called a database system (DBS) its management component a dutabase munagement system (DBMS). Similar to a statistical analysis system a DBS can be regarded as a special case of a DMBS. In the sequel we will focus on the database and its management, especially the data structures. More detailed definitions concerning the methodbase and its management can be found in [6]. We will assume that the database of the DMBS is-at the conceptual level-based on the relational model. Following [4], we define the terms attribute, relation and database as variables specified by their names and types (schemes) and an extension of a variable as a value (realisation). A data structure 139

R. HAUX and U. ECKERT

FACTS

IN

REALITY

SEMANTIC

Fig.

type relation is defined p

1. Relationships

INTEGRITY

between

facts,

: = rel(A 1CA),

where ZA denotes a set of semantic integrity contraints (SICs) concerning the set of attributes A. A SIC u, say, is a fact about the reality which is either ‘true’ (fulfilled) or ‘false’ (violated). It will be examined at some predefined times and-if violated-will cause a specified reaction. Each attribute named Ak E A : = {A,, . . . , AK}, with pairwise disjoint AK, k = 1, . . , K, is defined by its data type, mainly by the data type’s value set (domain) VS(AL), or briefly-if the attribute’s name is not of importance-VVS(A). The values of A will be denoted as aA, where VA E A.

A K-tuple a, say, can be described a:=

as

(aAIAEA)

with KY(A) : = {(aA 1A E A)}. Now, we can define the (valid) value set of a relation variable named R, say, R : re/(A 12,) as W(R)

: = {r c V.S(A) ( Vu E CA:

constraints

and dependencies.

I.e. A valid realisation r of R is a subset of the value set of A in which no SIC is violated. A type database is defined as

as

aA E W(A)

CONSTRAINTS

u = ‘true’}.

6 : = db(R 1CR) where R denotes DB, say,

the set of relations

DB:db(R

in a variable

) I&)

R := {R,, . . . , R,j with pairwise disjoint Ri, i = 1, . . . , I. ZR denotes the set of interrelational SICs. Relations are usually visualized as tables, the columns corresponding to attributes, the rows corresponding to tuples. As example let us take a database with one relation, named CARRIER, and let us regard a realisation of this relation (Fig. 2). CARRIER contains data of HBSAG-carriers (HBSAG: hepatitis B surface antigen) taken out of a multicentre study about blood donors which are healthy HBSAG-carriers [7]. For data security purposes, the patients’ identification have been changed and the dates of birth have been shortened. The attributes of the HBSAG-CARRIERS relation are: PATIENT (patient’s name, unique for each patient, DOE (date of examination, in years), DOB (date of birth), SEX, WEIGHT (in kg), SGPT (alaninaminotransferase, in units/litre) HBSAG (concentration of HBSAG in microgram/millilitre) Let us restrict the set of SICs 2 to the set of functional dependencies S, say, S C 2. Let furthermore denote rf C VS(A) a subset of the value

Nondeterministic CARRIER

dependencies

DOE

PATIENT

0

alice alice alice alice alice white queen white queen white queen white queen white queen red queen red queen red queen red queen red queen tweedledee tweedledee tweedledee tweedledee tweedledee tweedledum tweedledum tweedledum tweedledum tweedledum knight knight knight knight knight

in relations

DOB

SEX

1950

female female female female female female female female female female female female female female female male male male male male male male male male male male male Inale Inale nale

1950 1950 1950

2 3 4 0 2 3 4 0 2 3 4 0 2 3 4

1950 1942 1942 1942 1942 1942 1955 1955 1955 1955 1955 1932 1932 1932 1932 1932 1935 1935 1935 1935 1935 1952 1952 1952 1952 1952

141 WEIGHT

SGPT

57

19

56 55 55

13 13

53 54

11 13 17 12

55 58 57 66 64 64 62 64 71 70 71 71 72 91 81 81 83 84 62 65 67 65 70

IO 14 7 15 11 12 18 IO 17 15 IO 2 21 22 30 30 17 20 18 17 12 12 15

56

HBSAG 38.0 40.0 26.0 33.0 44.0 6.0 4.8

39.0 34.0 34.0 30.0 17.0 12.0 14.0 16.0 13.0 2.5 1.0

2.5 1.9 1.2 25.0 16.0 19.0 14.0 12.0

Fig. 2. Relation of HBSAG-carriers (db’) set of A at time t. In relation R : rel(A 1S) a set of attributes Y c A is functional dependent to a set of attributes X c A iff vt : Vl’, 11’E rr : 7J.X = M’.X j

?‘. Y = M’.Y.

Here

Thus, the present be defined as

is the projection of the tuple ~1 concerning the attributes X (for 11.x, u. Y, M’.Y of course, it is of the same kind). A functional dependency (FD) named FD,. say, will be designated as FDi : X +

: rel({PATIENT, DOE, DOB, SEX, WEIGHT, SGPT, HBSAG} 1 {FD,: PATIENT -+ DOB, FD2: PATIENT --+ SEX, FD7: PATIENT DOE + WEIGHT, FD4: PATIENT DOE + SGPT. FD5: PATIENT DOE + HBSAG}).

Y.

Usually, the cardinalities of X and Y are defined as I~IXI~IAI,l~IYI~IAI.Becauseofthe decomposition/union rule (see e.q. [S]) X~Z1Uz2~x-+ZIAX+z2 for some arbitrary Zl, 22 C A we can, for simplicity, restrict the cardinality of Y to I Y / : = 1. In the CARRIER relation we can specify 5 functional dependencies: FD, : PATIENT+

db’, say, of DB can

db’ E VS(DB). DB : dh ({CARRIER} 1 f3), CARRIER

il,.X : = (M’/, / A E X c A)

realisation

DOB

FDz : PATIENT

-+

FDs : PATIENT

DOE + WEIGHT

FD4 : PATIENT

DOE + SGPT

FD5 : PATIENT

DOE + HBSAG.

SEX

3. NONDETERMINISTIC DEPENDENCIES

In stating u-priori knowledge in functional dependencies we achieve two purposes: (1) FDs can serve as criteria to map a set of relations into a set of ‘well-structured’ relations. Criteria for well-structuredness could be. e.g. that all relations are in (CODD-) third normal form or in improved third normal form [3]. Additionally, as stated in section 2, (2) FDs can be regarded as a special class of semantic integrity constraints which cause a reaction if they are violated. Because of (1) we can, e.g. split the CARRIERrelation into two third normal form relations shown in Fig. 3. Let us now focus on the relationship of WEIGHT and PATIENT. Obviously there does not exist a functional deoendencv ---f L 2 PATIENT

R. HAUXand U. ECKERT

142

CARRIER 1

I PATIENT alice white

CARRIER 2

SEX

DOB 1950

female

queen red queen tweedledee tweedledum knight

1942 1955 1932 1935 1952

female female male male male

PATIENT

DOE

WEIGHT

SGPT

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4

57 56 55 55 56 53 54 55 58 57 66 64 64 62 64 71 70 ;:

19 13 13 11 13 17 12 IO 14 7 15 11 12 18 IO 17 15 IO 2 21 22 30 30 17 20 18 17 12 12 15

alice alice alice alice alice white queen white queen white queen white queen white queen red queen red queen red queen red queen red queen tweedledee tweedledee tweedledee tweedledee tweed ledee tweedledum tweedledum tweedledum tweedledum tweedledum knight knight knight knight knight

72 91 81 81 83 84 62 65 67 65 70

Fig. 3. Relation of HBSAG-carriers

WEIGHT, because we do have varying values of weight in a realisation of DB. We have varying values, because each patients weight had been recorded at several examination dates (additionally the variation could be caused by errors in (repeated) measurement). However if we regard the patients’ empirical weight distributions, we can see that each patient has its typical characteristic concerning her or his weight (Fig. 4). Although the weight of all HBSAG-carriers ranges between about 50 and 100 kg, we could hardly believe that Alice weights 70 kg but no one would doubt if Tweedledee is of that weight. Kinds of relationships as mentioned above can be regarded as semantic integrity constraints and as a special class of dependencies S’, say. Yet, these dependencies are not functional or deterministic. To describe this dependency, we have to interpret each patient’s weight as a random variable.

HBSAG 38.0 40.0 26.0 33.0 44.0 6.0 4.8 5.4 3.1 5.2 40.0 39.0 34.0 34.0 30.0 17.0 12.0 14.0 16.0 13.0 2.5 1.8

2.5 1.9 1.2 25.0 16.0 19.0 14.0 12.0

(db’).

The distribution of the weight is dependent on the values of PATIENT. Such a dependency will be called nondeterministic dependency (ND). In a relation R : rel(A 15” ) an attribute Y E A is nondeterministic dependent (named NDj) to a set of attributes X c A, 1X f 2 1, based on a distribution law 2&V,~CX)(Y), denoted NDj: X-+ Z(y) iff Qt : Qv, w E r’ : u.X = w.x 3 %.x (Y) = %.,, (u). This means: the distribution of the Y-values is depending on the X-values, but in a stochastic manner.

Nondeterministic

dependencies

in relations

143

J

7 ‘ALICE =WITE

2

nlJEcR

3 ali0 OUEW 4 =TuEEoLEDEE 5 =IvEEDLEDw ej ,wtrwr

i I

10

I

I

75.00

t

no.00

1

*

SS.00

I

I1

11

t

SO.00

95.00

100.00

WE1GHT

4 S 6

i

Fig. 4. Empirical cumulative

distribution function (ECDF) and 0, 25, SO, 75, 100% quantiles of WEIGHT for each patient.

, = ALTCE 2 =

WITE

OUEW

3 = REDsum 4 =

fYEEm.ED(;E

5 =

nEEDLEDuN

6 =

KNICMT

SGPT

I

3 2 4 5 6

s-----i

+--tszz mi

l---t-I

Fig. 5. Empirical cumulative dist~bution function (ECDF) and 0, 25, 50, 75, 100% quantiles of SGPT for each patient.

R. HAUX and U. ECKERT

144

Obviously there exists a nondeterministic dependency between any pair X E A, Y E A in R. The specification of an ND in order to describe a semantic integrity constraint is useful, when specific stochastic properties of some well-distinct objects (e.g. patients) exist. In our example we now can specify the following FDs and NDs FD, : PATIENT

+

DOB

FD2 : PATIENT

+

SEX

FDj : PATIENT

DOE -+ WEIGHT

FDA : PATIENT

DOE -+ SGPT

FD5 : PATIENT

DOE -+ HBSAG

ND, : PATIENT

+

.Z(WEIGHT)

ND2 : PATIENT

+

%!(SGPT)

ND3 : PATIENT

+

Z(HBSAG).

The empirical distribution functions of SGPT and HBSAG are shown in Figs. 5 and 6. The present realisation db’, say, of DB can now be summarized as

with F,(y), e.g. by using a Kolmogorov-Smirnov test statistic 191based on

or by (3) assuming that-for each x-value-we obtain a time series, and then check some properties of the time series (see e.g. [9] and [lo]), who describe an application for public health services). Possibility (3) is e.g. appropriate, if a dependency between the date of examination and F,(y) is assumed. This would be the case for the weight distributions of babies, where the mean and the variance for each baby is dependent on the examination date. Unfortunately F,(y) is usually unknown. In addition, for low sample sizes like n = 5, for each patient in the example it is doubtful to proceed in the above mentioned ways. As a simple and easy to use violation criterion we could choose the range and define, that a nondeterministic dependency is violated, if some Yvalue is outside the interval (0, - range/2, 0, + range/2). d denotes the sample median of the Y-val-

db2 E VS(DB),

DB : db ({CARRIERl, CARRIER

CARRIERZ} 1fl),

1 : rel ({PATIENT,

DOB, SEX} 1

{FD, : PATIENT+ CARRIER

2: rel ({PATIENT,

DOB, FD2 : PATIENT+

DOE, WEIGHT,

{FD, : PATIENT

SGPT, HBSAG} 1

DOE -+ WEIGHT,

FD2 : PATIENT

DOE + SGPT,

FDj : PATIENT

DOE -) HBSAG,

ND, : PATIENT

+ y(WEIGHT),

ND2 : PATIENT

+ T(SGPT),

ND, : PATIENT

+ T(HBSAG)}).

Up to now the ND specification is still incomplete: the definition of the assumed distribution law is missing. Let us in the sequel assume that V.S( Y) is ordered. If for some ND the distribution law y(Y), e.g. defined as F,( Y) for all x E VS(X), y E VS( Y), is known, or if the distribution parameters can be estimated, we could investigate violations by (1) checking tolerance intervals so, that for a given probability (Y

SEX})

ues for a given X-value. Especially this proceeding seems to be useful, if-although F,(y) is unknown-we can assume a shift model F,(Y)

:=

F(Y

+

w

for all x E VS(X) and estimate the unknown 8, by Ox. E.g. for ND, a shift model can be used. The ND-specification can be stated completely by NDj : X + ye(Y) range

P(l., < Y < u,) = 1 - cx where lx and u, denote the upper and lower limit for some X-value or by (2) comparing the empirical distribution fix;(y)

For SGPT and especially for HBSAG we can see that the shift model is not appropriate. Clinicians have observed that carriers with high level of HBSAG have a greater dispersion in their HBSAG-

Nondeterministic

dependencies

in relations

145

6

4

7

1

A

L-L

3 1

LL

. ~o-L+-5.00

4

I 15.00

10.00

1 =ALICE

I

I

20 .oo

I

I8

2s.00

I

30.00

I

55.00

2

=WllE

3 4 5 6

=RED OULW ‘1MEDLEoEE ‘1uEEDLEDUn =nr1wr

I1

I

40.00

OuEEa

1,

I

45.00

SO.00

HBSA G

Fig. 6. Empirical cumulative distribution function (ECDF) and 0, 25, 50,75. 100% quantiles of HBSAG for each patient.

CARRIER

PATIENT

alice alice alice alice alice white queen white aueen white queen white queen white &een red aueen red &een red queen red queen red aueen tweedledee tweedledee tweedledee tweedledee tweedledee tweedledum tweedledum tweedledum tweedledum tweedledum knight knight knight knight knight

DOE

WEIGHT

SGPT

LOGSGPT

HBSAG

LOGHBSAG

0 1 2 3 4 0 1 2 3 4 0

19 13 13 11 13 17 12 10 14 7 15

2.94444 2.56495 2.56495 2.39790 2.56495 2.83321 2.48491

38.0 40.0 26.0 33.0 44.0 6.0 4.8

3.63759

2.30259 2.63906 1.94591 2.70805

5.4 3.1 5.2 40.0

11

2.39790 2.48491

39.0

2 3 4 0 1 2 3 4

57 56 55 55 56 53 54 55 58 57 66 64 64 62 64 71 70 71 71 72

0

91

1

1

2 3 4 0 1 2 3 4

81 81 83 84 62 65 67 65 70

12 18 10 17 15 10 2 21 22 30 30 17 20 18 17

12 12 15

2.89037 2.30259

2.83321 2.70805 2.30259 0.69315 3.04452 3.09104 3.40120 3.40120 2.83321 2.99573 2.89037 2.83321 2.48491 2.48491 2.70805

34.0 34.0 30.0 17.0 12.0 14.0 16.0 13.0 2.5 1.8

2.5 1.9 1.2 25.0 16.0 19.0 14.0 12.0

Fig.7.Relation including log(SGPT)and log(HBSAG) values.

3.68888 3.25810 3.49651 3.78419 1 .79176 1.56862 1.68640 1.13140 1 .64866 3.68888 3.66356 3.52636 3.52636 3.40120 2.83321 2.48491 2.63906 2.17259 2.56495 0.91629

0.58779 0.91629

0.64185 0.18232 3.21888 2.77259 2.94444 2.63906 2.48491

R. NAUX and U. ECKFZRT

r-r

?! 90.00

,

, 0.50

,

, 1.00

1

,

*so

( 2.00

,

; 2.50

,

, s.oa

j

, 3.50

,

, 4.00

i

,

,

t*Sa

LOG(HESAG)

1

I

2 3 4

St----c3 6

Fig. 8. Empirical cumulative

d~strjbuti~~ function (ECDF) and 0, 25, 50, 75, 100% quantites log(WBSAG) for each patient.

Procedure

Attribute, database, data structure and ND-name are names in SAS conventicn. ND-name has to be unique. Interval is either a real number or an integer.

Fig. 9. Syntax diagram of SAS procedure ND.

( 5.00

of

Nondeterministic

dependencies

values, whereas carriers with a low level of HBSAG do not vary so strongly in their HBSAG values. Here we can assume a scale model F,(Y)

:=

F(Y

. exp(@.,))

in relations

147

PROCEDURE ND DATA = CARRIER3: ND FDl: PATIENTDOE -> WEIGHT, FD2: PATIENTDOE -' SGPT,

for all x E VS(X). As is known, such scale models can be reduced to shift models by calculating the logarithm of F. Here we obtain

FD3: PATIENTDOE -> HBSAG,

F,Jlog(y)) : = F(log(y)

ND2: PATIENT-> LOGSGPT185,

+ 0,).

Therefore, we only have to compute log(HBSAG) and log(SGPT) and specify ND2 and ND7 by using the shift model for the logarithmic Yvalues (see Fig. 7 for a relation, which contains the logarithms, and Fig. 8 for the empirical distribution function of log(HBSAG)). Of course, beside the median-range method other procedures to check NDs can be imagined, too. However NDs can only be used to check SICs, not to find out well-structured relations, like FDs.

4. DEPENDENCY

RELATIONSHIPS

As will be shown, FDs are some special NDs, hence S c S’ c 2,. A nondeterministic dependency degenerates to a functional dependency if the variance VAR,( Y) of the dependent variable is zero. Let us again assume, that there is a relation R : rel(AIZ~),thatX~A,IXI?l,YEA,aEVS(A) and that the subsets of V.!?(A) are denoted as r’. Lemma: A nondeterministic dependency ND, : X + y(Y) is a functional dependency FD, : X + Y, iff VAR,( Y) = 0 for all .r E V.!?(X). Proof: Recall the ND-definition:

If VAR,JY) = 0 then every distribution function or F,..dy) reduces to a one-point distribution and so Y to a constant y(7j.X) or y(w.X), say. If F,&y) = F,,.Ay), i.e. if the law is identical, then obviously the constants are identical. Thus, for all v.X = w.X, we obtain the ssme Y-value in the vand w-tuple: therefore 7j.Y = w.Y. So we obtain the FD-definition that with probability I F,,.Ay)

NDl: PATIENT-> WEIGHT 20, ND3: PATIENT-> LOGHBSAG1,5; Fig. IO. Procedure all for ND

stored in the database of the statistical analysis system SAS (for references see [5]). For checking the validity of the data before starting the analysis nondeterministic and functional dependencies have been specified. However in SAS there exists no possibility to add such semantic integrity constraints to the data structures’ type definitions. So we implemented an additional method (in SAS procedure) into the SAS methodbase. By this method NDs can be specified and checked. The simplified syntax is given in Fig. 9. In a first version we implemented the checking by defining a range, assuming a shift model. Functional dependencies can be stated by specifying no range-value. In this case range 0 is expected by the method and therefore a “distribution” with variance 0. As we have proved in section 4 a ND with a variance 0 distribution is a FD. A complete documentation of the method’s syntax and semantics can be found in [ 121. The stated nondeterministic and functional dependencies of the HBSAG-carriers data in CARRIER3 can be found in Fig. 10. The procedure classifies the tuple of Tweedledee’s 3-year examination date as invalid because of the very low SGPT-value.

6. DISCUSSION

Relations are database representations for objects of the real world. For limiting the number of extensions of such relations, we often want to state some facts about the reality’s excerpt, which is conVf:Vv,M’ E rf : v.x = w.x j ?I. Y = U’.Y. sidered, to the data structure definitions or to the database definition. Usually in database theory On the other hand, if VAR,( Y) > 0, then F,,.dy) facts are specified which are either true (fulfilled) and F,,.x(y) do not reduce to a one-point distribuor false (violated): the so-called semantic integrity tion. Therefore we cannot assume that ~1.Y and w. Y constraints. Well known semantic integrity conare identical. Q.E.D. straints are the functional dependencies (recall Fig. 1). But often, facts about the real world are more of a stochastic nature than of a deterministic one. 5. FUNCTIONAL AND NONDETERMINISTIC In this article we have defined a class of semantic DEPENDENCIES IMPLEMENTED integrity constraints, where not only deterministic, functional dependencies can be stated, but also The HBSAG-carriers data have been collected nondeterministic ones. In data used for statistical for statistical data analysis. Therefore they were

R. HAUXand U. ECKERT

148

analysis, e.g. such dependencies appear frequently, but cannot be expressed by FDs or other deterministic dependencies like multivalued dependencies, etc. The concept of nondeterministic dependency, which can be regarded as extension of the concept of functional dependency, enables a user to specify more a-priory knowledge about the reality considered. Of course NDs cannot be used to describe appropriately all relationships between any two attributes. However they are useful for specific applications, in which stochastic relationships can be observed for some well-distinct objects but which would disappear if the data structure as a whole is under investigation. Examples for such relationships are persons and persons’ data, clinics and clinics’ physical examination methods or populations and populations’ properties. Acknowledgements.-The authors would like to thank M. Alle, Heidelberg, F. J. Leven, Heilbronn and W. Stucky, Karlsruhe for their comments and corrections, which led to an improvement of a previous version of this article. REFERENCES

[l] E. F. Codd: Extending the database relational model to capture more meaning, ACM Trans. Database Syst. 4, 397-434 (1979).

[2] E. F. Codd: Relational database: a practical foundation for productivity, Comm. ACM 25, 109-117 (1982). [3] T.-W. Ling, F. W. Tompa and T. Kameda: An improved third normal form for relational databases, ACM Trans. Database Syst. 6, 329-346 (1981). [4] G. Schlageter and W. Stucky: Datenbanksysteme: Konzepte und Modelle. 2nd edition, Teubner, Stuttgart (1983). [5] I. Francis: Statistical Software: A Comparative Review, North Holland, New York (1981). [6] R. Haux: Die Verwendung komplexer Datenstrukturen in Statistischen Auswertungssystemen, Doctoral thesis. Univ. of Ulm (1983). [7] U. Kaboth et al. (DFG st;dy group ‘viral hepatitis’): Kooperative prospektive Studie ‘Klinisch gesunder HBsAg-Trlger’ (DFG). Verh. dt. Ges. f. innere Med. 86, 749-756 (1980). [8] W. W. Armstrong: Dependency structures in data base relationships. J. L. Rosenfeld (Ed.): Information Processing 74, 580-583, North Holland, Amsterdam (1974). [9] J. Hajek and Z. Sidak: Theory of Rank Tests, Academic Press, New York (1967). [lo] T. Yasaka: Health control data-base system and subject-specific normal range. Med. Inform. 1, 105-132 (1976). [I I] K. Nakano, T. Atobe, Y. Hiraki and T. Yasaka: Estimation of subject-specific normal ranges based on some statisticaimodels of an individual’s physiological variations. Med. Inform. 6, 195-205 (1981). 1121 M. Alle and U. Eckert: SAS Procedure ND. Techn. *Report, Institute of Medical Documentation, Statistics and Computer Science, University of Heidelberg (1983).