Redundancy in functional databases

Redundancy in functional databases

03fxs379 86 g3.00 + 0.00 Perganon Journals Ltd ~E~~N~ANCY IN FUNCTIONAL DATABASES LEVENT ORMAN Graduate School of Management, Cornell Universiry, ...

812KB Sizes 2 Downloads 95 Views

03fxs379 86 g3.00 + 0.00 Perganon Journals Ltd

~E~~N~ANCY

IN FUNCTIONAL

DATABASES

LEVENT ORMAN Graduate School of Management, Cornell Universiry, Ithaca, SY t4853. U.S.A

Xbstract-A set of axioms are presented to algebraically derive implied redundancies in functional databases. The axioms are proved to be sound with respect to the definitions of operatars. The functional granularity concept is introduced and shown to be crucial in derivations.

Kqvrcord~: Database design, non-redundancy,

1.

implied redundancy. functiona

1NTRQDUCTlON

algebraic system defined by the three operators

A functional database consists of functions defined on data sefs[l, 21. Each function is a set valued mapping from its domain to its range and it is characterized by a function name in addition to the names of the data sets constituting its domain and its range. A data set is a named set of database objects, where real world entities in addition to the character strings and numbers used to describe those enti$zs constitute the database objects. E.ranz&~ I.!: A university environment can be modelled using the following functions where a function f from D,, , D,, to R is denoted by /CD,, .a.. D,) --t R and referred to either as f or

f(D,,

,

* &,I:

COURSE # (COURSE) + COURSE

#

TEXT(COURSE) -+ TEXT STUDENT(COURSE -+ STUDENT NAME(INSTRUCTOR) 4 NAME DEPARTMENTQNSTRUCTOR) + DEPARTMENT C~URS~(INSTRUCT~R) + COURSE TEXT(fNSTRUCTOR) -+ TEXT TEXT(COURSE,

INSTRUCTOR)

database.

+ TEXT

A goad functional database is expected to be non-redundant[31. In other words, no function in the model should be derivable from the other functions in the model using the database operators. However, detecting all redundancies in a shared multi purpose system is not a trivial task especially because a given set of redundancies usually imply other additionat redundancies. The major objective of this articfe is to discover a set of inference r&es to derive implied redundancies from a given collection of redun-

dancies. A formal description of the functional model is presented in Section 2 and the three major database operators are defined. The behavior of these operators are characterized by five sets of axioms in Section 3. Implied redundancies are theorems of the

and the five sets of axioms. The derivations are shown to require the existence of another algebraic system for functional granularity. The granularity concept is introduced in Section 4 and a set of axioms for deriving ail granulariry statements is deveioped. Some shortcomings of the system are discussed in Section 5. The interest in functional databases stems from

three major observations: model operates at a difkrent level than the relational modei, It is basicalfy an access path model since functions assign a direction to each data relationship. Consequently, the functional model is important nat only as a possible alternative to the relational model but also as a coexisting lower ievet model. b. Functionat design procedures contribute to the existing theory of schema design by eliminating the need for the controversial universal relation assumption and related issues such as the use of null values in the universal relation@]. Moreover, the concept of redundancy in functional databases corresponds closely to the concept of join dependency in relational databases[5], and the axiom system developed in this article suggests a corresponding axiom system in relational systems. Similar axiomatic systems have been devefoped specifically for the relationat systems@]. f. Functional data model in conjunction with functional programming languages is promising in providing a superior application development environment based only an functions, and even a comprehensive theory of information where both the data and procedures are expressed in terms of Functions and treated u~~fo~~y~~].

a. Functionat

The interest in an axiomatic system follows from both theoretical and practical reasons. An axiomatic approach insures the soundness and completeness of the functional database theory. On the practical side, the algebraic manipulation of redundancies to derive

2’6

LEVESTORMAS

new redundancies is a most appropriate technique for the human database designers to determine a complete list of redundancies. Considering that the overwhelming majority of databases are still designed manually. the importance of such algebraic tools cannot be overemphasized.

to indicate a restricted argument. Given an argument d defined on domain D: d[A] where .-I E D, indicates the restriction of this argument to the data sets of A. Given a function f(D) + R with P = D U R. the inverse of/with respect to a collection of data sets X is If(P -X)+Xl-lP x

2. FUNCTIONAL

DATA MODEL

Each function f‘(o) + R of a functional database is a set valued mapping with the “domain basis” D, and the “range basis” R (“domain” and “range” when there is no ambiguity). The domain basis D and the range basis R are collections of zero one or more data set names. Let D = {D,, . . , 0,) and R = {R,, . . , R,}, be collections of data set names with V,, . , V,, and W,, . , W,,, as corresponding sets of values. An argument d defined on D is itself a single-valued function from D to V,U U V, where d(D,)s V,. In other words, d contains a value from each data set in D. The function F takes each argument d defined on D to a set of arguments r,, . , r,, all defined on R. E.uample 2.1: The function TEXT(COURSE, INSTRUCTOR) + TEXT’ assigns a subset of TEXT to each pair (c,i) where c is a course and i is an instructor. For completeness and to guarantee the correctness of the axioms under the boundary conditions, functions with null domain or range bases are allowed. A function with a null domain basis is viewed as a mapping from the database name d to its range. Similarly, a function with a null range basis is a mapping from its domain to 6, and a null function 6 is an identity mapping from d to 6. All data sets can be viewed as functions with null domain bases. Functions with null range bases are useful only as intermediate results in computations. A function fi is said to be equivalent to another functionf: if and only if (iff) they are defined on the same arguments andf,(x) = f?(.u) for every argument 1. A function f, is said to be finer (coarser) than another function!; iff they are defined on the same arguments and f,(x) E fZ(x) (A(x) Z f2(x)) for every argument x. A function f, is equivalent to another function f;,cf, =f?) iff f, is finer than J(J, 5 fJ and fi is finer than f,ul 5. fi). Three primitive operators are defined on functions. The inversion operator I is used to invert functions; the aggregation operator C is used to aggregate by union; and the product operator. is used to combine two functions into a composite function. The operators are defined formally utilizing the notation [ ]

where f[(d[P

- X]) = {r[P - X]:r[P

~[Rl~fkPl)} with arguments d and r defined on P. The aggregate off with respect to a collection of data sets X is Zf(D x

-X)-+R

around name sets are dropped whenever there is no ambiguity. Consequently, the string STUDENT may represent the data set containing students, or the name of that set “STUDENT”.

-A’

where X$d[D

- A’]) = {r[R - X]:r[D

-X]

= d[D - A’],

r[RlEf(rPl)j Given two functions f,(D,) -+ R, and I;(D2) -+ R, with P,=D,UR,, P2=DzUR,, D=D,#D?, R= R, U R2 and P = D U R; the product function is f; .J(D) -+ R where f;.f;;(d[D])

= {r[R - D]:r[D]

= d[D],

r[R,lEfi(r[D,I),r[RzlEf2(r[DZl)} 2.2: Given the function TEXT (COURSE,INSTRUCTOR) + TEXT, the inverse of TEXT with respect to COURSE is E,~~mple

I

TEXT(TEXT,INSTRUCTOR)

+ COURSE

COURSE

where I

TEXT(t,i)

= {c ECOURSE:~

ETEXT(C,~)).

COURSE

The aggregate E

of TEXT with respect

to COURSE

is

TEXT(i)

COURSE

= {r ETEXT:~CECOURSE

I E TEXT(c,i)}.

Also given the function COURSE(STUDENT) COURSE, the product function is

-+

TEXT. COURSE(STUDENT,COURSE, INSTRUCTOR) + TEXT where TEXT. COURSE(s,c,i) = {I ETEXT: t E TEXT(c,i),

‘The brackets and quotes

- A’] = d[P - A’],

c E COURSE(s)).

The unary operators I and Z have precedence over will be used to the binary operator ‘.‘. Parentheses change the order of application as is common practice in algebra.

‘27

Redundancy in functional databases 3. CHARACTERIZATIOK

OF OPERATORS

c. By using the definition

The behavior of the three major database operators can be captured by five sets of rules. These rules are viewed as the axioms of an algebraic system for functional databases. They are classified as commutativity. associativity, distributivity, identity and substitution ruIes. These rules establish the transformations under which the value of an expression remains unchanged. Commutativity involves changing the sequence of the operands of a binary operator. Associativity and distributivity involve changing the sequence of application of operators. The former involves two applications of the same operator, and the latter involves two distinct operators. Identity rule involves the application of an operator to an expression without changing its value, and substitution involves replacing one expression with another. In the following subsections these rules are stated and their soundness proved.

IIf(d[P xY

of inversion

twice:

- Xl) = I,J(d[P - Xl, ={r[X

n P]:r[P

- X] = dff

- X].

r[Rl~f(r[Dl)I 3.3 Distributicitj

The distributivity rules establish the conditions under which two consecutive but distinct operators can be exchanged. The inversion operation is shown to be distributive. The aggregation operation is always distributive over inversion, but distributive over product only when the aggregated data sets are not common to both operands of the product.

b. X:cf, .fi) = If, .Cf2 iff X is not common x x .r

to

fi and f;..

The commutativity rule establishes that the operands of a product operator can be exchanged without changing the value of the expression.

Proof: Commutativity follows immediately the symmetry of the definition of product.

from

3.2 AssociatiritJ The associativity rules establish that two consecutive product operators can be exchanged without changing the value of the expression. The same is also true for aggregation but not for inversion, since a second inversion operator always cancels out the effect of the first.

c.

Ef=I;If x Y

YX

To prove Throughout

distributivity three lemmas are needed. this subsection f(D) -+ R, f, (0,) ---) R, fi(D,) --t Rz are given, and D = D, U Dz, R = R, U Rz, P,=D,UR,, Pz=DzURz and P=DUR; and d, r and r’ are arguments defined on P. Lemma 3.1: An argument r defined on P satisfies If (i.e. r[Xn P]Eff(r[P -Xl)) iff r satisfies f itself x x (i.e. r[R)cf(r[D])). Proof: Using the definition of inversion r[XnP]EIJ(r[P-X])iff r[X

fl P]E{r’[X

n P]:r’[P

r’[R]Ef(r’[D]))j

b. CCJ- = XZ/ .Yt r.i c.

IIf=

If

XY

.r

3r’r’[X r’[P

Pro~f:Givenfi(D,)~R,,f?(D?)~R~,f~(D,)-,R~ and f(D)-R, with D=D,UD?UD, and R= R, U Rz U R,, and d and r as arguments defined on P=DUR;

a. By using the definition

h.(h.~/;)(4Dl)

of product

twice:

= {r[R - Dl:r[Dl =

of aggregation

I? P] = r [X fl P],

- X] = r[P

={r[R-XX-

- X-

YI) = =J(4D

Y]:r[D-X-

twice:

r[Rl~fWDlI1

- X - J’l) Y]=d[D

-X-

- X], r’[R]E

f (r’[D])

iff

Dlef,.f,(r[Dl) if?’ - D]E{r’[R

r’[R,l~fi(r’ID,lh :,:1(4D

iff

- D]:r’[D]

= r[D],

~‘~~,l~fi~~‘[~,l~,~‘[Rzl~fz(~‘[~~l~~ iff jr’r’[R - Dl = r [R - D], r’[D] = r[D].

r[R?1~.~‘2(~[DzI),‘[R,jff,(‘[D,l)} b. By using the definition

-XI,

Lemma 3.2: An argument r defined on P satisfies fi.fi (i.e. r[R - D]~fi.f~(r[D])) iff r satisfies bothf, and fz (i.e. r[R,]Efi(r[DJ) and r[RJcf?(r(D:])). Proof: Using the definition of product

r[R

4Dl,r[R,l~f,(rP,l),

= r[P

Flr’r’=r,r’[R]Ef(r’[D])iffr[R]Ef(r[D]).

r[R -

= Cfi~_h)._WPI)

-X]

3r’r’ = r, r’[R,]g Y],

r’[Rz]e

fi(r’[Dz])}

r’[R21Efz(r’[D,])} f, (r’[D,]),

iff

iff

LEVENTORMAS

228

Lemma 3.3: An argument r defined on P satisfies If (i.e. r [R - X] E Ef (r [D - X]) iff there is an argux x ment r’ satisfying f and matching P - X values of r (i.e. 3r’r’[P -X] = r[P -Xl, r’[R]~f(r’[D])). Proof: Using the definition of aggregation r[R - X]Ey{(r[D

‘r[D-X]=d[D-X],r[R-D]E?;.fi(r[D])). Using

-X]:r’[D

-X]=r[D

-Xl)

=

{r[R - D - X]:r[D

-X],

- X] = d(D - X],

r[R,l~f;(d[DIl),r[R~l~/L(r[D:l)~

3r’r’[R - X] = r[R -Xl, r’[D - X] = r[D - X], r’[R]Ef(r’[D]) - X] = r[P -Xl,

(3.1)

On the other hand, by substituting in the definition of product. .Ef, for, fi, Zf2 for f2, D, -X for D,, R, -X for R,, D2 - X for D?, and R -A’ for R:

iff

r’[R]Ef(r’[D]).

TJ+(d’D

a. Distributivity of inversion over product:

Icfi.h) x

3.2,

x

r’(R]E f (r’[D])j iff

3r’r’[P

Lemma

z(_/T,h)(d[D

- X]) iff

r[R -X]E{r’(R

Z((fi.fi)(d[D-X])={r[R-D-X]:

-Xl)

={r[R-D-X]:r[D-X]=d[D-X],

= pf2

Proof: By substituting in the definition version, f,.f: for f, and R -D for R:

r[R, -

of in-

XlE~fi(r[D, - 0.

r[R,-XlE~f~(r(D,--XJ)). I(J.fi)(d[P x

-X])={r[XnP]:r[P

-X]=d[P

r[R Using

Lemma

Icfi.f,)(d[P x

Dl~fi

-X],

.fz(r[Dl)j.

3.2,

for Dz, X fl “P, for R,.kd Xfl P2 for Rz, which imply the substitution of P - X for D and X fl P for R:

~MW’

-

3.3, -Xl)

=

{r[R - D - X]:r[D

On the other hand, by substituting in the definition of product, If, forf,, Ifi forf,, P, - X for D,, PI - X

;A

Lemma

~J+(dP {r[XflP]:r(P-X]=d[P-X],

-Xl)=

Using

-X]

= d[D -Xl,

3r’r’[P, -X]

= r[P, -Xl,

r’[R]Ef;(r’[D,]),

3r”r”[P2 -X]

= r[P? - X],r”[R2]Eh(r”[D2])j.

Using the fact that f, is independent fi if independent of P, - P2,

of P2 - P, and

4) = {r[R - D - X]:r [D - X] = d[D - X],

{r [X fl P]:r [P - X] = d[P - Xl,

rWnP,l~$f,W,

-Xl),

Lemma

3.1,

={r[XflP]:r[P

-X]=d[P

,X(J, .I;.) = Zh.Cf2 iffeitherf, x

= r[P -Xl, (3.2) Rz c P -X

and

-Xl, {r[R -D

orfzisindependent

- X]:r[D

-X]

3r’r’[P - X] = r[P -Xl,

b. Distributiuity of aggregation over product:

.Y

3r”r”[P -X]

r’[R,]Efi(r’[D,]),

If fi is independent of X then D, c P -X and consequently

rVCl~fi(r[D,lL ~[R21~_h(~[41)j.

x

= r[P -X],

r”[R~l~h(r”[41)~

r[XnP,I~f;f?(t.[P2-Xl)j Using

3r’r’[P -X]

3r”r”[P -X]

= d[D -X], r’[R,]Efi(r’[D,]),

= r[P -A’], r[R2]Efi(r[DZ])J.

ofX.

Proof of the ifpart: By substituting in the definition of aggregation, f,.f2 for f, and R - D for R:

Replacing all occurrences of r with r’[P -X] = r[P -X] and all occurrences independent of X,

r’ since of r are

219

Redundancy in functional databases Using Lemma 3.3,

{r’[R-

D - X]:r’[D

-Xl

= d[D - X],

IEl‘(d[P - x - Y]) = .r Y

~‘[~,l~“fl(~‘[~IlL~‘[&lEf?(~‘P?l), 3r”r”[P - X] = r’[P - X]).

{r[XflP

r[P - X - Y] = d(P - X - Y],

By renaming r’ as r and noting that 3r”r”[P - X] = r’[P - X] is always true since r“ = r’ satisfies it, ffi.;S?(d[D

-X)

3r’r’(P - Y] = r[P - Y], r’[R]~f(r’[D])l, Replacing

all

The case where fi is independent of X is symmetric and can be proved similarly. Proof of the only ifparr: Assume that X is common to_& andSz. We will show that there are instances of fi and _&for which ~(_,&.J2) # .Z$.C$. Proof is by construction. For each argument r defined on P, there are instances off, and 1; for which r satisfies $(fi .fJ !{ ,Cf,. x

r’[P -X]

of

r

with

Let r’

be an

argument

-X

- YJ) =

jr’[XfIP-

Y]:r’[P-X-

Y]=d[P-X-

Y],

r’[RlES(r’[Dl)j. By renaming I’ as I, IZf(d[P x Y

-X

- Y])

={r[XnP-

with

Y]:r[P-X-

Y]=d[P-X-

~N~f(rIDI)J

= r[P - X] but r’[X] # r[X]. (This obvi-

ously requires an X with at least two elements, but we only have to show the existence of an instance for which the equality does not hold.) Letf, be a function mapping r’[n,] to r’[R,] and null for all other arguments: let f;, be a function mapping r’[DZ] to r’fff,] and null for all other arguments. In other words,f, and fi are satisfied only by rf (and obviously not by r). We claim that r satisfies Cf,‘Zf2 but not x x :(f, .h). Using equation (3.2)

r’ since of r are

of X,

IEf(d[P x Y

not

occurrences

r’[P - X] = r [P - X] and all occurrences

=

independent

but

- Y]:

Y],

(3.3)

On the other hand, by substituting in the definition of aggregation, IfforJ P - X for D and X il P for R, x

;$d[P

- X -

Yl)

={r[XrIP-Y]:r[P-x-YYf=d[P-X-Y],

rWnPlEIJXr[P --Xl)} Using Lemma 3.2,

r[R - D - X]E~;~C~(~[D

-X]

VJ(4P

- X - Yl)

since

3r’r’[P -X]

={r[XflP-

= r[P -Xl, r’[R,lff,(r’[D,l,

rlri -D

-

J44~Cf;.fif(rP

Y]=d(P-X-

Y],

r[RlEf(r[Dl)j.

r’[R21Ef2(r’[D21).

On the other hand, using equation

Y]:r[P-X-

(3.1),

3.4 Identity

-4)

since

The identity rules establish the conditions under which the application of an operator to an expression leaves the value of the expression unchanged.

c. DisWibuticiIy of incersion ocer aggregation.

Given the functionsf,(D,) -4 R,,f,(D,) f(D)-tRwithP,=D,UR,,P2=DZUR2,P=DUR, D =D,UD, and R =R,UR?; a. Ef= Sifffis

Proof: By substituting in the definition of inversion, Cf for f, D - Y for D and R - Y for R, Y

;x{(d[D

x set)

b. If=fiff x

- X - Y]) =

c. f,.f2

{r[Xi?P-Y]:r[P-X-YY]=d[P-Xr[R - YlE:_(rP

- Yl)}

Y],

= f,

R,cXUY. Proof:

independent

XnP

-+ R, and

of X, i.e. X fl P = @(null

is the range off,

i.e. XIIP=R

iff fi > Cxrt; for some X, Y such that

230

LEWNT

a. From definitions,

f(Wl) = {r[Rl:rPl= 4D1, r[Rl~f(r[Dl)) T$f(d[D-X]) = {r(R -X]:r[D -X] = d[D --A’], r[Rl~f(~Dl)j.

Zf=f

Consequently,

D-X=D

iff

and

R-X=RilTP~X=PiKPnX=Q.

b. From

definitions, = {~[R]x[D]

f(d[~i)

IpP r[Rl

=

4~1. ~[RIE~(~[DI))

& = P, - Y - A’. Substituting Rz and D2 in condition (I):

= f(rPl))

the definitions

D, = D iff D,=D,U(P,-Y-X) iff P,-Y-XsD, P,cD,UYUX R, c Y U .I( since P, = D, U R, Similarly, dition (2):

of

iff iff for con-

R, = R - D = (R, -D) U (Rz - D) = (R, - D2) U (R1 - 0,) since R, U D, = R2 n D2 = 0. Substituting the definitions of R2 and D,: R,=(R,-(P,Y-X))U(YnP,-X-D,) =(R,n(YuX))U(YnP,-X-D,) since R, CP,. Substituting R, U D, for P,: R,=(R,n(YUX))fJ(YrlR,-X-D,) = R,fl(YUX) since YflR,

- X]) = {r[X n P]:r[P - X] = d[P - X],

E(YUX)flR,.

Finall),,

f =q

Consequently, R=XnP uniqueness

ORMAS

D = P -A’

iff

iff R=XnP since of data set names.

D=P-R

and

R,=R,fl(YlJX)

iff R, c YUX.

from 3.5 Substitution Equivalent functions can be substituted for each other. Proof: It follows immediately from the definition of equality.

c. From definitions,

f;(W,l) = {r[R,l:rP,l= 441, r[41~f;(rPIl)j 1;.f,(WI) = {r[R - Dl:rPl= rPCl~fi(rPIh Consequently, R, = R -D, r VV efz (r

~[RJ~fi(r[W~. iff (1) D, = D, (2) r[R,]~fi(r[D,]) implies implies r[R,l~_h(‘[D,l)

fi =fi.h and (3)

VU).

r[R,] E f2(r[D2])

iff f2 2 CxJ, for some

with no loss of generality implies

fi 2 xX$

X and

Y

because of distributivity. R2 = Y fl P, - A’

that

TEXT(COURSE,INSTRUCTOR) E:

3.6 Examples

WI,

and

The five sets of rules stated above can be used to test if a redundancy is implied by a given set of redundancies. Two examples will be given to demonstrate the proof procedure, given the rules as axioms and the implied redundancy as a theorem. the functions TEXT Example 3.1: Given (COURSE,INSTRUCTOR),TEXT(COURSE), and COURSE(INSTRUCTOR), and the following redundancies:

= TEXT(COURSE).COURSE(INSTRUCTOR)

COURSE(INSTRUCTOR)

= COURSE

i.e. all courses

have instructors,

INSTRUCTOR

then it follows

that

TEXT(COURSE)

=

C

TEXT(COURSE,INSTRUCTOR)

INSTRUCTOR

Proof: TEXT(COURSE,INSTRUCTOR)

= TEXT(COURSE).COURSE(INSTRUCTOR)

By using substitution,

z

TEXT(COURSE,INSTRUCTOR)

INSTRUCTOR

z

=

(TEXT(COURSE).COURSE(INSTRUCTOR))

INSTRUCTOR

By using distributivity 1

TEXT

of aggregation

over product,

(COURSEJNSTRUCTOR)

=

INSTRUCTOR

c INSTRUCTOR

TEXT(COURSE).

Z INSTRUCTOR

COURSE(INSTRUCTOR)

is given.

Redundancy

in functional

231

databases

By using the given redundancy. I

TEXT(COURSE,INSTRUCTOR)

=

INSTRCCTOR

Z

TEXT(COURSE).COURSE

INSTRUCTOR

By using identity 1

under

aggregation

and product.

TEXT(COURSE,INSTRUCTOR)

= TEXT(COURSE)

I>STRUCTOR

Example

3.2: Given

TEXT(COURSE),

the functions

TEXT(COURSE,INSTRUCTOR).

COURSE(INSTRUCTOR),

TEXT(COURSE,INSTRUCTOR)

TEXT(INSTRUCTOR)

and the following

redundancies:

=

TEXT(COURSE).COURSE(INSTRUCTOR).TEXT(INSTRUCTOR) TEXT(INSTRUCTOR) then it follows

= ,,cR,,

(TEXT(COURSE).COURSE(INSTRUCTOR)),

that

TEXT(COURSE,INSTRUCTOR)

= TEXT(COURSE)‘COURSE(INSTRUCTOR)

Proof: TEXT(COURSE,INSTRUCTOR) = TEXT(COURSE).COURSE(INSTRUCTOR).TEXT(INSTRUCTOR) is given. By using substitution, TEXT(COURSE,INSTRUCTOR) C

= TEXT(COURSE).COURSE(INSTRUCTOR).

(TEXT(COURSE)-COURSE(INSTRUCTOR))

COURSE

By using identity

under

product,

TEXT(COURSE,INSTRUCTOR)

4. GRANULARITY

= TEXT(COURSE).COURSE(INSTRUCTOR)

OF FtiNCTIONS

A function j; is said to be finer than another functionfi cf, f, ) iff f, is finer than fi. Each statement of the form E, < E2 where E, and E2 are expressions, is referred to as a granularity statement. A complete collection of granularity statements is necessary to derive all implied redundancies since axiom 3.4.c requires testing if a function f, is finer than another function f?. A given collection of granularity statements may imply further statements. These implied granularity statements can be derived using the redundancy axioms of Section 3 and the following four additional rules. The four granularity rules are referred to as the redundancy, substitution, maximality of data sets, and the minimality of functions. These rules can be viewed as the axioms of an algebraic system for granularity. 4. I Redundanq

fi=f2ifff2
and

Proof: It follows immediately

f2
4.2 Substitulion f, < f2 implies

E, < EZ where

the expression

Ez is

obtained by substituting fi for an occurrence off, in expression E,. Proof: The substitution rule can be proved in three steps by showing: a. f,
implies

Xf, < &$ for every X. .Y

b. f, < fi implies c.

If, < Ifi for every X. s X

f, < fi implies f,.f3

a. fi G f2 f,(d[D])c Using

< fi.f,

for every f,.

R, = Rz = R and iff D, = D? = D, f?(d[D]) for every d defined on D.

these in the definition

of Xf,, Y

Xh(d[D,-X]={r[R,-A’]: r[D, -X]=d[D,

--Xl,

= {r[R - X]:r[D -Xl

r[RIEfi(r[Dl)j < {r[R - X]:r[D

rIR,l~fI(~[Ol)j = d[D - X],

- X] = d[D -Xl,

r[RlEf2(~tDl)} = E;2(d[Dz - Xl). b. fi Sfz f,(d[D])

and R, = R2 = R, iff D, = D2 = D, E, f?(d[D]) for every d defined on P.

232

LEVENT ORMAN

since r[X] Q X for every r defined on P if X is in P.

Using these in the definition of 14,

4.4 Minimality of functions

I-f, (df P, - Xl) = (r [X n Pi 1:

f < :fZ$f r[P, -Xl =

= 4Pr --XI, rlPil~fi(r[41)~

r[RlEfi(r[Dl)}

f(WI)

= (rIRl:r[Dl=

< (r[Xll P]: rl~l~~~r~~l)~

WI,

- X]) = (r[R - X]:r[D -X] = d[D -X],

:f(d[D

= $Jz(dIP* - Xl).

r[N~f(rDl)j

R, = & and f;(d[D,]) iff D,=D2, c f2(d[D2]) for every d defined on P = P, U P2.

Using

Let

zfxf(dfD

c,f,gf,

D=D~uDj=DZuDj

and

R51RlURj.

the

definition

of product,

_ w)+[R_

w]:

Y

r[D - W]=d[D

Dl:rPl= WI,

= {r[R -

- X)],

r[R - Yl~$.f(r(rID

r[~~l~f~(r[D~l)~ - D]:r[D]

= d[D),

Assuming X and Y are disjoint definition of aggregation,

rI~~l~f~(rM41)~

$f~f@Wl)

rVM~~(r14l)l

defining

- W],

r[R - Xlf.$f(r[D

r[4JG(rPID2 < {r[R

and

w=xnr,

x

Using these in the definition of fl .h,

= {rIWIDl=

=f,~J;(dPJ). 4.3 Maximality

i.e. every

r~~]~f(~[Dl)~

rlP - Xl = d[P -xl,

.4..L(WI)

Y are disjoint,

function is finer than the product of its components. Proof: From definitions,

{r[xnp]:

r[P-Xl=d[P-Xl,

iff X and

YJ))

and using

the

d[D),

3r’r’[D -X]=rfD

of data sets

-

-X],

r’V1 Ef (r’IDh Wr”[D

I C f
- Y] = r[D - Y],

XP-x

i.e. all X-type elements participating in a function are contained in the data set X. Proof: From the definitions of inversion and aggregation, ; p~xf(44)

= {r[XnPl:r[RlEf(r[Dl)} $X = d

if if

X

x

P

is in

is not in

P

f(d[f?J)

< %f:f(d[D])

follows immediately by pick-

ing r’= r” = r. Assuming X and Y are not disjoint implies f $ :f:f

since P # (P - X) U (P - Y) =

P - (Xfl Y) which implies that the functions f and 2F$f are not compatible, i.e. not defined on the same

domain and range.

4.5 Examples Example 4. I: Given the functions

TEXT(COURSE,INSTRUCTOR),

TEXT(COURSE)

and TEXT(INSTRU~OR~

TEXT(COURSE)

=

r:

and the redundancies

TEXT(COURSE,INSTRUCTOR)

INSTRUCTOR

TEXT(INSTRUCTOR

=

E

TEXT(COURSE,INSTRUCTOR);

then

COURSE

TEXT(COURSE,INSTRUCTOR~ Using minimality

i TEXT(COURSE~.COURSE(INST~UCTOR),

of functions,

TEXT(COURSE,INSTRUCTOR)

<

Z INSTRUCTOR

TEXT(COURSE,INSTRUCTOR)

TEXT(COWRSE,INSTRUCTOR).

Z

COURSE

Redundancy

in functional

233

databases

Using substitution. TEXT(COURSE,INSTRUCTOR) Example

4.2: Given

the functions

< TEXT(COURSE).TEXT(INSTRUCTOR). TEXT(COURSE,INSTRUCTOR)

and COURSE(INSTRUCTOR)

and

no redundancies; TEXT(COURSE,INSTRUCTOR)

> TEXT(COURSE,INSTRUCTOR).

x INSTRUCTOR

COURSE(INSTRUCTOR). Using maximality z

of data sets,

COURSE(INSTRUCTOR)


INSTRUCTOR

Using identity

axiom c,

TEXT(COURSE,INSTRUCTOR).COURSE

= TEXT(COURSE,INSTRUCTOR)

Using substitution, TEXT(COURSE,INSTRUCTOR).

x

COURSE(INSTRUCTOR)

!NSTRUCTOR <

TEXT(COURSE,INSTRUCTOR)

5. SHORTCOMINGS

The algebraic approach used to derive implied redundancies is not complete, i.e. not all.possible redundancies can be derived algebraically, unless non-algebraic manipulation through renaming of data sets is also allowed. Fortunately, the redundancies that are not algebraically derivable are only of theoretical interest since they are quite complex and not likely to appear in real world databases. Nevertheless, an additional axiom involving nonalgebraic manipulation through renaming of data sets will be introduced in this section for theoretical completeness. A typical example of a redundancy which is not algebraically derivable from the axioms is the following equation. Given a function/(X, Y) -+ Z,

The inability to derive an equation of this type follows from the restriction that all data sets defining a function have to be uniquely named. Consequently, given two functions 1; (X) + Y and f>(X) + Y, it is impossible to multiply them along Y and get _&(X, X) + Y where each X corresponds to a different role played by X. This problem can be remedied by allowing renaming of data sets. Iff,/X’ denotes ihe function f,(X’) -+ Y obtained by renaming X as X’ then a new axiom can be used to effectively rename attributes without changing the value of expressions. E = E/X’ where E/X’ is an expression obtained by renaming all occurrences of X in E as X’. The new axiom can be used to derive the above equation since

6. CONCLUSIONS

An algebraic approach has been presented to derive implied redundancies in functional databases. Given the three major operators and the axioms, the implied redundancies are the theorems of the algebraic system for functional databases. The advantages of the algebraic approach are its theoretical basis and the completeness in real life situations. The commutativity, associativity, distributivity and substitution rules are familiar from elementary algebra and the algebraic approach to deriving redundancies appears to be more appropriate for human use than the identification of multivalued and join dependencies advocated by the relational theory. A practitioners’ guide to the use of algebraic approach is in preparation.

REFERENCES L. Orman. A familial model of data for multilevel schema framework. Inform. Sysrems 7(4) (1982). D. Shipman. The functional data model and the data language DAPLEX. ACM Trans. Database Systems 6(l) (1981). L. Orman. Design criteria for functional databases. Inform. Systems lO(2) (1985).

234

LEVENT ORMAN

[4] D. Maier, J. D. Ullman and Y. M. Vardi. On the foundations of the universal relation model. ACID Trans. Database S,vsrems 9(Z) (1984). [5]A. V. Aho, C. Beeri and J. D. Ullman. The theory of joins in relational databases. ,ACM Trans. Databose Systems 4(3) (1979). (6) E. Sciore. A complete axiomatization of full join dependencies. /. AClM 29(2) (1982). [7] L. Orman. A Familial specification language for database application systems. Compur. Lung. 8(3) (1983). [8] C. Beeri and M. Y. Vardi. On the properties of join dependencies. In Adrunces in Database Theory (Edited

by H. Gallaire, J. Minker and J. M. Nicolas). Plenum Press, New York (1981). preserving and lossless database 191J. Grant. Constraint transformations. In/arm. S,vsrems 9(2) (1984). and Y. Sagiv. Testing [lOI D. Maier. A. 0. Mendelson implications of data dependencies. ACID Trans. Darabase Systems 4(3) (1979). dependencies from [l II M. Y. Vardi. Inferring multivalued functional and join dependencies. Acta lnformorica 19(4) (1983). [I21 D. Vermeir and G. M. Nijssen. A procedure to define the object type structure of a conceptual schema. Inform. Sysrems 7(4) (1982).