03fxs379 86 g3.00 + 0.00 Perganon Journals Ltd
~E~~N~ANCY
IN FUNCTIONAL
DATABASES
LEVENT ORMAN Graduate School of Management, Cornell Universiry, Ithaca, SY t4853. U.S.A
Xbstract-A set of axioms are presented to algebraically derive implied redundancies in functional databases. The axioms are proved to be sound with respect to the definitions of operatars. The functional granularity concept is introduced and shown to be crucial in derivations.
Kqvrcord~: Database design, non-redundancy,
1.
implied redundancy. functiona
1NTRQDUCTlON
algebraic system defined by the three operators
A functional database consists of functions defined on data sefs[l, 21. Each function is a set valued mapping from its domain to its range and it is characterized by a function name in addition to the names of the data sets constituting its domain and its range. A data set is a named set of database objects, where real world entities in addition to the character strings and numbers used to describe those enti$zs constitute the database objects. E.ranz&~ I.!: A university environment can be modelled using the following functions where a function f from D,, , D,, to R is denoted by /CD,, .a.. D,) --t R and referred to either as f or
f(D,,
,
* &,I:
COURSE # (COURSE) + COURSE
#
TEXT(COURSE) -+ TEXT STUDENT(COURSE -+ STUDENT NAME(INSTRUCTOR) 4 NAME DEPARTMENTQNSTRUCTOR) + DEPARTMENT C~URS~(INSTRUCT~R) + COURSE TEXT(fNSTRUCTOR) -+ TEXT TEXT(COURSE,
INSTRUCTOR)
database.
+ TEXT
A goad functional database is expected to be non-redundant[31. In other words, no function in the model should be derivable from the other functions in the model using the database operators. However, detecting all redundancies in a shared multi purpose system is not a trivial task especially because a given set of redundancies usually imply other additionat redundancies. The major objective of this articfe is to discover a set of inference r&es to derive implied redundancies from a given collection of redun-
dancies. A formal description of the functional model is presented in Section 2 and the three major database operators are defined. The behavior of these operators are characterized by five sets of axioms in Section 3. Implied redundancies are theorems of the
and the five sets of axioms. The derivations are shown to require the existence of another algebraic system for functional granularity. The granularity concept is introduced in Section 4 and a set of axioms for deriving ail granulariry statements is deveioped. Some shortcomings of the system are discussed in Section 5. The interest in functional databases stems from
three major observations: model operates at a difkrent level than the relational modei, It is basicalfy an access path model since functions assign a direction to each data relationship. Consequently, the functional model is important nat only as a possible alternative to the relational model but also as a coexisting lower ievet model. b. Functionat design procedures contribute to the existing theory of schema design by eliminating the need for the controversial universal relation assumption and related issues such as the use of null values in the universal relation@]. Moreover, the concept of redundancy in functional databases corresponds closely to the concept of join dependency in relational databases[5], and the axiom system developed in this article suggests a corresponding axiom system in relational systems. Similar axiomatic systems have been devefoped specifically for the relationat systems@]. f. Functional data model in conjunction with functional programming languages is promising in providing a superior application development environment based only an functions, and even a comprehensive theory of information where both the data and procedures are expressed in terms of Functions and treated u~~fo~~y~~].
a. Functionat
The interest in an axiomatic system follows from both theoretical and practical reasons. An axiomatic approach insures the soundness and completeness of the functional database theory. On the practical side, the algebraic manipulation of redundancies to derive
2’6
LEVESTORMAS
new redundancies is a most appropriate technique for the human database designers to determine a complete list of redundancies. Considering that the overwhelming majority of databases are still designed manually. the importance of such algebraic tools cannot be overemphasized.
to indicate a restricted argument. Given an argument d defined on domain D: d[A] where .-I E D, indicates the restriction of this argument to the data sets of A. Given a function f(D) + R with P = D U R. the inverse of/with respect to a collection of data sets X is If(P -X)+Xl-lP x
2. FUNCTIONAL
DATA MODEL
Each function f‘(o) + R of a functional database is a set valued mapping with the “domain basis” D, and the “range basis” R (“domain” and “range” when there is no ambiguity). The domain basis D and the range basis R are collections of zero one or more data set names. Let D = {D,, . . , 0,) and R = {R,, . . , R,}, be collections of data set names with V,, . , V,, and W,, . , W,,, as corresponding sets of values. An argument d defined on D is itself a single-valued function from D to V,U U V, where d(D,)s V,. In other words, d contains a value from each data set in D. The function F takes each argument d defined on D to a set of arguments r,, . , r,, all defined on R. E.uample 2.1: The function TEXT(COURSE, INSTRUCTOR) + TEXT’ assigns a subset of TEXT to each pair (c,i) where c is a course and i is an instructor. For completeness and to guarantee the correctness of the axioms under the boundary conditions, functions with null domain or range bases are allowed. A function with a null domain basis is viewed as a mapping from the database name d to its range. Similarly, a function with a null range basis is a mapping from its domain to 6, and a null function 6 is an identity mapping from d to 6. All data sets can be viewed as functions with null domain bases. Functions with null range bases are useful only as intermediate results in computations. A function fi is said to be equivalent to another functionf: if and only if (iff) they are defined on the same arguments andf,(x) = f?(.u) for every argument 1. A function f, is said to be finer (coarser) than another function!; iff they are defined on the same arguments and f,(x) E fZ(x) (A(x) Z f2(x)) for every argument x. A function f, is equivalent to another function f;,cf, =f?) iff f, is finer than J(J, 5 fJ and fi is finer than f,ul 5. fi). Three primitive operators are defined on functions. The inversion operator I is used to invert functions; the aggregation operator C is used to aggregate by union; and the product operator. is used to combine two functions into a composite function. The operators are defined formally utilizing the notation [ ]
where f[(d[P
- X]) = {r[P - X]:r[P
~[Rl~fkPl)} with arguments d and r defined on P. The aggregate off with respect to a collection of data sets X is Zf(D x
-X)-+R
around name sets are dropped whenever there is no ambiguity. Consequently, the string STUDENT may represent the data set containing students, or the name of that set “STUDENT”.
-A’
where X$d[D
- A’]) = {r[R - X]:r[D
-X]
= d[D - A’],
r[RlEf(rPl)j Given two functions f,(D,) -+ R, and I;(D2) -+ R, with P,=D,UR,, P2=DzUR,, D=D,#D?, R= R, U R2 and P = D U R; the product function is f; .J(D) -+ R where f;.f;;(d[D])
= {r[R - D]:r[D]
= d[D],
r[R,lEfi(r[D,I),r[RzlEf2(r[DZl)} 2.2: Given the function TEXT (COURSE,INSTRUCTOR) + TEXT, the inverse of TEXT with respect to COURSE is E,~~mple
I
TEXT(TEXT,INSTRUCTOR)
+ COURSE
COURSE
where I
TEXT(t,i)
= {c ECOURSE:~
ETEXT(C,~)).
COURSE
The aggregate E
of TEXT with respect
to COURSE
is
TEXT(i)
COURSE
= {r ETEXT:~CECOURSE
I E TEXT(c,i)}.
Also given the function COURSE(STUDENT) COURSE, the product function is
-+
TEXT. COURSE(STUDENT,COURSE, INSTRUCTOR) + TEXT where TEXT. COURSE(s,c,i) = {I ETEXT: t E TEXT(c,i),
‘The brackets and quotes
- A’] = d[P - A’],
c E COURSE(s)).
The unary operators I and Z have precedence over will be used to the binary operator ‘.‘. Parentheses change the order of application as is common practice in algebra.
‘27
Redundancy in functional databases 3. CHARACTERIZATIOK
OF OPERATORS
c. By using the definition
The behavior of the three major database operators can be captured by five sets of rules. These rules are viewed as the axioms of an algebraic system for functional databases. They are classified as commutativity. associativity, distributivity, identity and substitution ruIes. These rules establish the transformations under which the value of an expression remains unchanged. Commutativity involves changing the sequence of the operands of a binary operator. Associativity and distributivity involve changing the sequence of application of operators. The former involves two applications of the same operator, and the latter involves two distinct operators. Identity rule involves the application of an operator to an expression without changing its value, and substitution involves replacing one expression with another. In the following subsections these rules are stated and their soundness proved.
IIf(d[P xY
of inversion
twice:
- Xl) = I,J(d[P - Xl, ={r[X
n P]:r[P
- X] = dff
- X].
r[Rl~f(r[Dl)I 3.3 Distributicitj
The distributivity rules establish the conditions under which two consecutive but distinct operators can be exchanged. The inversion operation is shown to be distributive. The aggregation operation is always distributive over inversion, but distributive over product only when the aggregated data sets are not common to both operands of the product.
b. X:cf, .fi) = If, .Cf2 iff X is not common x x .r
to
fi and f;..
The commutativity rule establishes that the operands of a product operator can be exchanged without changing the value of the expression.
Proof: Commutativity follows immediately the symmetry of the definition of product.
from
3.2 AssociatiritJ The associativity rules establish that two consecutive product operators can be exchanged without changing the value of the expression. The same is also true for aggregation but not for inversion, since a second inversion operator always cancels out the effect of the first.
c.
Ef=I;If x Y
YX
To prove Throughout
distributivity three lemmas are needed. this subsection f(D) -+ R, f, (0,) ---) R, fi(D,) --t Rz are given, and D = D, U Dz, R = R, U Rz, P,=D,UR,, Pz=DzURz and P=DUR; and d, r and r’ are arguments defined on P. Lemma 3.1: An argument r defined on P satisfies If (i.e. r[Xn P]Eff(r[P -Xl)) iff r satisfies f itself x x (i.e. r[R)cf(r[D])). Proof: Using the definition of inversion r[XnP]EIJ(r[P-X])iff r[X
fl P]E{r’[X
n P]:r’[P
r’[R]Ef(r’[D]))j
b. CCJ- = XZ/ .Yt r.i c.
IIf=
If
XY
.r
3r’r’[X r’[P
Pro~f:Givenfi(D,)~R,,f?(D?)~R~,f~(D,)-,R~ and f(D)-R, with D=D,UD?UD, and R= R, U Rz U R,, and d and r as arguments defined on P=DUR;
a. By using the definition
h.(h.~/;)(4Dl)
of product
twice:
= {r[R - Dl:r[Dl =
of aggregation
I? P] = r [X fl P],
- X] = r[P
={r[R-XX-
- X-
YI) = =J(4D
Y]:r[D-X-
twice:
r[Rl~fWDlI1
- X - J’l) Y]=d[D
-X-
- X], r’[R]E
f (r’[D])
iff
Dlef,.f,(r[Dl) if?’ - D]E{r’[R
r’[R,l~fi(r’ID,lh :,:1(4D
iff
- D]:r’[D]
= r[D],
~‘~~,l~fi~~‘[~,l~,~‘[Rzl~fz(~‘[~~l~~ iff jr’r’[R - Dl = r [R - D], r’[D] = r[D].
r[R?1~.~‘2(~[DzI),‘[R,jff,(‘[D,l)} b. By using the definition
-XI,
Lemma 3.2: An argument r defined on P satisfies fi.fi (i.e. r[R - D]~fi.f~(r[D])) iff r satisfies bothf, and fz (i.e. r[R,]Efi(r[DJ) and r[RJcf?(r(D:])). Proof: Using the definition of product
r[R
4Dl,r[R,l~f,(rP,l),
= r[P
Flr’r’=r,r’[R]Ef(r’[D])iffr[R]Ef(r[D]).
r[R -
= Cfi~_h)._WPI)
-X]
3r’r’ = r, r’[R,]g Y],
r’[Rz]e
fi(r’[Dz])}
r’[R21Efz(r’[D,])} f, (r’[D,]),
iff
iff
LEVENTORMAS
228
Lemma 3.3: An argument r defined on P satisfies If (i.e. r [R - X] E Ef (r [D - X]) iff there is an argux x ment r’ satisfying f and matching P - X values of r (i.e. 3r’r’[P -X] = r[P -Xl, r’[R]~f(r’[D])). Proof: Using the definition of aggregation r[R - X]Ey{(r[D
‘r[D-X]=d[D-X],r[R-D]E?;.fi(r[D])). Using
-X]:r’[D
-X]=r[D
-Xl)
=
{r[R - D - X]:r[D
-X],
- X] = d(D - X],
r[R,l~f;(d[DIl),r[R~l~/L(r[D:l)~
3r’r’[R - X] = r[R -Xl, r’[D - X] = r[D - X], r’[R]Ef(r’[D]) - X] = r[P -Xl,
(3.1)
On the other hand, by substituting in the definition of product. .Ef, for, fi, Zf2 for f2, D, -X for D,, R, -X for R,, D2 - X for D?, and R -A’ for R:
iff
r’[R]Ef(r’[D]).
TJ+(d’D
a. Distributivity of inversion over product:
Icfi.h) x
3.2,
x
r’(R]E f (r’[D])j iff
3r’r’[P
Lemma
z(_/T,h)(d[D
- X]) iff
r[R -X]E{r’(R
Z((fi.fi)(d[D-X])={r[R-D-X]:
-Xl)
={r[R-D-X]:r[D-X]=d[D-X],
= pf2
Proof: By substituting in the definition version, f,.f: for f, and R -D for R:
r[R, -
of in-
XlE~fi(r[D, - 0.
r[R,-XlE~f~(r(D,--XJ)). I(J.fi)(d[P x
-X])={r[XnP]:r[P
-X]=d[P
r[R Using
Lemma
Icfi.f,)(d[P x
Dl~fi
-X],
.fz(r[Dl)j.
3.2,
for Dz, X fl “P, for R,.kd Xfl P2 for Rz, which imply the substitution of P - X for D and X fl P for R:
~MW’
-
3.3, -Xl)
=
{r[R - D - X]:r[D
On the other hand, by substituting in the definition of product, If, forf,, Ifi forf,, P, - X for D,, PI - X
;A
Lemma
~J+(dP {r[XflP]:r(P-X]=d[P-X],
-Xl)=
Using
-X]
= d[D -Xl,
3r’r’[P, -X]
= r[P, -Xl,
r’[R]Ef;(r’[D,]),
3r”r”[P2 -X]
= r[P? - X],r”[R2]Eh(r”[D2])j.
Using the fact that f, is independent fi if independent of P, - P2,
of P2 - P, and
4) = {r[R - D - X]:r [D - X] = d[D - X],
{r [X fl P]:r [P - X] = d[P - Xl,
rWnP,l~$f,W,
-Xl),
Lemma
3.1,
={r[XflP]:r[P
-X]=d[P
,X(J, .I;.) = Zh.Cf2 iffeitherf, x
= r[P -Xl, (3.2) Rz c P -X
and
-Xl, {r[R -D
orfzisindependent
- X]:r[D
-X]
3r’r’[P - X] = r[P -Xl,
b. Distributiuity of aggregation over product:
.Y
3r”r”[P -X]
r’[R,]Efi(r’[D,]),
If fi is independent of X then D, c P -X and consequently
rVCl~fi(r[D,lL ~[R21~_h(~[41)j.
x
= r[P -X],
r”[R~l~h(r”[41)~
r[XnP,I~f;f?(t.[P2-Xl)j Using
3r’r’[P -X]
3r”r”[P -X]
= d[D -X], r’[R,]Efi(r’[D,]),
= r[P -A’], r[R2]Efi(r[DZ])J.
ofX.
Proof of the ifpart: By substituting in the definition of aggregation, f,.f2 for f, and R - D for R:
Replacing all occurrences of r with r’[P -X] = r[P -X] and all occurrences independent of X,
r’ since of r are
219
Redundancy in functional databases Using Lemma 3.3,
{r’[R-
D - X]:r’[D
-Xl
= d[D - X],
IEl‘(d[P - x - Y]) = .r Y
~‘[~,l~“fl(~‘[~IlL~‘[&lEf?(~‘P?l), 3r”r”[P - X] = r’[P - X]).
{r[XflP
r[P - X - Y] = d(P - X - Y],
By renaming r’ as r and noting that 3r”r”[P - X] = r’[P - X] is always true since r“ = r’ satisfies it, ffi.;S?(d[D
-X)
3r’r’(P - Y] = r[P - Y], r’[R]~f(r’[D])l, Replacing
all
The case where fi is independent of X is symmetric and can be proved similarly. Proof of the only ifparr: Assume that X is common to_& andSz. We will show that there are instances of fi and _&for which ~(_,&.J2) # .Z$.C$. Proof is by construction. For each argument r defined on P, there are instances off, and 1; for which r satisfies $(fi .fJ !{ ,Cf,. x
r’[P -X]
of
r
with
Let r’
be an
argument
-X
- YJ) =
jr’[XfIP-
Y]:r’[P-X-
Y]=d[P-X-
Y],
r’[RlES(r’[Dl)j. By renaming I’ as I, IZf(d[P x Y
-X
- Y])
={r[XnP-
with
Y]:r[P-X-
Y]=d[P-X-
~N~f(rIDI)J
= r[P - X] but r’[X] # r[X]. (This obvi-
ously requires an X with at least two elements, but we only have to show the existence of an instance for which the equality does not hold.) Letf, be a function mapping r’[n,] to r’[R,] and null for all other arguments: let f;, be a function mapping r’[DZ] to r’fff,] and null for all other arguments. In other words,f, and fi are satisfied only by rf (and obviously not by r). We claim that r satisfies Cf,‘Zf2 but not x x :(f, .h). Using equation (3.2)
r’ since of r are
of X,
IEf(d[P x Y
not
occurrences
r’[P - X] = r [P - X] and all occurrences
=
independent
but
- Y]:
Y],
(3.3)
On the other hand, by substituting in the definition of aggregation, IfforJ P - X for D and X il P for R, x
;$d[P
- X -
Yl)
={r[XrIP-Y]:r[P-x-YYf=d[P-X-Y],
rWnPlEIJXr[P --Xl)} Using Lemma 3.2,
r[R - D - X]E~;~C~(~[D
-X]
VJ(4P
- X - Yl)
since
3r’r’[P -X]
={r[XflP-
= r[P -Xl, r’[R,lff,(r’[D,l,
rlri -D
-
J44~Cf;.fif(rP
Y]=d(P-X-
Y],
r[RlEf(r[Dl)j.
r’[R21Ef2(r’[D21).
On the other hand, using equation
Y]:r[P-X-
(3.1),
3.4 Identity
-4)
since
The identity rules establish the conditions under which the application of an operator to an expression leaves the value of the expression unchanged.
c. DisWibuticiIy of incersion ocer aggregation.
Given the functionsf,(D,) -4 R,,f,(D,) f(D)-tRwithP,=D,UR,,P2=DZUR2,P=DUR, D =D,UD, and R =R,UR?; a. Ef= Sifffis
Proof: By substituting in the definition of inversion, Cf for f, D - Y for D and R - Y for R, Y
;x{(d[D
x set)
b. If=fiff x
- X - Y]) =
c. f,.f2
{r[Xi?P-Y]:r[P-X-YY]=d[P-Xr[R - YlE:_(rP
- Yl)}
Y],
= f,
R,cXUY. Proof:
independent
XnP
-+ R, and
of X, i.e. X fl P = @(null
is the range off,
i.e. XIIP=R
iff fi > Cxrt; for some X, Y such that
230
LEWNT
a. From definitions,
f(Wl) = {r[Rl:rPl= 4D1, r[Rl~f(r[Dl)) T$f(d[D-X]) = {r(R -X]:r[D -X] = d[D --A’], r[Rl~f(~Dl)j.
Zf=f
Consequently,
D-X=D
iff
and
R-X=RilTP~X=PiKPnX=Q.
b. From
definitions, = {~[R]x[D]
f(d[~i)
IpP r[Rl
=
4~1. ~[RIE~(~[DI))
& = P, - Y - A’. Substituting Rz and D2 in condition (I):
= f(rPl))
the definitions
D, = D iff D,=D,U(P,-Y-X) iff P,-Y-XsD, P,cD,UYUX R, c Y U .I( since P, = D, U R, Similarly, dition (2):
of
iff iff for con-
R, = R - D = (R, -D) U (Rz - D) = (R, - D2) U (R1 - 0,) since R, U D, = R2 n D2 = 0. Substituting the definitions of R2 and D,: R,=(R,-(P,Y-X))U(YnP,-X-D,) =(R,n(YuX))U(YnP,-X-D,) since R, CP,. Substituting R, U D, for P,: R,=(R,n(YUX))fJ(YrlR,-X-D,) = R,fl(YUX) since YflR,
- X]) = {r[X n P]:r[P - X] = d[P - X],
E(YUX)flR,.
Finall),,
f =q
Consequently, R=XnP uniqueness
ORMAS
D = P -A’
iff
iff R=XnP since of data set names.
D=P-R
and
R,=R,fl(YlJX)
iff R, c YUX.
from 3.5 Substitution Equivalent functions can be substituted for each other. Proof: It follows immediately from the definition of equality.
c. From definitions,
f;(W,l) = {r[R,l:rP,l= 441, r[41~f;(rPIl)j 1;.f,(WI) = {r[R - Dl:rPl= rPCl~fi(rPIh Consequently, R, = R -D, r VV efz (r
~[RJ~fi(r[W~. iff (1) D, = D, (2) r[R,]~fi(r[D,]) implies implies r[R,l~_h(‘[D,l)
fi =fi.h and (3)
VU).
r[R,] E f2(r[D2])
iff f2 2 CxJ, for some
with no loss of generality implies
fi 2 xX$
X and
Y
because of distributivity. R2 = Y fl P, - A’
that
TEXT(COURSE,INSTRUCTOR) E:
3.6 Examples
WI,
and
The five sets of rules stated above can be used to test if a redundancy is implied by a given set of redundancies. Two examples will be given to demonstrate the proof procedure, given the rules as axioms and the implied redundancy as a theorem. the functions TEXT Example 3.1: Given (COURSE,INSTRUCTOR),TEXT(COURSE), and COURSE(INSTRUCTOR), and the following redundancies:
= TEXT(COURSE).COURSE(INSTRUCTOR)
COURSE(INSTRUCTOR)
= COURSE
i.e. all courses
have instructors,
INSTRUCTOR
then it follows
that
TEXT(COURSE)
=
C
TEXT(COURSE,INSTRUCTOR)
INSTRUCTOR
Proof: TEXT(COURSE,INSTRUCTOR)
= TEXT(COURSE).COURSE(INSTRUCTOR)
By using substitution,
z
TEXT(COURSE,INSTRUCTOR)
INSTRUCTOR
z
=
(TEXT(COURSE).COURSE(INSTRUCTOR))
INSTRUCTOR
By using distributivity 1
TEXT
of aggregation
over product,
(COURSEJNSTRUCTOR)
=
INSTRUCTOR
c INSTRUCTOR
TEXT(COURSE).
Z INSTRUCTOR
COURSE(INSTRUCTOR)
is given.
Redundancy
in functional
231
databases
By using the given redundancy. I
TEXT(COURSE,INSTRUCTOR)
=
INSTRCCTOR
Z
TEXT(COURSE).COURSE
INSTRUCTOR
By using identity 1
under
aggregation
and product.
TEXT(COURSE,INSTRUCTOR)
= TEXT(COURSE)
I>STRUCTOR
Example
3.2: Given
TEXT(COURSE),
the functions
TEXT(COURSE,INSTRUCTOR).
COURSE(INSTRUCTOR),
TEXT(COURSE,INSTRUCTOR)
TEXT(INSTRUCTOR)
and the following
redundancies:
=
TEXT(COURSE).COURSE(INSTRUCTOR).TEXT(INSTRUCTOR) TEXT(INSTRUCTOR) then it follows
= ,,cR,,
(TEXT(COURSE).COURSE(INSTRUCTOR)),
that
TEXT(COURSE,INSTRUCTOR)
= TEXT(COURSE)‘COURSE(INSTRUCTOR)
Proof: TEXT(COURSE,INSTRUCTOR) = TEXT(COURSE).COURSE(INSTRUCTOR).TEXT(INSTRUCTOR) is given. By using substitution, TEXT(COURSE,INSTRUCTOR) C
= TEXT(COURSE).COURSE(INSTRUCTOR).
(TEXT(COURSE)-COURSE(INSTRUCTOR))
COURSE
By using identity
under
product,
TEXT(COURSE,INSTRUCTOR)
4. GRANULARITY
= TEXT(COURSE).COURSE(INSTRUCTOR)
OF FtiNCTIONS
A function j; is said to be finer than another functionfi cf, f, ) iff f, is finer than fi. Each statement of the form E, < E2 where E, and E2 are expressions, is referred to as a granularity statement. A complete collection of granularity statements is necessary to derive all implied redundancies since axiom 3.4.c requires testing if a function f, is finer than another function f?. A given collection of granularity statements may imply further statements. These implied granularity statements can be derived using the redundancy axioms of Section 3 and the following four additional rules. The four granularity rules are referred to as the redundancy, substitution, maximality of data sets, and the minimality of functions. These rules can be viewed as the axioms of an algebraic system for granularity. 4. I Redundanq
fi=f2ifff2
and
Proof: It follows immediately
f2
4.2 Substitulion f, < f2 implies
E, < EZ where
the expression
Ez is
obtained by substituting fi for an occurrence off, in expression E,. Proof: The substitution rule can be proved in three steps by showing: a. f,
implies
Xf, < &$ for every X. .Y
b. f, < fi implies c.
If, < Ifi for every X. s X
f, < fi implies f,.f3
a. fi G f2 f,(d[D])c Using
< fi.f,
for every f,.
R, = Rz = R and iff D, = D? = D, f?(d[D]) for every d defined on D.
these in the definition
of Xf,, Y
Xh(d[D,-X]={r[R,-A’]: r[D, -X]=d[D,
--Xl,
= {r[R - X]:r[D -Xl
r[RIEfi(r[Dl)j < {r[R - X]:r[D
rIR,l~fI(~[Ol)j = d[D - X],
- X] = d[D -Xl,
r[RlEf2(~tDl)} = E;2(d[Dz - Xl). b. fi Sfz f,(d[D])
and R, = R2 = R, iff D, = D2 = D, E, f?(d[D]) for every d defined on P.
232
LEVENT ORMAN
since r[X] Q X for every r defined on P if X is in P.
Using these in the definition of 14,
4.4 Minimality of functions
I-f, (df P, - Xl) = (r [X n Pi 1:
f < :fZ$f r[P, -Xl =
= 4Pr --XI, rlPil~fi(r[41)~
r[RlEfi(r[Dl)}
f(WI)
= (rIRl:r[Dl=
< (r[Xll P]: rl~l~~~r~~l)~
WI,
- X]) = (r[R - X]:r[D -X] = d[D -X],
:f(d[D
= $Jz(dIP* - Xl).
r[N~f(rDl)j
R, = & and f;(d[D,]) iff D,=D2, c f2(d[D2]) for every d defined on P = P, U P2.
Using
Let
zfxf(dfD
c,f,gf,
D=D~uDj=DZuDj
and
R51RlURj.
the
definition
of product,
_ w)+[R_
w]:
Y
r[D - W]=d[D
Dl:rPl= WI,
= {r[R -
- X)],
r[R - Yl~$.f(r(rID
r[~~l~f~(r[D~l)~ - D]:r[D]
= d[D),
Assuming X and Y are disjoint definition of aggregation,
rI~~l~f~(rM41)~
$f~f@Wl)
rVM~~(r14l)l
defining
- W],
r[R - Xlf.$f(r[D
r[4JG(rPID2 < {r[R
and
w=xnr,
x
Using these in the definition of fl .h,
= {rIWIDl=
=f,~J;(dPJ). 4.3 Maximality
i.e. every
r~~]~f(~[Dl)~
rlP - Xl = d[P -xl,
.4..L(WI)
Y are disjoint,
function is finer than the product of its components. Proof: From definitions,
{r[xnp]:
r[P-Xl=d[P-Xl,
iff X and
YJ))
and using
the
d[D),
3r’r’[D -X]=rfD
of data sets
-
-X],
r’V1 Ef (r’IDh Wr”[D
I C f
- Y] = r[D - Y],
XP-x
i.e. all X-type elements participating in a function are contained in the data set X. Proof: From the definitions of inversion and aggregation, ; p~xf(44)
= {r[XnPl:r[RlEf(r[Dl)} $X = d
if if
X
x
P
is in
is not in
P
f(d[f?J)
< %f:f(d[D])
follows immediately by pick-
ing r’= r” = r. Assuming X and Y are not disjoint implies f $ :f:f
since P # (P - X) U (P - Y) =
P - (Xfl Y) which implies that the functions f and 2F$f are not compatible, i.e. not defined on the same
domain and range.
4.5 Examples Example 4. I: Given the functions
TEXT(COURSE,INSTRUCTOR),
TEXT(COURSE)
and TEXT(INSTRU~OR~
TEXT(COURSE)
=
r:
and the redundancies
TEXT(COURSE,INSTRUCTOR)
INSTRUCTOR
TEXT(INSTRUCTOR
=
E
TEXT(COURSE,INSTRUCTOR);
then
COURSE
TEXT(COURSE,INSTRUCTOR~ Using minimality
i TEXT(COURSE~.COURSE(INST~UCTOR),
of functions,
TEXT(COURSE,INSTRUCTOR)
<
Z INSTRUCTOR
TEXT(COURSE,INSTRUCTOR)
TEXT(COWRSE,INSTRUCTOR).
Z
COURSE
Redundancy
in functional
233
databases
Using substitution. TEXT(COURSE,INSTRUCTOR) Example
4.2: Given
the functions
< TEXT(COURSE).TEXT(INSTRUCTOR). TEXT(COURSE,INSTRUCTOR)
and COURSE(INSTRUCTOR)
and
no redundancies; TEXT(COURSE,INSTRUCTOR)
> TEXT(COURSE,INSTRUCTOR).
x INSTRUCTOR
COURSE(INSTRUCTOR). Using maximality z
of data sets,
COURSE(INSTRUCTOR)
INSTRUCTOR
Using identity
axiom c,
TEXT(COURSE,INSTRUCTOR).COURSE
= TEXT(COURSE,INSTRUCTOR)
Using substitution, TEXT(COURSE,INSTRUCTOR).
x
COURSE(INSTRUCTOR)
!NSTRUCTOR <
TEXT(COURSE,INSTRUCTOR)
5. SHORTCOMINGS
The algebraic approach used to derive implied redundancies is not complete, i.e. not all.possible redundancies can be derived algebraically, unless non-algebraic manipulation through renaming of data sets is also allowed. Fortunately, the redundancies that are not algebraically derivable are only of theoretical interest since they are quite complex and not likely to appear in real world databases. Nevertheless, an additional axiom involving nonalgebraic manipulation through renaming of data sets will be introduced in this section for theoretical completeness. A typical example of a redundancy which is not algebraically derivable from the axioms is the following equation. Given a function/(X, Y) -+ Z,
The inability to derive an equation of this type follows from the restriction that all data sets defining a function have to be uniquely named. Consequently, given two functions 1; (X) + Y and f>(X) + Y, it is impossible to multiply them along Y and get _&(X, X) + Y where each X corresponds to a different role played by X. This problem can be remedied by allowing renaming of data sets. Iff,/X’ denotes ihe function f,(X’) -+ Y obtained by renaming X as X’ then a new axiom can be used to effectively rename attributes without changing the value of expressions. E = E/X’ where E/X’ is an expression obtained by renaming all occurrences of X in E as X’. The new axiom can be used to derive the above equation since
6. CONCLUSIONS
An algebraic approach has been presented to derive implied redundancies in functional databases. Given the three major operators and the axioms, the implied redundancies are the theorems of the algebraic system for functional databases. The advantages of the algebraic approach are its theoretical basis and the completeness in real life situations. The commutativity, associativity, distributivity and substitution rules are familiar from elementary algebra and the algebraic approach to deriving redundancies appears to be more appropriate for human use than the identification of multivalued and join dependencies advocated by the relational theory. A practitioners’ guide to the use of algebraic approach is in preparation.
REFERENCES L. Orman. A familial model of data for multilevel schema framework. Inform. Sysrems 7(4) (1982). D. Shipman. The functional data model and the data language DAPLEX. ACM Trans. Database Systems 6(l) (1981). L. Orman. Design criteria for functional databases. Inform. Systems lO(2) (1985).
234
LEVENT ORMAN
[4] D. Maier, J. D. Ullman and Y. M. Vardi. On the foundations of the universal relation model. ACID Trans. Database S,vsrems 9(Z) (1984). [5]A. V. Aho, C. Beeri and J. D. Ullman. The theory of joins in relational databases. ,ACM Trans. Databose Systems 4(3) (1979). (6) E. Sciore. A complete axiomatization of full join dependencies. /. AClM 29(2) (1982). [7] L. Orman. A Familial specification language for database application systems. Compur. Lung. 8(3) (1983). [8] C. Beeri and M. Y. Vardi. On the properties of join dependencies. In Adrunces in Database Theory (Edited
by H. Gallaire, J. Minker and J. M. Nicolas). Plenum Press, New York (1981). preserving and lossless database 191J. Grant. Constraint transformations. In/arm. S,vsrems 9(2) (1984). and Y. Sagiv. Testing [lOI D. Maier. A. 0. Mendelson implications of data dependencies. ACID Trans. Darabase Systems 4(3) (1979). dependencies from [l II M. Y. Vardi. Inferring multivalued functional and join dependencies. Acta lnformorica 19(4) (1983). [I21 D. Vermeir and G. M. Nijssen. A procedure to define the object type structure of a conceptual schema. Inform. Sysrems 7(4) (1982).