Compurers & Srrucrures Vol. 21, No. Printed in the Great Britain.
5. pp.
1047-1057
004s7949/m
$3.00
+ .oo
Pergamon PressLtd.
ON DESIGNING A DATABASE MANAGEMENT SYSTEM FOR A COMPUTER-AIDED ENGINEERING SOFTWARE SYSTEM S. D.
RAJAN
Arizona State University,
M. A.
Tempe, AZ. U.S.A.
BHATTI
University of Iowa, Iowa City, IA, U.S.A. (Received
22 September
1983; in revised form 6 March
1984)
Abstract-The renaissance of engineering design can be attributed to the phenomenal growth in the microprocessor industry. The basic difference between the new and the traditional approach is one of outlook. Complex design situations involve multidisciplines. However this has made the task of integrating and transferring design information among these disciplines very difficult. Database management systems (DBMS) provide attractive solutions to these problems. A centralized control of data ensures that redundancy can be reduced, inconsistency can be avoided, data can be shared, standards can be enforced, and conflicting requirements can be handled. This paper takes the reader through the steps required to configure an engineering DBMS. The different data models are explained and their usage in CAD/CAM settings is illustrated. The imoortance of data independence, modularity, efficiency, and protability is emphasized. . 1. INTRODUCTION
Engineering software has developed far beyond the “computational activity” that characterized programs of the 1970s. Computers are used extensively in the engineering field from the time the product development is initiated right through the final steps of the design and manufacturing processes. Each step is usually connected to a specialized program that implements the procedural steps. Only a fraction of this data is global in nature (Fig. 1). However, input to a certain program results in some output which in turn may be input to another (or several) programs. To bring order to such a situation, database management systems (DBMS) are indispensable today. There are quite a few sophisticated DBMS being used for engineering applications. Three representative examples are: the POLO-FINITE, TORNADO, and IPAD systems. They illustrate the current state-of-the-art and the trend in engineering DBMS. The POLO-FINITE system [ll was developed with structural analysis in mind. The POLO DBMS has three parts-the data definition language, host language interface, and run time support. In order to use the DBMS, the application programmer defines the data space and codes the algorithm in the host language. The host language has its own grammar and by interpreting the grammar, the DBMS is able to call the appropriate FORTRAN routines for carrying out the “number crunching.” The trend is towards more sophisticated DBMS that tend to integrate the components of the design process. The TORNADO DBMS [2] is one such
system. It has been designed for technical information systems suitable for handling complex network data structures. The data description language is easy to use simplifying application programming. The DBMS is being used in the Scandinavian computer aided design (CAD) project GPM (geometric product models). The IPAD [3] intergration prototype is an executive that controls several subsystems that operate directly under its control. The subsystems interact with various components such as automated design drafting system; high-speed network; RIM data management system; IPAD geometry canonical forms; IPAD graphics; finite-element modeling; Atlas and SPAR structural analysis systems; and query of the above data within the data manager. Several aspects of database management are illustrated by looking at the list of components. The high-speed network facilitates transfer of information between different computing systems; it implies the existence of a distributed database system. It also suggests that several “patch” subsystems exist that transfer data between local (subsystem) and global (IPAD) databases. The POLO system is an example of a highly application-oriented system. The advantage of such a system is that the end user can code his requirements quite easily. Programs need not be written in a high-level language and linked to the DBMS. However, the end user is restricted by the capabilities of the host language. The evolution of a design language written by the system developer in a high-level language is no trivial task, especially in a language like FORTRAN. On the other, hand, the TORNADO system is a package of FORTRAN rou-
1047
1048
S.
D.
RAJAN
and M. A.
BHATTI
Scenario
GLOBAL
DATABASE
Yl MODULE
Fig.
1. Modular
3
interaction in an engineering software sys-
tines that can be called by the application program. It has powerful capabilities for an informationbased system and is suitable for a multidisciplinary usage of a global database. Conceptually, the system is adequate for building components around the DBMS. However, there are several requirements that the system fails to meet at the current level of development. The IPAD system is still in a developmental stage. It is an example of a large database where the DBMS could be spending an appreciable amount of time in transferring data between the local and global databases among other bookkeeping chores. This paper is divided into four parts. The first part addresses the requirements of an engineering system and outlines the procedure to evolve database structure and schemes around context-free database management systems [4]. Section 2 describes the components of such a system. Section 3 deals with database implementation. The final part shows the importance of linking CAD (design information) to CAM (manufacturing), CAP (process control), and drafting. The paper is a sequel to the ideas presented by the authors in Ref. [5] and summarizes the experience gained during the development of the SADDLE [6] system. 2. EVOLUTION
OF AN APPLICATION
PROGRAM
We will attempt to describe the needs of engineering software system with a series of examples. The needs basically deal with manipulating information. We will call such a manipulation, a transaction against the database. The transactions can be categorized as (i) creating a piece of information; (ii) inserting a piece of information; (iii) deleting a piece of information; (iv) updating a piece of information; and (v) reading a piece of information. The process of writing the information to the database is implicit in the transactions (i), (ii), and (iv).
1 Let us look at the end user’s interaction with an interactive finite element program. To enter the nodal coordinates and boundary conditions into the database the program may accept commands of the form: POINT node number x, y, z, boundary conditions. This is an example of category (i). The program may also have a command DELEP node number, illustrating category (iii). If the POINT command is reissued for the same node, then quite clearly the nodal information is being updated [categories (iv) and (v)]. Let us assume that there is no restriction on the sequence of nodal information [i.e. information on node k may be entered before node (k - l)]. In this case, category (ii) transactions could occur quite frequently. This scenario leads to the first question-“How should the information be stored to facilitate the steps involved in carrying out a transaction?” Scenario
2
Let us consider the possibility of a graphics program that uses the information created by the finite element program. To draw the stress levels for a particular load case, the first step of the algorithm may proceed as begin for element: = 1 to no-of-elements do begin: find connectivity information; find nodal coordinates; update viewing box dimensions; find components of stress tensor; compute failure level (stress as a fraction of yield stress); update element color information based on the stress level menu; end; end. Note the chaining of information beginning with the root, an element. The element points to nodes that comprise it and hence the nodal coordinates. It also points to the stress tensor that describes its stress state and also to the material property that defines the yield stress value. Information is not only being read but is also being modified simultaneously. This leads to the second question“How should the information be organized to minimize access and update times?” Scenario
3
Let us assume now that a drafting program is used to create production drawings using the finite element data. Figure 2 shows a finite element model of an automobile rear suspension torque arm. Membrane elements have been used for the purpose of
Designing a database management system
1049
oped locally; (2) the engineering system is used by both technical and nontechnical staff; (3) the system is also used by people of widely varying experience and background; and (4) the needs change with time and therefore software must be updated constantly to reflect the changing needs. Under these circumstances, the steps needed to configure a database management system can be listed as: (a) Identify all the different pieces of information that describe the total system. In DMBS terminology, an entity is a quantity that exists and is distinguishable [71. A group of similar entities form an entity set. The different entity sets form the different pieces of information describing the database. In scenario I, each node is an entity and all nodes that describe the structure form the nodal entity set. (b) For every entity set, identify the attributes, which associate a value from a domain of values for that attribute with each entity set [7]. In scenario I, each node has attributes like nodal coordinates and boundary conditions that describe the properties of the node. (c) Go through a normalization process in which the entity sets and the attributes may have to be refined and redefined in Fig. 2. Finite element model of an automobile torque arm. order to: (i) minimize redundant information; (ii) avoid update, insertion, and deletion anamolies; (iii) minimize the access and update times; (d) design a language that would enable translation of the analysis and design. There are two problems with conceptual database into physical storage (and vice this model. First, the finite element program treats versa); and (e) implement the DBMS in a high-level the connecting rod as a two-dimensional problem. language keeping in mind efficiency, portability, Information along the thickness is ignored. Second, with the thickness changing between different re- and data independence. We shall examine each of the five steps in the gions, no attempt is made to maintain continuity of the slopes at the intersection of the different re- following sections. gions. The drafting program has to pick up representative points from the finite element data and 3. DATABASE STRUCTURE pass curves of appropriate nature to simulate the desired effect. Consequently, additional data is Most of the concepts prevalent in engineering most likely to be created around the existing data. DBMS have been borrowed from the business comThis leads to the third question-“If the same data munity. However, there are remarkable differexists in different forms, then how can this redunences. In the case of engineering computing, prodancy be minimized? Also, how can the integrity grams operate on large local databases (global of information be maintained?” database in comparatively small) and have large main storage requirements [4]. Engineering comScenario 4 putations center around two data types, namely taLet us assume the data is being used in an en- bles and matrices 151. Hence it would be logical to gineering company comprising of various departdevelop the data structure and schemas to handle ments-design, drafting, and manufacturing. Let us matrices and tables within the same DBMS. assume that two persons, one in the design office and the other in the drafting office, want to look at 3. I Schemas and query language the project data at the same time, Is it possible? The first step is to identify all the entity sets and What if one of them wants to make changes to the their attributes. In a finite element program, the set data. Who will be affected by the changes? Can of all elements is an example of an entity set (ELEanyone make these changes? What if the system MENTS). Each element is then an entity. If a piece crashes while the changes are taking place? How of data describes the entity, then it is an attribute will the DBMS recover from the crash? These quesof the entity. The type of element (truss, beam) and tions are very realistic and provisions for including the connectivity information are attributes of the such capabilities should be accounted for in the in- entity element. An attribute or a set of attributes itial DBMS design. whose value uniquely identify each entity in an entIt should be noted at the outset that in any enity set is called a key for that entity set. The element gineering environment: (I) software may be develnumbers are the keys to the entity sets in the above
1050
S.
D.
RAJAN and
examples. Having identified the entity sets, the next step is to define the relationships between the entity sets so as to form a conceptual layout of data. Note that a conceptual format does not (and should not) concern the physical layout (the manner in which the information is stored in memory and/or on an external device). The overall conceptual database description is referred to as the schema [8]. There are instances of very complex relationships between entity sets. In engineering applications, three common types of relationships are oneto-one, many-to-one, and many-to-many relationships. In finite-element analysis, an example of a one-to-one relationship is the correspondence between loading cases and nodal deflections. An example of many-to-one is the relationship between the entity set ELEMENTS and MATERIAL-GROUPS (many elements may have the same material properties but no element can belong to more than one material group). The relationships between NODES and ELEMENTS is an example of many-to-many because a node may belong to several elements and an element may address several nodes. The reason why these concepts are necessary is that frequently queries against the database are made to get particular information. In order to be able to express this query, each DBMS has its grammar for the query language-given an entity set E. its attributes A, and the values the how can the the quantities [8]? In a design environment, queries are usually the form: (1) = ?--“What the value of attribute E?” (What is cost/unit volume of terial (2) A(?) has a value of attribute V?” (Which have deflections greater than or equal to units?). (3) = ?-“What are the values for entities?” are colors all elements?) are other used less frequently at all)-?(E) ?(E) = ?, or ?(?)
are also possible, e.g. find the ments in design that have stress values than one-half the yield stress (optimization). example the fact that the types and subsequent computations greatly the conceptual scheme. Figures and 4 illustrate two entity-relationship schemes. In Fig. set DESIGN-GROUP the entity set The entity ELEMENTS further and STRESSES get necessary information to complete the Fig. 4. design information the entity set MENTS as an attribute. The that entity set However, in order to answer the query, the logic lo-
M. A.
BHATTI
F
DESIGN GROUP
,&,
&,
Fig. 3. Use of primary and secondary keys.
cate the elements in the requested design group. Then, is the first scheme superior? The answer is yes based on the nature of the query. However, if the query is posed differently as-find the elements that have stress values greater than one-half the yield stress and create a new entity set STRESS-SUM which stores the sum of the stress values of all elements in a group, then clearly the second scheme is superior. In order to refine the conceptual layout, we will categorize the different forms of relations between entity sets. This will enable the DBMS designer to implement step (c) (normalization) mentioned at the end of Sec. 2. 3.2 Data models
A data model provides an interpretation about how data are related. There are three widely used models-hierarchical, network, and relational. The salient features of each model will be illustrated using an example-formation of the global stiffness matrix. The algorithm is as follows: begin for element: = 1 to no-of-element do; begin (1) obtain the connectivity information; of the (2) obtain the nodal coordinates nodes forming the element; (3) obtain the material properties; (4) obtain the element properties; (5) form the local stiffness matrix; (6) form the local-to-global transformation matrix; (7) form the global stiffness matrix as the product of matrices in steps (5) and (6); (8) impose the appropriate boundary conditions; end; end.
ELEMENTS
. . ..
G;;;;G;o \
MATERIAL
PROPERTIES
Fig. 4. Modified relationship
scheme of Fig. 3.
1051
Designing a database management system
ELEMENT PROPERTIES
Fig. 5. Hierarchical
MATERIAL PROPERTIES
NODES
model for the structural database.
A hierarchical data model for the structural dain question is shown in Fig. 5. The data model is like a tree where the functional arc always point towards the leaves and away from the root in order to form a tree. Each leave is referred to as “node”. In order to avoid confusion between the data model node and a finite element NODE, the finite element related entities will always be written in capital letters. With the data model shown in Fig. 5, steps (2)-(4) of the above algorithm involve going down from the node ELEMENTS to subordinate nodes NODES, MATERIAL-PROPERTIES, and ELEMENT-PROPERTIES. The rest of the steps involve computations based on the previously accessed values with the boundary conditions information used in step (8). The hierarchical data model has two important constraints. The first is that no node (DBMS) further away from the root may be linked to more than one node (DBMS) closer to the root. With reference to Fig. 5, this will imply that an ELEMENT can have several NODES but a NODE may be connected to only one ELEMENT! Because of this functional restriction, it is not possible to represent many-to-many relationship types without building in unwanted redundancy in the model. The second constraint is that data be structured according to a tree. The implication is that one-to-many and manyto-many relationships must now be represented by unconnected hierarchical trees as shown in Fig. 6, where the entity NODES in the first tree is described only by a pointer that points to the occurrence of the NODE in the second tree, thus avoid-
tabase
I
Fig. 7. Network model for the structural database
ing storage of information twice. The data manipulation language that describes the rules specifying the action to be taken during a particular transaction becomes very complex. For example, does deletion of an ELEMENT imply deletion of all MATERIAL-PROPERTIES information? In order to overcome some of these difficulties, the network data model can instead be used, with all relations restricted to be binary, many-to-one relationship (links). However, to represent arbitrary relationships among entity sets El, Ez, . . . , Ek, a new logical record type T is created as (T, , T- . . . . Tk) of entities are a part of the relationship. This enables the creation of links L, , Lz, . . . , Lk where Li is the link from record type T to the record type Ti for entity set Ei [7]. The relationship that is represented by each link is recorded in the logical record Ti. Figure 7 shows a network model for the same algorithm. Consider the many-to-many relationship between ELEMENTS and NODES. To represent this relationship, a new record type E-TO-N is created that maps the relationship between elements and nodes (Fig. 7). While this technique ensures that many-to-many relationship can be represented, the rules specifying transactions are more complex. With sufficiently diverse databases, the relational data model has proven to be superior to both net-
STRUCTURE
dx ELEMENTS
Fig. 6. Unconnected
‘i
STRUCTURE
heirarchical trees for structural database.
1052
S. D.
RAJAN and
work and hierarchical models at least conceptually if not in terms of efficiency [7,9, 101. Pictorially, the relational scheme of a single relation is like a table. Each column contains an attribute and each row is an entity. A relation scheme is expressed as R(A, , where A,, AZ, . . . , A, are the atA29 . . . , A.), tributes of the relation. If D, , D2, . . . , D, are sets of values of the attributes A,, A2, . . . , A,, then R expresses the relation on these n sets if A, is from DI, A2 is from Dr, and so on. The sets Di are domains of the relation R [7]. Figure 8 shows the relational model for the structural database. In order to implement the algorithm, the element information is read in step (1). Using the connectivity information, the nodal coordinates are obtained from the entity set NODE [step (2)], the material properties using the material property number from MATERIAL-PROPERTIES [step (3)], and the element properties using the element property number from MATERIAL-PROPERTIES [step (4)]. When designing the relational data model, the user faces a dilemma since several relational schemes are possible. There are some desirable properties that a scheme should possess. In order to define these properties, it must be realized that central to the design of database schemes is the idea of data dependency [7]. For example, the attribute NODAL-COORDINATES is functionally dependent on the attribute NODE-NUMBER in the entity set NODES. Let us focus on the relation scheme ELEMENTS (Element-no,type,element-properties, materiaLproperties,connectivity-data). The first problem is redundancy. The element and material properties are repeated (perhaps) once for
M. A.
BHATTI
each element. Updating the material properties would involve changing the appropriate attribute in all tuples. There are other anomalies. Material properties cannot be defined unless there is at least one element made up of that material. Similarly if that element is to be deleted, all material information is lost. In order to overcome these problems, it is necessary to go through a process of normalization. It essentially consists of transforming complex relations into simpler relations without loss of information. There are three well-defined normal forms: first normal form (INF), second normal form (2NF), and third normal form (3NF) [ 111. The conversion of data into 1NF is essentially the process of eliminating repeating groups and hierarchical structure. A relation is in first normal form if every attribute is a simple attribute [ll]. In the preceding example, the attributes ELEMENT_ PROPERTIES and MATERIAL-PROPERTIES are composite. To put this relation into INF, we can describe the relation as ELEMENTS (Element-no,type, element-property-no, property_code,property_values,material-no, materiaLcode,materiaLvalues,connectivity_ data), where all composite attributes by their constituent parts.
have been replaced
A key of a relation is a set of attributes that uniquely determines a tuple in an entity set. In the ELEMENTS relation scheme (INF), the element number is the key. An attribute of the relation is prime if it participates in a key [ll]. An attribute
A
NODE NO
Bc’s
MATERIAL TYPE
PRESCRIBED DISPLACEMENT
MATERIAL PROP
Fig. 8. Relational scheme for the structural database.
1053
Designing a database management system
(or a group of attributes) B is functionally dependent on A if each value of A never has more than one value of B associated with it. B is fully functionally dependent upon a group of attributes A if B depends upon all the components of A and not on any subset of A. In order to ensure that a relation already in INF, is in 2NF, each attribute must be fully dependent on its primary key. In the ELEMENTS relation (1NF), connectivity-data depends upon element number and type, whereas property-values depend upon element-property number and property-code. We can now write the 2NF of ELEMENTS as ELEMENTS (Element-no*,type*,elementproperty-no, material-no,connectivity_data), ELEMENT-PROPERTIES (element-property-no*,property-code*, property-values), MATERIAL-PROPERTIES (material-no*,materiaLcode*, materialvalues), where the attributes followed by the asterisk form the primary key. An attribute C is transitively dependent on A if at every instance, C is functionally dependent upon B, B is functionally dependent upon A, but A is not functionally dependent on B. A relation is in 3NF if it is already in the 2NF and it has no transitivity dependencies of nonprime attributes on keys [l 11. The relation schemes discussed above are in 3NF since there are no instances of A + B --, C, where A is the primary key of the relation. The process of normalization not only overcomes the redundancy and transaction anamoly problems but also makes it easier for the user to conceptualize the database and portions of it. The next section discusses how to define the properties of the model in a specialized language. There is a strong similarity between the physical database description in terms of record format, file, and record and the relational database description in terms of relation scheme, relations, and tuple. This ready translation from concept (relation) to implementation (file), makes relational database a powerful data model. Figure 8 shows the relational scheme, e.g. ofthe structural database. The entity set ELEMENTS is related to entity sets MATERIAL-PROPERTIES, ELEMENT_ PROPERTIES, and NODES. The relation scheme for ELEMENTS has attributes as primary keys for these entity sets but not for their functionally dependent attributes. It is quite apparent that for the finite element database example, the network model is an improvement over the hierarchical model and the relational model provides further improvement over the network model. Note the enormous amount of redunCAS ?I:?-M
dant data in the hierarchical model. The material and element properties are stored for all elements ignoring the many-to-one relationship alluded to earlier. In fact, the model has already gone through a normalization procedure. In an orthodox model, the tree NODE would have been a child of ELEMENTS. The connectivity data now is of a logical type (which is a pointer to the tree NODE). The network model has minimal redundant data. In fact it can be argued that the model is identical to the relational model, semantics aside. However there is a difference. The logical record types E-TO-N, E-TO-M, and E_TO_EP, are quite unnecessary. In the relational model they have been substituted by embedded pointers or secondary keys. Relational models do not provide answers to all situations. There are instances where the database relationships, for either the entire or part of the database, are strictly hierarchical, or network or relational. Consider, e.g. the task of storing matrices. We want to devise a scheme which will allow matrix operations with matrices of varying sizes. Figure 9 illustrates the conceptual scheme. Each matrix will be identified by an alphanumeric name. We will use the name as the key. The attributes include the number of superrows, number of supercolumns, and the blocksize (BK) of all submatrices (or subvectors): since the size of the matrices are different the matrices will be stored as submatrices of BK rows and BK columns. The information is tabular
BUCKET I
.
I.
I
I
I
0111213(415
MATRIX
DIRECTORY
-1 2 -3 4 5 6
SUBMATRICES FILE
1
FILE
4
Fig. 9. Conceptual scheme for a matrix database.
10.54
S. D.
indicating a relational is hierarchical.
RAJAN and
model. However the storage
3.3 Data ~e~~jt~~n funguage The description of the database is written in a specialized language called a data definition language (DDL) [7]. Using the set of rules specified in the language, the user can express the properties of the data model. Conceptually, it is often better to split these rules into a subset that defines the structure and another subset that specifies the constraints i, the refationships, and associated values. There are different types of DDL for a relational data model. We shall discuss one such approach where the structure and constraints are expressed in terms of domain, attribute, and relations. Using the example from the preceding section, the structural database can be described using the following DDL: domain:
id-code type alLvalues boundary_cond pas_values connectivity nodaLcoor
attribute:
relation:
eiem-no mat-no prop-values prop-code
M. A.
BHATTI
Begin Open database PLOT: Set scope ELEMENTS read; Set scope NODES read; Set scope STRESS read; Set scope GRAPH update; set viewing box dimensions; get stress level menu; for elements: = 1 to no-of-elements do begin From ELEMENTS Select connectivity, stress-no Where “‘Element-no” = elements; Update viewing box dimensions; for i: = I to no-of-nodes_in_element do From NODES Select nodaLcoor Where “node-no” =
fixed [I . . . 30000], char” IO. float(l5) [min . . . maxl, fixed bin(@, float(l5) [O . . . max], tixed(20) [I . . . 300001, float(3) [min . . . max]: id-code, id-code, pas_values, id-code,
node-no prop-no mat-values mat-code
id-code, id-code, allvalues, id-code;
ELEMENTS (elem no*,type*,connectivity,prop_,mat-no), NODES (node-no*,nodaLcoor,boundary-condo, MAT-PROP (ma~no*,ma~code*.ma~values), ELEM_PROP (prop_no*,prop_code*.prop-values).
In the domain description, fixed, char, float, and bin refer to integer, character, floating-point, and binary data types that are followed by the number of elements if the variable is a vector, and the range of permissible values. The attributes followed by an asterisk are the primary key for the relation. Once the data model is described using the DDL, the next step is to use the data manipulation language (DML) or the query language to carry out the transactions against the database. 3.4 Data manipulation language Let us examine the interaction between an application program and the DBMS. Shown below is the application program for the algorithm in scenario 2. We will use four entity sets to describe the database.
connectivity(i); end; From STRESS Select stress-values Where “stress-no” = stress; Compare stress-value with stress level menu to obtain new-stress-level; From GRAPH Update stress-level Such-that “stress_level” = new-stress-level Where “element-no” = elements; end; Delete scope ELEMENTS, NODES, STRESS, GRAPH: Close database PLOT: end.
Relation Entity set ELEMENTS (Element-no*,type*,Stressno,materiaLproperty-no,connectivity) NODES (Node-no*,nodaLcoor) STRESS (Stress-no*,stres~values~ GRAPH (Element-no*.stress_level)
All DML commands begin with an upper case letter while the entity sets are typed all in upper case. The assumption that is made here is that the DDL has already been used to describe the data model. Note the protocol used to implement the algorithm.
1055
Designing a database management system
The first step is to specify the database identification using the open database command. The next series of commands specify which relation schemes are in use followed by the type of transaction lock (read or update). In a multiuser environment this lock establishes the types of transactions that can be carried out. For example, entity set ELEMENTS is write protected since it is open only for read operations. This also makes it possible for other users trying to access the same entity set to, possibly, read the information from the same database. The DML allows operation on one set at a time with the use of from, select, and where clauses. However, the nature of the algorithm shows that only one tuple is used at any time. The final steps involve releasing the entity sets using the delete scope command. New information is recorded on the database for the entity set GRAPH while no changes are recorded for other sets. Does the application program communicate directly with the physical database or through the data manager? In reality it may or may not depending upon the physical implementation of the logical database. If the physical database is exactly like the logical database then a “subschema language” is not required. The subschema language contains the description of the physical implementation. For example, the file NODE (a relational model) with the node number as primary key, looks exactly like the conceptual model. On the other hand, the file ELEM may not, simply because the number of nodes per element is not a fixed number. Physically two records could make up a logical record for elements that have many nodes per element (20-node brick element). The next section discusses the implementation of the physical database.
4. DATABASE IMPLEMENTATION
So far we have defined the components of the DBMS (Fig. 10) but have not concentrated on the implementation details. We must now concern ourselves with developing the DBMS in a computer language for a particular computer system. We must also bear in mind that we have to achieve the objectives as effectively as possible. Two aspects of this situation will now be considered-the storage on a nonvolatile medium (disk/tape) and the means of transferring the data between this medium and the high-speed memory, and the resource allocation scheme that provides the tools for implementing the database strategies effectively.
4.1 Srorage on secondary devices
Physically, all data is stored in files on a disk/ tape. One or more entities can be stored on one file depending upon the application. The unfortunate part of storing information on a secondary device is that both the storage and the manner of storing (and retrieving) are machine dependent. The DBMS
designer must find the optimum way to balance portability with efficiency and the implementation strategy on a particular machine. Each tile contains data items written as “records” of predetermined size. The usual practice is to transfer one page of data (i.e. a block of records) between the secondary storage and the memory at a time, instead of one record at a time. This speeds the Z/O (input/output) quite significantly. When the application program requests a record, the DBMS checks whether the record is already in memory. If not the page containing the record is brought from the physical database to the memory (the reverse procedure is carried out when writing a record). For large problems, the primary memory is usually not enough. This requires that the DBMS must provide facilities for memory management. 4.2 Memory management scheme
In order to implement a memory management scheme (MMS), decisions have to be made regarding fetch, placement, and replacement strategies for pages of data. The fetch strategy defines the policy of when to load the primary memory with data and how much of it to load at a time. A priority table establishing quotas for the entity sets involved is a necessary ingredient for an efficient fetch strategy. The placement strategy determines where in the memory to load new page (or pages) of data. Since the available memory size is predetermined and allocation policy has been established, the placement question poses no problem. The replacement strategy deals with what to remove or swap from the memory when there is not enough space available. Several schemes have been suggested to accomplish this task [12]. A “near-LRU” (near least recently used) technique is found to be very effective in computer aided engineering software [5]. For an efficient computer-aided engineering system the data organization and the memory management schemes must be flexible enough to allow APPLICATION PROGRAM DATA
MANAGER
‘I DATA MODEL TRANSLATOR
1 INPUT/OUTPUT MANAGER , T PHYSICAL DATABASE
Fig. 10. Parts of a data manager.
1056
S. D. RAJAN
and M. A.
performance enhancements on various computer installations. One technique is to include tuning parameters in the design of the DBMS: (i) maximum size of the working set; (ii) page allocation policy within the working set; and (iii) page size. Since the size of working set is dependent on the individual sizes of the entity sets (problem dependent), it is advisable to increase the size to as large a figure as possible on nonvirtual machines (computers with nonvirtual memory operating systems). If the requirements still cannot be met, the MMS will1 automatically invoke the replacement strategy to handle the additional requirements. The question is more difficult to answer for virtual machines (computers with virtual memory operating systems). The real-to-virtual memory ratio, type of computations carried out, system load (all installation-dependent features) dictate the size of the working set. Smaller working set is preferred in systems when the machine is heavily loaded. The allocation policy in the distribution of the working set is another parameter. This is especially true since the MMS swaps pages in and out of the subset of the working space allocated to the quantity that is being paged. The page size is another tuning parameter. On nonvirtual machines, the page size should be close to the maximum input/output (I/O) transfer allowed by the operating system. On virtual machines, the page size should be as close to the system page size as possible. It should be noted that a small page size results in more page turning activity and a large page table. On the other hand, a large page size requires a greater Z/O time during paging. Paging works most efficiently with programs that execute sequentially without frequent references to distant memory locations. Programs characterized by such locality of reference generate a minimum of page faults. References [12, 141 discuss the use of hypermatrix storage for storing large matrices. The locality of reference that is inherent when the scheme is applied to matrix manipulations is aptly illustrated in Ref. [13]. A MMS is of course a necessity for systems with limited core. For systems with a virtual memory operating system, it may be argued that since the DBMS is doing “paging” on top of the paging being done by the operating system, it is inefficient. To answer this question, several numerical experiments were performed on a PRIME 750 minicomputer with PRIMOS virtual memory operating system. Details are reported in Ref. [5]. It was found that PRIMOS performed relatively poorly when the system was heavily loaded but during times of light system load, PRIMOS required less disk transfer time. Also larger page sizes fared much better in terms of page faults and disk f/O times when tuned properly, compared to a smaller page size. Thus, it is the authors’ opinion that if a DBMS has a MMS it should have tuning parameters built into it to make the system as efficient as possible under different operating conditions.
BHATTI
5. INTEGRATION
WITHIN
AN ENGINEERING
SYSTEM
One of the biggest advantages that accrue from having a centralized database is that redundancy can be reduced. Too often the same design data exists in more than one form not only local to an application program but also as a part of other programs that deal with the same information. This is a relatively new area that has attracted the attention of people engaged in massive amount of information transfer. With an increase in numericalimathematical treatment of design and in the use of numerically controlled (NC) machines, the effective merger of CAD with CAM in the manufacturing setting and that of CAD and CAP in process control setting is essential. PHILIKON [15] is an example of an integrated CAD/CAM system where “dialogue modules for detailing mechanical parts, design of single-stage blanking and progression tools, and the programming of NC machine for spark erosion and turret punching are integrated by a common engineering database.” The system provides an interesting look into the requirements of both manufacturing and process control industries. The database can be broadly categorized into (i) design data: describes the geometry (or layout), material properties, “element” properties; (ii) catalog: describes the standard available components the designer has chosen to use (either available locally, or standardized, e.g. AISC); and (iii) specifications: specifies when to use a process or component (usually local to the company). One of the goals of integrating the design process and manufacturing (or process scheduling) is to eliminate the costly errors that can occur so often. Above all it frees the designers from cumbersome mechanical tasks and gives them more time to concentrate in evolving better design and manufacturing techniques. Two aspects of large databases are worth examining. Growth of application programs around some large databases takes place with the use of patch programs that link existing application programs. These programs merely translate data from one form to another (from one local database to another or from a local database to the global database and back). Quite often these databases are conceptually incompatible because the assumptions on which the different softwares are based are different. Enormous amount of time is lost during this data transfer especially since design proceeds iteratively and each iteration requires the usage of the patch programs. The second aspect deals with data independence. The needs of a company change with time. These changes finally must be incoporated into the design system. There are three types of changes-current capabilities are updated, some of the current capabilities are no longer required, and new capabilities must be added. If these changes imply a substantial change in the database organization, then the database lacks data independence. No DBMS can guarantee complete data
1057
Designing a database management system independence. The initial design must account for future enhancements so as to minimize the amount of work required during updates.
computing.
I. General descriptions.
S. D. Rajan and M. A. Bhatti, Data management in FEM-based optimization software. Comput. Struct. 16, 317-326
6. CONCLUSIONS The ideas presented in this paper have been successfully used in an FE&based design system called SADDLE 161. The system has been able to support the often conflicting requirements of finite elements, graphics, and optimization programs. Encouraging results have been obtained not only from problems of different sizes but also from problems very diverse in nature-a stumbling block for most structural synthesis software.
L. A. Lopez, R. H. Dodds. D. R. Rehak and J. L. Urzua, Application of data management to structures. Proc. ASCE Cot&. Comput. Civil Engng, Atlanta, Georgia, 1978. S. Ulfsby, S. Meen and J. Oian, TORNADO: a DBMS for CAD/CAM Systems. CAD 13, 193-198 (1981). R. P. Dube. IPAD Sofiware Workbook. IPAD Intergration Prototype Software. Boeing Aircraft Company (1979). C. A. Felippa, Database management in scientific
(1983).
6. S. D. Rajan, SADDLE: A computer-aided
structural analysis and dynamic design language, PhD. dissertation, Department of Civil Engineering, University of Iowa, Iowa City (1983). 7. J. D. Ullman, Principles o~Dur~buse Systems. Computer Science Press, Maryland (1980). 8. James Martin, Computer Database Organization. Prentice-Hall Inc., Englewood Cliffs, NJ (1977).
9
Proc. ACMISIGMOD
Conference
on Data Models:
’ Data Structure Set vs Relational (Edited by R. Rustin) ACM, New York (1974).
10. Computer Surveys (Edited by E. Silbey) Vol. 8, No. 1 (1976).
11. D. C. Tsichritzis and F. H. Lochovsky, Data Models. Prentice-Hall,
REFERENCES
Cornput. Struct.
5 10, 119-124 (1979).
Englewood Cliffs, NJ (1982).
12. A. C. Shaw, The Logical Design of Operating Systems. Prentice-Hall, Englewood Cliffs, NJ (1974). 13. A. C. McKellar and E. G. Coffman, Organizing matrices and matrix operations on paged memory systems. Comm. ACM 12, 153-165 (1969). 14. P. J. Pahl, Data management in finite element analysis. Proceedinas of the EuroDe-US WorkshoD. Rhur irniversitat, B&him, Germany, Springer ‘Verlag, New York (1981). 15, W. E. Fischer, PHIDAS. A database management system for CAD/CAM application software. CAD 11, 146-150 (1979).