Database management system architecture

Database management system architecture

2 Database management system architecture 2.1 Introduction The previous chapter conveyed the idea of a database management system (DBMS) as providin...

843KB Sizes 0 Downloads 190 Views

2 Database management system architecture 2.1

Introduction

The previous chapter conveyed the idea of a database management system (DBMS) as providing an interface between a global view of the data and the local views of the application programs. This was essentially a two-level DBMS architecture. In the present chapter a more careful analysis of the functions of a DBMS will show that three levels of data view can usefully be distinguished.

2.2

A three-level architecture

First a definition. A data view which is specified in a language which the DBMS software is designed to understand will be called a schema. The discussion in Chapter 1 implied that a DBMS should cater for two types of schema, namely global schema and local schema. In fact, in this two-level architecture, the global schema would need to contain two essentially different types of information. There is information about what data is available in the database, and information about the way that data is stored and accessed. Taken together with the local schema there are then three schemas. For consistency with general usage, the three schemas will be referred to from now on as the conceptual, internal and external schemas. These three terms are defined below.

2,3

The conceptual schema

The conceptual schema is a description of all the data of interest to the enterprise which is to be stored in the database. It specifies the logical data content of the database and the constraints which apply to the data. For example:

(1) (2) (3) (4) (5) (6) (7) (8) (9)

The data held on customers is customerNo, name, address, . . . . The data held on products is productNo, description, price . . . . . Customers place orders for products. An employee may not work simultaneously for more than one department. The age of an employee on the payroll may not exceed 99 years. Age must be a numeric data item. Personnel data (namely...) may be accessed only by the following users . . . . A product may be a final assembly, sub-assembly or component. The distinction between a full-time and a part-time employee is that . . . .

Database management system architecture

25

This list hints at the comprehensiveness which a conceptual schema might possess. As might be expected, database management systems differ in the degree of comprehensiveness, and ease of use, of their conceptual schema facilities. As the enterprise evolves, so the conceptual schema will have to be changed. If the conceptual schema is an accurate model of the enterprise, then any change seen by the enterprise as being a straightforward development in its way of doing business should imply a straightforward change in the conceptual schema. On the other hand, no one should be surprised if a radical change to the enterprise necessitates a radical change to the conceptual schema. The conceptual schema is therefore to be thought of as a relatively stable, long-term view of the data, which is capable of evolving with the enterprise.

2.4

The external schema

An external schema describes the local view of the database required by an application program. If several programs require identical local views they may share the same external schema. Properties of the data such as the format of data items or the sequence in which data is seen (e.g. customerName sequence) may be specified by an external schema, but it cannot override any of the constraints imposed by the conceptual schema.

2.5

The internal schema

The internal schema describes how the stored data is implemented at the level of stored records, stored record formats, indexes, hashing algorithms, pointers, blocksizes, storage media, and the like.

2.6

Mapping

The DBMS software is responsible for mapping between the three types of schema. It must be capable of checking the schemas for consistency (e.g. that each external schema is capable of being derived from the conceptual schema), and must use the information in the schemas to map between each external schema and the internal schema via the conceptual schema. The relationships between the three types of schema are shown in Fig. 2.1. The conceptual schema is the hub of the architecture. Each external schema provides one, or more, programs with a local view which can be derived from the conceptual schema; the internal schema describes how the conceptual schema is physically implemented.

2.7

DBMS components

Few, if any, available DBMS systems adhere strictly to the three-level architecture illustrated in Fig. 2.1, but the diagram does indicate the components which one would

26

DataAnalysis for Database Design

Fig. 2.1 The relationship between application programs and database in a three-level DBMS architecture. (Note that programs 3 and 4 share the same external schema, and that one program is a generalised query language processor) expect to find in some shape or form in a typical DBMS. These include: (1) (2) (3) (4) (5)

User language interfaces, normally including a data manipulation l~'lguage (DML), as described in section 1.3.4. An external schema data description language. A conceptual schema data description language. An internal schema data description language. A database control system (DBCS) which will access the database in response to DML commands.

The operation of the database control system is illustrated in Fig. 2.2. Suppose application program 1 issues a DML command to store a new record in the database. The DBCS consults the application program's external schema, the external/

Database management system architecture

27

conceptual mapping, the conceptual schema, the conceptual/internal mapping, and the internal schema, in order to find out how the record should be stored. The DBCS can then command the operating system to store the record in the database. Finally, a message is returned to the application program to say whether or not the store operation has been successful. The process as represented in Fig. 2.2 is interpretive in that the DBCS consults each schema and mapping every time it receives a D M L command. A system implemented in this way would provide very good program/data independence (for example, the internal schema could be changed between successive D M L commands without the application program being aware of the change) but the performance of the system would be unacceptably slow for many applications. In practice, a DBMS is likely to compile mappings in advance, so that a more direct path is provided between application program and database. A more detailed example of the sequence of events in a store operation is shown in Fig. 2.3. Both Figs 2.2 and 2.3 are intended to be suggestive, rather than prescriptive, of the structure and operation of a DBMS. In Fig. 2.3, the external schema contains just one record type, costRecord (in general, an external schema would contain several record types and relationships between records); this external schema maps on to part of one conceptual schema record type, Inventory (in general, an external schema could

Fig. 2.2 The way in which an application program accesses the database via the database control system. The sequence in which the schemas and mappings are consulted following a DML command from program 1 is shown by the numbers in parentheses

28

Data Analysis for Database Design

be derived from several conceptual record types and relationships); the conceptual schema in turn maps on to the internal schema (once again, the mapping could be more complex than the direct correspondence between Inventory and Stored Inventory).

2.8

A d v a n t a g e s of t h r e e - l e v e l a r c h i t e c t u r e

The three-level architecture simplifies the design and management of a database system by providing a higher level of program/data independence than is possible with a one- or two-level architecture. The clear separation of external, conceptual and internal schemas offers a standard against which a DBMS may be assessed. Some of the implications of the three-level architecture with respect to program/data independence are listed below.

Change in conceptual schema A change to the conceptual schema (the addition of a data item, say) will not affect any existing application program, unless its external schema is incompatible with the new conceptual schema.

Change in internal schema Changing patterns of data usage may necessitate tuning of the database system (for example, by changing blocksizes or providing additional pointers between records). The separation of the external schemas from the internal schema means that the latter can be tuned without application programs having to be changed. Also there is no danger that a change to the internal schema will corrupt the conceptual schema.

Change in external schema The definition of a new external schema, or the alteration of an existing external schema will not affect application programs which do not use the external schema in question.

2.9

Data administration

Because a database is a shared resource it is, in practice, essential to have centralised coordination of its design, implementation and maintenance. These functions are normally the responsibility of a data administration group. The three-level architecture proposed for the DBMS suggests a possible structure for the data administration group, in which responsibility for the three schemas is divided between three administrators. The conceptual schema is administered by the enterprise administrator, the external schemas by the application administrator, and the internal schema by the database administrator. We will use the term data administrator to cover any, or all, of these functions.

2.10

M o d e l vs s c h e m a

As defined above, a schema is a specification of a data view expressed in the language used by a particular database management system. Schemas written for one DBMS

Database management system architecture 29

0

0

Q.. 9
O

> 9

o

ce)

.m

30

Data Analysis for Database Design

would not normally be intelligible to a different DBMS 1. There are therefore advantages in adopting higher-level descriptions of the data, descriptions which are independent of any DBMS software. We will refer to such descriptions as data models rather than schemas. In defining conceptual and external data models, a data administrator need not be concerned with the peculiarities of a particular DBMS, so he or she is able to concentrate exclusively on the properties of the data. The existence of data models will also make it easier to compare the merits of different database management systems for meeting an enterprise's requirements. At a subsequent stage of the design process the data models will be mapped into schemas using the languages provided by a particular DBMS. Issues concerning the design of conceptual data models will occupy a substantial part of this book. Much of this discussion will be relevant also to the design of external data models. Other sections of the book examine how conceptual and external data models may be mapped into conceptual and external schemas using representative database management systems. Some aspects of internal schema design will also be considered, but fine tuning of internal schemas is beyond the scope of this book as it requires a detailed knowledge of the specific DBMS concerned.

2.11

Terminology

In database literature, as in many other areas of computing, there is no standard terminology. In other texts you may find the term schema for our conceptual schema, subschema for our external schema, and physical description or storage schema for our internal schema. Similarly, our references to conceptual, external and internal data description languages (DDLs)have as counterparts schema DDLs, subschema DDLs and data storage description languages, respectively. Some authors use the term schema where we would use data model, or define data model to include both the structure of the data and the language(s) used to access it. The terms global and local correspond to our terms conceptual and external. Although we could have legitimately regarded a schema as being a particular kind of data model, it will be more convenient to reserve the term data model for a DBMS-independent description, and use schema to mean a DBMS-dependent description. As a final source of confusion, we will normally drop the word 'data' from phrases such as 'conceptual data model', and refer instead simply to a 'conceptual model'.

Questions 1. Could two external schemas have some data items in common? (1 min) 2. A database contains a personnel record for each employee of an enterprise. The personnel record of a new employee is to be added to the database. Which, if any, of the conceptual, external and internal schemas will need to be changed? (1 min) 3. How many conceptual, external and internal schemas would you expect to find in an installation using a DBMS? (1 min) ~ManyDBMS are based on an ISO SQL standard, but typicallyinclude variations on, and extensions to, the standard whichrestrict compatibility.

Database management system architecture

31

4. Suggest four, or more, reasons why the store operation illustrated in Fig. 2.3 might be unsuccessful. (2 min) 5. Why might there be a security risk if the DBMS and operating system software are designed independently? (1 rain) 6. Compare the merits of: (a) sharing an external schema between several application programs; and (b) defining a separate external schema for each application program. (2 min) 7. Separation of the external schemas from the internal schema allows the latter to be tuned without changes having to be made to application programs. In what way might an application program become aware of a change in the internal schema? (1 min) 8. Suppose a particular DBMS allows only complete conceptual record types to be included in an external schema, so that an application program using the external schema always sees all the data items in a record. Compare this external schema structure with one in which it is possible to specify which data items are required. Which structure offers (a) the best program/data independence, and (b) the best security. (2 min) 9. In principle, the size of a data item could be defined differently in the external and internal schemas (3 characters versus 5 characters, or 1 decimal point versus 2 decimal points, say). If this were permitted by a DBMS, what problems would need to be resolved in connection with retrieval and amendment operations? (5 min) 10. Is there such a thing as an internal data model? (3 min) 11. Is the three-level DBMS architecture the best thing since sliced bread, or is it altogether too clever? (10 min)

Assignments 1. In section 1.6 (Chapter 1) it was stated that the function of a DBMS is to overcome problems attendant on the use of a database. In this an unduly harsh assessment? Argue the case for a more charitable view, namely that a DBMS offers a consistent set of facilities which would otherwise be provided in a piecemeal fashion. 2. Summarise the claims made in advertisements for database management systems. Comment on the validity of these claims. 3. If you are familiar with a particular DBMS, classify its facilities into conceptual, external and internal schema facilities. Assess how well the DBMS conforms to the three-level architecture, and outline the advantages and disadvantages of any deviations which you have identified.

Answer pointers 1. Yes. If you are in doubt return to Chapter 1. 2. None of them if, as is normally the case, the same type of information is to be held for both new and existing employees. If the information about the new employee is to include a new type of data item, then the conceptual schema, the internal schema, and the external schemas for those application programs which will use the new data item will have to be changed. 3. One conceptual schema, one internal schema, and several external schemas, for

32

Data Analysis for Database Design

each database. Some installations may have several databases, one for each application area (see section 1.7). Occasionally, more than one DBMS is used in the same installation. An alternative answer is that, if the installation's security is adequate, you won't find any schemas unless you have the necessary authority. 4. (a) The database already contains an Inventory record with the same productNo as the record to be stored. Note that the conceptual schema forbids duplicate values of productNo. (b) The value of price is zero or non-numeric; see conceptual schema rules. (c) Disk pack F23 is not on-line. (d) Disk write error. (e) Failure of the central processing unit. (f) Power failure. (g) Database has been opened for inquiry only, not update. 5. By accessing the operating system directly, for example to dump the database, an intruder may be able to by-pass security checks in the DBMS. In practice, most database management systems have been tacked on to existing operating systems. 6. Under option (a) one can be sure that each application program has exactly the same view of the data. If the programs do not need the same view, or if their views may differ in the future, then option (a) could cause maintenance and security problems. Sharing external schemas reduces the number of external schemas and external/conceptual mappings which have to be created. If a DBMS allows an application program to use a dummy external schema name, which can be separately equated to an actual external schema name, then option (a) could easily be converted into option (b). 7. Change in response time. 8. Specification of individual data items gives best program/data independence and security. 9. How will the different data item descriptions be aligned? From the left, right, implied decimal point? Will truncation from left or right of numeric data be allowed? If an external three character value ABC amends an internal five characte; value VWXYZ, should the result be ABCYZ or ABC followed by two spaces, or what? If an internal three character data item is retrieved into an external five character data item what values should be inserted in the extra two characters? 10. On our definitions a data model is DBMS-independent, but an internal schema is highly DBMS-dependent. An internal data model would have to be a DBMSindependent description of the storage structure of a database. Such a language could form the basis of a system for automatic conversion between different database management systems. 11. Depends what you want. Consider the extremes of a completely static environment and a highly dynamic environment. A good answer should also assess the merits of sliced bread.