DATA MODELING FOR THE STRUCTURED ENVIRONMENT
5.3
The structured environment contains a lot of complex data with a lot of possibilities for organizing and arranging that data. In the structured environment analysts have the opportunity to shape the data according to their needs. And given the many ways that data can be shaped, the organization needs a roadmap to guide the organization in its efforts to shape the data.
The Purpose of the Road Map The road map serves several important purposes: • The road map serves as a direction for the organization to go. • The road map serves as a guide to different people with different agendas who still must build a collaborative effort. • The road map allows a large effort to be sustained over time. • The road map serves as a guide to end users who ultimately must navigate the final product. There are many reasons then why large, complex organizations need a data model. The data that is modeled is the data that sits at the heart of the business of the company. The data model is shaped around whatever is at the core of the business of the o rganization.
Granular Data Only The data model is shaped around only the granular detailed data of the organization. Bad things happen when the data modeler allows summarized or aggregated data to enter the data model. When summarized or aggregated data is allowed to enter the data model: • There is a huge amount of data to be modeled. • The formula for calculating the summarized data changes faster than the modeler can create and change the model. • Different people have different formula for calculations.
181
182 Chapter 5.3 Data modeling for the structured environment
Figure 5.3.1
The first step in building the data model is to remove all derived data – summarized or aggregated– from the data model as shown in Figure 5.3.1. After the granular data is identified, the next step is to “abstract” the data. The data is abstracted to its highest meaningful level. As a simple example of abstraction, suppose a corporation has female customers, male customers, foreign customers, corporate customers, and governmental customers. The data model creates the entity known as “customer” and wraps all of the different types of customer together. Or suppose the company produces sports cars, sedans, SUVs, and trucks. The data model abstracts the data into the entity “vehicle.”
The Entity Relationship Diagram The highest level of abstraction for the data model is called the entity relationship diagram (ERD). The ERD reflects data at its highest level of meaningful abstraction. The entities of the organization are identified, as well as the relationships between those entities. Figure 5.3.2 shows the symbol that identifies the entities and relationships in an ERD. As an example of the ERD for a manufacturing company, the ERD might look like that seen in Figure 5.3.3. The ERD is important as a high-level statement of what the data model is all about. But of necessity, there is very little detail found at the ERD level.
Chapter 5.3 Data modeling for the structured environment 183
Figure 5.3.2
Figure 5.3.3
The DIS The next level of the data model is the place where much detail is found. This level of the data model is called the “data item set” (DIS). Each entity identified in the ERD has its own DIS. Using the simple example shown in Figure 5.3.3, there would be one DIS for customer, another DIS for order, another DIS for product, and yet another DIS for shipment. The DIS contains keys and attributes, and the DIS shows the organization of the data. The symbol for a simple DIS is seen in Figure 5.3.4. The basic construct of a DIS is a box. In the box are the elements of data that are closely related and that belong together. The different lines between the groupings of data have meaning. A downward pointing line indicates multiple occurrences of data. A line to the right indicates a different type of data. As a simple example of a DIS consider the DIS shown in Figure 5.3.5. The anchor or primary data is indicated by the box of data that is at the top left of the diagram. The anchor box indicates that the data that relates directly to the key of the box is
184 Chapter 5.3 Data modeling for the structured environment
Figure 5.3.4
Figure 5.3.5
escription, unit of measure, unit manufacturing cost, packaged d size, and packaged weight. The elements of data exist once and only once for each product. Data that can occur multiple times is shown beneath the anchor box of data. One such grouping of data is component id. There can exist multiple components for each product. Another grouping of data that is independent of component id is inventory date and location. The product may have been inventoried in multiple places on different dates. The lines going to the right of the anchor box indicate different types of data. In this case a product may be used in flight or in ground support. The DIS indicates the keys, attributes, and relationships for an entity.
Chapter 5.3 Data modeling for the structured environment 185
Figure 5.3.6
Physical Database Design Once the DIS is created, the physical design of the DIS is created. Each grouping of data in the DIS results in a separate database design. Figure 5.3.6 shows the database design that has resulted from the design of the grouping of data found in the DIS. The physical database design takes into account the physical structure of the data, the physical characteristics of the data, the specification of keys, the specification of indexes, and so forth. The result of the physical specification of the data is a database design, as shown in Figure 5.3.7. The elements of the database design include keys, attributes, records, and indexes.
Relating the Different Levels of the Data Model The different levels of the data model are akin to the different levels of mapping that exist in the world. Figure 5.3.8 shows how the different levels of mapping relate to each other: The ERD is the equivalent to a globe of the world. The DIS is the equivalent to the map of Texas. And the physical database design is the equivalent of the city map of Dallas, Texas. The globe – the ERD – is complete
Figure 5.3.7
186 Chapter 5.3 Data modeling for the structured environment
Figure 5.3.8
but not detailed. The map of Texas – the DIS – is incomplete in that you can’t find your way to and from Chicago with a map of Texas. But the map of Texas has a great deal more detail than the globe. The city map of Dallas – the physical data model – is even less complete. You cannot find your way from El Paso to Midland with a city map of Dallas. But you have even more details in the city map of Dallas than you do in the state map of Texas.
An Example of the Linkage The complete linkage of the different forms of data modeling to each other are shown in Figure 5.3.9.
Figure 5.3.9
Chapter 5.3 Data modeling for the structured environment 187
188 Chapter 5.3 Data modeling for the structured environment
Generic Data Models It has been noticed that when a data model is created that it oftentimes applies to companies in the same industry. For example, a bank – ABC – creates a data model. Then one day it is discovered that the data model for bank ABC is very similar to the data model for bank BCD, CDE, and DEF. Because of the great similarity of data models within the same industry, there are models called “generic data models.” The idea behind a generic data model is that it is much less expensive and much faster to acquire a generic data model than it is to build a data model from scratch. It is true that any generic data model is going to need customization. But even with customization, using a generic data model is preferable to having to build the data model by itself.
Operational Data Models and Data Warehouse Data Models There are different types of data models. There are operational data models and data warehouse data models. An operational data model is one that models the day-to-day operations of the company. The data warehouse data model is one that is based on the informational needs of the organization. The operational data model includes some information that is needed for operational processing only, such as a specific telephone number. The data warehouse data model does not contain data that is specific to operational processing. The data warehouse data model does not contain any summarized data. The data warehouse data model does contain a time stamp for every record in the model.