Domain analysis for business software systems

Domain analysis for business software systems

Information Systems Vol. 24, No. 7, 0 1999 Published by Elsevier Science Ltd. Pergamon pp. 555-568, I999 All rights reserved Printed in Great Brit...

2MB Sizes 2 Downloads 84 Views

Information Systems Vol. 24, No. 7,

0 1999 Published by Elsevier Science Ltd.

Pergamon

pp. 555-568, I999 All rights reserved

Printed in Great Britain 0306-4379199 $20.00

PII: SO306-4379(99)00032-O

DOMAIN ANALYSIS FOR BUSINESS SOFTWARE SYSTEMS+ A.T. Department

of Computer

BERZTISS

Science, University of Pittsburgh, and

Pittsburgh,

PA 15260, USA

SYSLAB, University of Stockholm (Received

27 March 1998;

in final revised form 23 August 1999)

Recent trends in the development of business software include process orientation, integration of several capabilities into the one system, internationalization, and the need for rapid response to changes in the business environment. Domain models are a means of dealing with these trends. We partition domain models into concept models and process models, discuss their structure and representation, and examine the models in relation to the new trends. We also consider how domain models can enhance reuse and assist in maintenance. 0 1999 Published by Elsevier Science Ltd. All rights reserved Abstract

Key words: Conceptual

Containment,

Domain Model, Process Model, Reuse

1. INTRODUCTION The term information system has lately acquired two meanings. One relates to systems that assist an enterprise, primarily a business enterprise, to achieve its strategic goals. Alternatively they support the day-to-day operation of the enterprise. The other relates to the representation and presentation of information. This refers to systems that support activities such as web browsing, or the effective utilization of technologies such as hypertext and touchscreens. Here we shall consider information systems of the first kind alone, and shall refer to them as business software systems. In our view the three primary characteristics of the business world of today are process orientation and integration of processes, progress toward greater automation of processes, and internationalization. Moreover, a business enterprise has to make rapid responses to rapid changes in its environment. This implies that the model of an enterprise has to be based on processes, and the internationalization and rapid change aspects require that the model be easy to reconfigure. We shall build our discussion around a flexible generic domain model that can serve all enterprises that perform similar functions. Domain modeling then provides a general framework that is to be adapted to the specialized needs of individual enterprises, with the design of the domain model and its specializations allowing for easy change. This can be regarded as an exercise in metamodeling. First, the structure and component types of the domain model constitute a metamodel. Second, by viewing a domain model as a generic framework that is to be adapted to special needs, we introduce a metamodeling aspect into the model construction process. For this we have to look at business activities as closely interrelated. While it has been customary to partition business software systems into, for example, systems that deal with transaction processing, management information, decision support, office automation, has been largely superseded - in many and executive support (see, e.g., [31]), such classification business software systems being developed today these capabilities are intermixed. For example, in a modern order fulfillment system the processing of orders the old-fashioned way may be replaced by automatic initiation of shipments based on monitoring of client inventories, and the system may also generate periodic summaries for middle management, survey long-term trends for top management, initiate transfer of inventory between warehouses in anticipation of shortages, and so forth. tRecommended

by Kalle Lyytinen

555

556

A.T. BERZTISS

As noted above, besides the integration of different business activities into a single system, other trends have arisen. First, organizations are being defined in terms of processes instead of managerial structures. This is the fundamental characteristic of business reengineering [16], which, however, has not yet been fully realized [5, pp. 6-101. Also, reengineering implies that existing processes are carefully examined and redesigned, with the aim of improving their effectiveness for advancing the objectives of the business. This may result in the reorganization of business tasks into new processes. As part of the reorganization, the processes should be constructed in such a way that future changes become relatively easy to introduce. Second, internationalization of business requires the use of different natural languages in what is essentially the same business system. Third, the environment in which a business enterprise operates changes at a very rapid rate, and the enterprise has to make almost instantaneous responses to these changes [27]. A recent change is the fast growth of internet commerce, which, to consider just one of its effects, has resulted in a high degree of volatility in stock markets, which, in turn, may require reexamination of money management in an enterprise. This implies that business software systems must be easily adaptable. Various commercial products have been developed that assist in building business software systems and integrating such systems. Examples of such products are the Baan (www5.baan.com), Peoplesoft (www.peoplesoft.com), and SAP/R3 (www.sesame.com) systems. There is a demand for the products - for example the 1998 revenues for Baan were $142 million. The commercial interest in systems for building business software emphasizes the need for a fresh examination of the foundations of conceptual modeling. We examine domain models as an effective means for dealing with current developments, and Section 2 is an introduction to domain modeling as a metamodeling approach. In Section 3 we discuss domain modeling based on conceptual containment, and Section 4 deals with process structures. Internationalization is the concern of Section 5. Section 6 is a discussion of what scope a domain model should have. In Section 7 we discuss domain models and maintenance. Section 8 is a summary of our findings. Throughout we identify problems that are to be solved, and suggest how domain modeling can help solve the problems. Some of our suggestions refer to long-term projects. 2. DOMAIN MODELS A domain has been defined as “a problem or task area in which multiple highly similar application systems will be developed to meet the particular requirements of several different customers” [47]. We shall see later on that this definition is not well formulated, but we recognize the usefulness of the terms problem urea and task: area. For example, shortage of personnel for dealing with an unexpected increase in customer enquiries at a bank may be the problem of interest. Its problem area is everything that these enquiries could relate to. The handling of a withdrawal from a bank account is an example of a task, and the totality of all banking transactions define a task area. Actually, there is no need to define the term domain itself. Our real interest is in domain knowledge and domain models. In the broadest sense, domain knowledge relates to a discipline, such as software engineering, accounting, or insurance, and it is the entire corpus of data, rules, and processes that characterizes the discipline. Some of it is codified in standards and handbooks, or embedded in software, but most of it is distributed in the collective memory of practitioners of the discipline. A domain model is an abstraction that consists of only those parts of the domain knowledge that are relevant to a particular purpose - domain analysis is the process of development of appropriate domain models. Here analysis refers to the examination of the domain and its problems, and appropriateness means that the domain model allows solutions to problems in a particular area of interest to be built up. After a solution has been defined, there is no longer a problem - the solution, which is a task or a sequence of tasks, has replaced it. This suggests that the distinction between problem and task areas is of little significance. In our view, whatever distinction there is, is one of specialization. If the purpose of a domain model is to arrive at task definitions, then a domain model should be a generic solution to a class of problems from which to

Domain

extract a specialization, problem in the class. interpret as constraints

Analysis

for Business

Software

Systems

i.e., the definition of the tasks that constitute the solution In addition, a domain model may contain business rules that a specialization has to satisfy.

Although the importance of domain analysis is degree still uncertain what it means. The confusion knowledge that relates to application domains, i.e., that relates to implementation domains, i.e., about are to help solve the problems [25].

557

of a particular [34], which we

now realized, software developers are to some is attributed to a lack of distinction between about problems to be solved, and knowledge the tools, representations, and methods that

The definition of domain cited above is symptomatic of persisting uncertainty. Thus, a domain exists even if it is uncertain that systems “will be developed” in it. Even if systems were to be developed, they do not necessarily have to be “highly similar” Finally, even if highly similar systems were to be developed, there do not have to be “several different customers” - all the systems could be developed for a single customer. An introduction to domain analysis can be found in [36]. Different uses of domain analysis are discussed by Wartik and Prieto-Diaz [50]. A somewhat limited bibliography has been prepared by Rolling [43]. Glass and Vessey [25] survey taxonomies of domains. The early work on domain analysis did not make it clear just how a domain model is to be constructed, what representation is to be used, and how the details of a particular application system are to be derived from a domain model. An attempt at structuring domains is the use of patterns, which has become an important part of object-oriented software development - for an introduction see, e.g., [II]. Under pattern-based software development, a design is an applicationspecific adaptation of a set of patterns. A collection of patterns gathered by Coplien [14] includes establishing the size of an organization, building teams by self-selection, establishing an apprenticeship program, and using scenarios to improve ultimate customer satisfaction. Whitenack’s catalog [52] includes requirements gathering, consideration of customer expectations, establishing good relationships with customers, and, at a specialized level, several patterns that deal with the Note that all these patterns relate to the software development of a system of domain objects. process, but patterns can define any application domain. Still, patterns have not found as wide an acceptance in industry as one would hope for. This lack of success has been attributed to patterns being oversold, being hard to learn to use, and not being classified in a manner useful to practitioners [13]. Moreover, since patterns relate to structures larger than modules, procedures, or objects [15], they may be at too high a level of generality. This applies also to frameworks, which are class libraries, and even more general than patterns. Further, the confusion between application and implementation domains, which we referred to earlier, is particularly strong in work on patterns - for example, although the title of [22] seems to promise a book on patterns to be found in software, it actually deals with patterns for developing software. Viljamaa [48] sees patterns as primarily a tool for communication between people, which again indicates a high level of generality. Detail can be introduced into the general picture by an ontological approach to the conceptualization of a business domain. Extending Mario Bunge’s work [9, lo], Wand and Weber [49] have defined 28 ontological constructs that are to act as components of domain models. Their basic construct is a thing. The other constructs relate to things, and include their properties, their states, events that change states, and a history, which is a chronologically ordered structure of states. These constructs define a metamodel for information systems modeling, but we need a construct that is more general than the thing construct. Such a construct is introduced in the next section. An additional shortcoming of the definition in [47] is that a domain model is to relate not only to the development of application software, but also to an understanding of the application area. Therefore we need a model for the comprehension of the concepts that arise in the area, and a model or models for the representation of the processes that are to define a business enterprise. We shall therefore partition domain models into concept models and process models. A concept model assists in the interpretation of the terminology and structure of a domain. A process model represents a generic process. The two models are related in that a concept model is needed to interpret the terms used in the process model. Of course, a separation between static and dynamic

558

A.T.

BERZTISS

phenomena is standard practice in information system modeling. What we are emphasizing here is that concepts are more than just things, and that the dynamic aspects should be captured in explicit process definitions. Note that process is itself a concept, which seems to imply that there should be just one model, but the general design principle of separation of concerns suggests otherwise. Earlier we referred to the confusion between application and implementation domains. An analogous confusion can arise regarding application and implementation processes. Separation of concerns suggests that an application process model be separated out from the application concept model. However, as regards implementation, where there are also a concept model and a process model, the application process becomes a component of the implementation concept model. 3. CONCEPTUAL

CONTAINMENT

Disciplined domain analysis for domain comprehension can be achieved by means of conceptual containment. In a universe based on set theory everything belongs to the world of sets, and a set is so fundamental that it cannot be defined. Similarly, a concept cannot be defined. Whereas in set theory the important relation is membership, in concept theory the important relation is containment. Thus, the meaning of a concept is defined by the concepts that are contained in it. Conceptual modeling has been studied extensively at the University of Tampere. For a basic philosophical introduction see [30]. A more recent continuation of this work is to be found in [38], which hp been followed up by an investigation of different kinds of conceptual containment from a philosophical point of view [39]. The importance of this work for us lies in its adaptation for modeling of business software systems by Kangassalo - an extensive outline of his approach is to be found in [29]. To distinguish the approach taken here from other approaches to conceptual modeling we shall refer to our approach as the Tampere method. A concept is a cognitive interpretation of anything at all - a thing, a being, an action, even an emotion. All such concepts can be significant in the development of business software systems. However, only a limited number of concepts is important in a given situation, and conceptual modeling extracts from a universe, which we call the concept space, just those concepts that are relevant for a particular application. Moreover, not all the concepts that are contained in a relevant concept have relevance for the application. To express this more precisely, we introduce new terminology. Note first that conceptual containment has two guises. One we shall call descriptive containment; the other will be called restrictive containment. Under descriptive containment the concept of a person is expressed in terms of the concepts name, date of birth, address, spouse (who is another person), etc. The fullest conceivable such description is an ideal concept that can never be realized. The realized part of an ideal concept, as in a data base, will be called a base concept. We also have to make precise what is meant by conceptual modeling. One of its purposes is to determine what base concepts are of interest to a business enterprise, and to define the concept space, which is the collection of all such base concepts, structured by containment. The other purpose is to extract from a base concept those components that are relevant for a particular application. Such abstractions of the base concept stand in the relation of restrictive containment to the base concept. Thus, if the base concept is that of a person, the abstractions may define the person as an employee, as an investor, as a member of a club, etc. We shall refer to such abstractions as specializations. The collection of specializations of different base concepts that are relevant for a particular application is a conceptual model. Of course, the specialization of a base concept is itself also a concept, and, in what follows, in the interests of brevity and literary style, we shall apply the term concept both to a base concept and to a specialization. When we talk of an instance of a concept, say that of a person, we have in mind a particular entity, i.e., a particular person in our context. We refer to this as the root entity of an instance of the concept. For every concept, we have a set of root entities, which we call the root set. As noted above, containment associates a set of contained concepts with a concept. Consider concept C. The concepts contained in C are functions or relations that map from the root set of C, but note that the targets of the mappings are themselves concepts as well. In the terminology of

Domain Analysis for Business Software Systems

559

the entity-relationship model [12], functions map to attributes - attributes are ground concepts in the sense that they do not contain further concepts within the framework of the conceptual model under consideration, but relations from the root set of C map to concepts that themselves contain further relevant concepts. In discussions of business software frequent use is made of the term Universe of Discourse (UoD). A domain model is the UoD, and a conceptual model based on containment is a particular representation of the UoD. We now carry this further. With each root entity associate a set of states. A state relates to a specialization of a concept, and is defined informally by means of an example as follows. Suppose that root entity p belongs to the specialization K defined by relations rx, ry, and rz. Interpret p as the person JamesSmith, rz as the father-of relation containing the element , ry as the mother-of relation containing the element , and rz as the sibling-of relation that maps to GeorgeSmith and GeorgetteSmith. Then a state of p with respect to K can be defined by the set {< rx, JohnSmith>, < ry, MarySmith>, < rz, {GeorgeSmith, GeorgetteSmith}>). A state transition takes place when this set changes. In our case this would happen if JamesSmith were to acquire another sibling. The union of such sets is the state space of p, and a state transition occurs whenever p changes from one state to another. If a root entity can exist in just one state, then we refer to this entity as immutable. It it can exist in more than one state, then it is a mutable entity. Integers are immutable, but persons can be immutable or mutable, depending on the particular specialization of the base concept of person. For example, if the specialization consists of just the single function date-of-birth then the entity is immutable, but any specialization that contains the function age is mutable.

4. PROCESS

STRUCTURES

In order to fully understand a mutable entity, it has to be understood what state transitions can occur, and under what conditions they occur. We shall refer to a sequence of state transitions as a process. Processes have been represented by a variety of notations. The more important ones are data flow diagrams [lS, 231, state transition diagrams - particularly statecharts [26, 211, and Petri nets [41, 421. All these representations show that there is some kind of flow from one node to another, but they differ in the way they express the flow, and what happens at the nodes. Thus, data flow diagrams are based on an assumption that there is an actual flow of data within an information system. This is true for some workflow systems in which a document is moved from one workstation to another. But in a system based on a centralized data base the how is of signals rather than of actual data. These signals initiate tasks. We have discussed this in some detail elsewhere [5]. The flow of signals (messages) as a device for combining tasks into processes is one of the basic characteristics of object orientation [51]. A natural correspondence between reality and representation thus results under object orientation, which is one reason for its popularity. State transition diagrams are digraphs that show clearly what states are possible (the nodes), and what transitions between them may take place (the arcs). However, they have to be combined with other kinds of diagrams, as in UML [21], and it may become difficult to maintain a clear understanding of how t,he diagrams are related, i.e., of the exact nature of the complete information system. This applies also to statecharts. Tools allow a statechart representation to be transformed into an executable program. This implies that the statechart formalism is based on sound semantics, but it also implies a level of detail that may obscure the high-level structure of an information system. Petri nets are very popular among theoreticians, but not so much among practitioners. The reasons for their popularity are their expressive power (e.g., time Petri nets are equivalent to Turing machines [3]), the many results that have been gathered regarding their properties, and the availability of a variety of tools. On the negative side, the great amount of research has resulted in a confusing profusion of different types of Petri nets. For example, in some nets tasks are represented by transitions, and states by places. A task is then viewed as an activity that effects a state transition. But, while a task takes some non-zero time, a transition is instantaneous this leads to a representation in which tasks are represented by places (since a token may remain in a place for non-zero time, the duration of a task can be represented by the time a token spends

A.T. BERZTISS

560

in the place). Also, a Petri net of even a fairly small system can become very large. The size problem can be solved by modularization - an example is the G-net [19]. One also hears that Petri nets are too difficult for practitioners, but the author’s experience with his students, who have little enthusiasm for matters theoretical, shows that they can use Petri nets competently after not much training (as long as the Petri nets are introduced by examples rather than by complex graph-theoretical definitions). In process modeling it is necessary to consider methods and tools, information, and people. The Tampere method allows these three components to be considered together. In defining the processing of housing loan applications, say, the tasks in this process need inputs from people, there may be elaborate procedures to follow, and references may have to made to an information base. Similarly, the metaprocess followed in defining the loan-application process would require various actions to be performed by people with different capabilities, a software process model would be followed (see, e.g., [l] for representations of such models), and process knowledge would have to be extracted from a process knowledge base. How exactly the three components of a process or a metaprocess, particularly the latter, are to be put together, is an important research topic for the future. The Tampere method provides a framework in which such an investigation can be carried out. It is most important that each element of relevance in the description of process or metaprocess be fully understood, and conceptual containment can provide the understanding. Understanding of the relevant elements is necessary if automation of a business process is to be furthered, and full understanding is necessary for complete automation. It should also be investigated to what extent the Tampere method can make patterns and framework more accessible, and thus have them adopted by a larger body of practitioners. In particular, this approach may lead to the realization that patterns and frameworks are not just something appended to object-oriented design, but that they are of deep significance that is largely independent of any particular design paradigm. Indeed, good design has always implicitly depended on patterns and frameworks. What is new is that explicit definitions are facilitating their interchange. The explicit elaboration of the actual “process nature” of a process needs a different approach. Conceptual containment allows us to regard a process as an independent concept made up of tasks, which are contained concepts. If execution of task T is to be followed by execution of tasks P and Q in parallel, this is easily modeled by having a relation is-followed-by with elements < t, p > and < t, 4 >, where t, p, and q are the respective root entities of the task concepts. But we also need to show under what conditions and by what agency a task is initiated, what activities then take place, the effect of these activities on the the information base, and the triggers that cause the task to initiate other tasks, To deal with these needs we introduced in [6] a format for the tasks. A task is represented as a pattern, but our interpretation of a pattern puts it at a much lower level of generality than is commonly assumed, e.g., one of our patterns defines the highly specific activities associated with the reservation of a rental car. Our format for a pattern has five components, as follows: Triggered by, which establishes what it is that initiates the activities grouped under the pattern; Activities, which gives an outline of all activities that are part of the pattern,

an indication of the conditions under which each activity takes place, and an indication of how the activities are related to each other - some of the activities will be performed by software, others by people, but, in the latter case, the software is expected to provide prompts and guidance that tell people when and how to perform the activities;

Information

base changes, which indicates those parts of the information process that are to be changed by the activities of the pattern;

base supporting

the

Affects, which identifies all patterns that can be initiated by this pattern, and states the conditions

under which these other patterns would be initiated; Notes, which can contain any information deemed relevant by the author of the pattern, explicit indication of what is not considered in the pattern.

e.g., an

Domain

Analysis

for Business

Software

Systems

561

This format shows clearly and explicitly how the tasks are interrelated to form a process. The mggered by component indicates what other task (or a person, or a data base trigger) initiates this task. The Aflects component tells what other tasks can be initiated by this task. The format allows easy construction of any of the graphical representations discussed above. Conversely, if a graphical representation has been developed first, the user has full knowledge of what patterns are needed, and can start filling in their text. It is important to note that the patterns contain full information for an implementation [7] - for example, if the implementation is to be object-oriented, the Activities component tells what methods are needed. By looking at the Information base change components of all the tasks, the structure of the information base can be arrived at. We consider the relative generality of our patterns to be their principal advantage. Detailed representational issues do not have to be considered, which allows the software developer to concentrate on matters of consequence. The effectiveness of the five-component patterns has been demonstrated by means of undergraduate team projects. For example, using this approach, a non-trivial information system for a car-rental agency was implemented and thoroughly tested in approximately 270 person-hours. This process was defined by 15 tasks [6]; for details on the student projects see [7]. The projects will be further discussed in Section 6. We believe that the success of the projects can be attributed to the understandability of the patterns. Davenport [17] quotes Tom Peters as saying that success in information management is 5% technology and 95% psychology - apparently our patterns get the psychology right. A special instance of process is provided by workllow systems [33, 24, 44, 281. These systems represent counterparts in the information world of industrial assembly lines. Instead of components being fitted into a piece of machinery, say, as it moves along an assembly line, different processing steps are carried out on a document as it moves from workstation to workstation. Here it is legitimate to talk of a data flow, and the representation of data flow in terms of our five-component patterns is straightforward. A possible problem is the ambiguity of natural language, but in [8] we show that there are advantages to formulating requirements in natural language. Moreover, as the next section shows, these advantages apply also to domain models defined in natural language.

5. COMMUNICATION

IN NATURAL

LANGUAGE

The problems raised by development of software for different markets have been discussed quite extensively [20, 46, 32, 37, 53, 401. Some relate to usage differences in different languages (English and Japanese), others to cultural differences between countries that share the same language (England and Canada, France and Canada, Portugal and Brazil). An ingenious system of communication between peoples who speak different languages are Chinese ideograms. We are now reaching a stage at which non-native speakers of English put together English words as if they were ideograms. The results do not always obey the norms of English usage. Two questions arise. First, if the meaning of a message is not compromised by the non-standard usage, does usage matter? An analogous situation existed in the Middle Ages when Latin was the language of science and of any other domain for which native languages were thought inadequate. However, this was not the “standard” Latin of Caesar or Cicero. Second, how can we prevent non-standard usage, or, when non-standard usage has altered the intended meaning of a message, how are we to restore the intended meaning. 7 Here we shall consider the second question alone. Our concern will be to find a way in which people with different cultural and linguistic backgrounds can communicate in English without their messages losing much of their intended meaning. We consider a conceptual model based on containment as an effective means of achieving this. Anomalies in the use of natural language belong to three main categories: spelling mistakes, syntax or sentence construction, and cultural effect. By cultural effect we mean the change of meaning a message undergoes when it is interpreted under two different frames of reference. We shall ignore spelling, look very superficially at sentence construction, and concentrate on the cultural effect.

562

A.T. BERZTISS

Most communications with which information systems have to deal follow what we shall call a wh-pattern. This means that in addition to defining what is their target of concern, they are also interested in locating it in time (when) and space (where), and in identifying participants (who). Students of journalism are being taught the importance of the wh-components from the first day of their training. The wh-components are equally important in business communication. Let us look at an example: “The estimated sales figure for Stockholm for 2000, as supplied by the head office, is SEK 8,000,OOO”. Here the what-component is “The estimated sales figure is SEK 8,000,000”, the where-component is “Stockholm”, and the when-component is “2000”. The who-component is “the head office”, and here it identifies the source of the information. In some languages the order of the wh-components is strictly fixed. In others, such as English, there are no ordering rules, and the order may be used to indicate the relative significance of the wh-components. This brings us to cultural effect. When somebody with a good sense of English carefully arranges the wh-components of a communication so that particular emphasis is put on location, say, and the message is interpreted by somebody with no ear for subtle stylistic hints, the intended emphasis is likely to be lost. This suggests that business language be explicit and robust. Explicitness requires that dependence for interpretation of a message on a particular cultural context be minimal. Thus metaphors are to be avoided. For example, red stands for danger in some cultures, but for celebration in others [35]. Robustness implies avoidance of double negatives and other potential sources of confusion. However, as much as we may wish the explicitness and robustness recommendation to be followed, reality will be different, and we have to find ways of dealing with this reality. Although it seems that the omission or misuse of articles, and the wrong use of number and tense would create frustrating problems, this is not so. When a particular written language puts little emphasis on such matters, and somebody brought up on this language writes “Programmer debug own program,” the phrase is easily understood. I have had to deal on a daily basis with Surprisingly and counterintuitively, I have found material written by Chinese and Americans. that although the writing by the Chinese is far from standard English, there is less room for misinterpreting their writing than that produced by Americans. The “Chinese English” is more robust. We take note of this in looking for a way of assisting a writer not in full command of English syntax to arrive at “standard” prose. Our suggestion is that the writer should start by creating a conceptual structure and that a text generator is to help the writer to transform this conceptual structure into stylistically acceptable prose. The conceptual structure would be similar to the structure that results when Chinese ideograms are combined, but its exact form will be determined by the need to couple it to a text generator, and this remains a research topic. As an example of the problems to be solved, suppose one wanted to buy mild cheese in Italian. A dictionary may suggest asking for “pious” cheese instead of the correct “sweet” cheese. This is the kind of problem that after the early enthusiasm for machine translation around the year 1950 led to disenchanted dismissal of its practicability in the later 1950s. There is now a realistic appreciation of the difficulty of natural language processing, and we understand that it requires the solving of very many highly specific problems, e.g., problems that may relate to just one particular word. By means of this step-by-step approach we have built up a repertoire of problem-and-solution patterns, which today allow natural language processing of fairly high quality. Nevertheless, a formidable set of problems remains, and, as one problem is solved, new ones take its place. This is inevitable because the domains in which natural language is applied keep changing. This means that natural language systems have to grow in size, in particular the procedures that are to deal with cultural effects. More and more people from more and more cultures are using English for business communications. Hence the number of different use and misuse patterns for English keeps growing. The ideal solution for avoiding misinterpretation due to cultural effects is to insist on explicitness and robustness, but, particularly in prose aimed at persuasion, metaphors will be used, and they can be misinterpreted. A possible solution is to flag expressions that do not meet the explicitness and robustness requirements. A concept model could be an important tool for recognizing such expressions. To make the discussion concrete, let us consider software development as the domain. The software development process has two aspects, a managerial and a technical, and different representations deal

Domain Analysis for Business Software Systems

563

with the two aspects.

They correspond to the concept model and the process model respectively. The most widely known representative of the managerial aspect is the Capability Maturity Model of the SE1 [45]. Of technical models there is a good number - some are surveyed in [l]. If now a concept model were constructed by reference to the managerial view of software development, textual expressions could be flagged for which the concept model cannot provide an interpretation. Making sure that all expressions are explicit and robust may not be enough. In some cases they will have to be translated into a different language. Our suggestion is to build isomorphic concept models in the different languages, to construct the intended process model with reference to the concept model relating to one’s native language, and then use the isomorphism to create a semantically equivalent process model in the other language. How exactly this is to be achieved remains a research topic.

6. SCOPE

OF DOMAIN

MODELS

Before discussing the scope of domain models, let us consider the ‘Lownership” of domain models. Davenport [16] advocates not only that an organization be considered as a collection of processes, but that the “owner” of a process be also the LLowner” of the data that the process relates to. These data are for most organizations their most valuable asset. Therefore it has to be clearly understood who is responsible for them. Of course, in terms of physical location, all process data can reside in a centralized data base, and the owner of the data relating to a particular process This suggests that each process be can delegate much authority to the data base administrator. a domain. On the other hand, a business rule may relate to more than one process. Further, a generic rental process and a generic purchase process have much in common, so that there would be much duplication if they were to be regarded as fully independent. Also, some processes, e.g., schedulers, make use of algorithms or heuristic procedures that are best obtained from some global repository. We face two challenges. Each generic process is to be defined by an independent process model, but the models are to be related. Resolution of this apparent conflict is the first challenge. We call it the dependence problem. Second, interpretation of a process model requires reference to the concept model. This we call the reference problem. For a solution to the dependence problem we turn to object orientation. It is characterized by data encapsulation, inheritance, and communication by message passing [51]. A process model can then be made to correspond to a framework. Activities of a task, the information base components to which they refer, and other tasks being initiated by the task are part of a task definition, but some of these other tasks and some components of the information base affected by this task can belong to a different process model. Such would be the case if a purchasing process were to be developed that refers to an already existing rental process. Another instance is the reference to an algorithm that is to deliver a result needed by a task. The five-component representation of a task of Section 4 allows these external components to be referenced by (implicit) message passing. However, the five-component task definitions already define a specific process. The generic process is defined by unformatted capsule descriptions - the fifteen capsules of the generic rental process are to be found in [6]. Here we show the text of just one of them, the reservation capsule: A customer makes a reservation of a rental object for a length of time starting at an indicated date and/or time. Variants of the basic pattern include (a) group reservations, (b) indication of just the desired starting point and no indication of the length of rental, (c) no indication of a starting point, as in the case of a library book that is currently out on loan, (d) confirmed reservation, (e) overbooking, in anticipation of a cancellation, (f) no prior reservation, with the arrival of a customer interpreted as a reservation, (g) in case of shortage of rental objects, a customer may be put into a wait line, but be allowed to make a reservation if the shortage is no longer in effect, (h) for some applications, the rental site and the return site of a rental object may differ, (i) the credit-worthiness of a customer may have to be established as part of this task. The natural-language capsules are checklists from which to construct a specific application model, e.g., a rental process for cars. The hard part of software development is in most cases the consideration in advance of all the special cases and exceptions that could arise. The capsules define

A.T. BERZTISS

564

the normal processing steps for each task, but their main purpose is to list all known abnormal situations, and to give hints on how to deal with them. The capsules grow in time as more abnormal situations are being thought of or are being met in practice. The reference problem arises when a capsule description contains a term that the reader does not understand. The context in which the term is found in the concept model should allow it to be understood, but a convenient way of linking a process model to the concept model remains to be found. Hypertext could be an effective way of implementing the linkages. Whereas a process model defines a single process, the scope of the concept model has to be much broader. It could relate to all activities of an enterprise, or it could cover an entire industry. One reason for the broadened scope is that a business rule may refer to more than one process. Under a process model, reference can of course be made to components of other processes, so that a business rule can become part of a task definition, but the business rule has to be obtained from somewhere in the first place. The appropriate repository for business rules is the concept model. The reference problem remains significant for business processes throughout their lifetimes. The user of an algorithm does not have to know in detail how the algorithm produces a result. In a business process, however, software and people may be constantly interacting, and people may override software decisions. This means that the users of a business software system have to understand the system very thoroughly, implying that they must have access not only to its process models, but also to the concept model so that they can interpret the process models correctly. The definition of a process can also be misunderstood, and formal specification may have to be used to prevent this. The form of the five-component pattern introduced in Section 4 allows the patterns to be easily translated into the formal specification language SF (for an introduction to SF see [5]). SF stands for Sets-Functions, and specifications of tasks are expresses in terms of sets, functions, and expressions in logic. Real-time processes are allowed for, and formal semantics for the component that assembles tasks into processes is provided by time Petri nets (for time Petri nets see [3]).

7. MAINTENANCE

OF BUSINESS

SOFTWARE

SYSTEMS

In the introduction it was pointed out that rapid changes in the environment in which business software systems are deployed necessitate rapid adaptation of the systems to the changes. This is a major purpose of maintenance. We shall not survey maintenance as such in any detail here because we have already done so elsewhere [4], but it is essential to relate configuration management to maintenance in our context. Configuration management is to ensure that all representations and documentation of a software system are consistent. They consist of a process model, detailed design, code, maintenance manuals, etc. Strict configuration management must be practiced by an organization that aspires to a high level of capability maturity of its software development process [45]. This implies that when changes are made at the code level, corresponding changes have to be made everywhere else as well. Since this means that the process model is to be changed in any case, it is more effective to change the process model first, and then make the other changes. This approach does not require any more recoding than a bottom-up approach. The main problem for maintenance personnel is the understanding of the system to be changed. A domain model is to help in this, but very often there does not exist a domain model. When this is so, construction of a domain model is a legitimate initial maintenance task. Understanding of the system is gained by reverse engineering of the system, and the target of reverse engineering is to be a process model of the application. But it is difficult to define this application model, and then to understand it properly, unless there is a domain model. We suggest, therefore, that there always be a domain model, and that a process model for the application under the new requirements be derived from this domain model. When reference can be made to a domain model, reverse engineering becomes manageable, and the differences between the existing system and the required system can be readily defined. Such an approach has several advantages. First, a domain model allows the requirements for the modified system to be expressed more precisely than would be possible without the model. Second, the model may show that the building of a totally new system may be more effective than

Domain Analysis for Business Software Systems

565

an attempt to salvage something from an existing system. Third, future maintenance efforts will be easier to define and to interpret if a domain model is in place. Since the use of the domain model is not limited to a single application, the cost of its construction can be spread out. Our two domain models, the concept model and the process model, are related in that the concept model helps to understand the process model. The relationship is particularly significant in the case of maintenance. As noted in the preceding section, the generic process model functions as a checklist from which to extract items of significance for the specific application being defined. If the generic model does not yet exist, it has to be constructed, and reverse engineering of an existing system, together with consultation of domain experts, will help in this. If there already exists a generic process model, it may have changed considerably since the specialized system under consideration was built (or was last subjected to maintenance). These changes are highly relevant for maintenance. If a concept model already exists, it, too, may have undergone changes since the specialized system was built (or last modified), and these changes have to be reflected in the new version of the system. If the concept model does not exist, it has to be constructed, and it is then to be consulted for the interpretation of the terminology, for understanding of how the different concepts of a domain are related, and for helping define the structure of the information base of an application. It should be noted that there are times when the process model can change without a change in the concept model, and vice versa. For example, if the ordering of the tasks constituting a process is changed, or a greater degree of parallelism introduced, this is unlikely to have any effect on the concept model. Since the concept model is primarily there to aid the cognitive understanding of the domain in which processes operate, a rearrangement of the conceptual structure may be undertaken to improve such understanding, but this need not affect the process model, i.e., it may be possible to leave the definitions and ordering of tasks unchanged. The precise determination of how and when changes in one model alfect the other is an important topic for further study. Basili [2] argues that maintenance and reuse are essentially the same in that an extensive maintenance project can be regarded as the construction of a new system in which much of the existing system is being reused. It is important to note, though, that the identity of maintenance and reuse does not hold in the context of a single business software system because reuse components do not usually come from just one existing system, but also from a reuse library. If now the reuse library is based on a domain model, then maintenance becomes in the first place the adaptation of the domain model. The model for a particular application is then extracted from this domain model, and code is generated with reference to this application model. Indeed, domain analysis is often seen primarily as a means of aiding reuse. We differentiate between maintenance and reuse, and, to explain the difference, introduce the terms context evolution and context switch. Context evolution relates to maintenance - it occurs when there is a change in the domain. This is to bring about a change in the domain model, but the change may affect the specializations of a generic process model differently. For example, electronic reservations can become significant for car rental but are unlikely to do so for video rental. Context switch relates to reuse. Suppose we have implemented a software system for car rental. When the context switches to video rental, the appropriate specialization of the generic process model does not differ all that much from the specialization for car rental, so that a significant amount of code reuse can be achieved.

8. CONCLUSIONS The main contribution of this survey is the demonstration that there should be two kinds of domain models, which we call concept models and process models, and we consider conceptual containment to be an effective structuring device for concept models. We introduced some new terminology, such as descriptive and restrictive containment, and root entity and root set, which should lead to better understanding of concept models. We have had some encouraging practical experience with process models based on natural-language capsule descriptions, particularly for rental processes [7]. Because car rental belongs to everyday experience for most American students, they found no need to develop an explicit concept model for their car-rental project. However, when

566

A.T. BERZTISS

a domain is less familiar, a concept model is essential for capturing and preserving the knowledge of domain experts. Therefore, a convenient mechanism should be found for relating process models to concept models. This remains a research topic. Business rules create a particular problem. It is not clear whether they should be part of the concept model or the process model. Business software does not just drive business processes. Increasingly it is to assist in business communication, particularly when the communication is between people with different native languages and cultural backgrounds. We recommend that business communications be explicit and robust, and suggest that domain models can help achieve explicitness and robustness. A concept model is to help set up messages as structures similar to those that relate ideograms. These structures are to be transformed into explicit and robust text in, say, English by a language generator, but how this is to be achieved in practice remains a research topic. A more difficult topic is how to deal with messages that are not explicit and robust. A concept model is to help recognize such messages, but it is not clear what can be done with the messages to improve their understandability after the recognition. If business enterprises are to be regarded as collections of processes, then it is to be expected that most of the messages will refer to processes. This means that linkages from process models to concept models are important for improving the messages. We noted that maintenance and reuse are related, but that one relates to context evolution and the other to context switch. We have shown by means of student projects [7] that a generic process model for rentals can be reused to create specialized processes for car rental, video rental, and even the “rental” by students of places in courses (i.e., student registration). We are now examining generic manufacturing and resource scheduling processes. Our aim is to establish what determines the complexity of a process, and, since these processes are more complex than rental or purchasing processes, comparison of the less and more complex processes should assist us with this objective. Acknowledgements - This work was performed while the author was on sabbatical leave in Kaiserslautern. Support was provided by the Fraunhofer-Gesellschaft (Einrichtung Experimentelles Software-Engineering) and the University of Kaiserslautern (Sonderforschungsbereich 501). The support is gratefully acknowledged. Suggestions by referees and editorial comments by Kalle Lyytinen have greatly improved the presentation of the material. If anything still remains unclear, the author alone is to blame.

REFERENCES PI P. Armenise, S. Bandinelli, C. Ghezzi, and A. Morzenti. A survey and assessment of software process representation formalisms. International Journal Software Engineering and Knowledge Engineering, 3:401-426 (1993).

PI

V.R. Basili. Viewing maintenance

131 B. Berthomieu Zkansactions

as reuse-oriented

IEEE

software development.

and M. Diaz. Modeling and verification of time dependent on Softwore Engineering, 17:259-273 (1991).

7(1):19-25

Software,

[41 A.T. Berztiss. Reverse engineering, reengineering, and concurrent engineering of software. International Software Engineering and Knowledge Engineering, 5:299-324 (1995). Methods for Business Reengineering. [51 A.T. Berztiss. Software PI A.T. Berztiss. Domains and patterns in conceptual modeling. VIII, pp. 213-223, IOS Press (1997).

[71 A.T. Berztiss. Conference,

(1995). Modelling

and Knowledge

Bases

In Proc. 27th Z+ontiers in Education

development

of information

systems.

Data & Knowledge Engineering,

!lkeatise on basic Philosophy:

Volume 3: Ontology I. Reidel (1977).

Zkatise

Volume 4: Ontology ZZ. Reidel (1979).

1111 F. Buschmann, Software

In Inform&ion

Joumol

Pittsburgh,

PI A.T. Berztiss. Natural-language-baaed 23:47-57 (1997). PI M. Bunge. WI M. Bunge.

Springer-Verlag

Failproof team projects in software engineering courses. PA, pp. 1015-1019, IEEE CS Press (1997).

(1990).

systems using time petri nets. IEEE

on basic Philosophy:

R. Meunier, H. Rohnert, P. Sommerlad, Wiley (1996).

and M. Stal. A System

ofPatterns -

Pattern-Oriented

Architecture.

WI P.P. Chen. The entity-relationship Systems, 1:9-36 (1976).

model:

P31 M.P. Cline. The pros and cons of adopting of the ACM, 3!3(10):47-49 (1996).

toward a unified view of data. and applying design patterns

ACM

tinsactions

in the real world.

on Database Communications

Domain

[14] J.O. Coplien. pp. 184-237, [15] J.O.

A generative Addison-Wesley

Coplien.

Idioms

Analysis

and patterns

Information

Davenport.

Structured

[18] T. DeMarco.

Software

pattern

language.

development-process (1995). as architectural

[16] T.H. Davenport. Process Innovation: School Press (1993). [17] T.H.

for Business

Ecology.

Analysis

Oxford

[20] Digital.

Digital Guide and

University

International

UML Distilled:

Structured

and T. Sarson.

Systems

Software,

Yourdon

Digital

the Standard

14(1):36-42

Press

Glass

[26] D. Hare]. (1987).

and I. Vessey. Statecharts:

[27] G.P. Huber.

The nature

[28] M. Jackson (1997).

Contemporary

a visual

Press

and design

of post-industrial

Business

and G. Twaddle.

for complex

-

[29] H. Kangassalo. Comic - a system and methodology Data 8 Knowledge Engineering, 9:287-319 (1992/93). [30] R. Kauppi. University

Einfchrung of Tampere

[31] D. Kroenke. [32] T. Madell,

in die Theorie (1967). Information

Management C. Parsons,

W.B.

[34] P. McBrien, M. Niezette, D. Panzatis, A rule language to capture and model neering (LNCS No. 4g8), pp. 307-318, Beyond

language

Systems.

Croft.

[37] S.M. O’Donnell.

translation:

crossing

From Concepts to Concept Ser. A, Vol. 416, University

the cultural

divide.

support

tool for writing

Petri Net Theory and the Modeling

[42] W. Reisig. Petri nets in software engineering. of Concurrency (LNCS No. 255), pp. 63-96, [43] W.A. Rolling. A preliminary neering Notes, 19(3):82-84 [44] T. Schael. (1998). [45] SEI.

annotated (1994).

Workflow Management

The Capability Maturity

[46] D. Taylor.

Global

Software

-

of Systems.

In Petri Nets: Springer-Verlag

bibliography

Guidelines

Developing

analysis.

Applications (1987). engineering.

Organization,

for Improving

Applications

multilingual Prentice-Hall

on domain

Systems for Process

Model:

IEEE

In Information

[40] C. Paris (1996).

Peterson.

construction.

Tamperensis,

Software.

to intelligent

Connections,

of concepts.

An interactive

(1984).

Addison-Wesley

and information

Ser. A, Vol. 15.

Prentice-Hall

network

Software,

Software

(1994). Acta

Universitatis

Modelling and Knowledge manuals.

Computer,

Bases

29(7):45-56

(1981). and Relationships

to Other Models

ACM SIGSOFT

2nd edn.

for the International

(1996).

Journal

Prentice-Hall

(LNCS

the Software Process.

[47] R.N. Taylor, W. nacz, and L. Coglianese. Software development using ACM SIGSOFT Software Engineering Notes, 20(5):27-38 (1995).

IEEE

and R. Wohed. Systems Engi-

13(6):43-46

and Results.

(1994).

systems.

International

A Guide to Internationalization.

Theory. Discoveries, of Tampere (1994). relations

[41] J.L.

modelling

office automation

[39] J. Palom%ki. Three kinds of containment VIII, pp. 261-277, IOS Press (1997). and K. V. Linden.

30:928-951

A.H. Seltveit, U. Sundin, B. Theodoulidis, G. Tziallas, business policy specifications. In Advanced Information Springer-Verlag (1991).

for the World:

Programming

Science,

(1995). 8:231-274

(1992).

[36] J.M. Neighbors. The evolution from software components to domain Engineering and Knowtedge Engineering, 2:325-354 (1992).

[38] J. Palomgki. Tamperensis,

12(4):63-76

Programming,

Workflow Systems.

and Localizing International

From

(1979).

Software,

Acta Universitatis

McGraw-Hill

Developing

and J. Abegg.

[33] D.E. Mahling, N. Craven, and Expert, 10(3):41-47 (1995).

[35] K. Nakakoji.

der Begriffssysteme.

Object-Oriented

models and mechanisms in a Transactions on Knowledge and

IEEE

Building

for conceptual

on

Addison-Wesley

Prentice-Hall

Management

organizations.

ZYansactions

of Reusable

of Computer

Science

Process Implementation:

IEEE

language.

Elements

taxonomies.

systems.

Business

(1991).

Tools and Techniques.

application-domain

formalism

Harvard

(1978).

and reasoning.

Customizing transaction [24] D. Georgakopoulos, M.F. Hornick, and F. Manola. programmable environment supporting reliable workflow automation. IEEE Data Engineering, 8:630-649 (1996). [25] R.L.

Design,

(1997).

Technology.

Object Modeling

Design Patterns

Analysis:

of Program

(1997).

representation

Software.

Applying

[22] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Software. Addison- Wesley ( 1995). [23] C. Gane

Press

and System Specification.

to Developing

K. Scott.

Languages

Work through Information

[19] Y. Deng and S.-K. Chang. A g-net model for knowledge Knowledge and Data Engineering, 2:295-310 (1990).

[21] M. Fowler (1997).

567

In Pattern

IEEE

literature.

Reengineering

Systems

Market.

domain-specific

Software Engi-

No. 1096).

Springer

Addison-Wesley

(1995).

Springer-Verlag

(1992).

software

architectures.

A.T. BERZTISS

568 [48] P. Viljamaa. The patterns 20(1):74-78 (1995).

business:

impressions

from plop-94. ACM SIGSOFT Software Engineering

[49] Y. Wand and R. Weber. On the ontological expressiveness Journal of Information Systems, pp. 217-237 (1993).

of information

systems analysis and design grammars,

[50] S. Wartik and R. Prieto-Diaz. Criteria for comparing reuse-oriented domain analysis approaches. Journal Software Engineering and Knowledge Engineering, 2:403-431 (1992). [51] P. Wegner. Classification

in object-oriented

systems.

Notes,

ACM SIGPLAN Notices, 21(10):173-182

[52] B. Whitenack. Rappel: a requirements-analysis-process pattern language for object-oriented Pattern Languages of Program Design, pp. 259-291. Addison-Wesley (1995).

International (1986).

development.

In

[53] C. Zhang and R. F. Walters. An abstract, shared and persistent data structure for supporting database management and multilingual natural language processing. International Journal Software Engineering and Knowledge Engineering, 3:362-382 (1993).