Information
and Software
Technology
38 (1996) 145-154
Software products evaluation system: quality models, metrics and processesInternational Standards and Japanese Practice Motoei Azuma Department of Industrial and Management Systems Engineering,
Waseda University, 3-4-l Okubo, Shinjuku-ku, Tokyo 169, Japan
Abstract ISO/IEC JTCl/SC7/WG6 and JSA/INSTAC/STD/WGS activities are introduced, followed by an outline of product quality evaluation. Then a concept of an evaluation system is proposed. Software evaluation technologies mainly derived from SC7/WG6 and INSTAC activities and some other contributions are explained based on the system concept. ISO/IEC 9126 may be used not only by developers for evaluating their products, but also by a user for selecting a product from alternatives and for many other purposes by many other people. This paper focuses on the evaluation for developers. Keywords: Product
evaluation;
Quality
evaluation;
Quality
metrics;
Software
1. Introduction As the computer
application
area has expanded,
so has
The examples are human life critical systems, social life critical systems, and security critical systems. Probably because software in itself is harmless, the public in general seem to be indifferent to software quality. However, it is the software quality that has significant influence on the system quality. A wide variety of off-the-shelf software packages for personal computers such as wordprocessor, spreadsheet, drawing, and presentation software, have come into use, as the utilization of personal computers in business environment has spread. An especially rapid spread of downsizing driven by powerful personal computers, local area networks and the Internet resulted in the software quality problems being highlighted. In order to develop a high quality software, the following are considered to be important. the criticality
of computer
based systems.
(1) To make the input to the process better, that is to say to clarify quality requirement and development policy. (2) To utilize good resources, such as techniques, highly skilled people and better environments. (3) To design good development processes, measure the processes, control the processes and improve the processes. (4) To plan, implement and do software product evaluation properly for both intermediate and final products. A software product is evaluated by the degree faction to required quality. To develop software any goal and try to acquire quality by testing is a time and effort. In order to develop a high quality
0950-5849/96/$15.00 0 1996 Elsevier SSDI 0950.5849(95)01069-6
of satiswithout waste of software
Science B.V. All rights reserved
measurement;
Software
metrics;
Software
quality
product without redundancy, it is necessary to define quality requirement clearly, and to evaluate the product at the early stage of the life-cycle concretely and qualitatively. ISO/IEC-JTCl/SC7 (International Organization for Standardization/International Electrotechnical CommissionJoint Technical Committee l/Sub-Committee 7) is working on standardization in the area of software engineering. It consists of nine working groups. SC7/WG6 (Working Group 6: Evaluation and Metrics) is developing drafts for a series of international standards on software product ‘ISO/IEC 9126: Information technologyevaluation. Software product evaluation-Quality characteristics and guides for their use’ [l] is the initial output of the series, which is now widely used and regarded as an unexpectedly successful international standard. INSTAC/STD/WGS (Information Technologies Research and Standardization Center) was established by the sponsor of MIT1 (Ministry of International Trade and Industry) for the purpose of supporting SC7/WG6 in 1987. It has been working for making researches, and publishing the results as annual reports. The past work was compiled into a book and published by JSA (Japan Standard Association) in 1994 as Software quality evaluation guide book [2]. In this paper, SC7/WG6 and INSTAC/STD/WGS activities are introduced, followed by an outline of product quality evaluation. Then a concept of evaluation system is proposed. Software evaluation technologies mainly derived from SC71 WG6 and INSTAC activities and some other contributions are explained based on the system concept. ISO/IEC 9126 may be used not only by developers for evaluating their product, but also by a user for selecting a product from alternatives and for many other purposes by many other people. This paper focuses on the evaluation for developers.
M. Azuma I Information and Sojiware Technology 38 (1996) 145-154
146 2.
Background
2.1. JTCl/SC7/WG6
of performance of computer-based software systems’ was approved by JTCl ballot and assigned to WG6, which defines metrics and process for performance measurement.
activities
Working groups and their conveners are listed in Table 1. It also illustrates major projects. WG6 (Evaluation and Metrics) is responsible for developing 9126 series and 14598 series of international standard on software product evaluation as project 7-13. ‘ISO/IEC 9126: Information technologies-Quality characteristics and metrics’ series is a revision of current ISO/IEC 9126 and consists of the following three parts. Part 1 Quality characteristics Part 2 External metrics Part 3 Internal metrics The purpose of ‘ISO/IEC 14598: Information technologiesSoftware product evaluation series’ is to provide a set of requirements, recommendations and some guides for product evaluation process. It consists of the following six parts. Part Part Part Part Part Part
1 2 3 4 5 6
General overview Planning and management Process for developers Process for acquirers Process for evaluators Evaluation modules
In addition to these work items, WG6 has already finished the project on ‘ISO/IEC 12119: Quality requirement and testing’ [ 3 ] , which was published in 1994. It defines quality requirements and testing directives for software packages, i.e. off-the-shelf software for consumer, based on each of six quality characteristics defined by 9126. SC7/WG4 developed new standard ‘ISO/IEC 14102: Guidelines for evaluation and selection of CASE tools’ [4] . It also gives guides based on each of six quality characteristics defined by ISO/IEC 9126. As software product evaluation related projects, ISO/IEC 14143: Functional size measurement (Committee Draft, WG12) gives basic size measure which is useful not only for scale measure but also for a base of quality metrics. New work item ‘Measurement and rating Table 1 Working groups Working
in JTClISC7
Title
Convener
System & Software Documentation Tools and Environment Evaluation and Metrics Life Cycle Process Support of Life Cycle Process Software Integrity Process Assessment Software Engineering Data Definition and Representation Functional Size Measurement
K. T. M. R. M. D. A. P.
group WG2 WG4 WG6 WGI WG8 WG9 WGlO WGll WG12
Jonson (UK) Vallman (USA) Azuma (Japan) Singh (USA) Kaplan (USA) Kiang (USA) Dorling (UK) Eirich (USA)
H Rehesaar (Australia)
2.2. Corresponding
standards organizations
in Japan
MIT1 supports the secretariat of international and national standardization activities. Publication of standards is the responsibility of JSA which is a non-profit organization. Technical works, such as making or reviewing a draft, preparing comments, are entrusted to various professional societies and associations. Works in the area of JTCl are entrusted to IPSJ/ITSCJ (Information Processing Society of Japan/Information Technologies Standardization Commission of Japan). ITSCJ has many sub-committees each of which corresponds to each sub-committee of the JTC 1. Japanese SC7 is one of these sub-committees, and the author is the chairman of it. Its members come from universities, mainframe computer manufacturers, software industry and representatives of users. It has working groups which correspond to those of the JTCUSC7. Therefore, Japanese national SC7/WG6 (Convener: M Azuma) is responsible for the 9126 series and 14598 series. In order to support standardization in information technologies, JSA established INSTAC in 1988. WG5 is one of working groups in the INSTAC and has been working for software product evaluation technologies. Its results were published annually. Some parts of the result were translated into English and submitted to the SC7/WG6.
3. Concept of software evaluation 3.1. What is quality evaluation ? TQM and quality evaluation. TQM (Total Quality Management) has been applied successfully in various industries in Japan. The concept and technique of TQM were transferred to software management. SWQC (Software Quality Control) in which the author made initial contributions [5] is well known as a successful example. PDCA (Plan DO Check Action) Cycle (Fig. 1) and Quality model (Quality deployment), which are accepted as effective techniques for software quality management, are key techniques of TQM and popular in the TQM community. Quality system and quality evaluation. In this model, in order to make timely and correct actions, it is necessary to measure and assess both products and processes. If there is no information that shows which part of the product and which characteristic of the product needs to be improved, no action is possible for improvement. IS0 9001 Quality system states requirements for implementing a quality system. However, as it is the generic standard, i.e. not a software specific standard, there is no requirement nor recommendation concerning quality characteristics, metrics,
M. Azumallnformation and Sofiware Technology 38 (1996) 145-154
147
Supporting tool and techniques for evaluation. Evaluating a software tool is a kind of product evaluation, because it is also a software product, especially from the software tools developers’ viewpoint. Therefore technologies for product evaluation are useful for this purpose. On the other hand (a) selecting the right technology such as a design method and CASE tools; and (b) predicting and ensuring the product quality, are important for improving quality and productivity from the developer’s viewpoint. For this reason, the international standard ‘ISO/IEC 14102 Guidelines for evaluation and selection of CASE tools’ was developed by JTCl/SC7.
Fig. 1. TQM PDCA cycle model.
and evaluation process. In these respects, both ISO/IEC 9126 series and 14598 series and IS0 9000 series complement each other. Process assessment and quality evaluation. SE1 (Software Engineering Institute, Pittsburgh, USA) developed CMM (Capability Maturity Model), which is widely accepted [ 61. CMM specifies process maturity by five levels, namely from Level 1 to Level 5. Measurement is one of the important conditions to improve the organization’s maturity from Level 2 to Level 3. The top level, that is Level 5, requires an organization to make continuous efforts to achieve quality and productivity improvements. This continuous effort is exactly what Japanese excellent companies, such as Toyota, have made and are still making to achieve excellence. JTCl/SC7/WGlO (Process Assessment, Convener: A Dorling) is preparing working drafts for a series of international standards on process assessment with the reference to CMM concepts. If technologies are available for measuring software product quality precisely, measures are useful for process assessment too. Thus, a good cycle of process and product assessment can be generated. 3.2. Purposes
of quality evaluation
Evaluation is the key to success for developing or acquiring good software. Evaluation is necessary at various stages of the software life-cycle for various target entities depending on the specific purpose. In general, the purpose of evaluation is to judge how good the target entity is for the specific objectives, which include: (1) (2) (3) (4)
To To To To for
select something from two or more alternatives. estimate or predict values of a target entity. assess the effect of target entity when it is used. get information on intermediate products or process controlling and managing the process.
Evaluation purpose is clarified when the target entity is defined. The following are the examples of evaluation purposes.
Intermediate product evaluation. Product evaluation may be categorized into two categories, i.e. intermediate product evaluation and end product evaluation. Intermediate product evaluation is done at the end of a stage of life-cycle as a part of formal review. Usually it is done by a developer or a reviewer in the organization. The purposes of intermediate product evaluation may be:
(4 to make a decision whether the intermediate (b) (cl (4 (4
product developed by the subcontractor has sufficient quality to accept; to make a decision at the end of a process whether the product has sufficient quality to be forwarded to the next process; to clarify a specific part or attribute which does not meet the requirement or which proves to be the cause of discrepancy; predicting end product quality; or to give data for process improvement by analysing the cause of excellent parts or bad parts.
At this stage, usually, it is not possible to execute a program for the evaluation. Therefore, the targets to be measured and evaluated are internal documents, such as specification and source program codes. The most popular tool for intermediate product evaluation is a checklist. End product evaluation. The end product evaluation is further categorized into three sub-categories by a person who does the evaluation, i.e. developers, acquirers and independent evaluators. Purposes and methods are different depending on the sub-category. JTC l/SC7/WG6 is preparing requirement and guides for each.
(1) Product evaluation by developers
(2)
Usually this evaluation means testing stage evaluation. Evaluation is done by executing the target program as testing. Its purpose may be either deciding release of the product, or comparing the product with competitive products. Product evaluation by acquirers Acquirers may be either those who acquire software products from the contractor, or those who buy software packages. In the case of the former, the main purpose
148
M. Azumallnformation
and So&are
is to validate the product with quality requirement for deciding acceptance of the product. In the latter case, selecting a product from alternative products is the main purpose. Usually evaluation must be done by comparing the specifications. Even when testing is permitted, the nature of it is quite different from the former case, because only limited testing is possible. When the product is to be used as a part of a critical system, e.g. a safety critical system, an acquirer is responsible for assessing the product to see whether it has sufficient quality characteristics for the criticality or not. This must be carried for both stated and implied needs. Quality assurance of critical software is considered to be the responsibility of both developers and acquirers, even if some quality requirement is not stated. (3) Product evaluation by independent evaluators It means evaluation by evaluators in testing laboratories or other evaluators in an independent organization, either in the same corporation or outside of it. There are three types of purposes of evaluations in this category: (a) evaluation by a request from a developer; (b) by a request from an acquirer; and (c) others such as comparing products for a software magazine. Applicability of information, testing environment and other conditions are different for each case.
4. Quality
evaluation
QES-P = {QES-EP,
QES-SP}
QES-EP = {Evaluation requirement analysis, Planning and implementation, Measurement and rating, Total assessment, Evaluation management} QES-SP = {Technologies development, Technology transfer, Technology assessment, Technologies and data (experience) management, Standardization, Evaluation support management} These resources and processes should be integrated, defined formally, and standardized. The concept of the quality model and the metrics for the evaluation system are described in clause 5 and 6 respectively, and the process of the evaluation system is stated in clause 7. 4.2. QES requirements QES should requirements.
satisfy
but not be limited
to the following
Repeatability. The same results are expected evaluation on the same product.
by repeated
Objectivity. The same results are expected by a different evaluation done by a different person or by a different team.
system
In order to carry out quality evaluation successfully, not only is each individual technique or tool important, but also it must be integrated well with others. Applying a system concept as Quality Evaluation System (QES) is helpful for this integration. 4.1. Conceptual
Technology 38 (19%) 145-154
models of sojiware
QES
Quality Evaluation System is a human-computer system which receives evaluation request and quality requirement specification as inputs, does planning, designing, implementing, executing evaluation as processes, and transfers results as outputs. QES may be implemented as a part of a higher level system (larger scope system), such as software development system or software acquisition system, or it may be considered as an independent system. QES consists of such resources as follows. QES-R = {Quality model, tools, Evaluation techniques, Evaluators, Computers}
Metrics, Measurement Data management tools,
QES Process (QES-P) can be decomposed into Support Process (QES-SP) and Evaluation Process (QES-EP). Each process is decomposed into sub-processes as follows.
Quantitativeness. Measurement is quantitative presentation of the results is quantitative.
and
the
Zndicativeness. When some discrepancy or other problems are found by the evaluation their causes and required actions are indicated. Economical&y. The cost required for the evaluation is relatively small, that is, cost effective. Evaluation depending on priority should be possible. Inclusiveness. characteristics.
5. Quality
The
evaluation
should
cover
all quality
model and quality characteristics
5.1. Quality model A quality model is a model which deploys quality into a set of characteristics and shows the relationships between them. It provides the basis for specifying quality requirements and evaluating quality of a product. Quality can be represented by a set of characteristics at various level of abstraction. ISO/IEC 9126 states: ‘A software quality characteristic may be refined into multiple levels of sub-characteristics.’ Using higher level abstraction
M. Azumal Information
and Software
is convenient for quick understanding, especially for management. However, more detailed technical work must be done based on lower levels of characteristics. As initial input to the area, Boehm model [7] and McCall model [ 81 are popular. Quality deployment is a well known and widely used technique of TQM in Japan. M Azuma and T Sunazuka developed SQMAT (Software quality measurement and assessment technology) [9 I which is a total quality evaluation system that includes a quality model and metrics. Some other companies, such as NTT, Fujitsu, Hitachi and Toshiba, developed quality models and evaluation methods in Japan. A quality model is a reflection of quality from a specific view. Therefore, any specialist can propose a new model: and there is no single solution. Some models may have logical problems, and may be useless. But it is more likely that each model has a reason to exist. This situation causes a problem of communication among software related people. If two people use the same word in different meaning they cannot communicate with each other. For example, if one thinks in one way and assumes that the other understood as he/she does, and if the other understood in other way, a big problem may be caused by the misunderstanding. Therefore quality model and associated definition of characteristics should be standardized, so that people can use the same terms with the same meaning. ISO/IEC 9126 was developed to give a baseline of quality model (Fig. 2) with associated quality characteristic Oualitv
Char’tics
Functionality
149
Technology 38 (19%) 145-154
Qualitv Characteristics
Subcharacteristics
Fuctionality
Suitability, Accuracy, Interoperability, Compliance, Security
Reliability
Maturity, Fault tolerance, Recoverability
Usability
Undestandability, Operability
Efficiency
Time behaviour Resource behaviour
Maintainability
Analyzability, Changeability Stability, Testability
Portability
Adaptability, Installability, Conformance, Replaceability
Fig. 2. ISO/IEC 9126-quality
Learnability,
characteristics.
definitions. INSTAC developed further refinement based on 9126. Fig. 3(a), Fig. 307) and Fig. 3(c) show the INSTAC quality model. INSTAC Quality Model introduced a concept of internal characteristics. Relations between (external) quality characteristics/sub-characteristics and internal characteristics are shown in this model using three different symbols ranging from strong relation to weak relation. However, as the model is developed by a kind of Delphi method, which means expert opinion with feedback, it is not proved to be the best model. More efforts must be made to refine the model based on statistics and other scientific approaches. Therefore it is rather an initial input for further studies.
Subchar’tics
Internal
Char’tics
Suitability Accurateness Inter-operability Compliance ommunications-corn
ity
Security
\ Fig. 3(a). INSTAC
Access control ~Access audit Robustness
quality model-functionality Quality Char’ticss
Oualitv Char’ticss
Subchar’Gcs
Internal
Reliability
Maturity
Completeness Traceability Consistency Self-descriptiveness Access audit Robustness Integrity Modularity Simplicity Instrumentalability Self-containedness
-E
Fault Tolerance Recoverability
Fig. 3(b). INSTAC
quality model-reliability.
Char’tics
Usability
-
Subchar’tics
Internal
-
Understandability
-
Learnability
Uniformity Expressiveness Hierarchieness Informativeness Metaphorability Well-equipmentness Attractiveness Timeliness Memorability Conciseness Choosability Guideability safety Labor-saving Adjustability
Operability Installability Controlability Communicativena
Fig. 3(c). INSTAC quality model-usability.
1
Char’tics
150
M. Azumallnformation
and Software
5.2. Quality characteristics ‘Quality characteristic’ is defined as ‘A set of attributes of a software product by which its quality is described and evaluated’ in ISO/IEC 9126. A quality characteristic is a specific view of quality. An attribute may be considered as ‘a measurable physical or abstract property of an entity’. Sometimes a community defines and uses a term which has generic meaning in the specific meaning. ‘Set’ in the mathematics community and ‘relation’ in the database community are examples of this fact. When one term is used in a different meaning by two or more communities it may cause confusion, and especially when people in both communities have an opportunity to work jointly, it may cause a problem. There exists this problem in defining a quality characteristic. ‘Usability’ is an example. It is used by information technologies people and ergonomics people with somewhat different meanings. SC7/WG6 started to revise ISO/IEC 9126 (new 9126-1) as stated before. This new version will include subcharacteristics not as an informative annex but as the main body. But we must wait until more research on internal characteristics is done and contributed. WG6 is planning to recommend the new quality model of 9126-1 to use as default standard model, unless there is a specific model in a specific community such as safety critical systems or usability critical systems.
Technology 38 (19%) 145-154
the measurement of one or more other attributes.’ [ lo] INSTAC categorized quality characteristics of a software product as external and internal. When a software quality is evaluated, it may be done either by testing or using a software, or by analysing a software, i.e. documentation or source code. The former is the case to measure and evaluate external characteristics, and the latter is the case of internal characteristics. 4.2. External metrics External metrics are those which are used to measure external characteristics. MTBF (Mean Time Between Failures) and MTTR (Mean Time To Repair) are popular metrics for measuring reliability. INSTAC collected metrics in this category and classified based on the quality model. Some examples are shown in Fig. 4(a), Fig. 4(b) and Fig. 4(c).. 6.3. Internal metrics Internal metrics are those which are used to measure internal characteristics. Many companies have statistical reports or
Functional Specification Change Ratio
The Number of Functions Changed
=
The Number of Functions Implemented
6. Metrics The Number of Users’ Requests for Change
6.1. Concept of metrics and measurement ‘Measurement’ is a process to assign a value to an attribute of target entity. ‘Measure’ is an assigned value as a result of measurement. Metric is a scale and associated rule and method to be applied for a measurement process. For example, program size may be measured by such scales as LOC (Lines of Code), FP (Function Points), the memory size which the program requires, and specification pages. An obtained values such as LOC, FP by measurement is a measure of the target program size. But if there is no rule for counting LOC, the same program may have various values depending on how it is counted as widely known. The rule and method for counting LOC of a target program is a metric. Norman categorized scales into nominal, ordinal, interval and ratio. He further categorized targets of measurement by entities and attributes. He categorized entities into products, processes and resources, and attributes into internal and external. There are two kinds of measures, i.e. direct measures and indirect measures. Norman defined them as: ‘Direct measurement of an attribute is measurement which does not depend on the measurement of any other attribute. Indirect measurement of an attribute is measurement which involves
Change Request Ratio = Product Scale (KLOC or The Number of Functions) Fig. 4(a). Examples of INSTAC external metrics-functionality-suitability.
Total Operating Time Mean Time To Failure
= The Number of Observed Failure
Product Error Density
=
The Number of Errors in Product Volume of Product
Fig. 4(b). Examples
of INSTAC external
metrics-reliability-maturity.
The Number of Operation Commands with D V Default Value Availability Ratio
=
Message Term Consistency
=
Total Number of Operation Commands
The Number of Standardized Ten The Number of Terms in Message
Fig. 4(c). Examples of INSTAC external metrics-usability-operability(Controllability).
M. Azumal Information
and Sofrware Technology 38 (1996) 145-154
151
at least records of problems, such as bug reports. Complexity metrics, such as cyclomatic complexity [ 111, modularity metrics, such as cohesion and coupling [ 121 are other examples of metrics in this category. INSTAC collected metrics in this category and classified based on the quality model. Some examples are shown in Fig. 5. 6.4. Basic metrics In many cases, when a measure is used for evaluation or selection, a measure should be normalized. For example, the number of faults and the number of failures are important measures. But they are not useful for evaluation unless they are normalized as MTBF and the number of faults for every KLOC. As a quality characteristic is abstract property, most measures for it are indirect measure like a ratio, which is derived from measures of attributes. And there are some measures which are frequently used for composing various indirect measures. We call this class of measures basic measures, and corresponding metrics basic metrics. KLOC is an example of basic metrics.
7. Evaluation
process
7. I. Life-cycle
and software product
Function
The Number The Number
Ratio of Specifications of Functions of Functions
in Specification
The Number Verifiable
Item Ratio
at Current
of INSTAC
internal
Phase
at the Previous
of Verifiable
Items
= The Number
Fig. 5. Examples
=
in Specification
Fig. 6. U-shape
of Items checked
metrics-traceability.
Phase
Review & Verification
model
Metric) paradigm proposed by Basili [ 141, and OPM (Objectives Principles Metrics) Approach by Arthur and Nance [ 151 are two better defined MB0 specifics to software quality. The one feature of the U shape model approach is that there are two views of quality, i.e. external and internal. Major processes in this model are as follows.
(1) Recognizing
evaluation
There are several well-known life-cycle models for software development. Implementing software product evaluation process in a life-cycle may be different, depending on a lifecycle model. If the waterfall model is to be taken as an example, it is suggested to take the following procedure as shown by the U Shape Model in Fig. 6. This model is still under discussion in SC7/WG6 as a part of 14598-1 General Overview. This approach is considered as a kind of MB0 (Management By Objectives). MB0 is a widely accepted management approach, which means that a well-defined goal is a necessary condition for acquiring good results. In the field of software, NEC applied MB0 successfully in SWQC. Weinberg proved the importance of goal in program development by an experiment [ 131. The GQM (Goal Question
Traceable
Internal
(2)
needs for new product @-e-development stage) Developing a product is always triggered by an awareness of needs. Needs may be ambiguous or clear. A user may want to have a better product than what he/she has. Or needs may be just complaint or dissatisfaction with a product. In a case of a software package developer, needs may be a good idea for a new product, or may be a strong wish to develop better products than a competitor’s. Needs amy exist in the human mind without being expressed, may be stated verbally or may be described in a document. Anyway, they are, in most cases, informal and incomplete. Dejining quality requirement (requirement analysis stage) In order to develop a good software product, quality requirement must be clearly defined. Without quality requirement, product evaluation is difficult and meaningless, because quality is ‘totality of characteristics of an entity (product) that bear on its ability to satisfy stated and implied needs’ [ 161. Therefore, quality requirements should be stated for every quality characteristics based on a quality model such as ISO/IEC 9126, and evaluated for each of them. Relative importance of each characteristic is different by a mission of a system in which the software product is used. For example, time behaviour (performance) of efficiency and security of functionality are very important for a banking system which processes an enormous number of transactions within a limited time frame. Another example is a wordprocessor and a spreadsheet software in which usability, interoperability and portability are important. These software products are widely used by a large number of people, most of them
152
M. Azumallnformation
and Sofhoare Technology 38 (19%) 145-154
not professional computer engineers, for various purposes in various environments. Though quality requirements are to be stated for every quality characteristic, state-of-the-art requirement definition technique supports only functionality and does not support most of other characteristics. Quality characteristics and their sub-characteristics in ISO/IEC 9126 are useful as a checklist for a quality requirement statement. As a result of requirement analysis, requirement specification does not always state all requirements. If it does, it is rather a rare case. A product which incorporates all stated needs as requirement is not necessarily a good product. Sometimes it is necessary to cut some minor needs to make the product consistent. On the other hand, requirement analysts should try to take implied needs into requirement specification using logical reasoning, heuristics or other design strategy. Dejining quality design goals (development stage) (3) If a developer develops a software product without explicit goal, no quality product is available. It is impossible or inefficient to try to acquire good quality only by means of testing. Quality requirement should be stated from the users’ view. However, it is suggested here that design goals are stated from a developer’s view, because how to achieve required quality is the developer’s responsibility. A developer’s view focuses on internal characteristics or internal attributes of software, such as modularity and traceability. Design goal should be as quantitative as possible by using metrics. The selection and use of metrics depends on the relative importance of each internal characteristic. By setting design goals for each internal characteristic and attribute, not only developers can have a clear guide, but also guides to design review and code inspection become clear. metrics and measurement (evaluation (4) Preparing planning stage) Some attributes need to be measured in advance. If there is no record, some measurement is difficult. For example, there may be a number of discrepancies found during design review divided by a number of reviewed specification pages. In this case these data items should be counted and recorded in advance. Therefore, selecting metrics and associated measurement should be planned at this stage. It is especially important for basic data elements. (5) Evaluating internal quality (development stage) Formal design review is a popular method for verifying that designed specification is correct. Though the name may vary, there are two major design reviews, i.e. preliminary design review and critical design review. Guidelines, checklists, or any other tools, are vitally important for successful reviews. Design reviews without any of these tools cannot be expected
to have effective results. In order to improve total quality in specification, it is necessary to plan a review, to measure, and summarize the results for each internal characteristic based on a checklist or other tools. Thus, it should be verified by measuring the results that the design goals are properly fulfilled. (6) Evaluating external quality (testing stage) At this stage, a program product is ready for execution, especially in the system testing stage for evaluating it. Testing means to execute a program for finding defects. Defects may be faults or other unsatisfactory attributes in a product. However, in many cases, testing is planned and done just for finding faults, and does not cover all quality characteristics. In order to confirm that the product meets all quality requirements, testing should be carefully planned; test cases are designed to cover all quality requirements. This stage is extremely important for assuring quality. (7) Evaluating ‘in use quality’ (use stage) External quality characteristics are validated with quality requirements. However, as it is impossible to expect, from various reasons’ as stated before, that quality requirement is equal to the needs, a user may be aware of some discrepancies between the users’ needs and the characteristics or attributes of the product throughout its actual usage. This is inevitable. Therefore, following up these problems and finding a mechanism for supporting it and making feedback to the next generation development cycle is important. This U shape model should be implemented in a specific software life-cycle by customizing it properly. For example, if prototyping is used for user interface, usability evaluation of stage (6) should be implemented at the earlier stage of the life-cycle. 7.2. Quality evaluation process ISO/IEC 9126 contains evaluation process model as Fig. 7. This model will be modified and transferred to the new series ISO/IEC 14598-1 General Overview. Quality requirement definition
‘=pLq Fig. 7. ISO/IEC
9126-evaluation
process model.
M. Aumallnformtion
By taking these discussion into consideration, of each process is as follows.
and Sofrware Technology
38 (19%) 145-154
153
the outline
(1) Evaluation requirement analysis Quality is, when evaluated from external view, represented by a degree in which the product satisfies stated needs, so quality requirement should be stated clearly for every quality characteristic. Evaluation starts with studying quality requirements to see whether they are stated exhaustively and clearly or not. Otherwise, no clear goal of evaluation can be identified. (2) Planning and implementation Evaluation planning and preparation is done based on the evaluation goals which are identified at the previous process. Preparation includes identifying target configuration such as documentation and program and their attributes for measurement, planning, selecting metrics, making measurement and data handling procedure, defining rules for rating and assessment. (3) Measurement and rating A measurement process can be refined into the primary measurement process and the secondary measurement process. The primary measurement process is to get original data by applying metrics to the attribute of the target product. It may be random sampling process or exhaustive measurement process. It may be automated process or manual process using checklist. It may be objective measurement or subjective judgement. The secondary process is to generate required measure (value) from a set of original data based on a statistic method or other defined functions. It is suggested by ISO/IEC 9126 that the measurement process is followed by the rating process. (4) Total assessment In this process, obtained measures are analysed, summarized or transformed based on a predefined procedure. Some judgements and interpretation of measure are done based on assessment criteria. Then a final report is prepared. 7.3. Quality evaluation
support process
Support functions, both managerial and technical, are important for effective evaluation. The questionnaire survey on software management by the author and D Mole clarified that many European and Japanese managers suggested that supporting functions, especially software engineering support, quality assurance support and education support, are important for the successful software management [ 171. Support functions include such processes as: technology development process including metrics, methodologies and tools; technology transfer process which includes education and direct support; technology assessment process both before and after evaluation;
T
Fig. 8. Supporting
1
process
Evaluation Process
and evaluation
technology and data (experience) standardization process; and evaluation support management.
+
[
process.
management;
Outline of support process and the support process and evaluation Fig. 8.
relationship between process is shown in
8. Conclusion Recently, software process assessment and product evaluation drew the attention of a large population in the field of software quality. As there are so many opinions and contributions, it is difficult to overview all the technologies. In this paper, the author has tried to clarify various concepts and technologies for quality evaluation by the static system model, the U-shape model and the evahiation and support process models. These models, I believe, can be used as a framework and can define the position of most of current technologies. By using these models it is also possible to identify what is missing. The works in SC7/WG6 and INSTAC are still going on. The author hopes that this paper can contribute to understanding these activities and give an opportunity for the information exchange in the field.
Acknowledgements The author is grateful to Dr Nigel Bevan, Mr Andrew Chruscicki and Dr Shigeru Nishiyama who contributed as project editors of 14598-1 General overview, other editors of 9126 and 14598 series, and members of SC7/WG6 for their useful suggestions. Appreciations are also given to the members of INSTAC/STD/WGS and the national SC7/WG6 members.
154
M. AzumalInformation
and Sofware
References
Technology 38 (1996) 145-154 assessment
technology,
Proc. 8th ICSE (1985) pp 142-148. Rigorous Approach Chapman
[ lo] E F Norman, Software Metrics-a
[ 1] ISO/IEC 9126, Information technology-Software product evaluation-Quality characteristics and guides for their use (1991). [2] M Azuma (ed.), SoJiware Quality Evaluation Guide Book JSA (1994) (Japanese edition). [ 31 ISO/IEC 12119, Information technology-Software packagesQuality requirement and testing (1994). [ 41 ISO/IEC 14102, Information technology-Guidelines for evaluation and selection of CASE tools. [5] M A Cusmano, Japan’s Sojiware Facfories Oxford (1991). [6] W S Humphrey, Managing the Software Process Addison-Wesley (1989). [7] B W Boehm, J R Brown and M Lipow, Quantitative evaluation of software quality, Proc. 2nd Int. Conf on Soft. Eng. (1976) pp 592-605. [8] J A McCall, P K Richards and G F Walters, Factors in Software quality, Rome Air Develop. Center Report, TR-77-369 (1977). [9] T Sunazuka and M Azuma, Software quality measurement and
& Hall (1991). [ 111 T J McCabe, A software complexity measure, IEEE Trans. on So@ Eng. Vol 2 No 6 (1976). [ 121 G Myers, Composite Structured Design Van Nostrand (1978). [ 131 G M Weinberg and E L Schulman, Goals and performance in computer programming, Human Factors Vol 16 No 1 (1974). [ 141 V R Basili and H D Rombach, The TAME project: towards improvement-oriented software environments, IEEE Trans. on Soft. Eng. Vol 14 No 6 (1988) pp 758-773. [ 151 J D Arthur and R E Nance, Developing an automated procedure for evaluating software development methodologies and associated products, A final Report. Technical Report SRC-87-007 Systems Research Center and Virginia Tech (1987). [ 161 IS0 8402, Quality Vocabulary (1994). [ 171 M Azuma and D Mole, Software management practice and metrics: EC and Japan-some results of questionnaire survey, J. of Systems & Software Vol 26 No 5, Elsevier (June 1994).