Symbiosis of Human and Artifact Y. Anzai, K. Ogawa and H. Moil (Editors) © 1995 Elsevier Science B.V. All rights reserved.
387
A composite measure of usability for human-computer interface designs Kay Stanney and Mansooreh Mollaghasemi a aIndustrial Engineering and Management Systems Department, University of Central Florida, 4000 Central Florida Blvd., Orlando, FL 32816, USA A methodology for formulating a composite measure of interface usability is provided. The measure integrates multiple usability criteria into a single measure by which designs can be directly compared. The primary advantages of the proposed approach are the ability to consider multiple criteria and to weight the importance of these criteria according to a particular company's priorities and requirements. 1. INTRODUCTION Usability is a term often used to described the efficacy of an interface. It has been defined as '~he extent to which (an interface) affords an effective and satisfying interaction to the intended users, performing the intended tasks within the intended environment at an acceptable cost" (Sweeney, Maguire, and Shackel, 1993). There are 3 primary manners of assessing such usability: user-based, theory-based, and expert opinion-based assessment. This study focuses on user-based assessments, with the level of user effectiveness and satisfaction forming the basis of this assessment. User-based usability assessments of human-computer interface designs generally involve the evaluation of multiple system and user criteria. Multiple criteria are used because it is difficult to derive a single measure that effectively characterizes the overall usability of a complex system (Kantowitz, 1992; Karat, 1988). Thus, since no single indicator is pervasive, a multitude of measures are used in system assessments. These criteria generally include performance time, errors, use of help, learning time, mental workload, and satisfaction (Eberts, 1994; Shneiderman, 1992). These measures can be quantified through direct measurement (i.e., performance time, frequency of errors, use of help), analysis (i.e., learning time can be determined via an NGOMSL analysis (Kieras, 1988) or production rules (Kieras and Poison, 1985)), or through questionnaires (i.e., human mental workload (Hart and Staveland, 1988) and satisfaction (Slmeiderman, 1992)). When a combination of usability indicators are used, often times evaluators report the results of each indicator separately without providing a comprehensive assessment of the system This makes it difficult to compare different designs. A technique is needed that integrates usability testing results, thus providing a composite measure of usability to facilitate direct comparisons between interface design options. The present study proposes the use of
388 the analytic hierarchy process to rank-order computer interface designs based on multiple evaluative criteria. The importance of each evaluative criterion can be weighted according to a particular company's priorities and requirements. Using these weights and the objective or subjective value of each criterion, a rank-ordering of the different interface design options can be achieved. Alternatively, Kantowitz (1992) suggests that a statistical combination of multiple usability indicators be derived. This is the approach that Paas and Van Merrienboer (1993) used when they combined mental workload assessments with performance efficiency (i.e., accuracy) by transforming the measures into z-scores, which were then displayed and related in a cross of axes. This approach, however, is limited because it assumes a linear relationship between mental workload and performance scores, which may not always hold. MiRa (1993) has suggested using the analytic hierarchy process (AHP) to rank-order computer interfaces based on multiple evaluative criteria. Using the AHP, MiRa (1993) evaluated and compared different computer interfaces in terms of three criteria: usability, leamability, and ease of use once mastered. This approach is one of the few in which computer interfaces have been evaluated in terms of an integrated assessment of several interface characteristics. It has several advantages over traditional approaches such as predictive modeling, expert evaluations, observational studies, surveys and experimental studies. First, the approach allows simultaneous consideration of multiple criteria. Secondly, by using the AHP, one is able to provide a measure for the consistency of the decision maker's judgments. In fact, no other approach used in solving multiattribute decision problems provides the ability for such consistency checks. Mitta's use of the AHP (1993), however, has some drawbacks. While she used multiple criteria in her analysis, the evaluations of the criteria were based solely on subjective assessments. Multicriteria decision making inherently has subjectivity built into the process, therefore it is not advisable to needlessly add to this subjectivity. The major disadvantage of Mitta's (1993) approach is that the implementation involves interpretation of users' abilities in judging interface characteristics. With this approach experimenters make pain~se comparisons of users with respect to their abilities to make satisfactory judgments regarding usability and learnability. The repeatability of this assessment procedure is questionable, since any two evaluators assessing the same set of users using this technique could come up with quite different results. This paper presents an alternative to using the subjective opinions of experimenter's by evaluating some of the interface attributes in terms of objective measures. 2. THE PROPOSED APPROACH
The proposed approach uses the AHP to assess the relative importance of each software usability attribute with the objective of selecting the best interface. The AHP was originally introduced by Thomas Saaty in the mid 70's (Saaty, 1980, 1994). Since its development, AHP has been applied in a wide variety of practical applications including those related to economic planning, energy policy, health, conflict resolution, project selection, and budget allocation. In fact, the AHP is one of the most popular multicriteria decision making methodologies available
389 today. The popularity of this method is primarily due to its flexibility, ease of use, and the ability to provide a measure of the consistency of the decision maker's judgment. The use of the AHP in assessing interface usability is a natural extension. The AHP is appropriate for situations, like usability assessment, where both objective and subjective criteria must be considered. The AHP also imposes more structure into the usability assessment process by requiring each decision-making criteria to be organized into a hierarchy. Each level of the hierarchy consists of a few, critical criteria that influence the quality of the interface (e.g., user effectiveness, user satisfaction, design intuitiveness, cognitive workload). Each criterion is, in turn, decomposed into sub criteria. For example, user effectiveness could be decomposed into number of errors and performance time; whereas intuitiveness could be broken down into percent of task correctly completed and number of references to help. The process of decomposing the various criteria continues until the decision-maker reaches the most specific elements of the problem, the alternatives (e.g., interface design options) which are listed at the bottom of the hierarchy. Once the hierarchy has been constructed, the next step is to determine the relative importance or priority of each of the elements at each level of the hierarchy. While the relative importance of the criteria for any given system will depend on the objectives being met by that system, Raskin (1994) provided some insight into possible priorities. Based on the results of recent surveys of a number usability labs, three general criteria were assessed in terms of their relative importance. These factors included user satisfaction, productivity, and intuitiveness, with the assigned relative importance of 50%, 30%, and 20%, respectively (Raskin, 1994). In general, relative importance is determined through a pairwise comparison of each pair of elements with respect to the element directly above. In general, this comparison takes the form of "how i ~ o r t a n t is element 1 when compared to element 2, with respect to the dement above?" The decision maker then provides one of the following responses in either numeric or linguistic fashion: Importance Numerical Rating Equally Important (preferred, likely) 1 Weakly Important 3 Strongly Important 5 Very Strongly Important 7 Absolutely Important 9 2, 4, 6, 8, are intermediate values
The responses of the decision-maker are put in the form of a comparison matrix t~om which the local priorities are determined using the eigenvalue method. The local priorities at each level of the hierarchy are then synthesized to determine the cardinal ranking of the alternatives. The selection and structuring of attributes is one of the most difficult as well as critical steps in solving multiattribute problems. For interface design, designers must identify those attributes that are most important to usability. There is really no ideal number of attributes for consideration. One must, however, be aware that too few attributes may mean that some
390 important design attributes have not been included while too many attributes may mean that too much detail has been included in the design decision. Perhaps the most important restriction in the selection of the attributes is that they must be independent of one another. This condition is sometimes difficult to achieve. Therefore, one may assume that important attributes are independent originally, but be willing to reject the outcome if he or she feels that this assumption significantly affects the results. Once the attributes have been selected, it may be helpful to structure them into a hierarchy. This is particularly useful when the number of attributes becomes large. 3. METHOD In order to determine the validity of the proposed approach, a set of interfaces was assessed. The objective of this assessment was to determine the relative usability of these interfaces using a number of evaluative criteria. 3.1 Interface Designs The purpose of this analysis was to perform a user-based usability assessment of a realistic and an unrealistic desktop interface design. The realistic design looked much ~like an office would, with a picture of a desk, phone, file cabinet, Rolodex, printer, trash can, and calendar. The unrealistic design looked like a typical Windows design with windows, icons, and menus. 3.2 Procedure The two interface designs were assessed based on user productivity, design intuitiveness, and user satisfaction. This data was obtained from a study by Miller and Stanney (1995). The relative weights of these criteria were derived from Raskin (1994), with user satisfaction, user productivity, and intuitiveness being assigned the relative importance of 50%, 30%, and 20%, respectively. Productivity was quantified in terms of performance time. Intuitiveness has been defined as a function of the percentage of tasks correctly completed and the number of help references made (Raskin, 1994). This analysis used the percentage of tasks correctly completed as a measure of design intuitiveness. User satisfaction was obtained by the users' pairwise comparison of the interfaces. Each of the users were asked the question of '~n terms of satisfaction, how much better is interface 1 than interface 2?". The answer to this question led to the realization of how well each interface performs with respect to user satisfaction. 4. RESULTS The two interfaces were tested by 30 users. Each user performed a set of 4 tasks on both interface designs (see Miller et. al., 1995 for details on the tasks). A relative measure of user satisfaction was obtained for each interface by the subjects' painmse comparison of the two interfaces. The user's performance time was traced by the computer as a measure of user productivity. The number of tasks completed was determined from a computer trace from which an Intuitiveness score was obtained. The mean performance times for the 30 subjects were calculated to be 926 seconds and 1177 seconds for the realistic and the unrealistic
391 interfaces, respectively. The mean number of tasks correctly completed for the 30 subjects were determined to be 90% and 88.4% for the realistic and the unrealistic interfaces, respectively. The relative importance weights for the two objective measures (i.e., productivity, design intuitiveness) were then computed by (1) and (2), respectively. Note that equation (1) is used when the lower values of the measures are preferred (e.g., cost, performance time) while equation (2) is used when higher values are favored (e.g., profit, percent of tasks correctly completed). 1 OM~ = ~ Ck × S
(1)
where r 1 S= )-'--, k=l
Ck
and O M k = objective measure for alternative k, and
c k = the actual measure obtained from alternative k.
OMk
Pk
=
r
(2)
k=l
where P k = the actual measure obtained from alternative k. Using equation (1), the relative importance of the interfaces with respect to user productivity was determined to be 0.56 and 0.44 for the realistic and the unrealistic interfaces, respectively. The relative importance of the interfaces with respect to design intuitiveness was computed to be 0.51 and 0.49 for the realistic and the unrealistic interfaces, respectively. The users' painvise comparison of the two interfaces with respect to user satisfaction resulted in the relative weight of 0.55 for the realistic interface and 0.45 for the unrealistic interface. These relative weights along with Raskin's (1994) relative importance measures of the criteria were synthesized to achieve a composite usability measure for each interface. The overall usability measure for the realistic interface was computed to be 0.55 while that of the unrealistic interface was calculated to be 0.45. 5. DISCUSSION
The results from the AHP analysis indicated that the overall usability of the realistic interface was better than the unrealistic interface. This was based on a composite measure of user
392 productivity, user satisfaction, and design intuitiveness. This composite indicator provided a common basis of comparison by integrating information from a diverse set of usability criteria. This overall measure of usability allowed for a straightforward comparison of the two interface design options and clearly indicated that the realistic design was preferable in terms of usability.
REFERENCES Eberts, 1~ (1994). User interface design. Englewood Cliffs, NJ: Prentice Hall. Gibson, J. (1994). How to do systems analysis. Englewood Cliffs, NJ: Prentice Hall. Hart, S.G. and Staveland, L.E. (1988). Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In P.A. Hancock and N. Meshkati (Eds.), Human mental workload. Am~erdam, Netherlands: North-Holland. Kantowitz, B.H. (1992). Selecting measures for human factors research. Human Factors, 34(4), 387-398. Karat, J. (1988). Software evaluation methodologies. In M. Helander (Ed.), Handbook of human-computer interaction (pp. 891-903). Amsterdam, Netherlands: North-Holland. Kieras, D.E. (1988). Towards a practical GOMS model methodology for user interface design. In M. Helander (Ed.), Handbook of human-computer interaction (pp. 67-85). Amsterdam, Netherlands: North-Holland. Kieras, D.E. and Polson, P. (1985). An approach to formal analysis of user complexity. International Journal of Man Machine Studies, 22, 365-394. Miller, L. and Stanney, I~M. (1995). The effects of realistic versus unrealistic desktop interface designs on novice and expert users. Proceedings of the 6th International Conference on Human-Computer Interaction, Yokohama, Japan, July 9-14. MiRa, D.A. (1993). An application of the analytical hierarchy process: a rank-ordering of computer interfaces. Human Factors, 35(1), 141-157. Paas, F.G.W.C. and Van Merrienboer, J.J.G. (1993). The efficiency of instructional conditions: an approach to combine mental effort and performance measures. Human Factors, 35(4), 737-743. Raskin, J. (1994). Intuitive equals familiar. Communications of the ACM, 37(9), 17-18. Saaty, T.L. (1980). The analytic hierarchy process. New York: McGraw-Hill. Saaty, T.L. (1994). How to Make a Decision: The Analytic Hierarchy Process, Interfaces, 24(6), 19-43. Shueiderman, B. (1992). Designing the user interface (2nd ed.). Reading, MA: AddisonWesley. Sweeney, M. Maguire, M. and Shackel, B. (1993). Evaluating user-computer interaction: a framework. International Journal of Man-Maehine Studies, 38, 689-711.