The Journal of Systems and Software 86 (2013) 649–663
Contents lists available at SciVerse ScienceDirect
The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss
Sirius: A heuristic-based framework for measuring web usability adapted to the type of website M. Carmen Suárez Torrente ∗ , A. Belén Martínez Prieto, Darío Alvarez Gutiérrez, M. Elena Alva de Sagastegui Computer Science Departament, University of Oviedo, C/Calvo Sotelo s/n, 33007 Oviedo, Spain
a r t i c l e
i n f o
Article history: Received 9 January 2012 Received in revised form 23 October 2012 Accepted 23 October 2012 Available online 1 November 2012 Keywords: Usability measurement Heuristic evaluation Usability metric Website classification
a b s t r a c t The unquestionable relevance of the web in our society has led to an enormous growth of websites offering all kinds of services to users. In this context, while usability is crucial in the development of successful websites, many barely consider the recommendations of experts in order to build usable designs. Including the measurement of usability as part of the development process stands out among these recommendations. One of the most accepted methods for usability evaluation by experts is heuristic evaluation. There is abundant literature on this method. However, there is a lack of clear and specific guidelines to be used in the development and evaluation process. This is probably an important factor contributing to the aforementioned generalized deficiency in web usability. We miss an evaluation method based on heuristics whose measure is adapted to the type of evaluated website. In this paper we define Sirius, an evaluation framework based on heuristics to perform expert evaluations that takes into account different types of websites. We also provide a specific set of evaluation criteria, and a usability metric that quantifies the usability level achieved by a website depending on its type. © 2012 Elsevier Inc. All rights reserved.
1. Introduction The web and related services play an important role in our society. Almost unlimited access to information is provided, and modifying behaviour habits with respect to leisure, consumption, work, and others (Madden, 2006; Perallos, 2006). In the past years we have seen huge amounts of websites for these purposes. Most of them are, a priori, very useful for the user. However, the low level of usability of many of these sites makes them responsible for the loss of time, demotivation, and frustration of the user when surfing the web (Ceaparu et al., 2004; Lazar et al., 2003). This has stimulated the launching of significant research efforts focused on usability within the development of websites (Folmer and Bosch, 2004). 1.1. The discipline of usability The discipline of usability has evolved in the last three decades. Its origins are in the establishment of very high level principles stated as guidelines for the developers of user interfaces
(Damodaran et al., 1980; Preece, 1994; Shneiderman, 1992). Then standards endorsed by international committees were created (ISO/IEC 9126, 1991; ISO 9241-11, 1998; ISO 13407, 1999; ISO/TR 18529, 2000; ISO/TR 16982, 2002; ISO 15504, 2004; ISO/IEC 25010, 2011). Today we observe a great number of methods and tools specifically oriented towards the web that intend to be a guide and support for the evaluation of usability in that environment. The evaluation of usability, understood as a methodology for measuring usability aspects (learnability, efficiency, memorability, satisfaction and errors) of a user interface and for identifying specific problems (Nielsen, 1993a) is considered nowadays as one of the most important tasks to perform when developing a user interface (Woodward, 1998). Besides, as the usability of a website is decisive for its success or failure (Griffith, 2002; Nielsen and Norman, 2000), usability must be integrated in the software engineering process. Therefore this is known as “usability engineering” (Mayhew, 1999; Rosson and Carroll, 2001), understood as the set of theoretical and methodological foundations that assure the accomplishment of the required usability levels for an application. 1.2. Definitions for usability
∗ Corresponding author. Tel.: +34 985104337; fax: +34 985103382. E-mail addresses:
[email protected] (M.C.S. Torrente),
[email protected] (A.B.M. Prieto),
[email protected] (D.A. Gutiérrez),
[email protected] (M.E.A. de Sagastegui). 0164-1212/$ – see front matter © 2012 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jss.2012.10.049
There are many definitions and meanings for the term “usability”. It is a word extensively used by those who analyze the factors contributing to a website being (in its simplest meaning) easy to
650
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
use (Nielsen, 2003). Many authors have tried to provide a definition of this term, usually by enumerating the different attributes or dimensions than can be used to evaluate it. Each definition depends, in the end, on the approach employed to measure usability (Folmer and Bosch, 2004). Bevan et al. (1991) define usability as “the ease of use and acceptability of a product for a particular class of users carrying out specific tasks in a specific environment”. Nielsen (1993a), a pioneer in the popularization of usability, views it as a multifaceted term, stating that a usable system should have the attributes of learnability, efficiency, memorability, errors and satisfaction. In Nielsen’s model, usability is“a part of usefulness that is a part of practical acceptability and, finally, a part of system acceptability”. Preece (1994), author of several usability studies and renowned books on the subject, refers to usability as “a measure of the ease with which a system can be learned or used”. For Redish (1995), the term usability does not only refer to making the systems simple to use, but also encompasses the comprehension of user goals, the context of their work, and the extent of knowledge and experience they have. She then directs her definition towards the goal of people working on usability, which is no other than “Usability means that the people who use the product can do so quickly and easily to accomplish their own tasks”. Quesenbery (2001) extends the ISO 9241-11 (1998) definition of usability to make it more comprehensible under her criterion. She defines it with regard to the features the users should find in an interactive system: effective, efficient, engaging, error tolerant, and easy to learn. According to Brinck et al. (2002), usability is defined as “the degree to which people (users) can perform a set of required tasks”. Rosson and Carroll (2001) use “the quality of a system with respect to ease of learning, ease of use, and user satisfaction”. Krug (2006), usability consultant and author of the “Don’t Make Me Think” best-seller book, provides one of the most practical definitions, including a touch of his fine sense of humour: “Usability is making sure that something works well: that a person of average (or even below average) ability and experience can use the thing – whether it’s a website, a fighter jet, or a revolving door – for its intended purpose without getting hopelessly frustrated”. 1.3. Usability-related standards Lately we have seen the development of standards in the field of Human–Computer Interaction within the ISO committees dealing with ergonomics, user interfaces, and software engineering. The standards related to usability touch basically product use aspects, user interface and interaction, product development process, and capability of the organization to apply user centred design. Some of these standards include definitions for usability. According to the ISO/IEC 9126 (1991) standard usability refers to “the capability of the software product to be understood, learned, used and attractive to the user, when used under specified conditions”. ISO 9241-11 (1998) defines usability as “the extent to which a product can be used by specific users to achieve specified goals with effectiveness, efficiency, and satisfaction in specified context of use”. ISO 13407 (1999) describes user centred design as a multidisciplinary activity involving human factors and the knowledge of ergonomics and working techniques. The goal is to optimize the efficiency and the productivity, improving working conditions while neutralizing the possible adverse effects of the use of the interactive system on human health, safety and operation. The improved version of this standard, ISO/TR 18529 (2000), adds the “Usability Maturity Model”: seven groups of basic tasks describing a Human-Centred Design Processes and their activities and includes the system users in its lifecycle.
ISO/TR 16982 (2002) introduces evaluation methods such as observations, questionnaires, interviews, design techniques and participative evaluation, or formal methods involving final users directly. ISO 9241-151 (2008) is a web-specific standard that provides a guidance on the human-centred design of software Web user interfaces with the aim of increasing usability and distinguishing the three domains in web development (development, evaluation, design). The standard focuses on the design domain. As reflected in recent standards, usability experts are working with the ISO/IEC JTC1/SC7 Technical committee on Software and Systems Engineering to include usability into software quality and engineering standards. Lastly, ISO/IEC 25010 (2011), a quality model which replaces the previous standard ISO 9126-1, uses the same definition as ISO 924111: “The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use”. This quality model also includes a broader concept of quality in use: “The degree to which a product used by specific users meets their needs to achieve specific goals with effectiveness, efficiency, safety and satisfaction in specific contexts of use” (Bevan, 2010). 1.4. Challenges in the evaluation process Some aspects related to the usability concept (such as design and ease of use of web pages) have become decisive when providing successful services on the Internet (Flavián et al., 2006). Therefore it is necessary to guarantee the usability of websites. There are plenty of papers documenting usability-related methods and tools (Edmonds, 2003; Hartson et al., 2003; Koutsabasis et al., 2007; Matera et al., 2006; Paganelli and Paterno, 2003). However, this proliferation of methods does not seem to help developers to acquire a clear model to follow in order to achieve maximum usability in their sites. This adds to the fact that the reference organization for web development (W3C, the World Wide Web Consortium) has not yet issued recommendations on the matter. Consequently, each researcher designs their own evaluation mechanisms, according to their own criteria. So we can find diverse methods that, while even focused on the same domain, consider very different evaluation elements. Besides, most of the proposals specifically addressing usability are centred on evaluating just one kind of site such as e-commerce sites (Hasan et al., 2009; Singh and Kotzé, 2002); or education sites (Alva et al., 2010); or propose a generic evaluation model that does not take into account the type of site evaluated at all (Gónzalez et al., 2008). Be it for this absence of a consensus model for the assurance of usability, be it for the lack of awareness of many web developers with regard to the importance of usability as a success factor, or be it for a combination of both, the reality is that a number of studies prove a lack of usability in many production websites (Granollers, 2004; Nielsen et al., 2001; Vora, 1998). 1.5. Sirius: our evaluation proposal Sirius, a new usability evaluation framework based on heuristics, is described in this paper. It is a framework for usability evaluation covering the experts evaluation phase, taking into account only the aspects an expert is able to evaluate. There are many factors influencing the usability level of a website, such as skills, aims and equipment of target groups of users, context of use, different services and tasks, etc. These aspects, as shown in the global evaluation architecture in which Sirius is framed (Fig. 3), would be considered in a different user’s evaluation phase.
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
Fig. 1. Pillars of the Sirius framework for expert’s usability evaluation.
We propose to check a non-vague, detailed set of criteria that not only contributes to a clear and concrete evaluation framework, but provides a percentual measure of the usability of a website adapted to the particular type of website analyzed. In order to achieve this tuning of the measure to the type of website, a classification of websites with respect to functionality has been developed. Aspects and criteria to be considered when performing the evaluation are listed, and used as the basis for evaluation. Then the level of relevance of the non-compliance with the aspects and criteria is computed with weighting coefficients tailored to the type of website. Therefore, this level always depends on the type of website being evaluated, adapting the measure of the level of usability to the type of site. Detailing concrete evaluation criteria, taking into account the type of website, and including a usability metrica are pillars of the Sirius evaluation framework (Fig. 1). We also have developed a web tool supporting the evaluation framework (http://www.prometheus-usability.com) that facilitates the inclusion of the framework in the development lifecycle of websites. This paper is organized as follows: Section 2 presents previous work and contributions from other authors on the areas of interest related to evaluation systems. The specification of the Sirius evaluation framework, as well as the work method used for its development is described in Section 3. Section 4 details an empirical study carried out using the evaluation framework proposed. Finally, Section 5 includes the conclusions and future research lines.
651
2007). It defines an inspection process in which expert evaluators examine the interface to judge the level of compliance with widely known usability principles called “heuristics”. The goal of heuristic evaluation is to find usability problems in the design of the user interface in order to correct them in the iterative design process. It can be applied in different phases of the development cycle, detecting a good percentage of the usability problems (Nielsen and Molich, 1990). Several authors have proposed heuristic evaluation approaches. Sometimes the heuristics are specified as general usability principles (Constantine, 1994; Instone, 1997; Nielsen and Molich, 1990; Shneiderman, 1992; Tognazzini, 2003). It is difficult to map these general principles to specific evaluable items. In other cases, the heuristics are indeed specified as concrete items, and verifying the compliance of the interface with these items is possible by inspecting the interface. Therefore the design of the evaluation is simpler (González et al., 2006; Hassan and Martín, 2003; Pierotti, 2005). Some authors (Olsina et al., 2001; Perallos, 2006) view the heuristic evaluation process as part of a broader process that evaluates the quality of a website, where usability is the key factor contributing to quality assurance. Despite having some proposals for heuristic evaluation, from a practical point of view, there is no standard guide with reference guidelines or criteria to determine the level of usability of a website, as is the case with accessibility (Cadwell et al., 2008; Chisholm et al., 1999). With accessibility, checking a number of items gives the accessibility level of a site (Freire et al., 2008; Lopes and Carric¸o, 2010; Vigo and Brajnik, 2011). Besides, developers rely on it as a reference for correcting the errors detected. Therefore, we miss standardized, clear and specific guidelines with relation to usability, analogous to the ones existing for accessibility. These usability guidelines would be used in the development process and/or used to verify compliance after development in a subsequent evaluation process. In addition, the proposals for heuristic evaluation of usability do not take into account the type of site being evaluated when an uncompliance with a heuristic is found by the evaluator. Typically the impact of a usability fault on the global assessment of the site is accounted by disregarding the diversity of types of websites, and is not weighted depending on the type of site. In this way, sites with very different types receive the same level of usability when failing the same evaluation items. But it seems obvious that the same item “low quality of images” is, for example, more important for the usability of a e-commerce site where images are fundamental for the goals of the site than for an online banking site where images are not essential. Our opinion, and as such is included in our evaluation proposal, is that uncompliance should have a greater impact in the usability level in the former type of site (e-commerce) than in the latter (online banking).
2. Review of relevant literature 2.2. Evaluation metrics 2.1. Heuristic evaluation There are a number of classification criteria for usability evaluation methods (Bowman et al., 2002). One of them classifies methods depending on who performs the evaluation: experts or potential users of the website. Expert-centred evaluation is based on a critical inspection of the interface with reference to a set of design principles. These principles are specified as rules describing common properties of a usable interface, and help in detecting uncompliances of the interface with the principles. Heuristic evaluation is one of these methods using inspection (Nielsen and Molich, 1990). It is widely accepted for diagnosing potential usability problems (Petrie and Bevan, 2009; Silva and Dix,
Coupled to the process of usability evaluation, it is important to obtain a numerical value denoting the level of usability achieved by the evaluated site. It can be used, for example, to position the level of usability of a website into a specific predefined range cluster of values. This is very useful, as the developer can have a quick feedback with a clear indicator of the degree of usability achieved. It can also be used to monitor the level of usability of a website over time, or to establish comparisons or rankings between websites. As mentioned before, there are many methods supporting the process of usability evaluation. However, there are few providing a metric indicating the usability level accomplished. When provided, the process is oriented towards the measurement of just one kind of website, for example, education sites or
652
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
Classification of WebSites
Development complexity
Functionality
- Informational - Interactive - Transaction - Workflow - Collaborative work environments - Online communities marketplaces - Web portals - Web services
Complexity and evolution
User actions
Complexity and purpose
- Communities - News - Tourism - E-Commerce - E-Learning - Search engine - Banking - Weblog
- E-Commerce - E-Learning - Business - Government - Institutional - News - Personal - Contents and services provider - Social network - Services
(Perallos,2006) (Deshpande et al.,2002)
(Coutin, 2002) Fig. 2. Classification of websites.
e-commerce (Alva et al., 2010; Granollers, 2004), or it is an automated measurement process in which only machine-measurable elements are considered, such as the number of broken links or font and colour changes (Ivory et al., 2001). The metric proposed by Olsina et al. (2001) considers usabilityrelated attributes as important factors contributing to the quality of a web artefact. Quality is deemed as having usability, functionality, reliability, and efficiency facets. The proposal evaluates a set of subheuristics, which are different depending on the type of site, grouped into 4 heuristics: global comprehensibility of the site, help mechanisms and online feedback, interface and aesthetical aspects, and miscellaneous. The measurement is computed using simple and composed aggregation functions that give the quality requirements tree. Using these values, a global quality value is constructed. In this proposal the set of subheuristics used in the evaluation of a given website, and the measuring criteria have to be defined according to the specific goal, user profile, and domain. That is why this evaluation model has been developed only for academic sites (using 28 subheuristics), and museums (24 subheuristics), and it is not easy to customize it to new types of websites. Few proposals consider a quantitative analysis of the results of a usability evaluation performed using the heuristic evaluation technique. Agarwal and Venkatesh (2002) define a heuristic evaluation procedure with a usability metric for the examination of the usability of websites. They propose to review 5 categories (Content, Ease of Use, Promotion, made-for-the-medium and Emotion) with 14 subcategories. Each subcategory has a weight adapting the value assigned by the evaluator to the subcategory. However, the weightings were specifically assigned for four sectors of the industry (Airline, Bookstore, Auto Manufacturer and Car Rental). González et al. (2006) present a heuristic evaluation process with corresponding metric, using 25 subheuristics grouped into 4 heuristics (Design, Navigation, Content, Search). A subsequent version (Masip, 2007) has 68 subheuristics and 4 heuristics (Interface Design, Simple Navigation, Content Organization, Diverse Functionality). The usability value is computed taking into account the number of subheuristics within a heuristic to apportion the weight of each heuristic in the final formula. In this case the type of website is not considered, as there is only one weighting set and formula.
2.3. A classification of websites One of the goals of the Sirius evaluation framework is to adapt evaluation results to the type of website, weighting the different evaluation values depending on the category of the site. As explained before, we believe that the impact of the usability errors detected in the evaluation process should depend on the type of site. Therefore, we face the problem of having a classification in which the different types of current websites are categorized. There are alredy several classifications of websites using different criteria. Baxley (2004) classifies websites depending on the complexity of development of the web application. Ginige and Murugesan (2001), Deshpande et al. (2002) and Coutín (2002) use the provided functionality, while Pressman (2006) relies on the actions performed by the user. Powell et al. (1998) use the complexity level and purpose, and Kappel et al. (2006) the complexity level and evolution. Fig. 2 depicts these classification methods. While there is a proliferation of classifications, there is not one receiving broad consensus. Besides, none of the examined classifications encompasses the diversity of current websites, for example institutional sites, blogs, or interactive sites (such as map sites). Thus, establishing a new classification that reflects the current variety of websites is one of the tasks needed when defining the Sirius evaluation framework. 3. Sirius evaluation framework 3.1. Global evaluation framework Sirius provides a framework for heuristic evaluation but it can be framed within a multi-stage global evaluation process of a website. This process starts with (as shown in Fig. 3) an accesibility evaluation before the usability evaluation. This is a decision based on the results of different authors (Medina et al., 2010; Petrie and Kheir, 2007; Takayuki, 2007) who think that increasing the accessibility of a website also increases its usability. Therefore, first, a review of accessibility is performed both automatically and manually. This review is done according to the W3C’s Conformance evaluation method (http://www.w3.org/WAI/eval/conformance.html). Then the heuristic evaluation of usability proposed here (Sirius)
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
ACCESSIBILITY
USABILITY
Initial Web Site Automatic Evaluation
653
Final Web Site
SIRIUS Expert´s Evaluation
User´s Evaluation
Manual Evaluation
Fig. 3. Global framework proposed for website evaluation.
would be performed. Finally, this would be completed with an evaluation model with users, considering the critical or relevant tasks, and involving users from all the target audiences of the site. This is included as skills, aims and equipment of target groups of users, context of use, different services and tasks, etc. also affect the level of usability of a site. Should the site have to be redesigned because of the results of each evaluation, the reviewing process would have to be performed again. 3.2. Description of Sirius Sirius provides developers not only with a set of specific evaluation guidelines, but also with an indicative value of the usability level achieved by the site. This way, with base in these results of the evaluation, measures to improve the usability level of the site can be taken. As an evaluation framework, Sirius can be used in the development process. In the first stages the guidelines (criteria) proposed by Sirius can be adopted as a part of the requirements of the website in development. In the prototype or production site evaluation phase it would be possible to:
Table 1 Types of websites defined in Sirius. Public Administration/Institutional Online banking Blog E-Commerce Communication/News Corporate/Company Downloads Education/Training Collaborative Environments/Wikis Virtual Community/Internet forum Leisure/Entertainment Personal Service Portal Image-based Interactive Services Non Image-based Interactive Services Webmail/Mail Hybrid
• Develop a classification of websites depending on their functionality. • Specify the list of aspects and criteria to be considered for evaluation. • Decide the level of importante of uncompliance for each of the evaluation criteria, depending on the type of site. • Formulate the metric associated to the evaluation process.
websites. Analysing the results, we verified a numerous agreement on a set of types. In other cases, due to the many different resulting types of sites (detailed in Suárez, 2011) having the common functionality of providing an interactive service to the user (e.g.: translators, auctions, videos, docs), and with the goal of not excessively increasing the set of types of sites, they are grouped into two more general types of sites: Non Image-based Interactive Services and Image-based Interactive Services. The final result with the types of websites considered in the Sirius framework is shown in Table 1. We include as well a Hybrid type of site. This covers the need to adapt the evaluation process to any type of site that does not fit in the other categories. Hybrid sites are a combination of the other types. Assume we have a website in which different functionalities are combined. For example a Public Administration/institution website may cover downloads, e-services and services portals, forums and image-based interactive services. In this case, the developer will define the type of the website as “hybrid” and assign the percentage of each type in the final configuration of the site. This configuration will be used in the evaluation process as explained in Section 3.2.6. The classification obtained has coincidences with those of other authors but includes more types of websites: (Table 2)
3.2.1. Developing a classification of websites Having not found a “universal” classification of websites, and as none of the classifications reviewed considers the diversity of current websites, we propose a new classification using the general purpose of the site. We perfomed an iterative process to arrive at our classification, in which 118 students of Web Engineering participated in groups of 2. To begin with, each group defined a minimum of 10 types of
3.2.2. Defining the evaluation aspects and criteria One of the basis of Sirius is the definition of the items to use in the evaluation process. We have considered the experience and knowledge in the heuristic evaluation process from several experts in usability evaluation (Serrano et al., 2002). We analized different proposals from several authors (Hassan and Martín, 2003; Pierotti, 2005; Olsina et al., 2001; Perallos, 2006; Gónzalez et al., 2008). We found that there is a great amount of
• Verify the compliance with the criteria. This will provide the developer with an ordered, clear and concrete list of evaluation items. • Obtain a quantitative measure that will indicate the level of usability achieved by the evaluated website. • Know the list of usability errors detected on the site, ordered by priority (impact on global usability), helping the process of improvement of the site. Sirius evaluation framework emerged from the following tasks:
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663 Table 3 Different proposals for the same subheuristic.
X
X
Subheurístic
Hassan and Martín (2003)
Are there navigation elements to guide the user about where she is and how to undo her navigation? ¿Are there elements allowing the user to know exactly where she is in the site, and how to go back? The user has to know at any moment where he is
Gónzalez et al. (2008)
X
X X
X X
X X
X
X
Leisure/ Entertainment Virtual Community/ Internet forum
X
X
X
X X
X
X
Collaborative Environments/ Wikis Downloads Education/ Training Corporate/ Company Communication/ News E Commerce Online banking
Deshpande et al. (2002) Coutín (2002) Perallos (2006)
X
X
Blog Public Administration/ Institutional
Type of website (Sirius classification) Other classifications
Table 2 Comparison between website classifications.
Author
Perallos (2006)
Personal
Service Portal
Imagebased Interactive Services
Non Imagebased Interactive Services
Webmail/ Mail
654
documented subheuristics, but with poor uniformity (meaning that there are different enunciations for the same subheuristic), and lack of coherence (meaning that the heuristics used to measure a specific aspect of usability do not consider the same factors contributing to it, and therefore not covering the aspect completely), between the proposals. As an example of these issues, Table 3 shows the same subheuristic as defined by the different authors that include it in their evaluation proposals. There is poor uniformity in the definition of the subheuristics between the first two proposals. The last proposal is a subset of the former, illustrating the aforementioned lack of coherence. It was then necessary to rewrite and homogenize the evaluation items. We selected the most comprehensive subheuristics, avoiding overlapping and using a common vocabulary of terms. The result was a single list of items that unifies the criteria used to heuristically evaluate a website. The list would be used by all experts involved in the evaluation process. These items are called criteria and are grouped into aspects, which correspond to the subheuristics and heuristics of other authors (Gónzalez et al., 2008; Nielsen, 1994a; Perallos, 2006). We chose to base our final list of aspects and criteria on Hassan and Martín (2003) proposal as, in our opinion, it was the most exhaustive and complete. We deleted the evaluation elements related to accessibility because, as was already mentioned, we believe that accessibility validation should be an independent process, made before the heuristic evaluation proposed here. Then we completed the initial list with several items from other authors (Krug, 2006; Pierotti, 2005) to get a first version with 87 criteria. This list was iteratively polished after performing several preliminary evaluations using the first version. In the preliminary evaluations, we found some usability errors that had no criteria covering them. For example, we added criterion “Translation of the website is complete and correct”, and “Some added value is provided by using sound” as there was no previous criteria covering these features. We also treated unification of criteria, wording issues, and the ordering (from general to particular) in which to perform the evaluation. After this refining process, we arrived at the final list of 83 criteria grouped into 10 aspects. It was used to evaluate a grand total of 300 websites from the 16 categories of websites considered in Sirius (13% Public Administration/Institutional, 5% Banking, 4% Blog, 7% ECommerce, 8% News, 8% Corporate, 4% Download, 5% Educational, 5% Collaborative, 7% Virtual Community, 8% Entertainment, 15% Personal, 4% Web Portal, 3% Image-based Interactive Services, 2% Non Image-based Interactive Services, 1% Webmail, 1% Hybrid). The evaluators had a computing engineer profile, had done web developing tasks, were trained and experienced in HCI, and all are familiar with the theory and practice of usability testing. They were asked not only to verify compliance with the criteria, but also to indicate usability errors detected that were not covered by the list in order to include them in the final list. As none of the 300 evaluations performed found any usability error that could not be represented by Sirius, we considered the list used as final. Aspects and criteria
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
of the Sirius evaluation framework are detailed in the following sections. 3.2.2.1. Sirius aspects. The 10 aspects used in the Sirius evaluation framework are the following: • General Aspects (GA): Elements related to the site goals, look & feel, coherence, and degree of content updating. • Identity and Information (II): Elements related to the identity of the site and information about the provider and the owner of the contents. • Structure and Navigation (SN): Elements related to the adequacy of the information architecture and site navigation. • Labelling (LB): Elements related to the significance, correctness, and familiarity of content labelling. • Layout of the page (LY): Elements related to the position and look of navigation and information elements in the interface. • Comprehensibility and ease of Interaction (CI): Elements related to the adequacy and quality of text contents, icons, and controls of the interface. • Control and Feedback (CF): Elements related to the freedom of navigation, and information provided to the user in the interaction proccess with the site. • Multimedia Elements (ME): Elements related to the degree of adequacy of multimedia contents to the site. • Search (SE): Elements related to the search feature implemented in the site. • Help (HE): Elements related to the help provided to the user while browsing the site. 3.2.2.2. Sirius criteria. Each aspect is broken down into a number of evaluable criteria. The reviewer assigns a value to each criterion during the evaluation process. Sirius has 83 criteria. As an example, the criteria for aspects General Aspects and Structure and Navigation are shown below (a full relation of Sirius criteria is in Appendix A): • General Aspects (GA): ◦ GA1: Goals of the site are concrete and well defined. ◦ GA2: Contents and services are precise and complete. ◦ GA3: General structure of the site is user-oriented. ◦ GA4: General look & feel is aligned to the goals of the website. ◦ GA5: General design of the website is recognizable. ◦ GA6: General design of the website is coherent. ◦ GA7: User’s language is used. ◦ GA8: Other languages are supported. ◦ GA9: Translation of the page is complete and correct. ◦ GA10: Website is updated regularly. • Structure and Navigation (SN): ◦ SN1: Welcome screen is avoided. ◦ SN2: Structure and navigation are adequate. ◦ SN3: Element organization is consistent with conventions. ◦ SN4: Number of elements and terms per element is controlled in navigation menus. ◦ SN5: Depth and breadth are balanced in the case of hierarchical structrure. ◦ SN6: Links are easily recognized as such. ◦ SN7: Link depiction indicates its state (visited, active). ◦ SN8: Redundant links are avoided. ◦ SN9: Broken links are avoided. ◦ SN10: Self links to the current page are avoided. ◦ SN11: Image links indicate the content to be accessed. ◦ SN12: A link to the beginning of the page is always present. ◦ SN13: Elements hinting where the user is and how to undo the navigation (breadcrumbs, coloured tabs) exist.
655
Table 4 Relevance of non-compliance of criteria. Value
Relevance
Definition
4
Crítical (CR)
3
Major (MA)
2
Moderate (MO)
1
Minor (MI)
The detected problem is severe. The user cannot complete the task and may leave the site. The user can complete the task, but with great difficulty, frustration, even performing many unnecessary steps. User can only overcome problems after being instructed. The user can complete the task in most of the cases, using a moderate effort to overcome the problem. Some links may be needed in order to find the correct option to complete the task. Later, when returning to the site, the users will probably remember how to perform the task. The problem appears intermittently and can be easily overcome, though it irritates the user. This is typically due to aesthetic problems.
◦ SN14: A map of the site to directly access contents without navigation exists.
3.2.3. Relevance of non-compliance One of the goals of expert evaluation using Sirius is not only to obtain a quantitative measure of the level of usability of a site, but also to indicate a priority or urgency in improving the non-compliant criteria. We have established the severity values (Nielsen, 1994a) of the criteria depending on the types of sites defined for the Sirius evaluation framework. This severity level is an indicator of the level of importance that the non-compliance of a subheuristic (criteria) has in the global usability level of a website. In Nielsen’s proposal, and as developed in metrics proposed by Alva et al. (2010) and Gónzalez et al. (2008), the evaluator assigns this value for each reviewed subheuristic. In our proposal the evaluator quantifies the non-compliance of the subheuristic, but does not quantify the severity level, as it is predefined in Sirius depending on the type of site. We claim to achieve more objectivity in evaluations performed by different evaluators on the same site, to facilitate comparisons between these evaluations, and to increase the efficiency of the evaluation process. Sirius relevance values for non-compliance with criteria (severity in Nielsen’s terminology) for each type of website were determined by a study performed with 78 participants. The participants were computing engineers enrolled in the last year of a postgraduate MSc in Web Engineering, with substantial training and experience in methods and techniques for web usability evaluation. The participants, in groups of 2, had to value the significance of the non-compliance of each criterion included in Sirius with respect to the usability of the type of website indicated in the questionnaire. We used a discrete value scale between 1 (minimum) and 4 (maximum) corresponding to Nielsen’s proposal, as shown in Table 4. Each questionnaire had the criteria for one Sirius aspect. We collected a total of 160: 16 types of websites × 10 aspects. After the processing of the data, we obtained the 1328 relevance levels for the 83 evaluation criteria times 16 types of websites used in Sirius. These values were computed using the arithmetic mean of the values provided in the questionnaires (deleting the outliers previously). Table 5 shows a subset of these values (9 of the 83 criteria for just 3 of the 16 types of website). The full list is detailed in Suárez (2011).
656
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
Table 5 A sample subset of relevance values for 3 types of websites. Criteria
GA.6: General design of the website is coherent. GA.8: Other languages are supported. II.5: Contact mechanisms are provided. SN.1: Welcome screen is avoided. LB.1: Labels are significant. LY.2: Information overload is avoided CI.1: Concise and clear language is used. SE.5: Simple and clear search system. HE.3: Context help is offered for complex tasks.
Type of website Public Administration
E-Commerce
Leisure/Entertainment
Critical Major Major Minor Major Moderate Critical Critical Major
Major Major Critical Moderate Moderate Major Critical Critical Moderate
Moderate Minor Minor Minor Minor Minor Major Major Minor
3.2.4. Evaluation values of the criteria Once criteria to be evaluated were defined, we established the range of values for each criterion. The scope of criteria compliance can be one of the following: 1. Global: Must be globally compliant through the whole site. 2. Page: Must be compliant in each page of the site. The criteria for the first case are upheld or not in a global way in the site, and cannot be checked in a page-by-page basis. Examples of this type of criteria are “Goals of the website are concrete and well defined” or “Simple and clear search system”. Therefore, it is only necessary to register whether the criterion is upheld or not, and in which amount. Examples of criteria for the second case are “Page text can be read easily” or “Translation of the page is complete and correct”. In these page criteria, we should not only register whether the criteria is upheld, but also in which particular pages of the site the non-compliance is detected. This is the reason why there are two types of measuring values in Sirius. A numeric scale for global criteria, and a text scale for page criteria: • Global criteria. A 0–10 scale shows the level of compliance of the criteria according to the evaluator. The value given by the metric is between 0–100 (with 0 for no usability and 100 for full usability). The 0–10 scale is proportional to the 0–100 value, and is often used in many education systems such as the Spanish one (Albaina and Aranda, 2001). This scale could be easily adapted to different intervals, as proposed in Suárez (2011). Example of global criteria: GA.1 Goals of the website are concrete and well defined 0 1 2 3 4 5 6 7 8 9 10 NA • Page criteria. A text value that, in the case of non-compliance, also indicates to what extent the problem is detected. This would correspond to error persistence according to Nielsen’s heuristic evaluation (Nielsen, 1994a). This text value is assigned by the evaluator, but internally it is converted to a 0–10 numeric representation. This mapping is done dividing the 0–10 interval evenly between the textual evaluation values, and assigning each of the values in the interval to an evaluation element. The textual values are in ascending order of compliance degree, so for example NSP (not compliant in one or more subpages non-reachable from the home page) has a 7.5 value which is much less severe than the 2.5 of NHP (not upheld in the home page). This mapping between the textual values and their numerical weight is shown in Table 5. An example of this case is the following: LY.9 Page text can be read easily NWS NML NHP NPI YES NA Table 6 shows the values used for compliance/non-compliance of a page criterion:
Table 6 Sirius evaluation values for page criteria. Evaluation value Definition
Numerical value
0..10 NWS NML NHP NSP YES NA
0, 1, 2, . . ., 9, 10 0 2.5 5 7.5 10 –
0: Not compliant at all 10: Fully compliant Not compliant in the whole site Not compliant in the main links Not upheld in the home page Not compliant in one or more subpages Fully compliant Criterion not applicable in the site
Table 7 Weighing of criteria relevance. Criteria relevance
Relevance value
Critical Major Moderate Minor
8 4 2 1
3.2.5. Weighing the non-compliance of criteria None of the heuristic evaluation processes we examined has an accompanying metric yielding a global usability value that also takes into account the diversity of types of websites. The metric we propose does take into account the type of website to calculate the global usability value of a site. This gives different values for different types of sites, even though they may have the same evaluation values for the criteria, as the effect on global usability of a criterion is not independent, and depends on the type of site. To numerically weigh the effect of non-compliance of a criterion in the global usability value, we map the relevance of a criterion (critical, major, etc.), to a numerical relevance value in order to compute the global usability value as shown in Table 7. The evaluation metric uses this value. There is a table with criteria relevance assigned according to each type of website, as shown in Table 5. The relevance of non-compliance is, in our proposal, the main factor deciding the level of usability of a site. We chose these values in order to obtain significative differences in usability level depending on the importance of the non-compliance of each criterion. We performed different tests and concluded that with a scale starting at 1, and doubling the value each time this effect was achieved. 3.2.6. Evaluation metric The last step in the Sirius evaluation framework is a quantitative value that reflects the usability level of a site (in percentage). It is computed after the expert assesses all criteria defined to evaluate a site (see Appendix A). The experts assign a 0–10 value for global criteria, and a text value (NWS, MNL, NHP, NSP, YES, NA, as shown in Table 6) for page criteria that is converted to a 0–10 scale. A weighting is applied to the value for each criterion that depends on the type of website (Table 7), in order to reflect that the impact of criteria on usability is not the same regardless of the type of website
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
657
Fig. 4. Components and process of the Sirius evaluation framework.
(Table 5). The formula used to compute the usability percentage of a website is the following: PU =
˙i=1,nec (wci∗ svi )
˙i=1,nec (wci × 10)
× 100
(1)
• sv: Sirius value. Evaluation value of a criterion (between 0 and 10). • wc: weighting coefficient. Weighting factor applied to the evaluated criterion. This is computed as follows: r vi ˙j=1,nec (r vj )
where:
wci =
• nec: number of evaluated criteria. It is 83 at most (the 83 criteria considered in Sirius). Some criterions cannot be applied to a given site, and therefore cannot be evaluated (NA value of Table 6).
In Eq. (2) rv is the relevance value for a given criterion. For Hybrid type sites made of the combination of simple types we apply the metric for each type of site in the hybrid.
Fig. 5. The Snap-On site in 2006.
(2)
0.11627907 0.23255814 0 ... 2 4 2 ... 10 10 0 ...
0.011627907 0.023255814 0.011627907 ...
3 3 3 4 2 10 0 10 10 0
0.01744186 0.01744186 0.01744186 0.023255814 0.011627907
0.174418605 0 0.174418605 0.23255814 0
1.372093023 0.174418605 0.058139535 0.087209302 0.122093023 0.174418605 0.01744186 0.005813953 0.01744186 0.01744186 0.01744186 3 1 3 3 3 10 10 5 7 10
% Usability ( fav × 10) Final aspect value (fav)
1.325581395 0.174418605 0.162790698 0.174418605 0.11627907 0.174418605 0.174418605 0.174418605 0
Final criterion value (sv × wc)
...
Structure and Navigation
GA1 GA2 GA3 GA4 GA5 GA6 GA7 GA8 GA9 GA10 SN1 SN2 SN3 SN4 SN5 SN6 SN7 SN8 SN9 SN10 SN11 SN12 SN13 SN14 ...
Evaluation value (ev) Criterion
General aspects
Table 8 A practical example of Sirius usability evaluation.
Some studies point to the fact that, especially for E-Commerce applications, website usability directly influences sales (Bias and Mayhew, 2005; Black, 2002; Nielsen et al., 2001). To empirically validate the Sirius usability metric, we tried to confirm these results measuring usability using Sirius. We performed Sirius usability measures in a 79 significative sample of business in two consecutive years (tests performed in March 2010). Our goal was to determine the level of usability achieved by each business (as computed by Sirius), and to quantify the variation in usability in the two years under study. If there is a statistical correlation between the two variables (usability and sales) that would confirm the studies, and would validate the Sirius metric, as it would correctly detect the variations in the usability level of the sites. We selected the 79 businesses in the Consumer Discretionary sector that appear in the USA Nasdaq index (http://www.nasdaq.com/). The goal of these businesses is to sell products that are not necessities to consumers. Therefore, to have a good website is important for their purposes (Lorca et al., 2012). We took years 2005 and 2006 to perform the usability evaluations, using the Internet Archive (http://www.archive.org) to access the websites of the time. 33 of the businesses were discarded as we found that they did not have a website online in some of the years. We finally analyzed the 43 businesses included in Table 9.
Sirius value (sv)
4.1. Empirical study
0.01744186 0.023255814 0.01744186 0.011627907 0.01744186 0.01744186 0.01744186 0.01744186
Weighting coefficient (wc) Relevance value (rv)
4. Validation of Sirius
3 4 3 2 3 3 3 3
3.2.8. Evaluating usability using Sirius To illustrate the practical use of the Sirius evaluation framework, we describe as an example the process used to obtain the level of usability of the Snap-On tool maker site (in 2006). This is one of the 43 sites included in the empirical study to validate Sirius described in Section 4 (Fig. 5). The expert evaluator examines the website and assigns values to each of the 83 Sirius criteria. In this example, criteria from just 2 of the 10 evaluated aspects are shown: General Aspects (GA), and Structure and Navigation (SN). Table 8 gathers the evaluator values, the mapping between these values and the internal Sirius values for computing usability values, the relevance values according to the E-Commerce type of the site, the weighting coefficients for each criterion, and the final values obtained for each criterion. The aggregated value for each aspect appears. The global usability level of the site is finally computed aggregating the aspect values.
10 7 10 10 10 10 10 0
3.2.7. Global Sirius components and process The structure of the Sirius evaluation framework, described in the former sections, can be summarized as shown in Fig. 4:
10 7 10 10 10 10 Y NWS NA Y Y 5 7 Y NA Y NWS Y Y NWS NA Y Y NWS ...
Then we weigh the usability level of each facet as specified in the definition of the hybrid (see Section 3.1, for example 70% leisure/entertainment, 30% downloads) to obtain the final usability level. If more than one evaluation has been performed for the same site, the level of usability is computed as the arithmetic mean of the values given by all the evaluators. In principle, individual evaluators can perform a heuristic evaluation of a user interface on their own, but the experience from several projects indicates that fairly poor results are achieved when relying on single evaluators (Nielsen, 1994b). Five users are typically cited as enough to find 80% of issues when conducting a heuristic evaluation (Hideki et al., 2005; Hvannberg et al., 2007; Nielsen and Landauer, 1993).
82.63
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
Aspect
658
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663 Table 9 Nasdaq index-business evaluated with Sirius.
659
After reviewing the results of the Sirius usability evaluations of the websites of the businesses analyzed we noted the following with respect to the 2005 and 2006 years:
Symbol
Company
AMZN APOL AN BBBY BIG BDK CCL CTX COH CMCSA DRI EK FDO GCI GM HAR DHI JCI LEG LEN LOW MAT MCD MHP MDP NYT OMC JCP RSH SIN SHLD SHW SNA SWK SBUX HOT TGT TWC TWX TJX DIS WHR YUM
Amazon.com Inc Apollo Group Inc AutoNation Inc Bed Bath & Beyond Inc Big Lots Inc Black & Decker Corp Carnival Corp Centex Corp Coach Inc Comcast Corp A Darden Restaurants Inc Eastman Kodak Co Family Dollar Stores Inc Gannett Co Inc General Motors Corp Harman Intl Industries Inc Horton D.R. Inc Johnson Controls Inc Leggett & Platt Lennar Corp A Lowe’s Cos Inc Mattel Inc McDonald’s Corp McGraw-Hill Cos Inc Meredith Corp New York Times Co A Omnicom Group Penney J.C. Inc RadioShack Corp Scripps Networks Interactive Sears Holdings Corp Sherwin-Williams Co Snap On Inc Stanley Works Starbucks Corp Starwood Hotel & Resort World Target Corp Time Warner Cable Inc Time Warner Inc TJX Cos Inc Walt Disney Co Whirlpool Corp Yum! Brands Inc
• 63% of the businesses increased the level of usability, because of improvements introduced in the 2006 site, or because of a complete redesign of the site. • 30% maintained the same level of usability, as there were no changes in the site between 2005 and 2006. • 7% reduced the level of usability, mostly due to a (bad) redesign of the site. As shown in the descriptive statistics corresponding to the Sirius aspects analyzed (detailed in the following Tables 10 and 11), the websites have acceptable scores in the majority of the aspects. Structure and Navigation is the aspect having better scores in both years. Search related criteria (present in 38 of the sites), are the ones having notable improvements between 2005 and 2006. Table 12 shows descriptive statistics for revenue and Sirius usability level for both years. The usability level is acceptable for most of the business, with a slight increment in the mean usability level in 2006. To find out if there is a significant correlation between the usability level of a business website and the following year’s revenue, we performed an inferential analysis computing the Spearman’s correlation coefficient. Table 13 shows the result. We then concluded that the correlation between revenue and the usability level is significative at the usual confidence levels. Therefore, with usability measurements done using Sirius, it is confirmed (as pointed out in the aforementioned studies) that usability indeed has a positive impact in future sales in the following year. This supports the conclusion that Sirius measurements have faithfully estimated usability levels, as they corroborate the relation between usability and sales. 4.2. Additional studies performed Besides the study described in the previous section, we have performed many usability evaluations with each type of site considered in our proposal, although with less populated samples than in the case of those used for the empirical validation. These studies also had satisfactory results. Sirius validity was also tested by applying the evaluation system to websites nominated for web awards, and to websites classified
Table 10 Descriptive statistics for Sirius aspects (year 2005).
Mean Std. Dev Mínimum Máximum Q 25 Q 50 Q 75
GA
II
SN
LB
LY
CI
CF
ME
SE
HE
8.416796 .6593531 6.6667 9.8000 8.222222 8.666667 8.888889
8.392442 1.2565132 5.9000 10.0000 7.500000 8.000000 10.000000
6.502513 .9961787 4.2692 8.3077 5.750000 6.666667 7.307692
8.299612 1.3453353 4.0000 10.0000 8.000000 8.500000 9.000000
7.707442 .9929756 5.5500 9.8000 7.000000 7.750000 8.500000
9.881783 .3802466 8.3333 10.0000 10.000000 10.000000 10.000000
9.436822 .7331489 7.0000 10.0000 9.000000 9.500000 10.000000
9.663 .6145 7.5 10.0 9.000 10.000 10.000
7.734286 2.7857772 .0000 10.0000 7.200000 8.800000 9.400000
10.00 .000 10 10 10.00 10.00 10.00
Table 11 Descriptive statistics for Sirius aspects (year 2006). GA Mean Std. Dev Mínimum Máximum Q 25 Q 50 Q 75
8.567313 .6506477 6.6667 10.0000 8.333333 8.666667 8.888889
II
SN
LB
LY
CI
CF
ME
SE
HE
8.321512 1.3182847 5.9000 10.0000 7.400000 8.0000 9.800000
6.681223 1.0150929 4.2692 8.3077 6.115385 6.78571 7.34615
8.369380 1.2101747 4.0000 10.0000 8.000000 8.50000 9.50000
7.758992 .9272767 5.7500 9.8000 7.100000 7.8000 8.4500
9.872093 .3824649 8.3333 10.0000 10.000000 10.0000 10.0000
9.483333 .6120808 7.7000 10.0000 9.000000 9.50000 10.0000
9.647287 .6011555 8.0000 10.0000 9.000000 10.0000 10.0000
8.133333 2.4192088 .0000 10.0000 7.725000 9.00000 9.40000
10.00 .000 10 10 10.00 10.00 10.00
660
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
Table 12 Descriptive statistics for revenue an usability level.
Mean Std. Dev Mínimum Máximum Q 25 Q 50 Q 75
Revenue 2006
Revenue 2007
% Usability 2005
% Usability 2006
18679.81 32454.651 1323 205601 4777.00 9561.00 18218.00
18162.84 29073.731 1404 181122 4645.00 9411.00 19408.00
79.429242639755 5.5207778112613 61.4088397790 90.4473684211 75.918918918919 81.000000000000 83.515151515151
80.99424082354890 5.668585632797828 61.408839779006 90.447368421053 77.96296296296300 81.95767195767202 85.58823529411770
Table 13 Correlation between revenue and the usability level. Variables Revenue 2006/Usability 2005 Revenue 2007/Usability 2006
Correlation coefficient
Significance level
0.363 0.325
0.017 0.033
as “Worst websites of 2010” as well. The result was a concordance between the panel of judges for the awards, and the usability value provided by Sirius. Therefore, Sirius usability value can be considered a good indicator for usability. These case studies are described in Suárez (2011). Currently, Sirius is being used in the Project “Coaching postgraduates for the implementation of ICT in international entrepreneurship processes of micro, small and medium businesses (MSMB) in Asturias”, financed by the Spanish Ministry of Industry and Tourism. The project is targeted to the redesign of MSMB websites. 5. Conclusions and future work In this paper we presented Sirius, a heuristic-based usability evaluation framework for expert evaluation. Sirius, using a welldefined set of criteria, detects usability errors by means of expert reviews, and computes a quantitative usability measurement of the level of usability of a site that depends on the type of the website. Sirius offers a comprehensive list of aspects and criteria to be considered in the evaluation process. Sirius integrates a new classification of websites, including a hybrid type that is a combination of other types, and defines a new metric that quantifies the global usability level obtained after reviewing all criteria that takes into account the type of website. This also gives the list of criteria to correct in the website, ordered by importance. It is extensible, as it is designed to admit new criteria and website types. To validate Sirius we evaluated the usability of 43 Consumer Discretionary business that are listed in the Nasdaq index. The goal of these evaluations was to experimentally confirm the correlation between the usability level of a business website and the revenue of the business in the next year, which was indeed the case. Our work is focused in automating the Sirius usability measurements. We are analyzing which criteria can be automatically measured, and to which extent those criteria explain the usability level of a site. We already have a first version of a tool supporting this process (the Prometheus tool). Besides, we are also investigating usability analysis for specific application domains, in order to dynamically adapt the list of criteria used to evaluate, as well as the extension of the Sirius framework with a model of user evaluation and the utilization of collaborative tools in the evaluation process. Appendix A. Complete list of Sirius aspects, criteria, and evaluation values
• General Aspects: GA.1 Goals of the website are concrete and well defined. GA.2 Contents and services are precise and complete. GA.3 General structure of the website is user-oriented. GA.4 General look & feel is aligned to the goals, features, contents and services of the website. GA.5 General design of the website is recognizable. GA.6 General design of the website is coherent. GA.7 User’s language is used. GA.8 Other languages are supported. GA.9 Translation of the website is complete and correct. GA.10 Website is updated regularly. • Identity and Information: II.1 Identity or logo is significant, identifiable and visible. II.2 Identity of the website is present in every page. II.3 Slogan or tagline is suited to the goal of the site. II.4 Information about the website or company is provided II.5 Contact mechanisms are provided. II.6 Information about privacy of personal data and copyright of web contents is provided. II.7 Information about authorship, sources, creation and revision dates of articles, news and reports is provided. • Structure and Navigation: SN.1 Welcome screen is avoided. SN.2 Structure and navigation are adequate. SN.3 Element organization is consistent with conventions. SN.4 Number of elements and terms per element is controlled in navigation menus. SN.5 Depth and breadth are balanced in the case of hierarchical structure. SN.6 Links are easily recognized as such. SN.7 Link depiction indicates its state (visited, active). SN.8 Redundant links are avoided. SN.9 Broken links are avoided. SN.10 Self links to the current page are avoided. SN.11 Image links indicate the content to be accessed. SN.12 A link to the home page is always present. SN.13 Elements hinting where the user is and how to undo the navigation (breadcrumbs, coloured tabs) exist. SN.14 A map of the site to directly access contents without navigation exists.
0 1 2 3 4 5 6 789 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA
0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA
NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663 • Labelling: LB.1 Labels are significant. LB.2 Labelling system is precise and consistent. LB.3 Page titles are planned and correct. LB.4 Home page URL is correct, clear, and easy to remember. LB.5 Inner page URLs are clear. LB.6 Inner page URLs are permanent. • Layout of the page: LA.1 Higher visual hierarchy areas of the page are used for relevant content. LA.2 Information overload is avoided. LA.3 Clean interface with no visual noise. LA.4 White areas between information objects are provided for visual rest. LA.5 Visual space on the page is used correctly. LA.6 Visual hierarchy is correctly used to express “part of” relationships between page elements. LA.7 Page length is under control. LA.8 Print version of the page is correct. LA.9 Page text can be read easily. LA.10 Blinking/moving text is avoided. • Comprehensibility and ease of Interaction: CI.1 Concise and clear language is used. CI.2 Language is user friendly. CI.3 Each paragraph expresses an idea. CI.4 Interface controls are used consistently. CI.5 Visible metaphors are recognizable and comprehensible by any user (e.g.: icons). CI.6 Coherent or alphabetic order in drop-down menus. CI.7 Available options in a user-input field can be selected instead of written. • Control and Feedback: CF.1 User controls the whole interface. CF.2 User is informed about what is happening. CF.3 User is informed about what has happened. CF.4 Validation systems are in place to avoid errors before the user sends information. CF.5 Clear and non-alarmist information, and recovery actions are provided to the user when an error has occurred. CF.6 Response time is under control. CF.7 Website windows cancelling or superimposing over browser windows are avoided. CF.8 Proliferation of windows is avoided. CF.9 User downloading of additional plugins is avoided. CF.10 In task with several steps, user is informed of the current step and the number of steps remaining to complete the task. • Multimedia Elements: ME.1 Images are well-cropped. ME.2 Images are comprehensible. ME.3 Images have the correct resolution.
NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA
NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA
NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA
ME.4 Some added value is provided by using images or animations. ME.5 Cyclical animations are avoided. ME.6 Some added value is provided by using sound. • Search: SE.1 If necessary, is accessible in every page. SE.2 Easily recognizable. SE.3 Easily accessible. SE.4 Text box width is enough. SE.5 Simple and clear search system. SE.6 Advanced search is provided. SE.7 Search results are comprehensible for the user. SE.8 User is assisted in case of empty results for a given query. • Help: HE.1 Help link is located in a visible and standard place. HE.2 Easy access to and return from the help system. HE.3 Context help is offered for complex tasks. HE.4 FAQ query selection and redaction is correct. HE.5 FAQ answers are correct.
661
NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA
NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA NWS NML NHP NPI YES NA 0 1 2 3 4 5 6 7 8 9 10 NA 0 1 2 3 4 5 6 7 8 9 10 NA
References Agarwal, R., Venkatesh, V., 2002. Assessing a firm’s web presence. Information Systems Research 13, 168–186. Albaina, A.I., Aranda, J.L., 2001. La educación y el proceso autonómico: textos legales y jurisprudenciales (Education and the autonomic process: legal and jurisprudence texts), vol. XV. Ministerio de Educación, Cultura y Deporte (Spanish Ministry of Education, Culture, and Sport). Boletín Oficial del Estado. Alva, M.E., Martínez, A.B., Suárez, M.C., Labra, J.E., Cueva, J.M., 2010. Towards the evaluation of usability in educative websites. International Journal Technology Enhanced Learning 2 (1/2), 145–161. Baxley, B., 2004. What is a web application? Available from: http://www. boxesandarrows.com/view/what is a web application (last accesed 01.12). Bevan, N., Kirakowsky, J., Maissel, J., 1991. What is usability. In: Proceedings of 4th International Conference on Human Computer Interaction, September 1991, Elsevier, North-Holland. Bevan, N., 2010. Extending the concept of satisfaction in ISO standards. In: International Conference on Kansei Engineering and Emotion Research 2010. Bias, R., Mayhew, D., 2005. Cost-Justifying Usability. Morgan Kaufmann, San Francisco. Black, J., 2002. Usability is next to Profiability. Bloomberg BusinessAvailable from: http://www.businessweek. com/technology/ week. content/dec2002/tc2002124 2181.htm (last accesed 01.12). Bowman, D., Gabbard, J., Hix, D., 2002. A survey of usability evaluation in virtual environments: classification and comparison of methods. Presence: Teleoperators and Virtual Environments 11 (4), 404–424. Brinck, T., Gergle, D., Wood, S.D., 2002. Usability for the Web. Morgan Kaufmann, San Francisco. Cadwell, B., Cooper, M., Guarino, L., Vanderheiden, G., 2008. Web Content Accessibility Guidelines 2.0, W·C Recommendation. Available from: http://www.w3.org/TR/WCAG20 (last accesed 01.12). Ceaparu, I., Lazar, J., Bessiere, K., Robinson, J., Shneiderman, B., 2004. Determining causes and severity of end-user frustration. International Journal of Human–Computer Interaction 17 (3), 333–356. Chisholm, W., Vanderheiden, G., Jacobs, I., 1999. Web Content Accessibility Guidelines 1.0. W3C Recommendation. http://www.w3.org/TR/WCAG10/ (last accesed 01.12). Constantine, L.L., 1994. Collaborative usability inspections for software. In: Proceedings of Software Development ‘94, Miller Freeman, San Francisco. Coutín, A., 2002. Arquitectura de Información Para Sitios Web (Information architecture for websites). Anaya Multimedia, Madrid, pp. 259–261. Damodaran, L., Simpson, A., Wilson, P., 1980. Designing Systems for People. NCC National Computing Centre, Manchester, pp. 25–31. Deshpande, Y., Murugesan, S., Ginige, A., Hansen, S., Schwabe, D., Gaedke, M., White, B., 2002. Web engineering. Journal of Web Engineering 1 (1), 3–17. Edmonds, A., 2003. Uzilla: a new tool for web usability testing. Behavior Research Methods, Instruments & Computers 35 (2), 194–201. Flavián, C., Guinalíu, M., Gurrea, R., 2006. The role played by perceived usability, satisfaction and consumer trust on website loyalty. Information and Management 43, 1–14. Folmer, E., Bosch, J., 2004. Architecting for usability: a survey. Journal of Systems and Software 70 (1–2), 61–78.
662
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663
Freire, A.P., Fortes, R.P.M., Turine, M.A.S., Paiva, D.M.B., 2008. An evaluation of web accessibility metrics based on their attributes. In: Proceedings of the 26th annual ACM International Conference on Design of Communication, ACM, New York, pp. 73–80. Ginige, A., Murugesan, S., 2001. Web engineering: an introduction. IEEE Multimedia 8 (1), 14–18. González, M.P., Granollers, A., Pascual, A., 2006. Testing website usability in spanishspeaking academia through heuristic evaluation and cognitive walkthroughs. Journal of Universal Computer Science 14 (9), 1513–1529. Gónzalez, M.P., Masip, L., Granollers, A., Oliva, M., 2008. Análisis cuantitativo en un Experimento de Evaluación heurística (Quantitative analysis in a heuristic evaluation experiment). In: IX Congreso Internacional de Interacción, Albacete, Spain. Granollers, A., 2004. MPIu+a. Una metodología que integra la Ingeniería del Software, la Interacción Persona-Ordenador y la Accesibilidad en el contexto de equipos de desarrollo multidisciplinares (MPIu+a. A methodology integrating Software Engineering, Human–Computer Interaction and accesibility in the context of multidisciplinary development teams). PhD thesis. Universitat de Lleida. Available from: http://hdl.handle.net/10803/8120 (last accesed 01.12). Griffith, J., 2002. Online transactions rise after bank redesigns for usability. Minneapolis/St. Paul Business Journal. Available from: http://www.bizjournals.com/twincities/stories/2002/12/09/focus3.html (last accesed 01.12). Hartson, H.R., Andre, T.S., Williges, R.C., 2003. Criteria for evaluating usability evaluation methods. International Journal of Human–Computer Intera 15, 145–181. Hassan, Y., Martín, F., 2003. Guía de Evaluación Heurística de Sitios Web (A Guide for heuristic evaluation of websites). No Solo Usabilidad, 2. Available from: http://www.nosolousabilidad.com/articulos/heuristica.htm (last accesed 01.12). Hasan, L., Morris, A., Probets, S., 2009. Using Google analytics to evaluate the usability of e-commerce sites. In: Kurosu, M. (Ed.), Human Centered Design, HCII 2009, vol. 5619. Lecture Notes in Computer Science. , pp. 697–706. Instone, K., 1997. Usability engineering for the web. W3C Journal. Available from: http://web.archive.org/web/19980710023358/http://w3j.com/5/s3.instone.html (last accesed 01.12). Hideki, E., Bim, S., Vieira, H., 2005. Comparing accessibility evaluation and usability evaluation in HagáQuê. In: CLIHC 2005, ACM Internaional Conference Proceeding Series, vol. 124, pp. 139–147. Hvannberg, E.T., Law, E.L.-C., Lárusdóttir, M.K., 2007. Heuristic evaluation: comparing ways of finding and reporting usability problems. Interacting with Computers 19 (2), 225–240. ISO/IEC 9126, 1991. Software product evaluation – quality characteristics and guidelines for their use. ISO 9241-11, 1998. Guidelines for specifying and measuring usability. ISO 13407, 1999. Human-centred design processes for interactive systems. ISO/TR 18529, 2000. Ergonomics of human–system interaction. Human-centred lifecycle process descriptions. ISO/TR 16982, 2002. Ergonomics of human–system interaction. Usability methods supporting human-centred design. ISO 15504, 2004. Information technology – process assessment. ISO 9241-151, 2008. Software ergonomics for World Wide Web user interfaces. ISO/IEC 25010, 2011. Systems and software engineering – Software product Quality Requirements and Evaluation (SQuaRE) – Software product quality and system quality in use models. Ivory, M., Sinha, R., Hearst, M., 2001. Empirically validated web page design metrics. In: CHI 2001. ACM Conference on Human Factors in Computing Systems, CHI Letters 3(1). Kappel, G., Pröll, B., Reich, S., Retschitzegger, W., 2006. Web Engineering – The Discipline of Systematic Development of Web Applications. John Wiley & Sons, New York, NY. Koutsabasis, P., Spyrou, T., Darzentas, J., 2007. Evaluating usability evaluation methods: criteria, method and a case study. In: HCI: Interaction Design and Usability, vol. 4550. LNCS, Springer, Heidelberg, Germany, pp. 569–578. Krug, S., 2006. Don’t Make Me Think: A Common Sense Approach to Web Usability. New Riders, Berkeley, CA. Lazar, J., Bessiere, K., Ceaparu, I., Robinson, J., Shneiderman, B., 2003. Help! I’m lost: user frustration in web navigation. IT & Society 1 (3.), 18–26. Lopes, R., Carric¸o, L., 2010. Macroscopic characterisations of web accessibility. New Review of Hypermedia and Multimedia 16 (3), 221–243. Lorca, P., De Andrés, J., Martínez, A.B., 2012. Size and culture as determinants of web policy of listed firms. The case of web accessibility in western European countries. Journal of the American Society for Information Science and Technology 63 (2), 392–405. Madden, M., 2006. Internet penetration and impact. Report Internet Evolution, Pew Internet and American Life Project. Available from: http://www. (last pewinternet.org/Reports/2006/Internet-Penetration-and-Impact.aspx accesed 01.12). Masip, L., 2007. Estudi de l’estat actual de les webs d’adjuntememts de localitats de menys de mil habitants (Current state of small councils’ websites). Trabajo Final de Carrera. Departamento de Informática e Ingeniería Industrial. Universitat de Lleida. Available from: http://www.recercat.net/bitstream/handle/2072/4553/Masip.pdf?sequence=1 (last accesed 01.12). Matera, M., Rizzo, F., Carughi, G.T., 2006. Web usability: principles and evaluation methods. In: Mendes, E., Mosley, N. (Eds.), Web Engineering. Springer, Heidelberg, Germany, pp. 143–180.
Mayhew, D., 1999. The Usability Engineering Lifecycle. Elsevier Books, Oxford. Medina, N., Burella, J., Rossi, G., Grigera, J., Luna, E., 2010. An Incremental Approach for Building Accessible and Usable Web Applications. Web Information Systems Engineering WISE 2010. Springer, Berlin Heidelberg, 564-577. Nielsen, J., Molich, R., 1990. Heuristic evaluation of user interfaces. Telemedicine journal and ehealth the official journal of the American Telemedicine Association 17, 249–256. Nielsen, J., 1993a. Usability Engineering. Academic Press Professional, Boston, MA. Nielsen J., LandauerT. K., 1993. A mathematical model of the finding of usability problems. In: Proceedings of ACMINTERCHI. pp. 206–213. Nielsen, J., 1994a. Heuristic evaluation. In: Nielsen, J., Mack, R.L. (Eds.), Usability Inspection Methods. John Wiley & Sons, New York, NY. Nielsen, J., 1994. How to conduct a heuristic evaluation. Available from: http://www.useit.com/papers/heuristic/heuristic evaluation.html (last accesed 06.2012). Nielsen, J., Norman, D., 2000. Usability on the web isn’t a luxury. information week. The Business Value of Technology. Available from: http://www.informationweek.com/773/web.htm (last accesed 01.12). Nielsen, J., Molich, R., Snyder, C., Farrell, S., 2001. High-level Strategy: E-commerce User Experience. Nielsen Norman Group. Available from: http://www.consumersgo.to/usability1.pdf (last accesed 01.12). Nielsen, J., 2003. Jakob Nielsen’s Alertbox: Usability 101: Introduction to Usability. Available from: http://www. useit.com/alertbox/20030825.html (last accesed 01.12). Olsina, L., Lafuente G., Rossi, G., 2001. Specifying quality characteristics and attributes for web sites. Web Engineering. Managing Diversity and Complexity of Web Application Development, vol. 2016. LNCS, Springer-Verlag, pp. 266–278. Paganelli, L., Paterno, F., 2003. Tools for remote usability evaluation of web applications through browser logs and task models, behavior research methods, instruments & computers. The Psychonomic Society Publications 35 (3), 369–378. Perallos, A., 2006. Metodología Ágil y Adaptable al Contexto para la Evaluación Integral y Sistemática de la Calidad de Sitios web (Agile and context-adaptable methodology for the integral and systematic evaluation of websites’ quality). Ph.D. thesis. Universidad de Deusto. Facultad de Ingeniería-ESIDE. Available from http://dialnet.unirioja.es/servlet/tesis?codigo=19163 (last accesed 01.12). Petrie, H., Bevan, N., 2009. The evaluation of accessibility, usability and user experience. In: Stephanidis, C. (Ed.), The Universal Access Handbook. CRC Press, pp. 1–15. Petrie, H., Kheir, O. 2007. The relationship between accessibility and usability of websites. In: Proceedings of CHI 2007. Pierotti D., 2005. Heuristic evaluation – a system checklist. Available from: http://www.stcsig.org/usability/topics/articles/he-checklist.html (last accesed 01.12). Powell, T., Jones, D., Cutts, D., 1998. Web Site Engineering: Beyond Web Page Design. Prentice Hall, Englewood. Preece, J., 1994. Human–Computer Interaction. Addison-Wesley, Reading, MA. Pressman, R., 2006. Software Engineering. A Practitioner’s Approach, Sixth ed. McGraw Hill, New York, NY. Quesenbery, W., 2001. What does usability mean: looking beyond ‘ease of use’. In: Proceedings of the 18th Annual Conference of Technical Communications. Available from: http://www.wqusability.com/articles/more-than-ease-of-use.html (last accesed 01.12). Redish, J., 1995. Are we really entering a post-usability era? ACM SIGDOC Asterisk Journal of Computer Documentation 19 (1), 18–24. Rosson, M., Carroll, J., 2001. Usability Engineering. Morgan Kaufmann, San Francisco. Serrano, M., Piattini, M., Calero, C., Genero, M., Miranda, D., 2002. Un método para la definición de métricas de software (A method to define software metrics). In: Proceedings of the first Workshop en Métodos de Investigación y Fundamentos filosóficos en Ingeniería del Software y Sistemas de Información (MIFISIS’2002) El Escorial, Madrid. Shneiderman, B., 1992. Designing the User Interface: Strategies for Effective Human Computer Interaction. Addison-Wesley, Reading, MA. Silva, P., Dix, A., 2007. Usability – not as we know it! In: Proceedings of the 21st BCS HCI Group Conference HCI, September 3–7, Lancaster University, UK. Singh, S., Kotzé, P., 2002. Towards a framework for e-commerce usability. In: Proceedings of the 2002 annual research conference of the South African Institute of Computer Scientists and Information Technologists on Enablement through technology (SAICSIT‘02), Republic of South Africa. Suárez, M.C., 2011. Sirius: Sistema de Evaluación de la usabilidad web orientado al usuario y basado en la determinación de tareas críticas (Sirius: system for evaluating web usability and user-oriented based on the determination of critical tasks). PhD thesis. Available from: www.di.uniovi.es/∼cueva/investigacion/tesis/Sirius.pdf (last accesed 01.12). Takayuki, W., 2007. Experimental evaluation of usability and accessibility of heading elements. In: Proceedings of the 2007 International Cross-disciplinary Conference on Web Accessibility (W4A), May 07–08, Banff, Canada. Tognazzini, B., 2003. First principles of interaction design. Available from: http://www.asktog.com/basics/firstPrinciples.html (last accesed 01.12). Vigo, M., Brajnik, G., 2011. Automatic web accessibility metrics: where we are and where we can go. Interacting with Computers 23 (2), 137–155. Vora, P., 1998. Designing for the web: a survey. Interactions 5 (3), 13–30. Woodward, B., 1998. Evaluation methods in usability testing. Available from: http://web.archive.org/web/20030213050921/www.swt.edu/∼hd01/5326/ projects/bwoodward.html (last accesed 01.12).
M.C.S. Torrente et al. / The Journal of Systems and Software 86 (2013) 649–663 M. Carmen Suárez Torrente is full professor of Human Computer Interaction in Computing Degrees and Graduate at the Department of Computing of the University of Oviedo, Spain. She received her Ph.D. in Computer Science from the University of Oviedo. Her research interests are related to web usability and accesibility, search engine optimization and object oriented programming. A. Belén Martínez Prieto is Computing Engineer and Ph.D. in Computing from the University of Oviedo, Spain. Teaching Computing Degrees and Graduate at the Department of Computing of the University of Oviedo. Her research interests include issues related to the discipline of Human–Computer Interaction, specifically usability and accessibility.
663
Darío Alvarez Gutiérrez holds a M.S. in Computing from the University of Málaga and a Ph.D. in Computing from the University of Oviedo, where he teaches databases, programming, and social, legal, and professional aspects of computing. His research interests include flexible VM-based systems, and security, privacy and social issues of Computing. M. Elena Alva de Sagastegui is Industrial Engineer (Trujillo University, Peru), Master in Information System (UPAO-Peru & I.T.S.M-Mexico), and Ph.D. in Computing from the University of Oviedo, Spain. She was professor at the Programming Languages in the Engineering Faculty of the U.P.A.O University in Peru (1990–2005). She is Full Professor at the Languages and Computers Systems area in the University of Oviedo. Her research interests include Human–Computer Interface, Web Usability, Web Engineering and Semantic Web.