Computers & Industrial Engineering 75 (2014) 230–238
Contents lists available at ScienceDirect
Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie
An approach of product usability evaluation based on Web mining in feature fatigue analysis Mingxing Wu a, Liya Wang a,⇑, Ming Li a, Huijun Long b a b
Department of Industrial Engineering and Logistics Management, Shanghai Jiao Tong University, No. 800 Dongchuan Road, 200240 Shanghai, China Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Shanghai, China
a r t i c l e
i n f o
Article history: Received 24 May 2013 Received in revised form 1 July 2014 Accepted 3 July 2014 Available online 11 July 2014 Keywords: Web mining Usability evaluation Apriori algorithm Feature fatigue
a b s t r a c t Customers prefer to purchase products with more features, but after using the products, they may become dissatisfied with the usability problems of the products. This phenomenon is called ‘‘feature fatigue’’. Thus it is imperative to analyze product usability in product definition stage. However, most traditional methods of usability evaluation are generally carried out using prototypes which are not available until in the later stage of product development. This paper proposes an approach based on Web mining to analyze product usability. This approach uses the massive online customer reviews on analogous products and features as data source, which are easy to get from Web and can reflect the most updated customer opinions on product usability. Association rule mining techniques are adopted to extract customer opinions on the usability of product features. The Apriori algorithm is used for mining association rules, based on which a usability evaluation method is then presented. A case example is given to validate the proposed approach. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction ‘‘Feature fatigue’’ (FF) represents the phenomenon that customers prefer to choose products with more features and capability initially, but after use they become dissatisfied with the usability problems (Thompson, Hamilton, & Rust, 2005; Wu, Wang, Li, & Long, 2013). FF will lead to customer dissatisfaction and widespread negative Word-Of-Mouth (WOM), which will severely damage the reputation of the product and even of the brand (Jokela, 2004; Rust, Thompson, & Hamilton, 2006; Thompson et al., 2005). Thus product usability analysis in the product definition stage is needed for FF analysis. Traditionally, usability can be evaluated through some usability testing methods (Dumas & Redish, 1993; Li, Wang, & Wu, 2013), which are generally carried out using prototypes. However, prototypes are not available until in the later stage of product development, while FF analysis should be performed in the product definition stage. Thus traditional testing methods of usability evaluation are not suitable for FF analysis. There are other methods of usability evaluation that are based on survey data, but it is costly and time-consuming to obtain data in practice, especially for large scale collections (Morinaga, Yamanishi, Tateishi, & Fukushima, ⇑ Corresponding author. Tel.: +86 13818491902. E-mail addresses:
[email protected] (M. Wu),
[email protected] (L. Wang),
[email protected] (M. Li),
[email protected] (H. Long). http://dx.doi.org/10.1016/j.cie.2014.07.001 0360-8352/Ó 2014 Elsevier Ltd. All rights reserved.
2002). In order to save time and economic costs, this paper focuses on proposing a novel approach of usability analysis for FF evaluation. Nowadays, with the rapid development of e-commerce, more and more people purchase products online and post their reviews on products onto the Web (Ding, Liu, & Yu, 2008). They may post positive comments on the Web like ‘‘it is easy to use’’, or express their complaints like ‘‘the Setup is the most frustrating thing in the world’’. These online reviews are one form of WOM that can significantly influence other customers’ purchase decisions (Chevalier & Mayzlin, 2006; Li & Hitt, 2008), and eventually affect the brand’s long-term revenue. Compared to traditional survey data, online reviews contain the most updated customer opinions that reflect the evaluation information of product usability derived from customers’ experiences. Moreover, these massive online reviews can be collected from Web easily and inexpensively relative to survey methods. Since most new products are evolutions of existing products (e.g., improving them to the next generation ones) (Bariani, Berti, & Lucchetta, 2004; Chen & Wang, 2008), analyzing the reviews on existing products can help designers analyze the usability of this feature, and provide decision supports for designers to improve product usability in the future, thus alleviating FF. In this paper, an approach based on Web mining is proposed to analyze product usability, which can help designers evaluate FF. It
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
uses the massive online reviews on products as data source. In practice, many of product reviews are redundant and unrelated to usability. Scanning all of the massive reviews to find useful information manually would be tedious and fruitless. To solve this problem, the proposed approach adopts machine learning techniques which can process online reviews automatically to analyze reviews in sentence level. It firstly uses a web crawler to collect online reviews. Then, it utilizes association rule mining techniques to extract customer opinions on the usability of product features. The Apriori algorithm is adopted for mining association rules, and a classifier is built to identify whether a review sentence is related to a feature’s usability and to judge its semantic orientation. And then features’ usability is evaluated, based on which a FF degree is defined to help designers evaluate FF. The remainder of this paper is organized as follows. The next section reviews the related work in the literature. In Section 3, the approach of usability analysis based on Web mining is proposed. Next, a case example is given to illustrate the proposed approach. Section 5 presents the results and discussions. The paper ends with a conclusion in Section 6.
2. Related work 2.1. Feature fatigue The term ‘‘feature fatigue’’ was first used by Thompson et al. (2005) to represent the phenomenon of customers’ inconsistent satisfaction with high-feature products before and after use. Hamilton and Thompson (2007) used construal level theory to explain the reason for FF. They found that indirect experience triggers more abstract mental construal and increases preference of high-desirability (capability) products, while direct experience triggers more concrete mental construal and increases high-feasibility (usability) products. Thus, customers prefer high-capability products before use but high-usability ones after use, which leads to the situation of FF. Gill (2008) classified product features into hedonic category (which is associated with experiential consumption, pleasure, and excitement) and utilitarian one (which is related to more instrumental/practical considerations), and showed that different categories of product features have different effects on customers’ perceived capability and usability. Yet, none of above studies points out how to determine which features should be integrated into the product so as to alleviate FF. There are some reports on efforts to alleviate FF. Thompson et al. (2005) proposed an analytical model to show the influence of the number of features on manufacturers’ long-term profit. They used the model to determine a suitable number of features to be integrated so as to maximize customer equity. But they just focused on the total ‘‘number’’ of features and considered all the features to be homogeneous, ignoring the differences between them. Li and Wang (2011) proposed a probability based methodology for FF analysis in which Bayesian network technique was used to analyze uncertain relationships among product features and the combination effects. But this methodology cannot point out what features should be integrated into the product to alleviate FF. Li et al. (2013) considered the feature addition problem as a multiobjective decision-making problem, in which product capability and usability are two conflicting objectives; and a FF multi-objective GA was proposed for solving the problem. But this approach still cannot find out what specific features should be added for alleviating FF, because it gives many solutions along the Pareto-optimal frontier for designers to select from. Wu et al. (2013) proposed an approach based on the SIR epidemic model and a genetic algorithm to help designers find an optimal feature combination that maximizes customer equity.
231
2.2. Usability analysis Usability is a critical dimension of product quality that affects product success (Mack & Sharples, 2009). It is defined in terms of efficiency, effectiveness, user satisfaction, and whether specific goals can be achieved in a specified context of use (ISO9241-11, 1998; Jeng & Tzeng, 2012). Most traditional methods of usability evaluation are generally based on customer survey or test (Han, Hwan Yun, Kim, & Kwahk, 2000; Kwahk & Han, 2002; Liu, Wang, & Ding, 2010; Marshall, Case, Porter, Sims, & Gyi, 2004). For example, using QFD (quality function deployment), Jin, Ji, Choi, and Cho (2009) developed a usability evaluation model based on customer sensation. Ham et al. (2006) proposed a conceptual framework for identifying and organizing usability impact factors of mobile phones in the view of user, product, interaction, dynamic, and execution separately. The problem of survey methods is that they should use prototypes in the later stage of product development, which will extend development cycle time (Ozer & Cebeci, 2010; Yeh, Huang, & Yu, 2011). Besides, survey data are often costly and time-consuming to collect in practice, and they are hard to capture the most updated customer feedback about the product usability. In recent years, many researchers have studied how to extract customer opinions from online product reviews, such as the works in the fields of opinion mining or sentiment analysis (Bhuiyan, Xu, & Josang, 2009; Kim, Ryu, Kim, & Kim, 2009; Long, Wang, & Liu, 2012). Most of their methods use opinion words or phrases to identify positive and negative reviews and then get preliminary summarized results (Zhan, Loh, & Liu, 2009). However, their works do not address extracting customer opinions about usability, which is an important factor for FF analysis. In this paper, a rule-based method is proposed for usability evaluation in FF analysis, which is most related to the work of Antonie and Zaiane (2002). The proposed method is different from traditional usability evaluation methods, as the usage of massive online reviews.
3. Usability analysis based on Web mining for alleviating FF The framework of the proposed approach is shown in Fig. 1. It consists of three modules: (1) Data preparation; (2) Usability analysis; (3) FF analysis. In module 1, a web crawler is used to collect online product reviews which are then pre-processed to be sentence level reviews. Product features are extracted in this module to establish a synonym dictionary of product features. In module 2, using the review sentences and the synonym dictionary obtained in module 1, the Apriori algorithm is adopted for mining association rules, and then product features’ usability are evaluated using the rules. This module is the main contribution of this paper. Then, in module 3, FF analysis is performed using features’ usability obtained in module 2 and features’ capability obtained in capability evaluation. The details of the proposed approach are presented in the following subsections.
3.1. Module 1: data preparation This module contains three processes: review collection, review pre-processing and product feature extraction. Review collection is performed to get product reviews from Web and outputs raw reviews that are saved into review corpus. Review pre-processing is used to transform the raw reviews into review sentences, which are used for association rule mining and usability analysis. Product feature extraction is to extract product features and to establish a synonym dictionary of product features. This module is not the
232
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
Data Preparation
Development Documentation
Web
Module 1
Output
Review Collection
Web Crawler
Product Feature Extraction
Review Corpus
Output
Review Sentences
Output
Input Review Pre-processing
Synonym Dictionar y
Input
Input Training, Test
Association Rule Mining
Output
Input
Rule Base
Domain Knowledge Base
Input Input
Module 2
Evaluation
Usability Evaluation
Input
Output
Feature Usability
Usability Analysis
Input
FF Analysis Module 3
Input Input
Feature Capability
Capability Evaluation
Output
Output
FF Index
FF Analysis
Fig. 1. Framework of the proposed approach. Note: The dotted lines represent data operations (inputs and outputs).
focus of this paper, so we will introduce it briefly in the following two sub-subsections.
Start
Seed URL Input
3.1.1. Review collection In order to collect online product reviews, this paper utilizes a web crawler to search webpages containing reviews and get product reviews from these webpages. Fig. 2 shows the procedure of the web crawler. The web crawler starts a new search from the unsearched URL list, accesses and downloads the corresponding webpage through HTTP, and then analyzes the webpage contents, finally outputs product reviews and saves them into the review corpus. There are two kinds of information contained in the webpages needed to deal with: (1) URLs. New URLs should be added into the URL list. (2) Product reviews. Each review should be saved into the review corpus as an individual datum.
URL List
Getting URL
Save Webpage loading
Web
Output
Webpage analysis
Finish?
Output
Review Corpus
No
Yes Stop
3.1.2. Review pre-processing The product reviews obtained in review collection are raw reviews. A raw review may contain several review sentences, each of which represents a comment on one product feature. Since the proposed approach analyzes reviews in sentence level, we firstly pre-process these raw reviews to decompose them into review sentences before mining association rules. A data cleaning process is required to remove a given list of stop words which carry no useful information for building the rulebased classifier. Since the focus of this paper is on classification using association rule mining algorithm, we simply use a stop list
Fig. 2. Procedure of review collection. Note: The dotted lines represent data operations.
adopted from which Fox (1989) proposed, excluding Negative Words such as ‘‘not’’, ‘‘no’’, ‘‘little’’, ‘‘cannot’’.
3.1.3. Product feature extraction To analyze the usability of product features, designers should know what features the product has. Thus it is necessary for designers to extract product features first. On the websites of E-shops and
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
dedicated review sites, a list of specifications is usually given to show the features and their details of a product. And the descriptions from manufacturers, usually in the form of advertisement, can also be easily got from Web. Therefore, designers can extract product features directly from these websites using the specifications and/or descriptions. In addition, when analyzing their own existing products, manufacturers can get product features directly from the development documentation of the products. When customers post reviews about the product, they may mention features in their own words or phrases, usually different from the standard expression. For example, customers may use the words ‘‘install’’, ‘‘configuration’’ or ‘‘wizard’’ to express their opinions on the same feature ‘‘Setup’’. Therefore, domain experts should build a synonym dictionary containing features in standard form and related synonyms that usually appear in customer reviews. This synonym dictionary of product features will be used for usability analysis in the next two modules. Of course, the processes of feature extraction abovementioned could be subjective and should require domain knowledge. 3.2. Module 2: usability analysis This module contains two processes: association rule mining and usability evaluation. Association rule mining is to get rules that identifies whether a review sentence is related to a feature’s usability and judges its semantic orientation. Using these rules, usability evaluation can be performed to analyze product features’ usability, helping designers analyze FF in the next module. Since this module is the main contribution of this paper, the details of these two processes are presented in the following sub-subsections. 3.2.1. Association rule mining This paper focuses on how to evaluate the usability of product features based on the large amount of online reviews. For a review sentence, there are two questions should be answered: whether it is about usability and what is its semantic orientation, positive or negative. To achieve this task, a rule-based classifier based on the Apriori algorithm is built and used as follows. 3.2.1.1. Association rule generation. Association rule mining searches for interesting relationships among items in a transactional database. The association rules can be defined as: Let I ¼ fi1 ; i2 ; . . . ; in g be a set of items. Let D ¼ ft 1 ; t 2 ; . . . ; tm g be a set of database transactions. Each transaction is associated with an identifier TID and contains a set of items in I. An association rule is an implication of the form A ) B, where A I, B I, and A \ B ¼ ;. In the rule, A and B are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS), respectively. There are many measures that can be associated with association rules, and the most widely used are support and confidence. The support of the rule A ) B is defined as the percentage of transactions in D that contain A [ B. This can be presented in the form of PA [ B. And the confidence of the rule A ) B is the percentage of transactions in D containing A that also contain B. In other words, the confidence is the conditional probability that consequent B is true under the condition of the antecedent A (Martínez-de-Pisón, Sanz, Martínez-de-Pisón, Jiménez, & Conti, 2012). That is,
supportðA ) BÞ ¼
jfTjðA [ BÞ # T; T 2 Dgj ¼ PðA [ BÞ jDj
jfTjðA [ BÞ # T; T 2 Dgj jfTjA # T; T 2 Dgj supportðA [ BÞ ¼ PðBjAÞ ¼ supportðAÞ
ð1Þ
confidenceðA ) BÞ ¼
ð2Þ
233
An association rule is strong if it satisfies both a minimum support threshold and a minimum confidence threshold. And these two measures are the usual ones for pruning to obtain a reasonable size of rules. There are many algorithms for generating association rules. Apriori algorithm is one of the best-known and influential algorithms to find frequent itemsets and derive significant association rules over transactional databases (Han & Kamber, 2006; Lee, Ryu, Shin, & Cho, 2012; Shahbaz, Srinivas, Harding, & Turner, 2006). The name of the algorithm is based on the fact of using prior knowledge. Apriori algorithm firstly finds the set of frequent 1-itemsets, and this set is denoted as L1. Then L1 is used to find L2, which is the set of frequent 2-itemsets. After that, L2 is used to find L3, and so forth, until no more frequent itemsets can be found (Agrawal & Srikant, 1994). For the problem of usability evaluation, each review sentence is given a category label. Three kinds of usability labels are used to present the content of the sentence: PU (positive attitude about the usability), NU (negative attitude about the usability) and NT (not about usability or not related to concerned features). Different from previous works in the field opinion mining, which just get the preliminary summarized results of positive and negative sentences, this paper focuses on the specific problem of usability evaluation in this section. The task is not only to find out the semantic orientations of a review sentence, but also whether the sentence is about usability. To achieve this purpose, Apriori algorithm is used to find association rules about the relationships between itemsets and usability labels. Using the generated rules, a review sentence classifier can be built for usability evaluation. And an itemset in this problem means a set of words contained in review sentences. The related procedures are described in Fig. 3. Before rule mining, the training set of all sentences should be divided into subsets according to usability labels. Thus there will be three subsets of sentences and, for each one Apriori is used to find frequent itemsets. In Fig. 3, step (2) finds the frequent 1-itemsets, L1. In steps (3–11) Lk1 is used to generate candidates Ck in order to find Lk. In step (4) Lk1 is joined with Lk1 to generate Ck. Step (6) uses a subset function to find all the candidate itemsets that can match the related sentence, and the count of each candidate is accumulated in steps (7–8). And in step (10) only the candidates that satisfy the minimum support threshold are added to Lk. Using the frequent itemsets found by Apriori, rules can be established in the form of fw1 ; w2 ; . . . ; wn g ) UL, where UL represents the usability label PU, NU or NT. 3.2.1.2. Rule set reduction. The number of strong association rules is usually extremely large. Thus it is very important to prune the rules to make the classifier more effective and efficient (Zaïane & Antonie, 2005). Two types of rule set reduction techniques are applied in this paper, as shown in Fig. 4. Firstly, the generated rules are ordered according to the number of words contained in LHS. For example, the rule {small, button} ) NU has a higher order than {very, difficult, programming} ) NU. In steps (2–7), the less general rules are pruned. That is, given two rules ri: A ) C and rj: B ) C, if A B and the confidence of ri is not less than that of rj, then rj should be eliminated. Steps (8–16) show the usage of database coverage. Step (11) uses a function to judge whether a rule can match the sentences in the training set S. If a sentence is matched correctly, the rule is marked and the sentence should be deleted. And in steps (17–19), the unmarked rules will be eliminated and the subset with a suitable number of rules can be got at last. To measure the performance of the usability classifier, the standard definitions of precision and recall are used and expressed by the following equations:
234
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
Algorithm: Apriori. Find frequent itemsets on the training set of customer review sentences. Input: A set of sentences S of the form si : {w1,w2,...wn} where wj are the words contained in the sentence; A minimum support threshold, min_sup. Output: Frequent itemsets L. Method: (1) C1={Candidate 1-itemsets and their support}; (2) L1={Frequent 1-itemsets and their support}; (3) for (k =2; Lk − 1 ≠ ∅; k ++) { (4) (5)
Ck =Lk − 1 Join Lk − 1; for each sentence s ∈ S {
(6) (7) (8)
Cs =subset(Ck, s ); for each candidate c ∈ Cs c.count ++;
(9) (10)
} Lk ={c ∈ Ck|c.count ≥ min_sup}
(11) } (12) return L = ∪ kLk ; Fig. 3. Apriori algorithm for review mining.
default rule can be added, having lowest precedence, to specify the NT class for any new sentence that is not matched by any other rule in the classifier. Though assigning unclassified sentences to NT may lose some information, it will be acceptable if the precision and recall meet the qualification. Then the synonym dictionary constructed in Section 3.1.3 is used to identify which feature each sentence is talking about. Finally, the summary results of usability evaluation based on the review sentences can be shown in the form like Fig. 5. In Fig. 5, ‘‘PU: 84’’ represents that there are 84 sentences containing positive attitudes about the usability of Setup; ‘‘NU: 120’’ represents that there are 120 sentences containing negative attitudes about the usability of Setup. To improve the usability of a product, manufacturers may just focus on the NU sentences (negative attitude about the usability). But the number of NU sentences cannot totally reflect customers’ perceived usability, as both positive and negative comments can influence the probability of purchase (Arndt, 1967). To reflect the comprehensive impacts of positive and negative reviews, the usability level of each feature is proposed as:
Algorithm: Pruning the set of association rules Input: The set of association rules found in the rule generation phase(R) and the set of training sentences(S ). Output: A set of rules used for usability evaluation process Method: (1) order the rules in R according to the number of words in LHS (2) for each rule ri ∈ R { for each rule rj ∈ R, {
(3) (4)
if i
(5)
∧ ri.confidence ≥ rj.confidence then delete rj
(6)
}
(7) } (8) for each rule ri ∈ R { ri.coverage = false (9) (10)
for each sentence sj ∈ S , { if mach(ri ,sj ) = true then {
(11)
delete sj ri.coverage = true
(12) (13) (14) (15)
} }
UN ¼ aNN NP
(16) } (17) for each rule ri ∈ R (18) delete ri where ri.coverage ≠ true (19) return R Fig. 4. Pruning algorithm for rule set reduction.
precision ¼
recall ¼
Nðcorrectly classified sentencesÞ 100% Nðtotal classified sentencesÞ
Nðcorrectly classified sentencesÞ 100% Nðtotal sentencesÞ
ð3Þ
ð4Þ
where N(*) denotes the number of *. 3.2.2. Usability evaluation Now the classifier can be used to attach usability labels to new customer review sentences. For each sentence, the classifier will scan all over the rules to find the one which can match the sentence. To make things simple, each sentence is required to belong to exactly one category. If more than one rule is found to match the sentence, the first one is chosen as the rules are organized in decreasing precedence based on their confidence and support, and the UL (the RHS of the rule) will be attached to that sentence. A
ð5Þ
where NN and NP represent the numbers of negative and positive review sentences, respectively; a is the importance level of negative attitude relative to positive one, which is determined by practitioners. Since it is shown that negative reviews usually have stronger impact than positive ones (Chevalier & Mayzlin, 2006), a is usually larger than one. To make the results more understandable, experts can translate the value of UN to a usability score according to a 5-scale in Table 1. The higher the score, the worse the usability. That is, when Product: remote control Feature 1: Setup PU : 84 < sentence 1 > < sentence 2 > ...... NU : 120 < sentence 1 > < sentence 2 > ...... Feature 2: LCD ...... Fig. 5. An example of usability evaluation summary.
235
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
developing a next generation product, designers should improve the usability of the features with higher usability scores, or remove these features to alleviate FF. Of course, the strategy is determined according to both the usability and the capability of the feature.
Table 2 Capability scores.
3.3. Module 3: FF analysis 3.3.1. Capability evaluation It is necessary to analyze the capability of each feature since FF is derived from the contradiction between product capability and usability. Since capability analysis is not the focus of this paper, a simple method is proposed to evaluate the capacity level of product features. The capability of a feature can be evaluated using scores according to the 5-scale in Table 2. These scores can be obtained through customer survey in customer requirements analysis using AHP or Kano’s model. Or customers can be asked to give the scores according to their preferences at the point of purchasing. If customer data are hard or costly to collect, domain experts can give the scores according to their expertise. 3.3.2. FF degree The ultimate aim of this paper is to analyze and alleviate FF. After getting the scores of capacity and usability, FF analysis can be performed in this stage using the approach proposed by Wu et al. (2013). As Thompson et al. (2005) suggested, to alleviate FF manufacturers should balance the capability and the usability of the product. In this paper we defined a FFD to evaluate FF degree of a feature:
FFD ¼ U C with U ¼
C¼
ð6Þ
FU FU min ; FU max FU min
FC FC min FC max FC min
where U and C are the normalized scores of usability and capability of the feature, respectively; FU is the usability score of the feature that is obtained according to Table 1; FC is the capacity score of the feature that is obtained according to Table 2. The value of FFD is within the range [1, 1], and it is the lower the better. A feature is defined as a FF feature while its FFD value is larger than zero (FFD > 0). As shown in Fig. 6, the values of U and C of a FF feature lays in the lower-right half of the FF matrix (the value of U is the lower the better while the value of C is the higher the better). A FF feature implies that its usability may be a problem to the users. Designers should pay more efforts to improve the usability of the FF features to alleviate FF. In order to alleviate FF, designers should employ different strategies for different features. As shown in Fig. 6, a feature lays in quadrant II will perform well because it is with high capability score and low usability score. That is, it is attractive and easy to use. Designers should integrate more this kind of features into the product. Features lay in quadrant I [especially in I (b)] are with high attractiveness but poor usability. For these features designers should pay much more attention to improve their usability.
Score
Description
9 7 5 3 1
Extremely attractive Very attractive Attractive Somewhat attractive Not attractive at all
Features lay in quadrant III are with low attractiveness but good usability. Designers do not need to pay many efforts to these features. For the features lay in quadrant IV, which are with low attractiveness and poor usability, designers may consider to delete these features to alleviate FF when is needed. 4. Case example 4.1. Case description To illustrate the proposed approach, a case example of usability analysis of universal remote control for alleviating FF is presented in this section. Universal remote control is a kind of typical highfeature product as manufacturers are lured into integrating a growing number of technologies and features with the expectation of providing attractive and competitive models. Customers initially prefer to choose these high-feature universal remote controls but after use they will find that the usability of such products becomes an increasing problem. And as more and more customers becoming familiar and comfortable with Web, a large number of product reviews about universal remote controls can be obtained from the Wed. In this paper Logitech Harmony 890 is used for a case example. For this product, 587 customer reviews are crawled and downloaded from Amazon. Then they are decomposed into review sentences, 5500 of which will be used for the case example. 4.2. Feature extraction and capacity evaluation Firstly, seven features are selected according to the specifications of the remote control on the Web, and then the synonyms usually used in customer reviews are added to related features.
Capability (C) 1
Note:
ĉD
FF
Ċ ĉE 0.5 ċD
Table 1 Impact index for usability evaluation.
Č
Score
Description
9 7 5 3 1
Strong negative impact Weak negative impact Not apparent impact Weak positive impact Strong positive impact
ċE O˄0,0˅
0.5 Fig. 6. FF feature matrix.
Usability (U) 1
236
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
After that a feature dictionary can be built in the form like Table 3. Since capacity evaluation is not the focus of this paper, the capability of each feature is evaluated by experts according to the capability scores in Table 2. The final capacity scores are given in Table 6. 4.3. Usability evaluation To illustrate the proposed usability evaluation approach, the sentences crawled from Web are divided into three sets: 2000 sentences for training, 2000 for testing and 2000 for usability analysis. Each sentence in the training and testing sets is manually given one of the three category labels: PU (positive attitude about the usability), NU (negative attitude about the usability) and NT (not about the usability). In the training set, there are 306 PU sentences, 475 NU sentences and 1219 NT sentences. In the testing set, there are 363 PU sentences, 396 NU sentences and 1241 NT sentences. A data cleaning process is required to remove a given list of stop words, and word stemming is also implemented to reduce inflected or derived words to their stem, base or root form. There are totally 32,541, 30,250 and 32,348 words in the sentences of training, testing and predicting sets, respectively. There are 23,988, 22,558 and 24,523 words removed in the sentences of training, testing and predicting sets, respectively. Then Apriori algorithm is applied to extract rules from the sentences in training set. After that pruning techniques are used to reduce the number of rules. At last totally 144 rules are obtained and some examples are shown in Table 4. 2000 review sentences are used for testing, and the performances of the usability classifier are reported in Table 5. It is shown that if just use the rules generated by Apriori, the precision is 77.0% and the recall is just 34.8%. But if add a default rule to give a default label NT for a sentence that is not matched by any generated rules, the precision and recall will both become 75.3%, as all sentences are classified in this situation. The precision just decreases 1.7% while the recall increases a lot. These value changes indicate that most of the unclassified sentences are NT ones. As this paper focuses on PU and NU review sentences for usability evaluation, just using the generated rules is also enough and suitable for the task. Now the classifier can be used to evaluate the usability category of the 2000 product review sentences. The synonym dictionary constructed in Section 4.2 is then applied to identify which feature a sentence is talking about. The summary of usability evaluation is shown in Table 6. 5. Results and discussion The results of capability and usability evaluation are used for FF analysis. In Table 6, NP and NN represent the numbers of positive and negative sentences about each feature, respectively. The value of a in Eq. (5) is set to be 2, indicating that the influence of a negative sentence is one time stronger than that of a positive one. Then UN can be calculated to reflect the comprehensive influence from both positive and negative sentences. Using the impact index
Table 3 Synonym dictionary of product features. Feature
Synonyms
Setup LCD Keypad Activity button (AB) Radio frequency (RF) Help button (HB) Rechargeable battery (RB)
Programming, PC,. . . Color screen, icons,. . . Number pad, backlit,. . . Macro, a single button,. . . RF, radio signal,. . . Help key, press help,. . . Recharge, charger, cradle,. . .
Table 4 Examples of association rules for usability evaluation. price ) NT frustrating ^ setup ) NU programming ^ easy ) PU push ^ one ^ button ) PU right ^ worked ^ not ) NU ::::::
Table 5 Precision and recall of usability classifier. Classifier
Precision (%)
Recall (%)
Without default rule With the default rule
77.0 75.3
34.8 75.3
Table 6 Results of the case example. Feature
NP
NN
UN
FU
FC
U
C
FFD
Setup LCD Keypad AB RF HB RB
84 2 13 75 17 4 3
120 3 57 27 15 0 8
156 4 101 21 13 4 13
9 5 9 1 7 3 7
8.5 7.25 6.25 7.5 2.5 6.25 4.75
1.00 0.50 1.00 0.00 0.75 0.25 0.75
0.94 0.78 0.66 0.81 0.19 0.66 0.47
0.06 0.28 0.34 0.81 0.56 0.41 0.28
in Table 1, the usability score FU can be achieved depending on the UN value. And the capability scores FC are the average results of four experts’ evaluation scores in this case example as an example. Finally, FFD is calculated using Eq. (6) to represent the overall effect of FF considering both capability and usability. The results of the case example are shown in Table 6. If manufacturers just focus on finding and solving usability problems, they may be interested in Setup, Keypad, AB and RF, as these features have more negative sentences. But the influence of positive comments cannot be ignored. For example, although having 27 negative sentences, the feature AB also gets 75 positive ones. Even considering the stronger influence of negative reviews, the UN of AB is less than 0, which means the usability problem is not serious in an overall view. Compared with AB, the RF feature should be paid more attention according to UN even it has a smaller NN value. Only considering usability from positive and negative aspects is still not enough for alleviating FF. Manufacturers need to balance the capability and the usability of the product feature. In this case example, the key features that have higher FFD values (FFI > 0) are Keypad, Setup, RB and RF. For the RF feature, it is not very attractive (FC = 2.5) for customers when they are making purchase decision. However, the FU of this feature is 7 indicating the poor usability. Thus RF has a more serious problem in the view of FF than RB, which has the same FU value but is much more attractive (FC = 8.5). The FFD values also show that RF has a more serious FF problem than RB. To alleviate FF, manufacturers should adopt different strategies for different features, as shown in Fig. 7. For example, Keypad and Setup are with high capability but poor usability. Actually, they are must-be features which cannot be deleted from a remote control. Therefore, designers should analyze the negative review sentences labeled in Section 4.3 to improve the usability of Keypad and Setup. While for RB and RF, which are additional features compared to some common controls, designers may try to analyze its usability problems and improve its usability. Or they can have another choice: just directly delete these features from the product as they
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
Capability (C) 1 ■AB
237
Acknowledgement
▲Setup
Ċ
ĉD References
■LCD
■HB
ĉE
0.5
▲Keypad
▲RB
ċD Č ▲RF
ċE O˄0,0˅
This research is supported by the National Natural Science Foundation of China (Grant no. 71072061/G020801).
0.5
Usability (U) 1
Note: ■ NOT FF feature; ▲ FF feature Fig. 7. FF feature matrix of the case example.
are not very attractive and have poor usability performances. For the additional features like HB, which is not very attractive but does not have usability problem, manufacturers can chose to improve its perceived capability through some ways like advertising. 6. Conclusions In this paper, an approach based on Web mining is proposed to analyze product usability in FF analysis. It firstly uses a web crawler to collect online reviews. Then association rule mining techniques are used to construct a classifier to identify whether a review sentence is about usability and what is its semantic orientation. The Apriori algorithm is used for mining association rules. After counting the positive and the negative review sentences, the usability evaluation results of product features could be obtained. Then FF degree can be calculated based on the usability and capacity of the features, and different strategies for different features could be applied to alleviate FF according to these analysis results. This approach uses the massive online customer reviews on products as data source, which can reflect the most updated customer opinions on product usability. Compared with traditional survey data, these online reviews are really inexpensive and easy to get from Web. The case example shows that the proposed approach can help designers analyze the usability of product features, and provide decision supports to alleviate FF in product development. Some limitations call for further research. First, to build a rulebased classifier for usability evaluation, a sufficiently large number of existing reviews should be given, but for some unpopular products the related online reviews are usually very limited. Future works should aim at dealing with how to extract customer opinions for usability analysis when only a small amount of reviews are available. Second, the proposed approach can analyze the usability of only existing product features. It is necessary to combine other approaches to analyze the usability of new features. Third, the algorithm is in general computationally expensive; works of improving the algorithm need to be performed in the future.
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of 20th international conference on very Large data bases (1215, pp. 487–499). VLDB. Antonie, M. L. & Zaiane, O. R. (2002). Text document categorization by term association. In 2002 IEEE international conference on data mining, 2002. ICDM 2003 proceedings (pp. 19–26). Arndt, J. (1967). Role of product-related conversations in the diffusion of a new product. Journal of Marketing Research, 4(3), 291–295. Bariani, P. F., Berti, G. A., & Lucchetta, G. (2004). A combined DFMA and TRIZ approach to the simplification of product structure. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 218(8), 1023–1027. Bhuiyan, T., Xu, Y., & Josang, A. (2009). State-of-the-art review on opinion mining from online customers’ feedback. In Proceedings of the 9th Asia–Pacific complex systems conference (pp. 385–390). Tokyo: Chuo University. Chen, C., & Wang, L. (2008). Integrating rough set clustering and grey model to analyse dynamic customer requirements. Proceedings of IMechE, Part B: Journal of Engineering Manufacture, 319–332. Chevalier, J. A., & Mayzlin, D. (2006). The effect of word of mouth on sales: Online book reviews. Journal of Marketing Research, 43(3), 345–354. Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. In Proceedings of the 2008 international conference on web search and data mining (pp. 231–240). Palo Alto, California, USA: ACM. Dumas, J. S., & Redish, J. C. (1993). A practical guide to usability testing. Norwood: Ablex Publishing Corporation. Fox, C. (1989). A stop list for general text. SIGIR Forum, 24(1–2), 19–21. Gill, T. (2008). Convergent products: What functionalities add more value to the base? Journal of Marketing, 72(2), 46–62. Ham, D.-H., Heo, J., Fossick, P., Wong, W., Park, S., Song, C., et al. (2006). Conceptual framework and models for identifying and organizing usability impact factors of mobile phones. In Proceedings of the 18th Australia conference on computer– human interaction: Design: Activities, artefacts and environments (pp. 261–268). ACM. Hamilton, R. W., & Thompson, D. V. (2007). Is there a substitute for direct experience? Comparing consumers’ preferences after direct and indirect product experiences. Journal of Consumer Research, 34(4), 546–555. Han, S. H., Hwan Yun, M., Kim, K.-J., & Kwahk, J. (2000). Evaluation of product usability: Development and validation of usability dimensions and design elements based on empirical models. International Journal of Industrial Ergonomics, 26(4), 477–488. Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd ed.). San Francisco, CA, USA: Morgan Kaufmann. ISO9241-11. (1998). Ergonomic requirements for office work with visual display terminals (VDTs) – Part 11: Guidance on usability. Jeng, D. J.-F., & Tzeng, G.-H. (2012). Social influence on the use of clinical decision support systems: Revisiting the unified theory of acceptance and use of technology by the fuzzy DEMATEL technique. Computers & Industrial Engineering, 62(3), 819–828. Jin, B. S., Ji, Y. G., Choi, K., & Cho, G. (2009). Development of a usability evaluation framework with quality function deployment: From customer sensibility to product design. Human Factors and Ergonomics in Manufacturing & Service Industries, 19(2), 177–194. Jokela, T. (2004). When good things happen to bad products: where are the benefits of usability in the consumer appliance market? Interactions, 11(6), 28–35. Kim, W. Y., Ryu, J. S., Kim, K. I., & Kim, U. M. (2009). A method for opinion mining of product reviews using association rules. In Proceedings of the 2nd international conference on interaction sciences: Information technology, culture and human (pp. 270–274). Seoul, Korea: ACM. Kwahk, J., & Han, S. H. (2002). A methodology for evaluating the usability of audiovisual consumer electronic products. Applied Ergonomics, 33(5), 419–431. Lee, S., Ryu, K., Shin, M., & Cho, G.-S. (2012). Function and service pattern analysis for facilitating the reconfiguration of collaboration systems. Computers & Industrial Engineering, 62(3), 794–800. Li, X., & Hitt, L. M. (2008). Self-selection and information role of online product reviews. Information Systems Research, 19(4), 456–474. Li, M., & Wang, L. (2011). Feature fatigue analysis in product development using Bayesian networks. Expert Systems with Applications, 38(8), 10631–10637. Li, M., Wang, L., & Wu, M. (2013). A multi-objective genetic algorithm approach for solving feature addition problem in feature fatigue analysis. Journal of Intelligent Manufacturing, 24(6), 1197–1211. Liu, P., Wang, L., Ding, X. (2010). Modeling product feature usability through Web mining. In 2010 2nd International Conference on e-Business and Information System Security (EBISS) (pp. 1–4). Wuhan, China. Long, H., Wang, L., & Liu, P. (2012). A method of product feature usability analysis based on web semantic mining. International Journal of Services Operations and Informatics, 7(2), 136–149.
238
M. Wu et al. / Computers & Industrial Engineering 75 (2014) 230–238
Mack, Z., & Sharples, S. (2009). The importance of usability in product choice: A mobile phone case study. Ergonomics, 52(12), 1514–1528. Marshall, R., Case, K., Porter, J. M., Sims, R., & Gyi, D. E. (2004). Using HADRIAN for eliciting virtual user feedback in ‘design for all’. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 218(9), 1203–1210. Martínez-de-Pisón, F. J., Sanz, A., Martínez-de-Pisón, E., Jiménez, E., & Conti, D. (2012). Mining association rules from time series to explain failures in a hot-dip galvanizing steel line. Computers & Industrial Engineering, 63(1), 22–36. Morinaga, S., Yamanishi, K., Tateishi, K., & Fukushima, T. (2002). Mining product reputations on the Web. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 341–349). Edmonton, Alberta, Canada: ACM. Ozer, M., & Cebeci, U. (2010). The role of globalization in new product development. IEEE Transactions on Engineering Management, 57(2), 168–180. Rust, R. T., Thompson, D. V., & Hamilton, R. W. (2006). Defeating feature fatigue. Harvard Business Review, 84(2), 98–107. Shahbaz, M., Srinivas, M., Harding, J., & Turner, M. (2006). Product design and manufacturing process improvement using association rules. Proceedings of the
Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 220(2), 243–254. Thompson, D. V., Hamilton, R. W., & Rust, R. T. (2005). Feature fatigue: When product capabilities become too much of a good thing. Journal of Marketing Research, 42(4), 431–442. Wu, M., Wang, L., Li, M., & Long, H. (2013). An approach based on the SIR epidemic model and a genetic algorithm for optimizing product feature combinations in feature fatigue analysis. Journal of Intelligent Manufacturing, 1–11. Yeh, C., Huang, J., & Yu, C. (2011). Integration of four-phase QFD and TRIZ in product R&D: A notebook case study. Research in Engineering Design, 22(3), 125–141. Zaïane, O., & Antonie, M.-L. (2005). On pruning and tuning rules for associative classifiers. In R. Khosla, R. Howlett, & L. Jain (Eds.), Knowledge-based intelligent information and engineering systems (pp. 966–973). Berlin Heidelberg, Melbourne, Australia: Springer. Zhan, J., Loh, H. T., & Liu, Y. (2009). Gather customer concerns from online product reviews – A text summarization approach. Expert Systems with Applications, 36(2, Part 1), 2107–2115.