Expert Systems with Applications 41 (2014) 5466–5482
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
A knowledge and collaboration-based CBR process to improve network performance-related support activities Leobino N. Sampaio b,a,⇑, Patricia C. de A.R. Tedesco a, Jose A.S. Monteiro a, Paulo R.F. Cunha a a b
Informatics Center — Federal University of Pernambuco (UFPE), Pernambuco, Brazil Federal University of Bahia (UFBA), Bahia, Brazil
a r t i c l e
i n f o
Keywords: Network support CBR Expert systems User-Perceived Quality
a b s t r a c t In a context characterized by a growing demand for networked services, users of advanced applications sometimes face network performance troubles that may actually prevent them from completing their tasks. Therefore, providing assistance for user communities that have difficulties using the network has been identified as one of the major issues of performance-related support activities. Despite the advances network management has made over the last years, there is a lack of guidance services to provide users with information that goes beyond merely presenting network properties. In this light, the research community has been highlighting the importance of User-Perceived Quality (UPQ) scores during the evaluation of network services for network applications, such as Quality of Experience (QoE) and Mean Opinion Score (MOS). However, despite their potential to assist end-users to deal with network performance troubles, only few types of network applications have well established UPQ scores. Besides that, they are defined through experiments essentially conducted in laboratory, rather than actual usage. This paper thus presents a knowledge and Collaboration-based Network Users’ Support (CNUS) CaseBased Reasoning (CBR) Process that predicts UPQ scores to assist users by focusing on the collaboration among them through the sharing of their experiences in using network applications. It builds (i) a knowledge base that includes not only information about network performance problems, but also applications’ characteristics, (ii) a case base that contains users’ opinions, and (iii) a user database that stores users’ profiles. By processing them, CNUS benefits users through the indication of the degree of satisfaction they may achieve based on the general opinion from members of their communities in similar contexts. In order to evaluate the suitability of CNUS, a CBR system was built and validated through an experimental study conducted in laboratory with a multi-agent system that simulated scenarios where users request for assistance. The simulation was supported by an ontology of network services and applications and reputation scheme implemented through the PageRank algorithm. The results of the study pointed to the effectiveness of CNUS, and its resilience to users’ collusive and incoherent behaviors. Besides that, they showed the influence of the knowledge about network characteristics, users’ profiles and application features on computer-based support activities. Ó 2014 Elsevier Ltd. All rights reserved.
1. Introduction A growing number of tools have been used by network support staff as a mean of providing assistance for user communities that have difficulties using network applications (Johnston et al., 2008). Despite the fact that such trend has been identified as one of the major issues of performance-related support activities, support systems still lack a sufficient set of features to fulfill the ⇑ Corresponding author at: Federal University of Bahia (UFBA), Bahia, Brazil. Tel./fax: +55 71 3283 7309. E-mail addresses:
[email protected] (L.N. Sampaio),
[email protected] (Patricia C. de A.R. Tedesco),
[email protected] (J.A.S. Monteiro),
[email protected] (P.R.F. Cunha). http://dx.doi.org/10.1016/j.eswa.2014.02.020 0957-4174/Ó 2014 Elsevier Ltd. All rights reserved.
requirements of the current demand for assistance due to some reason. Firstly, monitoring infrastructures have not been designed and developed considering the profiles of the general public interested in network performance data. Commonly, they present network information only suitable for network specialists. User’s expertise level in dealing with performance data influences mostly the amount of effort network support staff should dedicate to them (Crovella & Krishnamurthy, 2006). By disregarding user-related technical and social factors, monitoring solutions frequently provide traffic performance information in a generic manner (Binczewski, Lawenda, Lapacz, & Trocha, 2009; Brauner et al., 2009; Hanemann et al., 2006; Jeliazkova, Iliev, & Jeliazkov, 2006;
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Sampaio et al., 2007). They rely on the assumption that users have enough knowledge to interpret the performance information provided by the visualization tools. Due to the fact that the interest on Internet applications has increased immensely over the years, where users have shifted from computer savvy people only to a more general audience who comes from diverse knowledge areas, this assumption is no longer true. Secondly, the operation and management of wide area networks currently faces many scalability problems. Network support has become a very relevant activity and users have relied on their staff to get guidance about the best use of the available network resources. In general, the support staff consists of network experts that evaluate, almost manually, each problem situation, based on their expertise and previous knowledge about the applications’ requirements. These approaches do not scale well since they mainly rely on human skills to act in environments that are continuously growing and changing their contexts. For these reasons, such scalability challenges led support staffs to include groups made up of specialists with strong network and application background knowledge to assist the users. Since the support activity is mostly carried out in complex environments, involving everemerging new (and more complex) applications, users’ communities, and their respective network performance requirements, the development of computer-based guidance tools has become imperative. This explains why some of the current solutions rely on a knowledge base regarding network performance-related issues that is maintained by both application and network experts. However, the problem resides in the fact that systems capable of processing support-related knowledge to assist users in using their applications are still scarce. Thirdly, the existing computer-based support solutions do not consider users’ feedback. It is a consensus of the research community that the quality perceived by end users is not just a technical issue at network level (Hassan, Das, Hassan, Bisdikian, & Soldani, 2010). It also involves user’s perception, expectations and experience, thereby requiring non-technical parameters almost never treated by quality of service (QoS) research. Therefore, discussions within the networking community have evolved to find new methods and approaches to measure the quality perceived by users. The Mean Opinion Score (MOS) (ITU-T, 2008a) is an example of such initiative. The MOS values are defined from subjective ratings by real users or predicted from objective measurements of applications’ properties. Measurement tools are then employed to gather network measures that are used to infer the corresponding MOS. In another line of research, the Quality of Experience (QoE) (Hassan et al., 2010; Jain, 2004) concept has been introduced in computer networking research as a mean of measuring the quality from the users’ perspective. As a result, this concept has gained attention from the scientific community as can be demonstrated by a network magazine special issue on the improvement of Quality of Experience for network services (Hassan et al., 2010). Finally, concerns related to users’ satisfaction have also been demonstrated by the International Telecommunication Union (ITU) (ITU-T, 2008b). They have recommended, in a user-centric view, subjective evaluation methods (ITU-T, 1998, 2001, 2009) to assess the perceptual quality provided by Internet services. User-Perceived Quality (UPQ) scores, such as MOS and QoE, are interesting resources for networking support activities. By classifying the network quality into few general types, they simplify the information regarding the network performance. Consequently, end-users and non-specialist network operators refrain from the burden of understanding technical network performance characteristics. As a matter of fact, such simplified information is more convenient for a non-specialist network user to take decisions related to the use of the network. It leads users to focus on performing their end-activities, instead of interpreting
5467
performance details of the network they intend to use. In support situations involving this kind of users, too much technical information can even harm their understanding of the support system’s responses. Due to these reasons, MOS and QoE scores arise as key components for assisting end-users on the use of their applications. They consist in high-level information that indicate the appropriateness of the network performance in a given context. That is, User-Perceived Quality scores simplify users’ guidance by pointing out what they can expect by using their application on the current network conditions. Despite their potential to assist end-users to deal with network performance troubles, only few types of network applications have well established User-Perceived Quality scores. Perhaps, one reason for such absence resides in the fact that existing approaches define them through experiments essentially conducted in laboratory, rather than actual usage. Considering the rapid change of Internet applications’ requirements and the dynamics of their contexts, such approaches are expensive and fairly limited. Indeed, they face practical problems. In light of the above, this work looks for ways of furnishing network support infrastructures with User-Perceived Quality scores of network applications in timely fashion and based on the collaboration among users from a given community. Hence, it tries to address the following research question: is it possible to conceive a process in which the domain knowledge and collaboration among users are explored to find User-Perceived Quality scores of network applications in order to improve the assistance of users in respect to their use of the network? Accordingly, it has set out to find ways of better assisting network managers and operators in performing users’ support activities, by considering users’ opinions, applications’ characteristics, and network performance conditions. This paper then introduces a knowledge and Collaborationbased Network Users’ Support (CNUS) CBR Process that, based on past experiences, predicts User-Perceived Quality (UPQ) by classifying them into five scores. The process is knowledge-based since it provides content-based recommendations according to the description of cases and also uses a domain ontology of network applications to better describe a case. Besides that, CNUS is collaboration-based because it considers the opinions of users weighted by their reputations. In respect to its users, the benefits of CNUS are twofold: to end-users that try to obtain more information about a network link due to an experienced performance trouble and to non-specialist network operators that look for a better understanding of users’ characteristics. As discussed in further sections, end-users are benefitted by indications of the degree of satisfaction they may have based on the general opinion from members of their community in similar contexts. The non-specialist network operators in turn can have valuable information from the system’s databases to define a more detailed profile of the users they support. This research evaluates CNUS process through the combination of two approaches. First of all, we make a proof of concept through an empirical evaluation of Mentor Advisory tool (Carlomagno, Dourado, Sampaio, Monteiro, & Cunha, 2009). Mentor has been implemented according to the guidance approach for supporting network users introduced in Sampaio (2011). Besides that, it has CNUS’s CBR modules, which are used to predict QoE. Since Mentor is a web application, which would require a great number of real users to be evaluated, CNUS’s suitability is evaluated through a multi-agent system that simulates diverse users’ assistance scenarios. The remainder of this article is organized as follows: Section 2 presents the related works; Section 3 presents the CNUS process in detail; Section 4 presents a preliminary investigation that shows how the CBR process can be used; Section 5 presents the experimental studies carried out in laboratory to assess the feasibility
5468
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
of the process; Section 6 discusses the results from the experiments; and finally, Section 7 presents the conclusions and proposals for further works. 2. Related works This section presents the works closely related to this research. The discussions about them can be grouped in five threads: (i) CBR tools and frameworks, (ii) Computational models of trust and reputation, (iii) Network performance management and support, (iv) Network performance support systems, (v) User-related measures of network service quality. The following subsections are dedicated to discuss the relationships of each of these aspects with respect to the research presented in this article. 2.1. CBR tools and frameworks In recent times, numerous applications have used the CaseBased Reasoning (CBR) technique as a means for addressing AI problems in diverse knowledge domains (López et al., 2011; Shimazu, 2002; Varma & Roddy, 1999). Given the increasing number of inexperienced application designers and developers, the industry and research community have devoted efforts to fulfill the lack of tools and frameworks, which help the rapid development and implementation of CBR systems. In general, they can be grouped into domain-independent and domain-specific tools. For instance, myCBR (Stahl & Roth-Berghofer, 2008) and jColibri (Díaz-Agudo, González-Calero, Recio-Garca, & Sánchez-Ruiz-Granados, 2007) are domain-independent CBR environments whose main goal is to enable the rapid prototyping of diverse CBR applications, whereas eXiT⁄CBR (López et al., 2011), ExpertClerk (Shimazu, 2002), and ICARUS (Varma & Roddy, 1999) are domain-specific CBR tools that show how the CBR paradigm has been applied in different fields. In the following we briefly present these frameworks and explain why this research has not taken advantage of their features. With respect to domain-independent tools, jColibri (DíazAgudo et al., 2007) is one of the most relevant CBR frameworks that benefits developers requiring a java-based, domain-independent and easy-to-use CBR environment. It consists of a java and improved version of the COLIBRI software built in LISP. jColibri is an open-source software that has a complex architecture that decouples the reasoning algorithms from the domain model. Besides that, it uses a CBR ontology mapped into java classes. Due to such features, jColibri can assist the building of a diverse set of CBR applications. With similar goals, myCBR (Stahl & Roth-Berghofer, 2008) consists of an Open Source CBR framework built to facilitate the rapid prototyping of CBR applications. It provides a friendly development environment comprising a graphical user interface for generating and managing cases, modeling tools to create domain models, retrieval engines to perform similarity-based retrievals, and an explanation component that supplies explanations about information retrievals performed by the application. myCBR is a plug-in for the Protégé platform. Such integration is an interesting feature of the tool, since an ontology can be used to describe cases’ attributes. Regarding domain-specific tools, the community research has proposed many initiatives (An, Kim, & Kang, 2007; Bai, Yang, & Qiu, 2008; Chou, 2008; van Setten, Veenstra, Nijholt, & van Dijk, 2004). For instance, eXiT⁄CBR (López et al., 2011) is a framework particularly developed for medical diagnosis. The tool focuses on the experimentation of medical CBR application and provides a graphical user interface which facilitates the interpretation of the results. In commercial settings, ExpertClerk (Shimazu, 2002) is another CBR tool whose goal is to enhance e-commerce web-sites
through a virtual salesclerk that implements a conversational CBR approach. The interaction with users helps the system to find the appropriate case during information retrieval. The Intelligent Case-based Analysis for Railroad Uptime Support (ICARUS) (Varma & Roddy, 1999) is another example. It consists of a CBR tool developed for the diagnosis of off-board locomotives by associating fault codes to specific repairs. Even though these frameworks ease the development of CBR applications, this research needed to develop CNUS from scratch due to some specific requirements. Firstly, it is imperative for the system to support reputation models since this research is based on the assumption that users can practice collusion or present incoherent behavior. Secondly, although some of the current general purpose CBR tools promote software reuse, in general the code available is too complex to integrate into our network performance advisor (Mentor). Mentor was used both as a standalone application as well as a software agent in the multi-agent simulation environment. Therefore, it was essential that we had a simple code to implement the similarity-based retrieval of the CBR process introduced in this paper. Thirdly, network performance is subject from a very specific domain, thus requiring the design and development of a simple and specific CBR platform with a straightforward set of features.
2.2. Computational models of trust and reputation Reputation and trust models are currently reviewed in the literature according to diverse factors (Jøsang, Ismail, & Boyd, 2007; Pinyol & Sabater-Mir, 2013; Sabater & Sierra, 2001, 2005). Due to the great variety of models and the exhaustive scientific research in this field, we briefly present herein only the trust and reputation models closely related to our work. This discussion is based on the survey recently presented in Pinyol and Sabater-Mir (2013). We start with online reputation models, which have considerably increased in commercial settings (Dellarocas, 2000; Jøsang et al., 2007; Mui, 2002; Resnick, Kuwabara, Zeckhauser, & Friedman, 2000). These models are frequently adopted by e-commerce web sites, such as ebay,1 in order to rate buyers and sellers, and consequently, to establish a more trustable market place. Other reputation models have been proposed in the context of e-commerce and related areas. For instance, Sporas and Histos introduced in Zacharia, Moukas, and Maes (2000) enable users to rate others after doing transactions and then modify their reputations accordingly. Agent oriented models are those in which the reputation is a result of the processing of subjective properties. In Pinyol and Sabater-Mir (2013), the authors mention as part of this group the trust model presented by Abdul-Rahman & Hailes (Abdul-Rahman & Hailes, 2000) where the information regarding the reputation is transmitted from one agent to another. The ReGreT model introduced in Sabater and Sierra (2001) is another example discussed by them. The model is defined taking into account the agent’s direct experiences, third party information and social structures to compute trust, reputation and levels of credibility. These are just few examples of trust and reputation models related to the flow models used in research. In general, the choice of each model depends on the problem it is intend to address. In this research, we adopted the flow model discussed in Jøsang et al. (2007) and implemented by Google’s PageRank (Brin & Page, 1998) since it seemed to be suitable to address the reputation scheme we have designed. The reader is referred to the surveys presented in Pinyol and Sabater-Mir (2013), Sabater and Sierra (2001) and Jøsang et al. (2007) for deep discussions about their classification, features, and field of application. 1
http://www.ebay.com/.
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
2.3. Network performance management and support With regard to performance management and support of wide scope networks, the majority of existing solutions attempt to assist operators on their activities through the adoption of support systems. These systems supply such operators with the information used to help users get efficient network performance for their end-systems. Therefore, the most adopted support model suggests end-users to contact the Network Operation Center (NOC) to complain about network performance problems. In general, they emit a ticket that is then sent to domain specialists to deal with the problem. The Géant’s Performance Enhancement & Response Team (eduPERT) (Geant2, 2009) is an example of an initiative that follows this model. eduPERT offers network performance support by responding to users’ requests to investigate quality of service (QoS) issues submitted by them. Operators are benefited with reference documentation that explains the concepts of network performance, highlights the most common causes of quality of service (QoS) problems, and receive general guidelines on how to configure systems to optimize performance. Through a ticket system, the members of eduPERT staff, known as Case Managers, are in charge of receiving cases and evaluating the need of assistance from domain experts. Among the existing roles, the Subject Matter Experts are those who have specialized knowledge about the problem and help users on a best effort basis. The PERTKB (Geant2, 2010) knowledge base about network problems is an initiative maintained by eduPERT staff to facilitate this activity. 2.4. Network performance support systems Problem detection and diagnosis as well as the need to advise the use of the network start when a user looks for end to end performance information related to a communication path, in order to run his/her application. The detection and diagnosis activities aim at identifying the eventual points of failure at the network, operating system, and application levels and, afterwards, explaining their causes as a means for finding a solution. The advising is in fact a way of indicating the most appropriate use of the network according to the minimum quality of service requirements the application has. Most of the current network support solutions use tools that gather performance data from the last mile by performing their own measurement tests rather than collecting them from measurement archives. Such tools not only aim at obtaining last mile performance data to complement the one already collected in the core, but also help operators advise users on how to better use network services. Examples of them include NDT2 (Network Diagnostic Tool) and Internet2 Detective.3 The drawback of these initiatives, however, is that, in some cases, human support from the network operation is still needed for an effective diagnostic, due to the amount of technical data. Besides that, despite their friendly interfaces, the tools present very poor advisory information. It is worth noting that these are just a few examples of initiatives that propose tools in early stages of development. Their development relies solely on the knowledge of those who designed them, due to the lack of frameworks and well-defined procedures to conceive support systems that also consider the knowledge of the users. 2.5. User-Related measures of network services quality The Mean Opinion Score (MOS) (ITU-T, 2008a) is the most adopted measure of network services quality as perceived by users
2 3
http://e2epi.internet2.edu/ndt. http://detective.internet2.edu.
5469
in practice. MOS-related indicators are usually determined from subjective ratings by end users or predicted from objective measurements essentially conducted in laboratory. In the former case, a group of users evaluates the quality of samples of the network service being provided (e.g., speech, video, etc.) and, then, the respective scores are averaged to arrive at a MOS. The latter alternative builds mathematical models to map subjective MOS to measurable parameters. Some of the current support systems have already incorporated MOS to help managers understand the users’ needs in particular when the support involves video and voice applications.
3. The CNUS process This Section describes in detail the knowledge and Collaboration-based Network Users’ Support (CNUS) CBR Process proposed in this article. This is done by discussing how its stages were conceived in order to accomplish the guidance services introduced herein.
3.1. Fundamentals CNUS is a process defined according to a guidance approach for supporting network users which draws heavily on multi-dimensional recommendation approaches adopted by recommender systems in commercial settings (Adomavicius & Tuzhilin, 2005; Balabanovic´ & Shoham, 1997; Felfernig, Friedrich, Jannach, & Zanker, 2006). The information regarding the guidance stems from the User Application network performance level relation, rather than the commonly seen bi-dimensional User Item. So, the recommendation space S is defined as a Cartesian product S ¼ D1 D2 D3 , where D1 and D2 represent the dimensions comprising the sets of attributes that define the users’ profiles and applications’ features, respectively, while dimension D3 comprises a set of attributes that defines the network performance condition. It can also be defined by a single attribute where its value corresponds to the major characteristics of a given network type. The information from each of these dimensions is obtained through the guidance approach summarized in Fig. 1 and introduced in Sampaio (2008, 2011). It consists of a network performance guidance solution that suggests the combination of conversational and knowledge-based recommendation methods to support users on running their applications. It is knowledgebased due to its usage of knowledge about users, classes of application, and network performance parameters. It is also conversational since there are some required parameters, which can be obtained only through users’ interaction. For instance, users might inform the period of time they intend to run a given application, their quality expectations, and involved network endpoints. Besides that, an advisory system must define a class of the network application the users intend to run in order to identify the acceptable levels for its network performance metrics. The application class as well as network properties can be respectively inferred and obtained by the system or can also be informed by the users. Broadly speaking the guidance approach’s steps, the Data gathering step suggests implementing a conversational method to facilitate the user’s profile and application’s class identification. Afterwards, the Reasoning step finds out relevant network performance metrics associated to the application class previously identified. Finally, the Learning step obtains the remaining information required to support the user. Based on network conditions (either measured or specified by users), it suggests the search for other similar previous situations to guide users in the use of the network according to the current performance conditions.
5470
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Fig. 1. The network performance guidance approach.
Case-Based Reasoning (CBR) is a distinguished problem-solving AI technique that fulfills the requirements of the approach depicted in Fig. 1. This technique typically relies on specific domain knowledge related to previous experiences to solve a new problem, just like problem-solving experiences of cognitive learning. Thus, a new problem is addressed through the search of a similar case previously experienced that has its solution adapted to the current case. The process ends by retaining the solved case coupled with its solution so that it can be used in future problem solutions. Through this approach, a support system can offer responses that reflect the opinion of a group of users, especially because a rating of a single user could deviate the precision of the guidance process. In practice, network performance support teams frequently provide assistance taking into consideration the opinion of a whole community of users. CNUS is therefore based on Aamodt and Plaza’s development framework of CBR systems (Aamodt & Plaza, 1994). It was adapted to deal with the network performance-related support issues, but still addressing the four stages of the cycle proposed by the authors, as depicted in Fig. 2. The main goal of the process is to predict User-Perceived Quality (UPQ) scores for a network application specified in an assistance request of a network user. In CBR literature, a ‘‘case’’ means the description of a problem coupled with its solution. CNUS then assumes that the problem is each user’s assistance request, involving an application and a set of network properties, whereas its solution is the UPQ score. The assistance request is then characterized by an application class, a user type, and a set of network properties, and represented in a case structure that comprises two main parts. The first consists of a vector of six features to describe the user’s assistance request. They are: User type (string), Application class (string), Delay (number), Bandwidth (number), Jitter (number), Loss (number). The first two features describe the user type and the application class, respectively. The remaining features describe the network properties. The second part of the case representation, in turn, comprises one attribute that contains the classification regarding five UPQ scores. They are: 1 — Excellent, 2 — Good, 3 — Fair, 4 — Poor,
and 5 — Bad, respectively. In view of obtaining the information of this case representation and processing them to predict UPQ scores, CNUS suggests the stages depicted in Fig. 2. They are detailed in Section 3.3. 3.2. CNUS Users The main CNUS users are those end-users of network applications that are not interested in technical details of performance parameters. They recur to CNUS whenever they try to obtain more information about performance levels of the network due to an experienced trouble when running an application. Through CNUS’s output, a user can verify the UPQ Scores from others with similar preferences and in similar contexts. Besides that, this user can also use CNUS in advance, particularly when wanting to have an idea of their expected satisfaction degree in a given context. In this way, users can better plan their activities involving the network. At last, we assume that the subjective score provided by CNUS is enough to guide end-users in their activities. Otherwise, a guidance process is required. In essence, this process may be conducted with the support of a user’s profile that furnishes personal information like performance preferences and frequently quality expectation. Section 3.4 gives a brief idea of how this process can be conducted. Non-specialist network operators are also users of CNUS. They can provide user-tailored assistance that can bring great improvements to support activities. In order to carry this out, it is essential to have available enhanced user models or profiles. The enhanced user model which we are referring to must contain information detailed enough to enable the network operator better identify the characteristics of the user requesting for assistance. Therefore, among other parameters, CNUS has in its knowledge base the reputations of the users and their degree of compliance with their community’s opinion. User’s reputation can help the network operator on selecting the appropriate information to be provided to the user. Besides that, the language adopted and level of details can be adjusted according to the type of user. The degree of compliance with community’s opinion assists the operator to anticipate the
Fig. 2. CNUS Process.
5471
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
solution for future problems that can be experienced by users from the same community. 3.3. CNUS CBR Stages The process illustrated by Fig. 2 consists of a reasoning cycle made up by five stages as follows: the Case characterization stage characterizes the assistance demanded by the user. This activity aims to identify the application’s class and user’s profile through an interactive process with users. The network features can be either measured by the network infrastructure or informed by the user, as well; the Retrieve stage searches for similar requests and their respective UPQ scores in the cases base; The Reuse stage predicts the UPQ score according to the ones from the retrieved cases; (4) the Revise stage verifies whether the predicted UPQ score and the one from a given user matches and then make the adjustment as needed to retain the case; and (5) the Retain stage stores the case along with its final UPQ score. Besides that, it maintains the case base with a reasonable size by excluding unnecessary cases through the instance reduction strategy explained in Section 3.3.5. The next subsections further discuss the details of CNUS by explaining how each of these stages was conceived through the description of the development of the Mentor advisory tool (Carlomagno et al., 2009). Mentor is a tool for network performance guidance that plays the role of an adviser who assists application end-users using a high level language, thus helping them on network usage. The tool has been available to users of the Brazilian National Research and Education Network (RNP) as part of the Brazilian Ipê Network Performance Monitoring Service (RNP, 2014). 3.3.1. Case characterization As stated before, each case is represented through a set of attributes that contains information about application class, user’s type, and network features. Instead of dealing with each network application, CNUS groups network applications which share similar performance requirements. In doing so, CNUS is able to identify the importance of each network property during weighting of case’s features for the similarity measures of the Retrieve stage. Indeed, this strategy reduces the need for storage and improves the performance of the system without harming system accuracy. Users are, likewise, grouped in well-defined types due to the same reasons. The case characterization stage obtains the values for these attributes when the user requests for assistance. However, users not always know which class a network application belongs to. At this stage, the user should thus provide information detailed enough to carry out this classification. Mentor implements this strategy through MonONTO domain ontology (Moraes, Sampaio, Monteiro, & Portnoi, 2008). It provides a classification of network applications and establishes relationships among network applications, performance metrics, and measurement tools. Therefore, by classifying the network application the user intends to run through MonONTO, Mentor is capable of finding general features such application may have. That is, based on users’ interactions, this tool can efficiently identify the performance requirements of the applications classes, and define the weights for each network properties case attribute. Fig. 3 shows an excerpt of MonONTO that helps in showing how this classification is carried out. The ACADIA and SATyrus individuals from the E-LEARNING concept inherit the properties from its super classes (e.g., GRID and APPLICATION concepts). These relations, coupled with the axiom 8x; y; w(APPLICATION(x) ^ APPLICATION(y) ^ NETWORKCHARACTERISTIC(w) ^ ISTOLERANTWITH(y,w) ) ISTOLERANTWITH(x,w)), enable Mentor to infer whether these individuals tolerate a network characteristic. Consequently, the tool not only identifies the
importance of performance metrics but also can search for previous cases according to the network characteristics of the application, instead of just the name of its class. We stress that ontologies help the organization of the knowledge base, thus enabling the tool to find additional information about a given application more efficiently. MonONTO has been designed and used by Mentor and is detailed in previous papers (Carlomagno et al., 2009; Moraes et al., 2008). Its hierarchical relationships organize the knowledge about network applications and network performance related aspects, such as performance metrics and measurement tools. Besides that, concepts that inherit features from the same class share the same properties, thus facilitating the search for similar cases. Besides an application class, it is needed to identify the type of user the support involves since: (i) previous users’ experiences can be used to improve the precision of future support requests; (ii) users from the same community frequently have similar network performance preferences; and (iii) the retention of feedback from users’ experiences keeps users’ profiles updated, thus leading to accurate results. The maintaining of users’ profiles seems to be the best alternative to deal with such issue. Therefore, this stage selects key attributes used in the search of cases to define the weights distribution used by the matching function performed by the Retrieve stage of the process. In order to identify the criteria used to define which attributes have more influence on the similarity, the pairwise comparison (Saaty, 2008) can be adopted. Through a given scale, this method helps the identification of the relative importance of the attributes that define a case. Section 4 provides an example of how this can be carried out.
3.3.2. Retrieve This stage aims to retrieve the most similar cases from the case base according to the information previously identified and a similarity threshold. The latter is defined as the lowest possible similarity value between the case presented and the most similar one retrieved from the case base. Thus, the threshold is used by CNUS to select only cases with a minimum similarity. In terms of case search, an important issue of CBR discussion is related to the method used to compute the similarity among the current case and the ones previously stored. The Distance-weighted Nearest Neighbors algorithm (Dasarathy, 1991) represented by Eq. (1) is widely used in CBR systems and has the advantage of including the weights associated with the attributes that define a case. For this reason, it was adopted in this stage. Briefly, it provides a method to estimate the degree of closeness between the target case and each case stored in the case base. In this equation, C is the current case, B is the case in the base, a is the number of attributes of each case, i is an individual attribute, f is the function used to compute the local similarity of the attribute i of the cases C and B, and W i is the weight for each attribute i.
Pa simðC; BÞ ¼
i¼1 f ðC i ; Bi Þ Pa i¼1 W i
Wi
:
ð1Þ
The case similarity calculated in Eq. (1) is in fact the aggregation of similarities of each attribute, computed by the function f ðx; yÞ. So, with respect to their similarity measures, the methods vary according to the type of attribute used. This stage compares the following attributes: User type (string), Application class (string), Delay (number), Bandwidth (number), Jitter (number), and Loss (number). Thus, the attributes that this stage deals with are numeric, when concerning the network performance parameters, and symbolic, when concerning the user type and application class. Therefore, the distance methods from Eqs. (2) and (3) were chosen and explained herein assuming C ¼ fC i ; i ¼ 1 . . . ag and B ¼ fBi ; i ¼ 1 . . . ag
5472
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Fig. 3. Case characterization through an application class.
as arrays of attributes being compared and f ðC i ; Bi Þ as the distance between the ith attribute of C and B. In Eq. (2), j is an individual symbolic attribute, f is the function used to compute the local similarity of the j-th attribute of cases C and B. The value 1 is assigned whenever the attributes have the same values, otherwise the value 0 is assigned.
( f ðC j ; Bj Þ ¼
1 if C j ¼ Bj ; 0
ð2Þ
if C j – Bj :
In what concerns the network conditions (e.g., loss, bandwidth, delay, and jitter metrics values), represented by numeric values, the similarity calculation is quite different. Since network performance metrics can have different scales values, a normalization method is required. The linear function shown in Eq. (3) addresses this problem, where k is an individual numeric attribute, Lmaxk and Lmink respectively represent the maximum and minimum values which Bk and C k can assume. In this way, the performance metric similarities are transformed in values in the interval ½0; 1.
f ðC k ; Bk Þ ¼ 1
jBk C k j : ðLmaxk Lmink Þ
ð3Þ
Getting back to the discussion about the weight distribution, the Nearest Neighbors algorithm, represented in Eq. (1), is widely used in CBR systems and has the advantage of including the weights associated with the attributes that define a case. The weights are important in this context, since network performance metrics relevance varies according to the application class. Finally, the similarity threshold s, defined as the lower limit for the similarity between the stored case and the new presented case, is obtained according to the approach presented in MacDonald, Weber, and Richter (2008) coupled with a leave-one-out retrieval technique. Firstly, one case is taken from the case base and is defined as a new case. Afterwards, it computes the similarity between this new case and every existing case in the case base, and then, the new case is included back in the case base. This procedure is carried out with all the existing cases in the case base. After computing the similarities, we take the overall minimum similarity of the most similar existing case in each run and define it as s. Eq. (4) summarizes the adopted method, where n is the number of cases in the case base, m is the number of new cases presented (n-1). We emphasize that if no case satisfies s, this stage must indicate the absence of similar cases in the case base.
s ¼ mini6m maxj6n ½simði; jÞ :
ð4Þ
3.3.3. Reuse In general, the reuse stage of CBR systems consists of solving a new problem by adopting and adapting, if necessary, the solution that was used to solve its most similar case. CNUS implements the reuse stage by computing the mean opinion of UPQ scores from other users that have made similar requests. That is, with the purpose of considering the opinion of a group of users, the reuse stage must compute the average of the ratings assigned to the whole set of retrieved cases to predict the UPQ score the user is likely to have. The precision of this average can be even better if it is weighted by users’ reputation, since their opinions may vary according to a number of factors. However, reputation schemes in general lead to the problem associated to collusion practices, where a group of users may give bad responses to harm the system or gain an unfair advantage from it. Due to these requirements, this stage suggests to: (i) Identify users’ reputation from the retrieved cases; (ii) Compute the UPQ score through an average weighted by users’ reputation; (iii) Present the predicted UPQ to the user. In terms of users’ reputations, it establishes that the reputation must be defined by the number of times a given advice has been accepted by other users. That is, the more users agree with the opinion of a given user, the more important this user must be. In addition to that, the reputation of the user that agrees with a rating, also determines how important such agreement is. Among the existing reputation models in computer systems, the flow model (Jøsang et al., 2007), also implemented by Google’s PageRank (Brin & Page, 1998), has been adopted since it fulfills such requirements. Flow models are usually implemented by transitive iteration through looped or arbitrarily long chains. The user’s reputation increases as a function of incoming flow, and decreases as a function of outgoing flow, so the reputation of the entire community is distributed among its members (Jøsang et al., 2007). This stage then suggests a reputation scheme based on Eq. (5). It implements the second version of the PageRank algorithm (Page, Brin, Motwani, & Winograd, 1998), assuming that the PageRank value represents the probability that a user randomly agrees with the opinion of another given user. Thus, in Eq. (5), PRðuo Þ is uo user’s PageRank, PRðui Þ is the PageRank of ui user that agree with the rating of uo ; cðui Þ is the number of times the ui user agrees with the rating of other users ðcðui Þ > 0Þ. We assume that every user follows the opinion of at least one other user; n is the number of existing users in the system, and p is the (‘‘dumping’’) factor, ð0 < p < 1Þ. In practice, the value of p factor ranges from 0:85 to 0:9. It is suggested the value 0:85 herein, just like Google does.
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
PRðuo Þ ¼ p
n X PRðui Þ i¼1
cðui Þ
þ
1p : n
ð5Þ
The definition of cðui Þ value in Eq. (5) is supported by an adjacency matrix T ¼ ½t io , for i ¼ 1; 2 . . . n and o ¼ 1; 2 . . . n, of the directed graph G ¼ ðV; EÞ with V ¼ fv 1 ; v 2 ; . . . ; v n g and E ¼ fe1 ; e2 ; . . . ; em g, for n users and m relationships among them. Thus, G describes a trust network of the users, where the values of E define the trust levels. Each value of tio must be proportionally incremented or decremented whenever the user i agrees or disagrees with a rating given by a user o, respectively. Fig. 4 depicts an example of how the reputation method works. The arrows indicate when a user agrees with the rating of another user and Pagerank values represent reputations levels. In this example, user u2 has the most accepted opinions, since he/she is followed by all the others. Therefore, user u2 has the higher Pagerank value (i.e., 0.3941). In spite of being followed by just one user, u0 has a higher reputation, since the user he/she follows (i.e., u2 ) has the biggest reputation value. Users u1 and u3 have lower reputation values. The first follows one user and is followed by just one, whereas the second is not followed by any user. The reputation values are used to compute the UPQ score for the new case (E) according to Eq. (6). Based on the UPQ scores from a set of t retrieved cases, satisfying the similarity threshold, it assumes a set of users u ¼ fu1 ; u2 ; . . . ; ut g from the retrieved cases and their correspondingly UPQ scores s ¼ fs1 ; s2 ; . . . ; st g and reputation values PR ¼ fPR1 ; PR2 ; . . . ; PRt g to find an estimation (E) of the satisfaction level of the user being advised is likely to have.
E¼
Pt i¼1 si PRðui Þ : P t i¼1 PRðui Þ
ð6Þ
Based on the UPQ score provided, the user runs the network application and, afterwards, rates the prediction after using the application. The information obtained from this feedback is then used in the Revise stage in order to update or identify user’s preferences. 3.3.4. Revise The Revise stage begins by matching the predicted UPQ score (E) against the UPQ informed by the user. It aims at adjusting UPQ score of cases badly rated by users, assigning the actual UPQ score obtained from them. Therefore, when a user disagrees with a recommendation provided, (E) should be revised. At the end of this stage, the case coupled with its final UPQ score are retained in the case base. The Revise stage also updates users’ reputations, since (E) is a value computed by a weighted sum of each individual reputation.
5473
Therefore, each user that contributes with the non-accepted solutions has his/her reputation value decreased accordingly. This is performed by updating the adjacency matrix that defines the values for cðui Þ of Eq. (5). Through these updates, the Revise stage improves CNUS accuracy, thus benefiting future requests. Moreover, it also makes the process act according to the users’ general opinion, enhancing its effectiveness. 3.3.5. Retain CBR is a type of example-based learning algorithm, which may lead to excessive storage and time complexity when all existing cases are stored in the knowledge-base. Depending on the domain, in the long term, this can make the knowledge-base slow and unmanageable; moreover, it can reduce the accuracy of the system due to redundancy and non-trustable cases. In order to avoid such problems, our proposal suggests the reduction of the case base through two procedures. Firstly, it excludes from the case base (i.e., training set) the cases involving non-trustworthy users. That is, it adds only cases of users with a minimum reputation. For instance, the minimum reputation can be defined as the one from users which has at least one opinion accepted. Secondly, it suggests the adoption of Instance-Based Learning (IBL) (Aha, Kibler, & Albert, 1991) algorithms to act as a classification function. As stated by Aha et al. (1991), the classification helps reduce the number of cases stored in the knowledgebase and maintains the knowledge-base updated without harming the accuracy of the system. The IB3 algorithm (Aha et al., 1991) has a set of features large enough to address the issues related to storage requirements and noisy instances. It keeps only acceptable misclassified instances in the training set, thus excluding the ones which can harm the definition of a concept or class. In this stage of CNUS, IB3 carries out the classification of a new case, according to the values coming from the similarity function defined by Eq. (1). Afterwards, it updates the information concerning the classification performance and, finally, decides which case should be retained in the case base. Through these features, IB3 tackles the updating problem of CNUS’ case-base, which contains opinions provided by users which may change their perception along time. The IB3 algorithm is adopted in the retain stage to carry out these procedures. In order to adopt it, CNUS assumes that each instance (i.e., case from the base) belongs to a class, which corresponds to the UPQ scores assigned by users or predicted by CNUS. Therefore, besides the n attribute–value pairs, used to describe the instance, there is an extra attribute used to identify the class it belongs to. As discussed in previous Sections, the classes are disjoint and the
Fig. 4. Reputation example based on Pagerank.
5474
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
values assumed are: 1 – Excellent, 2 – Good, 3 – Fair, 4 – Poor, and 5 – Bad. In addition to the definition of the instance’s class, IB3 also requires a classification record of each instance to identify its classification performance. Such information is used to determine whether the instance will be useful for future classifications or if it should be discarded. Therefore, the classification record has the following data structure
. The first attribute refers to an instance descriptor, the second refers to the number of correct predictions (e.g., number of times the instance’s class matched the new instance’s class), and the third refers to the number of incorrect predictions (e.g., number of times the instance’s class did not match the new instance’s class). These attributes support the significance test of IB3. In other words, such attributes, determine which instances should be kept for further classification (e.g., acceptable instances) and which ones should be discarded. We encourage the reader to see (Aha et al., 1991) for further details on the IB3 algorithm and how the significance test is used to define an instance as acceptable. Hence, the whole process ends by updating Mentor’s case base through IB3 instance reduction strategy. 3.4. Guiding users through CNUS’s output The main output of CNUS is the User-Perceived Quality score predicted according to the information furnished by the end-users. However, the process can also provide user’s reputation (R) and compliance with his/her community’s opinion (C). These information can be useful for improving the details of user’s profiles and, therefore, assisting the guidance process. (R) means the trust level of a user in his/her community, whereas (C) regards to the degree of user’s compliance with his/her community’s opinion. That is, based on the historical forecast errors identified during the Revise stage, CNUS can identify to what extent a user follows the opinion of his/her community. These outputs can be used by non-specialists network operators, through a more detailed user’s profile P ¼ ðR þ C þ hÞ, to improve the guidance G (P+k), where k consists of contextual information related to an assistance, such as time of the day, network domain, day of week, and h regards to user’s individual data, such as frequently degree of quality requirement, area of expertise, performance preferences, gender, knowledge domain, etc. Hence, a profile P conditions a guidance G. Users benefits from G when diagnostic information is not enough to deal with the situation in which a network performance difficulty takes place. Since network managers and operators have a deep knowledge of their area and some users need just the UPQ score, CNUS does not intend to define how G should be conducted, but instead, it furnishes a set of information to improve the guidance G. Otherwise, we would be disregarding their background knowledge and their ability to interpret the obtained UPQ score. Thus, since the UPQ scores indicate the satisfaction degree a user may experience according to a given context, G can be defined by matching CNUS’s scores with the users profiles. For instance, a user can be recommended with a UPQ score of 2 – Good, but if in his/her profile has the information that this user has a high performance exigency, the guidance process would indicate to try to run the application in another moment in order to wait a UPQ score of 1 – Excellent. We stress that such guidance can be either carried out automatically through a performance advisory system or by a network operator. 3.5. Concluding remarks of CNUS CBR process The aforementioned discussion described the CNUS CBR process and some of its implementation alternatives through the development of Mentor advisory tool. In order to evaluate the whole
process, the next section shows a case study that guided the development of the multi-agent system used in the experimental studies. 4. Preliminary investigation of CNUS This Section presents a preliminary investigation that shows how the whole process is implemented by Mentor. This description serves as a proof of concept and demonstrates the feasibility of CNUS. 4.1. Case characterization stage Table 1 shows a set of cases involving application classes, users and network performance metrics that were defined from the domain analysis made to build the MonONTO ontology. The network applications considered on the evaluation belong to the following classes: Messages (MSG), Collaboration (COL), Services on Demand (SDEM), Bulk Data Transfer (BDT), Information Services (ISRV), and Grid (GRD). This example also includes types of users frequently found in academic contexts, namely: Regular users, Researchers, and Network managers. The most adopted network performance metrics Loss, Delay, Jitter, and Bandwidth were considered as well. Finally, for each occurrence from the table, there is an associated UPQ score, which indicates the users’ perceived quality of received media (i.e., 1 — Excellent, 2 — Good, 3 — Fair, 4 — Poor, and 5 — Bad, respectively). The information listed in Table 1 is used to describe each user’s support request. For instance, in case #1 Daniel is a regular user that experienced an application from the Collaboration class with network performance condition consisting of 1% of loss, 50 ms of delay, 8 ms of Jitter, and 600 kbps of Bandwidth. He rated the condition in which he executed this application as Excellent. As noted before, this stage involves the definition of the relative weight of attributes of the cases shown in Table 1. The referred weights are used to compute the similarity between cases in the Retrieve stage. As suggested in Section 3, the Pairwise comparison method can be used in their definition. However, to employ this method, the system needs to count with a domain knowledge base in order to explore the application characteristics. The knowledge about network applications used to build MonONTO ontology was used with this purpose. Table 2 shows the performance metrics relevance associated with each application class obtained from it. Such information was used to define the weight distribution among them. Basically, the method consisted of determining qualitatively which criteria are more important and assigning a quantitative weight to each of them. Taking the Collaboration application class as an example, the first step arranges the information from Table 2 in an a a matrix as presented in Table 3. That is, by considering an item in the row with respect to every other item in the same row, the letter of the criteria considered as most important in each Pairwise comparison is assigned to the corresponding matrix cell. For instance, User group (A) was considered more important than application class (B), thus the cell at the position (1,2) has the value A. Besides that, whenever two criteria have the same importance level, both letters are assigned to the cell. Once the matrix is filled, an ordered list of the items, ranked by the number of cells containing their flag letter, is created as: (1) User (5 occurrences), (2) Application Class (4 occurrences), (3) Loss (1 occurrence), (4) Delay (3 occurrences), (5) Jitter (3 occurrences), and (6) Bandwidth (1 occurrence). In the second step, the quantitative weight is obtained by assuming a linear proportion between all the weights and solving Eq. (7), where the coefficients are the number of occurrences of each criteria in the matrix. The number of occurrences leads us
5475
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482 Table 1 Example of cases from the case base. Case #
1 2 3 4 5 6 7 8 9 10 11 12
D1
D2
D3
UPQ
Application
User
Class
Name
Type
Loss (%)
Performance metrics Delay (ms)
Jitter (ms)
Bw (kbps)
COL SDEM MSG ISRV SDEM COL ISRV MSG BDT COL ISRV MSG
Daniel Carol Eduardo Eduardo Marina Nathalie Carol Carol Elaine Ana Marina Marina
Regular User N. Manager Researcher Researcher N. Manager Regular User N. Manager N. Manager N. Manager Researcher N. Manager N. Manager
1 0 3 3 3 3 0 0 0 1 1 1
50 100 450 450 260 350 100 100 100 150 260 260
8 10 40 40 20 15 10 10 10 15 20 20
600 1024 128 128 600 512 1024 1024 1024 500 600 600
Table 2 Performance metric relevance for each application class.
COL BDT ISRV MSG SDEM
Table 5 Subject case example.
Loss
Delay
Jitter
Bandwidth
Attribute
Value
Medium High High High Medium
High Low Medium Low High
High Low Low Low High
Medium Low Medium Low High
User type App. class Delay (ms) Bandwidth (kbps) Jitter (ms) Loss (%)
Researcher Collaboration 44 1024 12 1
Table 3 Pairwise comparison method for attributes of Collaboration application class.
User group (A) App class (B) Loss (C) Delay (D) Jitter (E) Bandwidth (F)
1 5 1 1 4 2 5 5 5 5 4 4
(A)
(B)
(C)
(D)
(E)
(F)
-
A -
A B -
A B D -
A B E DE -
A B CF D E -
After applying the retrieve function, the cases returned is presented in Table 6. It shows cases involving users with similar assistance requests that do not necessarily had the same UPQ scores. The Reuse stage then tries to predict the UPQ score with more accuracy by considering the reputation of each user from this resulting set. 4.3. Reuse phase
to assign the weight values of 29, 24, 6, 18, 18, 6 to User, Application Class, Loss, Delay, Jitter, and Bandwidth attributes, respectively. The weights obtained for all application classes through this procedure is listed in Table 4.
100 ¼ 5x þ 4x þ x þ 3x þ 3x þ x
)
x ¼ 5:88:
ð7Þ
Case reuse and users’ reputation are the main issues of the Reuse phase. Still considering the example shown in Table 6 and assuming that users Ana, Daniel, Carol, Marina, and Nathalie have 0.36, 0.18, 0.39, 0.03, and 0.04 reputation values, respectively. CNUS defines the UPQ score S for the subject case through the weighted mean shown in Eq. (8).
S¼
4.2. Retrieve stage
ð0:36 5Þ þ ð0:18 1Þ þ ð0:39 5Þ þ ð0:03 4Þ þ ð0:04 2Þ ð0:36 þ 0:18 þ 0:39 þ 0:03 þ 0:04Þ
4:1: Having defined the weights, this stage retrieves the most similar cases from the case base. Table 5 describes a subject case for this example. It shows a case that consists of a Researcher user that plans to run an application from the Collaboration group under network performance conditions corresponding to 44 ms for delay, 1024 kbps for bandwidth, 12 ms for jitter, and 1% for Loss.
ð8Þ As described in the previous section, such reputation scheme alleviates precision errors resulting from misbehaving users, since they tend to have their reputations’ values decreased whenever users do not agree with their opinions. 4.4. Remaining stages
Table 4 Weights from the Pairwise comparison method. Attr User App. Class Loss Delay Jitter Bandwidth
COL
BDT
ISRV
MSG
SDEM
GRD
29 24 6 18 18 6
29 24 18 6 6 18
31 25 19 12 1 12
28 22 17 11 11 11
33 27 1 13 13 13
29 24 18 6 6 18
The Revise and Retain stages of the CNUS are not detailed herein. They consist of making adjustments in cases and storing cases’ information in case bases, respectively. In this sense, after the user runs the network application and gives the feedback regarding the actual UPQ score. This is then compared with the UPQ score (i.e., 4.1) predicted by the Reuse stage. The information regarding the UPQ is adjusted (if needed) based on this comparison and stored coupled with the other case’s parameters by the Retain stage.
5476
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Table 6 Retrieved cases. Case #
10 1 2 5 6
D1
D2
D3
Application
User
Class
Name
Type
Loss (%)
Delay (ms)
Jitter (ms)
Bw (kbps)
COL COL SDEM SDEM COL
Ana Daniel Carol Marina Nathalie
Researcher Regular User N. Manager N. Manager Regular User
1 1 0 3 3
150 50 100 260 350
15 8 10 20 15
500 600 1024 600 512
UPQ
Sim
5 1 5 4 2
0.9778 0.8460 0.7906 0.7906 0.7906
Performance metrics
The effectiveness of these phases was verified during the experimental studies presented in the next section. 5. Experimental studies The CNUS CBR process was evaluated through an experimental study carried out in laboratory by multi-agent systems (MAS). We have chosen this validation approach due to the reasons listed below. First, even though a reasonable number of works in AI literature point to the need of domain experts for effectiveness evaluations of knowledge-based systems (Gena, 2005; Macal & North, 2005), this is an alternative that generally incurs in high costs to the evaluation procedures. Experimental studies carried out in laboratory are cheaper and feasible to verify the applicability and effectiveness of the proposed process (Gena, 2005). Second, MAS have a set of features sufficient to simulate users’ behavior (Macal & North, 2005). In particular, software agents of MAS are characterized by their autonomy in decision-making without human intervention, reactivity that enables them to perceive the context in which they operate and react to it appropriately, and social ability, which enables them to collaborate on a particular task (Wooldridge, 2009). Third, multi-agent systems enable the simulation of random and concurrent interactions of agents. Such randomization is a key aspect of the study described herein, since it tries to simulate real support scenarios. Finally, the autonomy of agents in decision-making contributes to reproduce the situation in which a group of users tries to harm the advisory system or presents incoherent feedback. Such understanding reinforces the idea of conducting the experimental study through MAS simulation in order to assess the CNUS suitability and to verify its advantages, as well. This simulation relieved the need of users involvement and, consequently, the cost and time associated with real scenarios (Macal & North, 2005). Even if this research had a real environment for experimentation available, collusive and incoherent users’ behavior would demand too much time as well as a reasonable amount of participants on the study to reach reliable results. The next subsections present the experimental study carried out with the Mentor advisory tool that implemented CNUS and was included in the MAS environment used for simulations. 5.1. The study definition Defining an experimental study entails the identification of its goal and the main questions it should answer (Berander & Jönsson, 2006). So, in this study, the main goal is to evaluate the CNUS process with respect to its suitability for network support activities related to network performance, verifying whether the domain knowledge and collaboration among users improve their assistance in respect to their use of the network. This goal was formulated based on the assumption that UPQ score has enough information for user’s decision making related to the network use.
This goal led to research questions regarding the CNUS process, selected according to the literature related to case based reasoning systems, recommender systems, decision supporting systems, and information retrieval systems. They were: Q 1 : Does CNUS assist users with more accurate and precise UPQ scores by considering the knowledge about network properties, applications’ characteristics, and users’ profiles? Q 2 : Does CNUS assist users with more effective UPQ scores by considering the knowledge about network properties, applications’ characteristics, and users’ profiles? Q 3 : Does CNUS assist users with accurate and precise UPQ scores in the presence of collusive and incoherent users’ behavior? To answer these research questions, performance metrics were defined according to the literature of evaluation of user-adaptive systems (Gena, 2005).
5.2. Execution planning of the study In order to minimize the need for domain experts, a case base was generated for the experiment based on the specialized literature of network support used during the domain analysis. Afterwards, the experimental study was carried out by selecting ten case subsets from it. They were then used as training and test sets. The former was used in the analysis and the latter was used in validation. Thus, these cases subsets were used in the evaluation to measure the effectiveness of retrieval and adaptation functions of the CBR system (i.e., Mentor advisory tool). The k-fold cross-validation (Weiss & Kulikowski, 1991) strategy was employed to assist the definition of the training and test sets used in the system’s evaluation. This is a technique in which the original set is randomly partitioned into k subsets. One subset is used as a testing set and the others as training sets. Experimental evidences have showed that the value of k ¼ 10 is the best choice to obtain accurate estimates (Weiss & Kulikowski, 1991). The k-fold cross-validation has already been employed in validation of CBR systems. For instance, the case library subset test approach (CLST) proposed by Gonzalez, Xu, and Gupta (1998) suggests the use of the original case base to validate CBR systems. It applies the leave-one-out cross-validation type, which uses a single case from the original case-base as the validation test and the remaining cases from the base are used as training set. Actually, CLST has its basis in CBR evaluation of O’Leary’s ideas (O’Leary, 1993). The author argues that the validation of such systems leads to three important steps: (i) the selection of validation criteria, which consists of identifying the criteria used to validate the system; (ii) test case set design and generation, that is used for validation; (iii) prototype development, used to automatically feed the test case of the system during the evaluation.
5477
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
For these reasons, this study was carried out by following O’Leary’s suggested steps. The next subsections present the specifics of how each step was conducted. 5.2.1. Measures of analysis Performance metrics were selected to quantitatively answer the research’s questions discussed above. The metrics were selected according to the questions as follows: Question Q 1 : Accuracy and precision performance measures are analyzed in terms of prediction error rates through the descriptive statistics by analyzing Mean Absolute Error (MAE) (Breese, Heckerman, & Kadie, 1998; Herlocker, Konstan, & Riedl, 2000), Root Mean Square Error (RMSE) (Good et al., 1999), and Mean Absolute Percent Error (MAPE) (Makridakis, 1993). Question Q 2 : Effectiveness evaluation implies in classifying Mentor’s responses into two groups on the basis of whether they are considered as correct or not correct by users. That is, the information provided by the tool is classified in dichotomic predictions, separating them into the ones with errors that satisfy a pre-established threshold, i.e., ‘‘success’’, and the ones that do not, i.e., ‘‘failure’’. From this classification, performance measures were defined based on a confusion matrix that defines True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) (Salton & McGill, 1986). Based on them, performance measures studied were: Sensitivity, Specificity, Precision, Negative prediction, Success rate (accuracy), Error rate, F-measure, Recall. Question Q 3 : The evaluation of MAE, RMSE and MAPE metrics are observed in a scenario in which there was a group of users colluded to harm the efficiency of Mentor, providing inconsistent ratings. In order to deal with such practice, the Mentor development adopted a reputation scheme as described earlier. 5.2.2. Validation criteria of the experiments The definition of the validation criteria consists in establishing an acceptability threshold, labeled Result Acceptability Criteria (RAC). It was applied to each of the tests executed on Mentor in order to evaluate its UPQ score predictions. In other words, if the tool presented a relative error (RE) that satisfied the RAC threshold the test was regarded as successful. The general effectiveness of Mentor was verified based on a percentage of successful tests. This percentage was named System Validation Criteria (SVC) and served to determine whether, in light of the suite of test cases, the tool could be considered valid, just as the approach followed by Gonzalez et al. (1998). That is, the percentage of successful tests must be above the pre-established SVC value in order to validate the whole CNUS CBR process. The experiments were conducted adopting the significance level of 95% (p 6 0:05) for the SVC. The acceptable percentage error defined for RAC was defined based on the range of values Mentor
Fig. 5. Confusion matrix.
provides. Since it provides values ranging from 1 to 5, the acceptable error was defined as 20% to ensure that its acceptable error does not exceed two. Through such definitions it was possible to adopt the confusion matrix showed in Fig. 5. The conditional sentences presented in the matrix was used to define True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). The table shows the relation between the UPQ score predicted by Mentor (Mentor’s UPQ) and the one informed by the user (Case’s UPQ). 5.2.3. Generation and design of the test case set The design and generation of the case set were performed according to the rules presented in Table 7. It shows network performance values obtained from the domain analysis made to define each of MonONTO’s application classes according to the related literature (EELA, 2006a, 2006b; ESNET, 2006; Marechal, Bello, Carvalho, & Mayo, 2007; RINGrid, 2007). Since it involved an extensive research in the specialized literature and expert interviews, the cases generated and stored in the case base are regarded as gold standard and adequately cover the domain of the applications used in the experiment. The information presented in Table 7 was used to generate 220 cases involving network applications from Messages, Grid, Service on Demand, Collaboration, Bulk Data transfer, and Information Services classes; users from Network Operator, Researcher and Ordinary types, and Loss, Delay, Jitter, and Bandwith network characteristics. From the generated cases, training and test subsets were selected by the k-fold cross-validation strategy through the following procedure: 1. The case base (gold standard) was randomly partitioned into mutually exclusive k folds. They should have near to n=k elements; 2. The examples from ðk 1Þ folds were used for training and Mentor’s responses were verified through the remaining fold; 3. This process was repeated k times, each considering a different fold for test;
Table 7 Rules used to test case set design and generation. App. Class
M. Collaboration Bulk Data Transfer Information Services M. Messages Services On Demand
Loss (%)
Delay (ms)
Jitter (ms)
Bandwidth (kbps)
Exc
Good
Bad
Exc
Good
Bad
Exc
Good
Bad
Exc
Good
Bad
<1 0 0 0 0
1 1 2 3 0
>3 >1 >1 >1 >2
< 50 0 – 150 <1 0 – 150 0 – 250
50 – 150 150 – 400 1–4 150 – 400 250 – 1000
> 150 > 400 >4 > 400 > 1000
< 20 < 40 <1 < 50 <1
20 – 50 40 – 80 1–2 50 – 100 1–2
> 50 > 80 >2 > 100 >2
> 1000 > 100 > 500 > 100 > 500
100 – 1000 10 – 100 10 – 500 10 – 100 10 – 500
< 100 < 10 < 10 < 10 < 10
– – – – –
3 1 1 1 2
5478
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Fig. 6. MAS simulation environment.
4. The performance metric was computed from the results obtained in each k fold. The subsets generated through this approach were then used to model the agents behavior of the multi-agent system used in the validation. 5.2.4. Prototype development A multi-agent system was developed to simulate an environment in which users request assistance from Mentor. The main agents played the following roles: (i) Advisor who make suggestions about the satisfaction level of users on running their applications. This agent also implements the CNUS CBR process, so it maintains the case base and gathers feedback from users that can agree or disagree with a given UPQ score; and (ii) Client that requests for assistance and reproduces collusive and incoherent users’ behavior. In order to accomplish such goals, the Java Agent DEvelopment Framework (JADE) (Bellifemine, Caire, & Greenwood, 2007) was used in the prototype development. Besides the Advisor and Client agents, particularly developed for the simulation, Agent Management Service (AMS) and Directory Facilitator (DF) JADE agents from the platform were used in the simulation. They provide yellow and white page services, required to maintain the interaction among the developed agents. Fig. 6 depicts the environment developed, showing that the case base was maintained by the Advisor agent. The arrows indicate the information flow among the agents implemented through the Agent Communication Language (ACL) from FIPA protocol (FIPA, 2002). The agents depicted in Fig. 6 were implemented as follows: Advisor agent. It implements the CNUS CBR process, acting as Mentor tool. It remains active during the simulation time and waits for requests from the other agents. This behavior was implemented through the cyclic agent behavior provided by jade.core.behaviors.CyclicBehavior. The Advisor agent follows the same weight distribution scheme presented in the previous section to compute the case similarities. Besides that, it adopts 0:95 as the minimum similarity value between two compared cases to reuse a previous UPQ score. The cyclic behavior of the Advisor agent consists of processing three types of messages: BEHAVIORREQUEST, ADVREQUEST, and FEEDBACK. The BEHAVIORREQUEST
message type is used to supply each Client agent with a subset of test cases used to request assistance. The ADVREQUEST message type is used by Client agents when they request assistance. The FEEDBACK message type contains the rating from the Client agents. As soon as the Advisor agent receives that message it goes to the review stage of CNUS process to make the needed adjustments. As the CNUS process states, the Advisor agent also implements the reputation scheme described in previous sections. Therefore, one java class was used to maintain the adjacency matrix and another one to process the Pagerank value. Client agent. It simulates users’ behavior during the requests for assistance. This agent can either modify its behavior according to the information regarding collusions received from other agents or it can also initiate a collusive behavior. Regardless of how the simulation of collusive behavior is started, it may give inconsistent UPQ scores whenever it receives a response from the Advisor agent. 5.3. Preliminary procedures The execution phase of the study followed part of the CBR evaluation approach presented in Gonzalez et al. (1998), which suggests that such activity should be carried out through three well defined steps. Initially, the cases used for the experimentation are analyzed, then the similarity function efficiency is verified, and lastly, the adaptation function is evaluated. Accordingly, we followed three steps to carry out these procedures. Initially, in order to verify the system’s ability to learn by the increase of the number of cases in the case base, we initiated the test with an empty case base of Advisor agent. Then, we included cases, one at a time, to verify the evolution of the system’s accuracy. After running the simulation for one hundred requests, it showed that the Advisor agent provided incorrect responses for a while and, afterwards, improved its accuracy, presenting good classifications for half of users colluded to harm the system efficiency. This behavior is a result of the learning time of the system, since at the beggining it had few cases in its case base. Afterwards, we executed tests to ensure the correctness of the indexing and retrieval functions through a leave-one-out cross-validation technique. After the process finished, the proportion of the successful and failed tests was computed and compared to the
5479
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
system validation criteria (SVC) previously defined. This procedure confirmed the consistency of the similarity function adopted in the study. It had a 100% success rate, thus satisfying the validation criteria (SVC). Finally, in order to verify of the correctness of the adaptation function, we followed the procedure of the adaptation test described in Gonzalez et al. (1998). The analysis thus adopted the k-fold cross-validation strategy in each fold from the test case set, where k ¼ 10. Instead of extracting just one case at a time, it extracted all cases from the fold in each run, presenting each case from the fold extracted as the new case. This procedure measures the success of the adaptation test based on the results achieved in all tests carried out. That is, adaptation success rate of individual tests are compared to the SVC. If it is greater than or equal to the SVC, the adaptation was considered valid. This experiment showed the system had a good adaptation function since it found an appropriate solution for 100% of the cases.
Table 8 Global average error rates and their standard deviations.
EqW DiffW
MAE (r)
MAPE (r)
RMSE (r)
0.45 (0.14) 0.42 (0.12)
14.55 (7.77) 13.63 (7.49)
0.61 (0.16) 0.59 (0.18)
5.4. Planning of experiments After ensuring the correctness of the system, three experiments were conducted to evaluate the whole CNUS process as follows: Fig. 7. Better performance of Advisor agent when adopting domain knowledge.
Experiment 1. Through descriptive statistics, this experiment aimed at evaluating numerical forecasting errors of the Advisor agent from the addition of knowledge about application characteristics, users’ profiles, and network performance. This experiment looked for answers to Q 1 . Experiment 2. This experiment looked for answers to Q 2 , which is related to the system effectiveness. Thus, based on the confusion matrix presented earlier, the results of the Advisor agent were classified in two groups regarding the successful and failed predictions. The experiment also adopted the k-fold cross-validation, where k ¼ 10. Experiment 3. This experiment aimed to answer Q 3 by including ten Client agents in the simulation environment and measuring the performance of the Advisor agent. We included just half of the users practicing collusion due to the need of having at least a reference of good responses. Otherwise, the collusive behavior would dominate the environment. This experiment, thus, assumes that in real scenarios, users practicing collusion are always a minority. The experiment performed the evaluation using the following experimentation scenarios: - Scenario 1. No reputation scheme and collusive behavior (nR=nC). It provides a standard by which the other test scenarios could be compared with. The test included the ten Client agents, each presenting twenty cases and normally interacting with the system. - Scenario 2. No reputation scheme and half of the users practicing collusion (nR=yC). This experimentation scenario aimed to examine the effects of having users practicing collusion in the system in the absence of a reputation scheme. It consisted of half of Client agents in collusion and the other half normally interacting with the system. - Scenario 3. Reputation scheme and half of the users practicing collusion (yR=yC). Its main goal was to investigate the effectiveness of the reputation scheme implemented in the Advisor agent, measuring the extent to which the system could alleviate the effects of the collusion. This scenario was evaluated through two approaches. The first included half Client agents practicing collusion simultaneously in order to verify the system’s behavior. The second included users practicing collusion one at a time. The results of these experiments are presented and discussed in the next section.
Fig. 8. Effectiveness measures of Advisor agent that assess the advantage of using the domain knowledge.
6. Results and analysis This Section presents the findings of the experimental studies discussed in the previous Section. 6.1. Experiment 1 In this experiment, the independent variable was the use of the domain knowledge provided by MonONTO. That is, one test makes use of the domain knowledge by assigning different weights to the attributes (DiffW) and the other test does not, since it assigns equal weights to the attributes (EqW). After conducting the experiment and computing the error from all folds, the MAE, MAPE, and RMSE global average error rates are presented in Table 8. As can be noted through Fig. 7, the Advisor agent had better performance when it adopted the domain knowledge from MonONTO. Besides that, errors means are according to the value suggested by Gena (2005) and Good et al. (1999) (i.e., < 0:7). As previously discussed, the system learns more and improves its precision by the use of the domain knowledge as the number of cases increases. So, the distance between the mean values tends to become large. This experiment also shows that through the domain knowledge most
5480
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Table 9 Global average error rates and their standard deviations showing the effects of collusion and the effectiveness of the reputation scheme.
nR=nC nR=yC yR=yC
Table 11 Pagerank values of agents after the yR/yC testing scenario execution.
MAE (r)
MAPE (r)
RMSE (r)
Agent
Rep
0.44 (0.13) 0.78 (0.42) 0.56 (0.18)
15.36 (6.21) 23.25 (15.1) 17.28 (3.19)
0.67 (0.15) 0.99 (0.55) 0.77 (0.21)
1 2 3 4 5 6 7 8 9 10
0.3180 0.1865 0.3415 0.0278 0.0298 0.0266 0.0250 0.0120 0.0170 0.0160
Fig. 9. MAE errors in an ordinary scenario, where there is neither collusive behavior nor reputation scheme.
Fig. 12. Accuracy evolution of the system by the increase of the number of users with collusive behavior.
Fig. 10. MAE errors in a scenario where the collusive behavior harm the Advisor’s precision.
of the errors were in the 2:5% bucket. Fig. 7 shows that from the 15:0% bucket, EqW shows either near or greater errors than DiffW, since the latter concentrated its errors in the initial and smaller buckets (i.e., <15.0%). The results presented by this experiment thus answer Q 1 . That is, users can have more accurate and precise information by having the support system considering the knowledge about network properties, applications’ characteristics, and users’ profiles, since the different weights influenced its performance. 6.2. Experiment 2
Fig. 11. MAE errors in a scenario where the reputation scheme improves the Advisor’s precision.
This experiment evaluated the effectiveness of the Advisor agent by the analysis of the number of successful and failed responses it performs. The same folds were tested adopting equal weights (i.e., EqW) and different weights (i.e., DiffW). As already mentioned, the definition of the weights used in the DiffW scenario was based on the knowledge about applications’ characteristics, users’ profiles and network performance metrics. Fig. 8 compares the effectiveness of the agent in EqW and DiffW scenarios. The results show that, using the knowledge about
Table 10 Pagerank adjacency matrix after the yR=yC test scenario execution.
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
6
7
8
9
10
0.00 10.70 12.22 11.37 2.74 1.03 13.13 3.68 2.69 3.20
3.53 0.00 7.65 10.62 2.15 1.72 5.35 2.18 3.77 6.30
4.00 4.35 0.00 9.58 1.95 0.92 9.70 2.74 0.16 7.10
2.35 3.57 3.17 0.00 0.27 0.24 5.42 0.84 2.94 2.23
6.10 4.75 3.17 3.27 0.00 3.17 6.68 0.13 1.04 4.57
2.51 1.89 0.92 1.05 4.67 0.00 3.71 4.78 2.53 2.67
3.20 7.26 5.25 1.45 1.25 0.48 0.00 0.98 1.71 8.66
5.40 1.85 3.46 2.76 1.25 0.12 6.47 0.00 2.55 1.34
1.60 1.71 2.23 3.18 1.85 1.00 3.85 0.75 0.00 7.29
2.70 1.30 2.10 3.00 1.50 3.00 4.10 5.20 3.00 0.00
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
applications, users, and network characteristics (i.e., DiffW scenario), results in better effectiveness, thus answering Q 2 . 6.3. Experiment 3 This experiment aimed to evaluate the Advisor agent behavior by simulating its interaction with other agents in three scenarios: (i) users performing their activities under normal conditions (nR=nC); (ii) half of users colluded to harm the system efficiency (nR=yC); (iii) half of users colluded to harm the system efficiency and the Advisor agent adopting a reputation scheme (yR=yC). Table 9 presents the MAE errors produced after executing the tests in these three scenarios. The histograms shown in Figs. 9–11, coupled with the comparison made in Table 9, show a significant performance loss of the Advisor agent in scenario 2, when half of the users started the collusion. In scenario 3, however, its performance got better. The reputation scheme was able to alleviate the effects of collusive behavior of users, in spite of Advisor agent’s performance in scenario 3 could not overcome its performance in scenario 1. It is worth noting that by increasing the number of agents acting in normal conditions, users with collusive behavior tend to be isolated in the environment. Other agent’s feedback reduce their influence on Advisor agent’s performance. Table 10 shows the Pagerank adjacency matrix after the yR=yC test scenario execution. The Client agents that had collusive behaviors (6 – 10) finished the test with lower (sometimes negative) values in columns. It means that the other agents disagree with most of their ratings. Table 11 confirms the utility of the reputation scheme. It shows that agents with collusive behavior had lower Pagerank values. This test included simultaneously half users practicing collusion. Afterwards, the experiment included one at a time to verify the system’s accuracy evolution. Fig. 12 shows the accuracy evolution of the system with the increase of the number of users with collusive behavior. It compares yR=yC and nR=yC scenarios showing that the percentage of MAE errors above 20% is greater when there is no reputation scheme. These results helped to answer the third Q 3 question of the study. As a result of the reputation scheme adopted, the agent could provide accurate and precise support information even in the presence of collusive and incoherent users’ behavior. 6.4. Concluding remarks The experiments presented in this section demonstrated the suitability of the CNUS CBR process. By considering the domain knowledge, Experiments 1 and 2 showed that a CBR process with such features can provide accurate and precise information useful to improve network support’s effectiveness. The findings in Experiment 3 indicated that such process is suitable to support activities in real scenarios and that it is resilient to misbehaving users. This conclusion shows the benefits of considering the collaboration among users through the sharing of their experiences. 7. Conclusions and further work Performance support systems are expanding their roles as important decision aids in support activities. Consequently, many efforts have been dedicated towards addressing user-related network performance issues. However, none of the existing monitoring infrastructures have guidance services implemented with the capability of effectively helping users in circumstances in which the diagnostic services are not enough. In this context, this work presented the CNUS CBR process for network users’ support. It is a new attempt to provide a set of well-defined procedures that
5481
serves as a starting reference point for guidance services development. Indeed, this process differs from the current solutions in that it recognizes the importance of Quality of Experience (QoE) in the process of advising network users on performance issues. So, instead of relying on purely technical parameters, it considers the users’ opinions during the information processing, taking into account the reputations from who have them. By combining existing methods and techniques from artificial intelligence with network monitoring solutions, CNUS suggests a sequence of steps that leads to an enhanced feature set for a computational network performance adviser. Finally, in network support field, this article has stressed that users play and will remain playing an important role in network performance assessment thus leading to the need of a support system capable of considering their opinions and preferences about network services. This work is an initial contribution and step towards the achievement of goals of an enhanced performance guidance infrastructure in network support contexts. Therefore, some work lies ahead. First, among the existing AI problem solving approaches, the Case-Based Reasoning (CBR) is just an alternative able to fulfill the requirements of the guidance approach presented earlier. Actually, the choice should be based on the support scenario in which the approach is intended to be used. For instance, the Cluster analysis (or clustering) is another alternative that can be adopted to identify and reuse past experiences related to the network use. Second, a network performance advisory system can benefit different types of users, in particular those from network operation centers that assist users in the analysis of performance information to troubleshoot end-to-end performance problems. Although this group of users has enough expertise to analyze performance data regardless of system’s interface, they can even be benefited by systems which adapt their interfaces. In addition to that, usability issues also concerns end users when they operate this kind of system without the help of a supporting staff. Third, even though users can use CNUS’s outputs to take their decisions in respect to the use of the network, an application can be developed to better process them by taking into account users’ profile and contextual parameters related to the users. Fourth, the Data gathering step of the guidance approach suggests the interaction with users in order to identify their preferences when requesting assistance. User models can play an important role in this activity, since they facilitate the grouping of users’ characteristics in known stereotypes that can be used to describe users’ communities. User models coupled with interface adaptation can significantly contribute to the guidance approach’s Data gathering step by expanding the range of users that can benefit from advisory services. The main reason resides in the fact that these issues, when well addressed by the system, prevent users’ involvement in technical discussions. References Aamodt, A., & Plaza, E. (1994). Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7, 39–52. Abdul-Rahman, A., & Hailes, S. (2000). Supporting trust in virtual communities. Proceedings of the 33rd Hawaii international conference on system sciences, HICSS ’00 (Vol. 1). Washington, DC, USA: IEEE Computer Society [pp. 9]. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17, 734–749. Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66. An, S.-H., Kim, G.-H., & Kang, K.-I. (2007). A case-based reasoning cost estimating model using experience by analytic hierarchy process. Building and Environment, 42, 2573–2579. Bai, Y., Yang, J., & Qiu, Y. (2008). Ontocbr: Ontology-based CBR in context-aware applications. In International conference on multimedia and ubiquitous engineering, MUE 2008 (pp. 164–169). Balabanovic´, M., & Shoham, Y. (1997). Fab: Content-based, collaborative recommendation. Communications of the ACM, 40, 66–72.
5482
L.N. Sampaio et al. / Expert Systems with Applications 41 (2014) 5466–5482
Bellifemine, F. L., Caire, G., & Greenwood, D. (2007). Developing multi-agent systems with JADE (Wiley series in agent technology). John Wiley & Sons. Berander, P., & Jönsson, P. (2006). A goal question metric based approach for efficient measurement framework definition. In Proceedings of the 2006 ACM/ IEEE international symposium on empirical software engineering, ISESE ’06 (pp. 316–325). New York, NY, USA: ACM. Binczewski, A., Lawenda, M., Lapacz, R., & Trocha, S. (2009). Application of perfsonar architecture in support of grid monitoring. In Grid enabled remote instrumentation signals and communication technology (pp. 447–454). US: Springer. Brauner, D., Moura, A. L., Stanton, M., Faerman, M., Machado, I., Porto, E., Sampaio, L., Monteiro, J. A. S., Melo, E., Jaque, S., & Patino, J. (2009). The deployment of perfSONAR performance monitoring in Latin American networks and its use in the EELA-2 project. In EELA-2 ’09: Proceedings of the first EELA-2 conference. Breese, J. S., Heckerman, D., & Kadie, C. M. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In G.F. Cooper, & S. Moral (Eds.), Proceedings of the 14th conference on uncertainty in artificial intelligence (pp. 43–52). Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30, 107–117. Carlomagno, A. L., Dourado, P., Sampaio, L., Monteiro, J. A. S., & Cunha, P. (2009). MentorWeb: A network performance advising tool. In Demo section of 27th Brazilian symposium on computer networks and distributed systems. Recife, PE: Anais do SBRC [in Portuguese]. Chou, J.-S. (2008). Applying AHP-based CBR to estimate pavement maintenance cost. Tsinghua Science & Technology, 13, 114–120. Crovella, M., & Krishnamurthy, B. (2006). Internet measurement: Infrastructure, traffic and applications. New York, NY, USA: John Wiley & Sons Inc.. Dasarathy, B. V. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos, CA: IEEE Computer Society Press. Dellarocas, C. (2000). Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior. In Proceedings of the 2nd ACM conference on electronic commerce, EC ’00 (pp. 150–157). New York, NY, USA: ACM. Díaz-Agudo, B., González-Calero, P. A., Recio-Garca, J. A., & Sánchez-Ruiz-Granados, A. A. (2007). Building CBR systems with jColibri. Science of Computer Programming, 69, 68–75 [Special issue on Experimental Software and Toolkits]. EELA (2006a). Deliverable D2.4.1.1: Network Requirements Report. Report EELA. EELA (2006b). Deliverable D3.3.1: Application Impact Report. Report EELA. ESNET (2006). Science-Driven Network Requirements for ESnet. Report ESNET. Felfernig, A., Friedrich, G., Jannach, D., & Zanker, M. (2006). An integrated environment for the development of knowledge-based recommender applications. International Journal of Electronic Commerce, 11, 11–34. FIPA (2002). FIPA ACL message structure specification. FIPA Agent Communication Language Specifications. IEEE. Geant2 (2009). Performance enhancement & response team. . Geant2 (2010). The PERT knowledge base. . Gena, C. (2005). Methods and techniques for the evaluation of user-adaptive systems. Knowledge Engineering Review, 20, 1–37. Gonzalez, A., Xu, L., & Gupta, U. (1998). Validation techniques for case-based reasoning systems. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 28, 465–477. Good, N., Schafer, J. B., Konstan, J. A., Borchers, A., Sarwar, B., Herlocker, J., et al. (1999). Combining collaborative filtering with personal agents for better recommendations. In Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference, AAAI ’99/IAAI ’99 (pp. 439–446). Menlo Park, CA, USA: American Association for Artificial Intelligence. Hanemann, A., Jeliazkov, V., Kvittem, O., Marta, L., Metzger, J., & Velimirovic, I. (2006). Complementary visualization of perfSONAR network performance measurements. In International conference on internet surveillance and protection, ICISP ’06 (pp. 6–6). Hassan, J., Das, S., Hassan, M., Bisdikian, C., & Soldani, D. (2010). Improving quality of experience for network services. IEEE Network, 24, 4–6 [guest editorial]. Herlocker, J. L., Konstan, J. A., & Riedl, J. (2000). Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on computer supported cooperative work CSCW ’00 (pp. 241–250). New York, NY, USA: ACM. ITU-T (1998). ITU-T recommendation P.861: Objective quality measurement of telephone-band (300–3400 Hz) speech codecs. Report International Telecommunication Union (ITU). ITU-T (2001). ITU-T recommendation P.862: Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Report International Telecommunication Union (ITU). ITU-T (2008a). ITU-T recommendation P.800.1: Mean opinion score (MOS) terminology. Report International Telecommunication Union (ITU). ITU-T (2008b). ITU-T recommendation P.910: Subjective video quality assessment methods for multimedia applications. Report International Telecommunication Union (ITU). ITU-T (2009). ITU-T recommendation G.107:The E-model: A computational model for use in transmission planning. Report International Telecommunication Union (ITU). Jain, R. (2004). Quality of experience. Multimedia, IEEE, 11, 95–96.
Jeliazkova, N., Iliev, L., & Jeliazkov, V. (2006). UPerfsonarUI – a standalone graphical user interface for querying perfSONAR services. In International symposium on modern computing, JVA ’06. IEEE John Vincent atanasoff (pp. 77–81). Johnston, W. E., Chaniotakisand, E., Eli, D., Guokand, C., Metzger, J., & Tierney, B. (2008). The evolution of research and education networks and their essential role in modern science. In Trends in high performance & large scale computing. Jøsang, A., Ismail, R., & Boyd, C. (2007). A survey of trust and reputation systems for online service provision. Decision Support Systems, 43, 618–644 [Emerging Issues in Collaborative Commerce]. López, B., Pous, C., Gay, P., Pla, A., Sanz, J., & Brunet, J. (2011). eXiT⁄CBR: A framework for case-based medical diagnosis development and experimentation. Artificial Intelligence in Medicine, 51, 81–91 [Advances in Case-Based Reasoning in the Health Sciences]. Macal, C. M., & North, M. J. (2005). Tutorial on agent-based modeling and simulation. In Proceedings of the 37th conference on winter simulation WSC ’05, winter simulation conference (pp. 2–15). MacDonald, C. M., Weber, R., & Richter, M. M. (2008). Case base properties: A first step. In M. Schaaf (Ed.), ECCBR workshops (pp. 159–170). Makridakis, S. (1993). Accuracy measures: Theoretical and practical concerns. International Journal of Forecasting, 9, 527–529. Marechal, B., Bello, P., Carvalho, D., & Mayo, R. (2007). Applications ported to the EELA e-Infrastructure. In Seventh IEEE international symposium on cluster computing and the grid, CCGRID 2007 (pp. 852–857). Moraes, P., Sampaio, L., Monteiro, J., & Portnoi, M. (2008). MonONTO: A domain ontology for network monitoring and recommendation for advanced internet applications users. In Network Operations and Management Symposium Workshops, NOMS Workshops 2008 (pp. 116–123). IEEE. Mui, L. (2002). Computational models of trust and reputation: Agents, evolutionary games, and social networks (Ph.D. thesis). MIT. O’Leary, D. E. (1993). Verification and validation of case-based systems. Expert Systems with Applications, 6, 57–66 [Special Issue: Case-Based Reasoning and its Applications]. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford University. Pinyol, I., & Sabater-Mir, J. (2013). Computational trust and reputation models for open multi-agent systems: A review. Artificial Intelligence Review, 40, 1–25. Resnick, P., Kuwabara, K., Zeckhauser, R., & Friedman, E. (2000). Reputation systems. Communications of the ACM, 43, 45–48. RINGrid (2007). Deliverable D3.3: Summary of requirements and needs to be currently fulfilled to efficiently introduce the remote instrumentation idea into practice. Report RINGrid. RNP (2014). RNP’s backbone. . Saaty, T. L. (2008). Relative measurement and its generalization in decision making: Why pairwise comparisons are central in mathematics for the measurement of intangible factors – the analytic hierarchy/network process. Revista de la Real Academia de Ciencias Exactas, Fsicas y Naturales, 102, 251–318. Sabater, J., & Sierra, C. (2001). Regret: Reputation in gregarious societies. In Proceedings of the fifth international conference on autonomous agents, AGENTS ’01 (pp. 194–195). New York, NY, USA: ACM. Sabater, J., & Sierra, C. (2005). Review on computational trust and reputation models. Artificial Intelligence Review, 24, 33–60. Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York, NY, USA: McGraw-Hill, Inc.. Sampaio, L. N. (2008). A network performance recommendation process for advanced internet applications users. In RecSys ’08: Proceedings of the 2008 ACM conference on recommender systems (pp. 315–318). New York, NY, USA: ACM. Sampaio, L. N. (2011). A guidance approach for network users support (Ph.D. thesis). Informatics Center Federal University of Pernambuco (UFPE). Sampaio, L. N., Koga, I., Costa, R., Monteiro, H., Monteiro, J. A. S., Vetter, F., Fernandes, G., & Vetter, M. (2007). Implementing and deploying network monitoring service oriented architectures. In LANOMS (pp. 28–37). Shimazu, H. (2002). Expertclerk: A conversational case-based reasoning tool for developing salesclerk agents in e-commerce webshops. Artificial Intelligence Review, 18, 223–244. Stahl, A., & Roth-Berghofer, T. (2008). Rapid prototyping of CBR applications with the open source tool myCBR. In K.-D. Althoff, R. Bergmann, M. Minor, & A. Hanft (Eds.), Advances in case-based reasoning. Lecture notes in computer science (Vol. 5239, pp. 615–629). Berlin Heidelberg: Springer. van Setten, M., Veenstra, M., Nijholt, A., & van Dijk, B. (2004). Case-based reasoning as a prediction strategy for hybrid recommender systems. In J. Favela, E. M. Ruiz, & E. Chvez (Eds.), AWIC. Lecture notes in computer science (Vol. 3034, pp. 13–22). Springer. Varma, A., & Roddy, N. (1999). Icarus: Design and deployment of a case-based reasoning system for locomotive diagnostics. Engineering Applications of Artificial Intelligence, 12, 681–690. Weiss, S. M., & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Francisco, CA, USA: Morgan Kaufman. Wooldridge, M. (2009). An introduction to multiagent systems (2nd ed.). Chichester, UK: Wiley. Zacharia, G., Moukas, A., & Maes, P. (2000). Collaborative reputation mechanisms for electronic marketplaces. Decision Support Systems, 29, 371–388.