A multi-agent system for minimizing information indeterminacy within information fusion scenarios in peer-to-peer networks with limited resources

A multi-agent system for minimizing information indeterminacy within information fusion scenarios in peer-to-peer networks with limited resources

Accepted Manuscript A Multi-Agent System for Minimizing Information Indeterminacy within Information Fusion Scenarios in Peer-to-Peer Networks with L...

1MB Sizes 2 Downloads 68 Views

Accepted Manuscript

A Multi-Agent System for Minimizing Information Indeterminacy within Information Fusion Scenarios in Peer-to-Peer Networks with Limited Resources Horacio Paggi , Javier Soriano , Juan Alfonso Lara PII: DOI: Reference:

S0020-0255(18)30276-7 10.1016/j.ins.2018.04.019 INS 13562

To appear in:

Information Sciences

Received date: Revised date: Accepted date:

1 March 2017 24 January 2018 3 April 2018

Please cite this article as: Horacio Paggi , Javier Soriano , Juan Alfonso Lara , A Multi-Agent System for Minimizing Information Indeterminacy within Information Fusion Scenarios in Peer-to-Peer Networks with Limited Resources, Information Sciences (2018), doi: 10.1016/j.ins.2018.04.019

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights Heterogeneous peer-to-peer networks often depend on limited resources, which usually cuts down the performance of the network.



We present a multi-agent information fusion system that relies on collaborating peers to significantly improve the quality of information in resource-limited settings.



The underlying model is founded on querying the peers that have historically performed better for a given agent and information type.



Our system has a broad spectrum of application domains, ranging from mobile recommendation systems to decision-making applications in critical environments.

AC

CE

PT

ED

M

AN US

CR IP T



1

ACCEPTED MANUSCRIPT

A Multi-Agent System for Minimizing Information Indeterminacy within Information Fusion Scenarios in Peer-to-Peer Networks with Limited Resources Horacio Paggi, Javier Soriano1 ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Campus de Montegancedo, 28660 Boadilla del Monte (Madrid), Spain.

CR IP T

Juan Alfonso Lara1 Universidad a Distancia de Madrid, UDIMA. Carretera de La Coruña, KM.38,500 Vía de Servicio, n º 15 28400 Collado Villalba. Madrid, Spain.

ED

M

AN US

Abstract: Information fusion (IF) has gained ground in recent years. It is increasingly used in applications involving networks of heterogeneous elements that communicate with each other peer-to-peer. This is thanks primarily to the advance of the Internet of Things (IoT) and the emergence of paradigms like holonic information fusion. However, heterogeneous peer-to-peer networks often depend on limited resources (energy, communication capacity, time, etc.). On these grounds, network components necessarily have to behave intelligently. They also have to be autonomous and be able to coordinate their actions in order to obtain results in the presence of vague or uncertain data. In this paper, we present a multi-agent information fusion system model that relies on collaborative peers to improve the quality of the information handled by the agents. The idea behind the model is to query the peers that have historically performed better for a given agent and information type (such as a certain data field). We report the results of the experiments we conducted on a proof-of-concept implementation of the proposed system model, consisting of a statistically significant number of simulation runs on two case studies with different numbers of agents and messages. The results show that the performance of an open peer-to-peer network of agents with no predefined structure, measured as mean traffic per agent (i.e. the use of resources) for replies of different quality and as the strict success rate, improves significantly when the members of the network adopt the intelligent mechanisms proposed in this article. This system model has a broad spectrum of application domains, ranging from mobile recommendation systems to decision-making applications in critical environments.

Keywords: adaptive P2P systems; multi-agent information fusion system; uncertain information handling.

PT

1. Introduction

AC

CE

The sustained increase in systems connectivity, the advent of the Internet of Things (IoT) and the emergence of the holonic information fusion (IF) paradigm [42, 34] have led to the fusion of more and more information coming from heterogeneous data sources. These data sources communicate non-hierarchically through a communications network and their resources (energy, quantity of transferable data, etc.) and information quality (which is vague, uncertain, incomplete, etc.) are limited. In this regard, this paper focuses on describing a system model whose components fuse information. The aim of IF is to improve the quality of the information that an agent querying the network receives from its peers. All (peer-to-peer) interactions are possible, although they are

1

Email addresses: [email protected], [email protected] (Horacio Paggi, Javier Soriano), [email protected] (Juan Alfonso Lara) Preprint submitted to Elsevier

September 6, 2017

2

ACCEPTED MANUSCRIPT

gradually restricted over time for the purpose of selecting the most reliable peers with a view to saving resources in limited resources scenarios.

AN US

CR IP T

IF is carried out in order to reduce the negative impact of information imperfections [19, 42] (such as the vagueness and uncertainty of the received messages) on decision-making. Impact reduction means that the receiver should make the best decisions in view of the semantic content assigned to the received message. Indeterminacy is a key feature of information. Uncertainty and vagueness are two basic aspects of indeterminacy. Uncertainty is primarily associated with a data error or imprecision, and vagueness is typical of natural language (in the phrase “lengthy paper”, for example, how many pages means “lengthy”?). On the other hand, information quality usually denotes a multidimensional characteristic of information. Information quality is composed of many facets (also referred to as characteristics or dimensions), which are still open to debate. Of these facets, we are concerned here with vagueness and uncertainty: how accurate or reliable a data item is and how fuzzy this data item may be (see Section 2.1). In this regard, we compare two quality measures by synthesizing their dimensions (vagueness, uncertainty, and others related to reliability, credibility, and efficiency) in order to form a value that is called quality (see Section 2.2).

CE

PT

ED

M

The main contribution of this paper is the proposal of an original multi-agent system model based on a heterogeneous peer-to-peer network with limited resources (like time, energy, number of messages, processing capacity, bandwidth, etc.), whose human or other agents fuse information to reduce the chances of poor quality information. This model’s key innovative feature is that it is capable of switching from an unrestricted to an intelligent operating mode when agents detect a situation (event) where certain conditions apply, for instance, a number of unanswered messages within any preset time limit or a specified limit on the number of available messages per agent. In this manner, it can optimize the use of resources to achieve best system performance. It is this feature that distinguishes our investigation from the conventional research focus on maximizing the quality of the transmitted and received information regardless of limited resources of individual system agents. In intelligent mode, each system component employs decentralized information fusion, using local data either competitively or cooperatively to increase the quality of the information contained in the indeterminacy messages that it exchanges with and receives from the queried agent, focusing on vagueness and uncertainty, in order to make the best decision. The model is based on an information quality metric. Our proposed metric takes into account information uncertainty and vagueness typical of human communication. Being scalar, this metric supports decision-making and automation.

AC

Our model employs event-triggered transmission schemes that have previously been proved to be useful for saving resources in networked systems management [20, 21, 51]. While the reported systems are based on resource allocation and nodes communication assignment strategies, our system uses an information fusion strategy based on querying the peers that have performed better for a given agent and information type (such as a certain data field) in the past. Additionally, we provide empirical evidence on the use of JDL Model Level 4 information quality for information fusion process management [27, 45]. In this respect, the proposed model adaptively selects the data to be acquired by picking the better quality data from the responding (available) sources, whereas the queried sources change over time. This is a contribution to what Bossé et al. [42] described as the scant research on this topic.

3

ACCEPTED MANUSCRIPT

The effectiveness of the proposed system was validated by means of an intensive simulation study, which demonstrated a statistically significant improvement in system performance and success rate compared with the same unintelligent system. System performance was measured as mean traffic per agent for different response qualities, and system success as agents behaving in accordance with the proposed model.

CR IP T

The implementation of the proposed model is potentially applicable in real networked systems. It has a broad spectrum of application domains, ranging from mobile recommendation tools to decision-making applications in critical environments and can be used in different fields including the IoT, human organizations, etc.

AN US

This article is structured as follows. Section 2 reports the research related to IF and peer-topeer (P2P) networks. Section 3 formally states the problem. Section 4 describes the problem solution as a generic system architecture. Section 5 then presents and analyses the results of the experiments conducted on a simulation of a particular implementation and two case studies. Section 6 describes the model applicability. Finally, Section 7 outlines the conclusions and future research.

2. State of the Art

2.1. Indeterminacy, Vagueness, and Uncertainty

PT

ED

M

In this paper, we consider two dimensions as essential for measuring information quality: vagueness and uncertainty. Both are related to indeterminacy. In the context of this paper, indeterminacy refers to the degree of knowledge about the immediate consequences of a message. As Novák put it, uncertainty and vagueness form two complementary facets of a more general phenomenon which we may call indeterminacy [32]. Indeterminacy (i.e., uncertainty and vagueness) implies a degree of belief in the communicated proposition and, in turn, only one tendency to act: citing Smith, when a term is familiar, it can be used without people asking “what does that mean?”…a degree of belief that proposition P [is true] implies a tendency to act as if P [is true]. We are interested in vagueness and uncertainty insofar as they can affect agent beliefs and responses, as in [35]. For a thorough analysis of the different aspects of indeterminacy, see [6].

AC

CE

Of the different ways of modelling vagueness (using what Sutton calls “theories of degree” [47]) we opted for fuzzy sets [53], which is usual (see for example [43, 14, 17]). On the one hand, one of the quality components of a fuzzy information item is its fuzziness, that is, the quality of an information item is poorer, the fuzzier it is. This idea of quality was put forward in [23] claiming that the better of two fuzzy numbers is the one that is least fuzzy. This is the idea that we have implemented in the quality function used in this paper. 2.2. Information Quality Metric To determine which is the best quality information using a single dimension (quality taken as a scalar), the system uses ad hoc formulas to integrate a number of information quality dimensions. Neither the formulas, nor the different quality dimensions (credibility, efficiency, confidentiality, etc.), are specified here as they are not relevant for the comprehension of the remainder of the article. However, they are explained in the listed references. The dimensions considered (for one agent α querying another agent) are:

4

ACCEPTED MANUSCRIPT

The relative importance of each of the constituent parts of an information item for α. For example, given a message composed of several fields, how important each field is to the querying agent.



The importance attached by the agent to the information being first, second, or third hand, etc. For example, whether or not the queried agents supplies the information directly.



The variation in the quality of the responses by the queried agents. The aim is to minimize uncertainty, but a large variation yields less reliability or certainty than a small variation in quality. This is associated with the credibility dimension [36, 24].



The number of agents queried by α, associated with the credibility and reliability of the resulting value.



The number of agents that replied to the query, together with the number of queried agents, which is representative of the credibility, efficiency, and confidentiality dimensions [24, 36].



The vagueness with which an agent α replies, and the uncertainty of its response. These two values can be determined by the queried agent (through introspection), negotiated with, or assigned directly, by α. Vagueness is represented as mentioned in Section 2.1, and uncertainty, using probabilities (like the percentage error of the response value). We implemented introspection.



α-induced vagueness and uncertainty for each information item. Vagueness is related to the potential fuzziness of the possible values, and uncertainty, to the prediction error. Uncertainty is associated with the correctness [36] and accuracy [24, 4] quality dimensions, whereas vagueness is related to precision [36], semantic accuracy [24], and ambiguity [50].



The number of query forwards. This should be low for the same reasons that we have a small number of queried agents. Additionally, it is not, intuitively, the same thing to query four agents at the same time as to query A who queries B who queries C who queries D. Some risks (for example, to security) would appear to be a lot higher if there are more forwards. This maps to the confidentiality and efficiency dimensions [24, 36].

PT

ED

M

AN US

CR IP T



By using a single number for quality, we avoid multi-criteria decision-making.

CE

2.3. Use of Information Fusion to Reduce Uncertainty and Vagueness

AC

IF is useful for reducing indeterminacy and improving results [31, 37]. By way of confirmation, Foo et al. sustain that some advantages of carrying out IF include: improvement in data accuracy, as well as a reduction in uncertainty and ambiguity within data; and improvement in situation awareness (SAW) and inference leading to better decision making [18]. A reduction in indeterminacy will then improve decision-making. One advantage of the IF approach is that each agent is allowed to reason and handle vagueness and uncertainty in a wide variety of ways. It is, therefore, well suited for open systems and systems that do not guarantee the composition and characteristics of their members. Different approaches can be used in order to fuse data from the queried agents. Bayesian methods are a formalism that can be employed if probabilities are used to represent only information uncertainty [9]. Dempster-Shaffer theory [15] generalizes Bayesian methods and is

5

ACCEPTED MANUSCRIPT

CR IP T

capable of representing incomplete knowledge, updating beliefs, and explicitly representing uncertainty. Dezert-Smarandache theory [40] improves upon Dempster-Shaffer insofar as it is capable of formally combining all sorts of independent data sources. However, it focuses primarily on the fusion of potentially highly-conflictive uncertain and imprecise quantitative and qualitative data. There is now a great deal of research on this theory. See, for example, [16]. Semantic methods use not only data but also semantic information supplied by different sources [9]. Finally, other ad hoc techniques can be used. Some are based on the arithmetic averages of source data (if they are quantitative) and others on the choice of the source data that most closely resemble the value calculated or estimated by the agent. The fusion depends on the application. This paper does not examine the properties of any particular fusion technique; instead, it examines the effect that two different ways of selecting the information to be fused has on a system. On this ground, the algorithms in Section 4.3 refer only to “Fuse”, that is, fusion is taken as a generic function. This is one of the reasons why we say that we are proposing a system model or architecture rather than an actual system. As we are not interested in the technique as such, we sought out the one that was simplest and easiest to implement.

AN US

2.4. Real-World Networks Requiring Uncertainty Handling: Networks with Limited Resources The possibilities of communication between people and things have increased so much that, nowadays, everything communicates with everything. This leads to increasingly complex networks. However, there are a number of factors (limited energy, communication costs, tactical risk of a high number of communications, limited computational power, limited bandwidth, etc.) preventing these networks from operating unrestrictedly at all times (without constraints on the number of messages or time). Examples of this are:

M

a) a network using mobile devices (for example, cell phones and tablets) with limited battery life or credit (applicable to communications between both people and things);

ED

b) networks subject to strict response time limits: each communication consumes time which, in some cases, is a critical factor (for example, for handling incidents caused by natural disasters or acts of terrorism);

PT

c) military networks with a high component turnover or EDGE networks [2, 3].

CE

Generically, networks with such constraints are called networks with limited resources. They commonly include ad hoc wireless networks [30], ad hoc mobile networks (MANETs) [1], delay-tolerant networks (DTN) [8], vehicular ad hoc networks (VANets) [25], and body area networks [11]. Specific applications of the above networks are distributed decision-making, realtime recommendation systems with geolocation, etc.

AC

This paper is related to the potential use of networks with limited resources in decisionmaking in order to provide good options at acceptable costs (associating the costs of messages sent, energy used, etc.). According to [31], local information (i.e., agent and environment information) can be used to obtain very efficient algorithms. On this basis, we compare results with respect to unrestricted networks. 2.5. Peer-to-Peer Systems In this context, a system is a set of components that are related to each other and make up a unified whole; it is distinct from its environment [28]. In turn, a peer-to-peer (P2P) system is a system that shares resources (processing capacity, bandwidth, etc.) and services for the purposes

6

ACCEPTED MANUSCRIPT

of computation [33]. A P2P system should be evaluated according to the purpose that it serves, since a generic metric is quite unfeasible [48]. An unwanted phenomenon that is typical of some of these systems is churn (continual turnover of P2P network participants). This phenomenon is ignored here.

CR IP T

Peers may leave the network in two different ways. Graceful leaving is when a peer gives notice that it is leaving and transfers its resources to other peers. This does not apply to the developed model. Ungraceful leaving is when the peer neither gives notice that it is leaving nor transfers its resources to its neighbours (e.g., due to a power failure or low power). The developed model addresses ungraceful leaving.

3. Problem Statement

AN US

As already mentioned, more and more networks are peer-to-peer, and human-computerhuman communications are becoming more common. P2P networks can be subject to limited resources on different grounds. For this reason, their members should be endowed with some intelligence to optimize the quality of the results output using their resources. Therefore, we can define our problem as follows:

ED

M

Let Γ be an open peer-to-peer network with no predefined structure, save the network structure itself. Let Ω be an agent querying the network Γ, that is, sending a message to some of its members and waiting for a reply. If Ω receives more than one reply, it will select or fuse responses. Γ is composed of agents (artificial or otherwise). Each agent carries out IF using local data in order to increase the quality of the information contained in the messages (which contain uncertainty and vagueness) that it exchanges with, and receives from, Ω in order to make the best decisions. Success is defined as a reply by Γ to Ω with a quality greater than a preset value and received within a given time limit or using a predetermined number of messages.

PT

The hypothesis to be tested is:

The performance of Γ measured as: mean traffic (messages received and sent) per agent for replies of differ quality from Γ; and

CE

 

success rate;

AC

improves significantly when the members of Γ adopt the intelligent mechanisms proposed in this article (Section 4) in order to reply to queries working with depleting resources (time, energy, number of messages). Apart from maximizing the quality of the managed information, these mechanisms optimize the use of such resources.

7

ACCEPTED MANUSCRIPT

4. Solution 4.1. General Aspects of the Proposal

CR IP T

In this paper, we propose a multi-agent system architecture that replies to messages sent by an external agent (called Ω). This system architecture uses IF to minimize the impact of system communications vagueness and uncertainties. This architecture is based on an unstructured P2P network whose topology changes with the availability of resources. This network is called d-P2P (d stands for dynamic). The messages should be composed of a number of fields possibly linked by composition relationships and where the qualities of any such composition (fusion) vary depending on each agent. A field may form (be used to determine) more than one single field. Therefore, the data structure is not necessarily a tree. Each agent can, in theory, compose (predict or deduce) any compound field and query any other agent about the value of a compound field. The agents will have been trained at some point to do this. Training techniques will be based, for example, on neurofuzzy networks.

AN US

The system switches to a more intelligent operating mode depending on the number of unanswered messages within any preset time limit and according to a specified limit on the number of available messages per agent. This switchover takes place gradually agent by agent or when a signal indicating limited resources is broadcast. The initial architecture is a flat (decentralized and unstructured [49]) P2P system, which accounts for neither the quality nor the cost of the communications, sometimes with the exception of the number of messages. An external agent determines when the messages in the network should start to be limited. This agent is not modelled here.

AC

CE

PT

ED

M

The idea of message cost usually implies geographical distance. However, we do not consider whether an agent is physically local to another. The response time to each message is assumed to be random for each agent. At the start of system operation, all the agents are in communication. As time goes by, a process of natural selection takes place: when an agent observes that there are too many time-outs, it simply switches to intelligent (or economy) mode and starts to address its messages exclusively to agents that have replied recently and offered better quality information. As time passes, agents that deplete their communication capacity (represented by the communications equipment battery charge, the prepay telephone message credit, the physical possibility of placing a call in a hostile environment, etc.) will stop replying. As a result, others will be selected from among the agents that have not yet been queried because they did not provide the best quality responses, took too long to reply, or entered the system after the query was launched. Therefore, the system always switches to intelligent mode due to either the cumulative number of time-outs or the event limiting the number of messages. Although, as mentioned in the problem statement, new agents will enter the system over time, the agent entry and leaving rate is assumed not to be so large as to be a case of churn [46]. Although described in detail in the following sections, the flowcharts in Figures 1-3 illustrate the overall operation of our proposal. Figure 1 describes the behaviour of agent Ω, which launches a global system query, receives responses from its favourite peers, and fuses responses to output a final outcome, updating its favourites list depending on the quality of the responses received.

8

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

PT

Figure 1. Querying agent behaviour.

AC

CE

Figure 2 shows the behaviour of each system agent. Over successive sessions, an agent receiving an information query tries, if it has the available resources, to give a quality response based on its knowledge. In turn, the queried agent may, if necessary, query its favourite peers to formulate a response. In this case, the favourites list for this agent is also updated. Resources are updated in all scenarios depending on the resources that each agent consumed to respond to the query and, if applicable, to query other peers.

9

AC

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 2. Behaviour of each system agent. Finally, part of Figure 3 illustrates any external agent requesting to join the system.

10

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

ED

Figure 3. Behaviour of each external agent requesting to join the system.

AC

CE

PT

Two mutually non-exclusive approaches can be adopted to design a system with decentralized IF: competitive or cooperative agents. In the competitive case, the agents compete to give independent measures of the same object property or characteristic. This may be very useful for critical systems environments (like aerospace systems that should be failure-resistant). On the other hand, the information provided by the different agents in cooperative systems is not redundant. The proposed system is both competitive and cooperative. On the competitive side, each agent can, in theory, predict all the compound fields. To do this, it tries to offer a better quality reply than the others. On the cooperative side, however, this is done by maximizing the quality of the final reply of the whole system. In a distributed system, reliability is one of the quality issues. In this paper, reliability is basically represented by the number of queried agents, the number of agents that replied and their respective quality (i.e., prediction qualities being equal, a prediction made by querying many agents would be more reliable than a prediction for which no agent was queried). However, the number of queried agents for one and the same quality value should be high. This implies sending more messages, which may be counterproductive. This dilemma is solved by adding an exponent to the quality formula to specify how important it is to check the number of messages for each query.

11

ACCEPTED MANUSCRIPT

Randomness is a major feature of this model (when an agent selects the fields about which it queries others, when it chooses the agents it queries, when it decides whether or not reply or whether the response is of poorer quality than the query, etc.). In this manner, it helps the system instantiated based on the model to self-organize creatively [29]. 4.2. Description of the Model

CR IP T

The different levels of IF (e.g., data, medium, and high) are represented in the system by the field composition relationships. For example, high-level fusion corresponds to the value of the full message m response, and low-level fusion to the predicted value of a non-compound field. The fusion function is based on the arithmetic mean of the values to be fused, as the queried fields were (or were coded as) numerical. While this is a basic form of IF, it is extremely easy to implement.

AN US

4.2.1. Parameters The proposed architecture can be parameterized depending on different aspects explained in Table I. Table I. System parameters.

Description Maximum response time (fixed for all agents) Number of messages exchanged before selecting the preferred PREVIOUS.MES agent for predicting a particular field Maximum number of agents to be queried by any agent (how NO.AG.TO.BE.QUERIED = N many agents are taken from the top of the list of best agents) Maximum number of messages that agents have at the start of MAX.NO.AG.MESSAGES the action. If <0, it is not taken into account. Non-negative integer specifying the importance attached to WEIGHT.LEVEL = Z good quality being achieved within a few query steps. If it is 0, the level is not taken into account. Minimum number of messages that an agent has to have to BOUNDARY.THRESHOLD reply to a CHECK-IN Number of time-outs that an agent must accept before TIME.OUT.BOUNDS switching to intelligent mode Matrix with the relative weights of the fields when calculating W the quality of the register m (where row i represents the weights of the sub fields of mi with m = m1, m2, …,mi,…) MAX.NO.QUERIES Maximum number of queries awaiting reply per agent Number of time-outs received by one agent from another ACTIVITY.LIMIT before it is considered inactive

AC

CE

PT

ED

M

Parameter TIME-OUT.TIME

These numbers should be adjusted, taking into account the number of system agents, the communication network speeds, the agent processing speed, the complexity of predicting each field, etc.

12

ACCEPTED MANUSCRIPT

4.2.2. System Operation

CR IP T

In general terms, the system operates by receiving messages from an external agent (Ω).The external agent asks the system to estimate certain message fields (or even the full message), selected at random by the querying agent. The system returns this estimation or estimations within a preset time. If this time limit is not met, there is a time-out. Generally, neither the system nor any of its agents respond if they are of lower quality than the querying system or agent (although this is very unlikely). This behaviour is inspired by simulated annealing. In the early stages, all the agents receive the queries. With the passage of time, that is, once it has been determined which agents provide Ω with better quality information for the specified fields (or full message), only a few (at most N for each field or for the full message) receive the queries.



STAGE 0: Training of prediction/estimation mechanisms (e.g., neural networks). STAGE 1: Interaction among the agents according to the abovementioned rules (e.g., reply whether the agent is only occasionally of lower quality than the querying agent). This stage includes the stabilization time, that is, when each agent determines which agents it should query for each field. STAGE 2 starts when these agents have been determined. STAGE 2: Agent operation in the system.

M

 

AN US

As their resources are depleted, agents stop being able to reply. On this basis, new agents must be selected to reply to a query. Formally, this is described as stagewise behaviour. System agents are considered to pass through three types of stages during operation:

ED

Agents will all be at the same stage at some point in time, which is when the system can be said to be at this stage. Therefore, the system passes from one stage to another through hybrid intermediate stages where reorganizations take place.

PT

System evolution over time implies that the different agents periodically pass through Stage 1, when the best agents should be determined again because the peers lose their ability to communicate.

AC

CE

The system is open. Agents enter by sending an entry application (“HELLO” message) to any of the bootstrap agents (this list is public). See the “CONNECTION ATTEMPT” and “HELLO IS RECEIVED” events in Section 4.3. If that bootstrap agent has available resources, it will accept the applicant as a new active agent and will reply with an “OK” message (see “OK MESSAGE IS RECEIVED” event, etc.); otherwise, the applicant will have to try with other agents on the list. System leaving is logical and takes place when a member stops replying (e.g., due to a shortage of resources, sharp drops in quality, etc.). The idea of logical leaving is that the agent can continue to be physically connected to the rest of the system, but is no longer visible to its peers. If it acquires resources, it may return to the system. This has nothing to do with an agent deciding to no longer be part of the system and disconnecting. Section 4.3 also describes the basic behaviour of the agents when receiving a data request (query) (“QUERY RECEIVED” event), receiving a reply (“REPLY RECEIVED” event) and when receiving and replying to system entry requests (“HELLO RECEIVED”, “CONNECTION ATTEMPT”, and “OK MESSAGE RECEIVED” events).

13

ACCEPTED MANUSCRIPT

Figure 4 is a diagram of system dynamics. At first, none of the agents has any agent preferences for a field. All agents receive a query (Figure 4A), as Ω does not yet have agent preferences. This query is processed and leads to further queries among the system agents. As these queries are processed, some agents may not be able to reply (and this is as if they had left the system, whereas others may enter (Figure 4B)). The reply is returned to Ω. With the subsequent queries, Ω and the other agents will determine which agents are their favourites.

CE

PT

ED

M

AN US

CR IP T

Likewise, as the queries are passed on, the resources of some of the agents will be depleted. On this ground, Ω will determine its new favourite agents (Figure 4C). The agents that are the new favourites of Ω and of the other agents will now reply to the queries (Figure 4D). This process is repeated indefinitely as long as the system is operational. Note that agents can enter or leave the system at any time.

AC

Figure 4. System dynamics: (A) All peers receive the query; (B) After a number of iterations, each agent knows which other agents are worth querying in each case; (C) With time, some peers leave and others enter; (D) the system reaches a state of equilibrium again. The following must be taken into account:



The agents are permanently listening for messages, even if they cannot reply. The fact that an agent cannot reply is considered to be a logical leaving (see Figure 4).



New agents may enter the system. To do this, they use check-in messages (“HELLO”). Any system agent may be the bootstrap (i.e., process the entry of a new agent). Entry is local: the new agent becomes a system agent known to the bootstrap.

14

ACCEPTED MANUSCRIPT



The agent initialization stage is omitted here.



There is no special operation for an agent to leave the system. When an agent leaves, it merely stops replying to messages.

4.3. Process and Behaviour Description The main agent behaviours are described below.

CR IP T

The difficulty or cost of communications is not taken into account: they are all equally valuable, and there is no notion of a topological neighbourhood, that is, there is no such thing as a distant or close agent.

PT

ED

M

AN US

EVENT: REPLY IS RECEIVED BY Y Description: Y receives a reply from Z about the field mt for id.message 1 Begin 2 If the reply was received before the TIME-OUT then 3 If intelligent mode is ON then 4 Count the number of replies received for mt 5 End if 6 else 7 Count the TIME-OUT 8 If the maximum threshold was exceeded then 9 Switch that agent Y to intelligent mode 10 End if 11 Do reply=NULL 12 End if 13 If all the agents already replied for that mt and id:message or it timed out then 14 Reply 15 End if 16 End

AC

CE

EVENT: QUERY IS RECEIVED BY Y Description: Y receives a query from Z about the field m for id.message. 1 Begin 2 For some compound, randomly selected fields of m (including the proper m) 3 do Query, with a probability inversely proportional to the quality of the approximation of Y to the subfield, another N agents apart from Z //Checking that there are not too many simultaneous queries for the same agent 4 End for 5 If there are no unanswered queries for subfields of m and the respective id.message generated by Y then 6 Reply to Z about m and for id:message 7 End if 8 For each of the other compound fields

15

ACCEPTED MANUSCRIPT

Calculate their value End for

EVENT: OK MESSAGE IS RECEIVED Description: Y receives an OK message from Z. 1 Begin 2 Do Connection attempt flag in X = 0 3 End

CR IP T

9 10 11 End

ED

M

AN US

EVENT: CONNECTION ATTEMPT Description: Agent X starts to try to enter the network (system). 1 Begin 2 Do connection- flag = 1 // Sets the ag that indicates that HELLO should be continued to be sent to agents 3 While connection- flag = 1 repeat // As long as that ag is set, it sends HELLO messages to agents at random 4 Select an agent at random from the list of bootstraps // There is a list of agents that volunteer as bootstraps, and that list is public. The event of joining this list is not described here. 5 Send HELLO to the selected agent 6 Wait a random time 7 End repeat 8 End

AC

CE

PT

EVENT: REPLY Description: Y replies by composing the field m that can now be calculated. 1 Begin 2 Do fusion of the replies and the value calculated by Y //If it timed out the reply is the empty set 3 For each parent of mt that can now be calculated and was queried by a Z 4 Calculate value V of the parent 5 Calculate the quality Q of V according to Y 6 If it is in intelligent mode then 7 If Q > quality of the querying agent then 8 Reply to Z with value V and quality Q 9 else 10 With a probability proportional to (1- Q) / RM Reply to Z with value V and quality Q // RM is the percentage of remaining messages, RM = 1 if it is not in intelligent mode 11 End if

16

ACCEPTED MANUSCRIPT

12 13 14 End

End if End for

CR IP T

EVENT: HELLO IS RECEIVED Description: Y receives a HELLO message from X. 1 Begin 2 If Y has remaining messages > threshold then 3 Insert X in structures of Y and in the list of system entries according to Y 4 Insert X in the list of active agents for Y (that Y can query) 5 End if 6 End

AN US

5. Case Studies 5.1. Generalities

In order to evaluate the proposed model, we designed two case studies regarding the performance of the element classification task (i.e., predictive data mining task) as part of a decision-making support strategy.

M

The data were sourced from two datasets taken from the University of California, Irvine repository [26].

ED

The data fusion level is higher for the first than for the second case study. However, there are much fewer elementary fields in the first than in the second case. Also, the resulting field in this case study is formed by fusing twice as many fields as in the first case study. In sum, the two case studies are somehow complementary. Sections 5.1.1 and 5.1.2 describe each case study, respectively.

PT

5.1.1. Cars (Case A)

CE

In this case, the aim is to classify whether or not a car is acceptable (i.e., should be purchased) based on a set of attributes. The information dealt with in this problem is structured as follows:

AC

CAR = car acceptability, composed of PRICE and TECH PRICE = Price, composed of Purchase price Maintenance price TECH = technical specifications, composed of COMFORT and Safety COMFORT = comfort, composed of Number of doors Number of seats (number of passengers) Size of boot Safety

17

ACCEPTED MANUSCRIPT

5.1.2. Web Sites (Case B) IF has been used in several computer security applications [12]. The following dataset contains fields specifying whether or not a website is, according to certain rules, a phishing site. One of the problems is that there is no approved set of properties defining whether or not website is a phishing site.

CR IP T

We have a set of rules based on site attributes that state whether or not a website is a phishing site; the rules are grouped by topic, for example, address bar-based features or domainbased features The final result, RESULT (whether or not a website is a phishing site), depends on fields a) b), c) and d). These fields depend on the rules described below:

AN US

a) ADDRESS BAR-BASED FEATURES, depending on:  IP address use: o Rule: If the Domain Part has an IP address then it is a phishing site; else it is legitimate.  Long URL to hide the suspicious part: o Rule: If length (URL) <54 it is legitimate; else if length (URL) 54 and 75 then it is suspicious; else it is a phishing site.  Use of “TinyURL” services that shorten URL, and so on.

CE

PT

ED

M

b) ABNORMAL-BASED FEATURES, depending on:  URL request: o Rule: If % of URL requests is <22% then it is legitimate; else if % of URL requests 22% and 61% then it suspicious; else it is a phishing site. o Rule: If SFH is ""about: blank\"" or empty then it is a phishing site; else if SFH "Refers To" another domain then it is suspicious; else it is legitimate  Send information by email: o Rule: If it uses "mail()\" or "mailto:\" in order to send user information then it is a phishing site; else is legitimate and so on. c) HTML- AND JAVASCRIPT-BASED FEATURES, depending on a series of rules omitted for reasons of space. d) DOMAIN-BASED FEATURES, depending on a series of rules omitted for reasons of space. The field values are described in more detail in [26].

AC

Two examples of registers are shown below:  

–1, 1, 1, 1, –1, –1, –1, –1, –1, 1, 1, –1, 1, –1, 1, –1, –1, –1, 0, 1, 1, 1, 1, –1, –1, –1, –1, 1, 1, –1, –1 1, 1, 1, 1, 1, –1, 0, 1, –1, 1, 1, –1, 1, 0, –1, –1, 1, 1, 0, 1, 1, 1, 1, –1, –1, 0, –1, 1, 1, 1, – 1

where –1 stands for a “legitimate site”, 0 is a “suspicious site”, and 1 is a “phishing site”.

18

ACCEPTED MANUSCRIPT

5.2. System Evaluation 5.2.1. Experiment Description In order to evaluate the proposed model, we implemented a prototype simulating the connections between agents. This approach does not require the use of any specific multi-agent platform.

CR IP T

All agent activity was recorded in an ASCII text field and then analysed using ad hoc programs written in Python 3.5. Some of these logs were very large, for example, 208,000 lines. Quality was defined as H ≡ 0 in all cases, that is, data were considered as crisp (i.e., non-vague) and the measure of uncertainty, u(x), was considered as the percentage error of an agent when predicting the value of field x. The simulation was run on conventional computers running Microsoft Windows 7 and 10 Pro. We used Microsoft Excel 2016 to process the data. By convention, the system is said to be operating in intelligent mode when peers communicate using the model proposed here and in simple mode when peers query all agents without considering the past behaviour of other peers.

AN US

Each run for a triple (number of agents, number of messages, mode) was repeated at least 32 times (for the simple or intelligent mode, as explained below). We calculated the average values for all 32 repetitions in order to assure that the results were statistically valid. Each run was independent. The other system parameters were constant across the different runs. The same set of 30 messages sent by Ω (for each case) was used for all the runs. Runs were executed in parallel in order to speed up the data collection process.

M

This version did not automatically switch from simple to intelligent mode when detecting a high number of time-outs. Accordingly, we were able to compare the properties of the two modes more clearly.

PT

ED

The maximum number of parallel searches in (N) was constant at 5. The time-out time was 100 seconds for each agent. The uniform probability distribution was used every time random variables had to be taken into account. Unless otherwise stated, all the other parameters were constant across all experiments. 5.2.2. Quantitative Performance Measurement

Success rate: In the context of P2P networks, resource location is regarded as a success. In this case, the aim is to find a reply whose quality is greater than a given specified value. To be exact, success in this context means that the quality of a reply to a request or query received by the system is at least equal to its original quality. A success is strict when the quality of the reply is strictly greater than the quality of the query, and is lax, otherwise. Strict successes are especially interesting as they are to the ones that match up with the cases in which the system was really useful.

AC



CE

Taking into account the hypothesis defined in Section 3, we used the following metrics for the evaluation process:



Mean traffic load: mean number of messages used per agent (which can be associated with the cost of the query) either in response to each message received from Ω or in order to reply with a given minimum quality.



Number of time-outs: number of times that the system does not reply in time.

19

ACCEPTED MANUSCRIPT

The ultimate aim of the experiments is to compare the result of applying our model (intelligent) with a conventional approach (simple) with regard to the following issues: 

The effect of the number of agents and messages available in the system on the success rate (using replication =1); and



The effect of the number of agents and messages available on time-outs.

CR IP T

The tests to be conducted to analyse the above issues are described in Table II (Case ACars; Case B-Web sites). Table II. Tests to be conducted.

Hypothesis

H0 : μ0 ≥ μ1 H1 : μ0 < μ1

AN US

CE

H0 : μ0 ≤ μ1 H1 : μ0 > μ1

M

H0 : μ0 ≥ μ1 H1 : μ0 < μ1

ED

H0 : μ0 ≥ μ1 H1 : μ0 < μ1

μ0: Number of time-outs in intelligent mode for a minimum number of messages γ, with a given γ μ1: Number of time-outs in simple mode for a minimum number of messages γ, with a given γ μ0: Number of time-outs in intelligent mode for a given number of initial messages μ1: Number of time-outs in simple mode for a given number of initial messages μ0: Number of time-outs in intelligent mode for a maximum number of agents δ, with a given δ μ1: Number of time-outs in simple mode for a maximum number of agents δ, with a given δ μ0: Number of time-outs in intelligent mode for a given number of agents μ1: Number of time-outs in intelligent mode for a given number of agents μ0: Number of successes in intelligent mode for a minimum number of messages γ, with a given γ μ1: Number of successes in simple mode for a minimum number of messages γ, with a given γ μ0: Number of successes in intelligent mode for a given number of initial messages μ1: Number of successes in simple mode for a given number of initial messages

PT

H0 : μ0 ≥ μ1 H1 : μ0 < μ1

Meaning

AC

H0 : μ0 ≤ μ1 H1 : μ0 > μ1

H0 : μ0 ≤ μ1 H1 : μ0 > μ1 H0 : μ0 ≤ μ1 H1 : μ0 > μ1

μ0 Number of successes in intelligent mode for a maximum number of agents δ, with a given δ μ1 Number of successes in simple mode for a maximum number of agents δ, with a given δ μ0: Number of successes in intelligent mode for a given number of agents μ1: Number of successes in simple mode for a given number of

Test No. Case Case A B 1

13

2

14

3

15

4

16

5

17

6

18

7

19

8

20

20

ACCEPTED MANUSCRIPT

H0 : μ0 ≥ μ1 H1 : μ0 < μ1 H0 : μ0 ≥ μ1 H1 : μ0 < μ1

9

CR IP T

H0 : μ0 ≥ μ1 H1 : μ0 < μ1

AN US

H0 : μ0 ≤ μ1 H1 : μ0 > μ1

agents μ0: Average (total number of successes in iteration x /total number of messages used in iteration x) for all agent numbers and initial message numbers in intelligent mode μ1: Average (total number of successes in iteration x /total number of messages used in iteration x) for all agent numbers and initial message numbers in simple mode μ0: Number of messages in intelligent mode necessary to achieve a minimum quality β, with a given β μ1: Number of messages in simple mode necessary to achieve a minimum quality β, with a given β μ0: Number of messages in intelligent mode necessary to achieve a given quality μ1: Number of messages in simple mode necessary to achieve a given quality μ0: Number of times that the response was of a quality greater than or equal to a given α in intelligent mode μ1: Number of times that the response was of a quality greater than or equal to a given α in simple mode

5.3. Results

21

10

22

11

23

12

24

We now report the results of the two case studies, highlighting the strengths of the model.

M

5.3.1. Case A: Car Classification

CE

PT

ED

We studied the number of messages required to achieve a minimum quality (i.e., the reply quality was α, at least). Remember that we set out to check the two-edged question of: a) whether the intelligent model requires fewer messages to get a quality reply, and b) whether the intelligent model has a higher success rate. The results are shown in Table III. Figure 5 (the average number of messages on the y-axis and for the minimum quality level on the x-axis) plots some of the results. Complementarily, we also analysed how often a certain minimum quality was achieved (i.e., how frequently the quality was greater than or equal to α, with a given α). Tests show that the system yields results that are strict successes more often when operating in simple, than in intelligent, mode, albeit only for the lowest qualities. On the other hand, the system more often returns a strict success for the highest quality (1.0) if it is working in intelligent mode. See Table III and Figure 6 (the average number of strict successes on the y-axis for a minimum number of messages on the x-axis).

AC

Therefore, the results were positive, confirming part a) of the question for case study A. Part b) is divided into two parts: 

b1) The number of queries that are unanswered because of time-outs (mean timeouts depending on the number of messages and mean time-outs depending on the number of agents). It was observed that the intelligent mode systematically reduces the number of time-outs (messages that are left unanswered because the predefined query time expires), averaged over the different numbers of initially-available messages. Symmetrically, the intelligent mode always reduces the number of timeouts averaged over all agent numbers.

21

ACCEPTED MANUSCRIPT

b2) The number of successes per agent (the average number of strict successes for a maximum number of agents and for a number of initially available messages). We found that the number of strict successes achieved by changing the number of agents in Case A is always greater in the intelligent than in the simple mode. This is complemented by the results for Case B, where the intelligent mode outperforms the simple mode, even though there are 30 or fewer agents. We found that the number of strict successes achieved by changing the number of agents in Case A is always greater in the intelligent, than in the simple, mode. This is complemented by the results for Case B, where the intelligent mode does not outperform the simple mode until there are 100 or more agents.

CR IP T



Therefore, the results are also positive for part b) of the question in this case study.

For reasons of space, we have omitted charts clarifying the above results for Case A.

AN US

Section 5.4 details the tests that confirm that the proposed intelligent strategy has a statistically significant effect on the results (mainly, number of messages required and successes achieved). 2000

1600

1400 1200

800 600 400 200 0

Intelligent

M

1000

ED

Average number of messages used

1800

0

0,1

0,2

0,3

Simple Lineal (Intelligent) Lineal (Simple)

Minimum desired quality 0,4

0,5

0,6

0,7

0,8

0,9

1

PT

Intelligent 677,335 700,956 744,035 807,172 862,402 922,531 942,256 774,836 521,25 380,047 26,625

Simple

1128,09 1136,08 1248,9 1396,58 1575,29 1803,27 1823,44 1252,84 881,208 569,063 38,6923

AC

CE

Figure 5. Case A: average number of messages required to achieve a minimum quality (all agents).

22

ACCEPTED MANUSCRIPT

80

60 50 Intelligent

40

Simple

30

Lineal (Intelligent)

CR IP T

Number of successes

70

Lineal (Simple)

20 10

Minimum number of available messages

0 10

15

20

30

100

1000

AN US

5

Figure 6. Case A: average number of strict successes for a minimum number of messages.

5.3.2. Case B: Web Site Classification

200

150

AC

CE

Number of times that desired quality is achieved

PT

250

ED

M

In this second case study, we conducted the same analyses as for Case A. Table III confirms part a) again. Figure 7 illustrates that the intelligent mode yields high qualities (0.7 or more) more often than the simple mode. With respect to part b), again analysed as b1 and b2, Figures 8 and 9 confirm this question for b1 and Figures 10 and 11, for b2. In this implementation, we found that the intelligent mode is better when there are more than 100 agents in the system (see Figure 10), although this does not hold statistically (see Section 5.4). We conclude from the abovementioned figures that the analysed question also holds in Case B subject to certain constraints on the number of agents. Section 5.4 reports the tests that confirm the statistical significance of these results.

Intelligent Simple

100

Lineal (Intelligent) Lineal (Simple)

50

0

Desired quality 0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

Intelligent 119,406371,4218855,3958347,44531 43,425 97,54688127,2054144,2305147,3056 147,25 134,6903 Simple

138,5682138,9031153,3889171,2695193,7411223,1979187,2813151,8828120,135485,64063 6,6875

Figure 7. Case B: Average frequency for minimum quality (all agents).

23

ACCEPTED MANUSCRIPT

250

Average number of time outs

200

150

Intelligent

Simple 100

Lineal (Intelligent)

50

Minimum number of available messages 0

5

10

15

20

Intelligent 173,4635417

158,61875

147,765625

122,6979167

83,84375

5,15625

Simple

178,6875

162,2395833

138,515625

102,578125

13,45833333

191,78125

30

CR IP T

Lineal (Simple)

1000

AN US

Figure 8. Case B: Average number of time-outs for different numbers of messages.

145

140

125

120

115

Simple

Lineal (Intelligent) Lineal (Simple)

M

130

ED

Average number of time outs

135

Intelligent

10

15

20

30

100

300

600

Maximum number of agents

PT

5

Figure 9. Case B: Average number of time-outs for different numbers of agents.

CE

18 16 14

Number of successes

AC

12 10

Intelligent

8

Simple

6 Lineal (Intelligent) 4 Lineal (Simple) 2 0 -2 -4

5

10

15

20

30

100

300

600

Maximum number of agents

Figure 10. Case B: Average number of strict successes for a maximum number of agents.

24

ACCEPTED MANUSCRIPT

90 80

60 50

Intelligent

40

Simple Linear(Intelligent)

30

Linear(Simple)

20 10 0 5

10

15

20

30

Minimum number of available messages

CR IP T

Number od successes

70

1000

Figure 11. Case B: Average number of strict successes for a minimum number of messages.

5.4. Statistical Analysis

AN US

All the experiments refer to behaviour measurements for a system operating in two different modes. Also, the runs for each set of parameters are mutually independent. Therefore, the data were analysed as paired samples.

M

The hypotheses were stated such that the alternative hypothesis is the statement that we were trying to prove (thereby assuring strong support, as explained by Witte and Witte [52]). For example, if we want to test whether the mean of a particular measure (e.g., the number of successes), X, is greater in the intelligent, than in the simple, mode, the null hypothesis is XINTELLIG ≥ XSIMPLE. In other cases, such as the number of time-outs or messages used to achieve a quality level, XINTELLIG ≤ XSIMPLE.

AC

CE

PT

ED

All hypothesis testing was conducted using Student's t-test for paired samples. Previously, we applied the Shapiro–Wilk test to check the differences between the two series for normality. This is considered to be the most powerful test for this purpose [38, 10]. There are several possible options if the differences are not normal. Firstly, whenever the sample size is large (n > 30, according to the literature [7, 44]), we can opt to conduct the straight t-test. Secondly, nonparametric methods, like the Wilcoxon test, can be used to verify whether the samples (that form the pair) come from populations with different probability distributions (i.e., whether the intelligent mode really does have an effect on system behaviour [52]). As another point that we want to clarify is which mean is lower, we can determine confidence intervals for the means that we are checking. Supposing the intervals were (a,b) and (c,d) then if b < c we can deduce that one mean is lower than the other for a particular level of α. This is denoted by I(µ0) < I(µ1) in Table III, meaning that the confidence interval at level 0.05 for µ0 is (a,b) and for µ1 is (c,d), and b < c holds. Thirdly, we can apply the Box-Cox transformation [39] to the two-paired series using the same λ. We then calculate the difference between the transforms and apply the t-test to the resulting differences [13]. The application of the transformation does not assure that normality is achieved. Fourthly, we could analyse the data and look at the Q-Q plots [5] in order to find out whether an outlier was causing non-normality. In this paper, we applied the Box-Cox transformation when the differences were found not to be normal. If this transformation did not lead to normality, we analysed the data for outliers. If the outliers accounted for less than 10% of the sample, they were removed; likewise, we ran

25

ACCEPTED MANUSCRIPT

Wilcoxon sign and signed-rank tests to find out whether the source distributions of the two series were the same. We also determined confidence intervals for the two means, and checked whether I(µ0) < I(µ1) or I(µ0) > I(µ1) held. Finally, we also conducted the Hodges-Lehmann test [22] to ascertain the median of the difference between the series and confirm which of the means was greater. These tests are a way of confirming the results of applying the t-test if the condition of the normality of differences between samples does not hold.

CR IP T

The significance level used in all the tests was α = 0.05 for both case studies. However, α = 0.01 and even α = 0.001 were used for some tests because the hypothesis to be accepted had an impact on the benefits of the intelligent mode.

AN US

In most cases (21/24 ≃ 87.5%), the null hypothesis was rejected, and the alternative hypothesis was confirmed (the proposed intelligent method outperformed the simple method), as shown in Table III. In tests 5, 7, 12 (Case A) and 17, 19, and 24 (Case B), the null hypothesis was not rejected for all the qualities or all the different message or agent numbers. This was due to the fact that the intelligent method had desirable properties for only certain quality values, or agent or message numbers, and these properties applied for interesting quality values, and agent or message numbers, as explained in Table IV. Additionally, it might not always be possible to entirely reject the H0. For example, the null hypothesis stated that the intelligent mode achieved a minimum quality level at a rate greater than, or equal to, the simple mode. However, the H0 was rejected at a significance level α ∈{ 0.01,0.05} for all qualities < 1.0 in test 12, that is, the intelligent mode more often achieved qualities = 1.0. This is a desirable property (achieve maximum quality). It offsets the other occasions when the simple mode achieves higher quality (since  f q  1 where q is a specified quality and fq is the frequency with which it occurred, if

M

q

the number of time-outs is constant). Table IV explains the results of these tests.

ED

5.5. Final Remarks on Results

The intelligent model yielded better results than the simple version for different aspects, as shown in Section 5.3 and confirmed statistically in Section 5.4: The average number of messages required to get a reply with a minimum quality level was lower.

PT



This was demonstrated for both case studies by tests 10 and 11, and 22 and 23.

CE

The number of strict successes for a given number of initial messages was lower.

AC





This was demonstrated for both case studies by tests 5, 6, 17, and 18. The number of strict successes increases with the initial number of available messages in both cases.

There were fewer time-outs with respect to agents and messages. In both case studies, the number of time-outs is more or less constant in intelligent mode and almost constant in simple mode with differing numbers of agents, where the trend in Case A was found to be slightly upward. On the other hand, there are fewer time-outs for any number of initial messages in both Cases A and B with the system running in intelligent mode. The trends were likewise systematically better for the intelligent mode than for the simple mode. Tests 1–4 and 13–16 confirm this statement.

26

ACCEPTED MANUSCRIPT



High quality replies are more frequent.



CR IP T

We found that high quality replies were slightly more frequent in intelligent than in simple mode. The linear trends of the two operating modes are almost identical in both case studies. Tests 12 and 24 explain the results. Qualities 0.9 and 1.0 are more frequent in the intelligent, than in the simple, mode in Case B. This also applies to Case A, albeit for a quality of 1.0 only. Taking into account the fact that the frequencies must always add up to 1, we conclude that there is a significant improvement in the quality of the replies in intelligent mode, as there are more top quality replies (0.9 and 1.0). The ratio of the number of successes to the number of messages used is greater in intelligent mode than in simple mode. Tests 9 and 21 confirm this point. Case A Test No.

AN US

Table III. Test results.

Result

H0 is rejected for α = 0.05 and any number of messages

2

H0 is rejected

CE

PT

ED

M

1

4

Sample size (n)

32

257

For δ ∈ {5,10,15,20,30,100,300,600} H0 is rejected in all cases (for all δ) and for α ∈ {0.01,0.05}

All the series of differences are normal at level 0.05.

32

H0 is rejected for α = 0.05

The series of differences do not appear to be normal, perhaps due to n being very large. The Wilcoxon sign and the signed-rank tests denote two different distributions. I(µ0) ≺ I(µ1). The median of

256

AC 3

Normality of the series of differences All series are normal except for 1000 messages. For 1000 the Wilcoxon sign test states that they are different distributions. I(µ0) ⊀ I(µ1). The median of the differences (Hodges-Lehman test) is - 6.7. The series of differences do not appear to be normal, perhaps due to n being very large. The Wilcoxon sign and signed-rank tests denote two different distributions I(µ0) < I(µ1). The median of the differences (HodgesLehman test) is 31.167 at level 0.05.

27

ACCEPTED MANUSCRIPT

the differences (HodgesLehman test) is 26.25 at level 0.05. 5

H0 is rejected for α = 0.05 For the first 164 differences, H0 is rejected for α ∈ {0.001,0.01,0.05}

32

The series of differences do not appear to be normal, perhaps due to n being very large. The Wilcoxon sign and the signed-rank tests denote two different distributions. I(µ0) ⊀ I(µ1). The median of the differences (HodgesLehman test) is - 8.7 at level 0.05. The series of the first 164 differences is normal.

225

All the series of differences are normal at level 0.05.

32

AN US

6

All series are normal.

CR IP T

For γ ∈ {5,10,15,20,30,100,1000}: H0 is rejected in all cases (for all γ) and α ∈ {0.01,0.05}

For δ ∈ {5,10,15,20,30,100,1000}: H0 is rejected in all cases (for all δ), for α ∈ {0.01,0.05}

M

7

H0 is rejected for α = 0.05

ED

8

AC

CE

10

H0 is rejected at the level of α ∈ {0.01,0.05} H0 is rejected for all qualities at the level of α ∈ {0.01,0.05}

PT

9

11

H0 is rejected

12

H0 is rejected at the level of α ∈ {0.01,0.05} for α <1

The series of differences do not appear to be normal, perhaps due to n being very large. The Wilcoxon sign and the signed-rank tests denote two different distributions. I(µ0) > I(µ1). The median of the differences (HodgesLehman test) is –9.439 at level 0.05. The differences are normal at level 0.05. All the differences are normal at level 0.05. H0 could be rejected for up to 0.001 if normality is not considered. The Wilcoxon sign and the signed-rank tests denote two different distributions I(µ0) < I(µ1). The median of the differences (Hodges-Lehman test) is 501.91 at level 0.05. For the qualities that are neither < 0.1 nor 1.0, the differences are normal. For <

257

48 35

334

32

28

ACCEPTED MANUSCRIPT

13

H0: μ0 ≥ μ1 is rejected for α = 0.05 and any number of messages γ ∈ {5,10,15,20,30,1000} H0: μ0 ≥ μ1 is rejected for α ∈ {0.01,0.05}

Normality of the series of differences None are normal. In all cases, the Wilcoxon sign test states that they are different distributions. In all cases, I(µ0) < I(µ1).

Sample size (n)

The series of differences are normal.

257

32

M

14

Result

AN US

Case B Test No.

CR IP T

0.1 and 1.0, the Wilcoxon sign and the signed-rank tests denote two different distributions. For 1.0, I(µ1) < I(µ0) but for < 1.0, I(µ0) < I(µ1). The median of the differences (Hodges-Lehman test) for 1.0 is 1 at level 0.05, that is, the simple model tends to yield more lower quality responses than the intelligent model (fewer responses of quality 1.0).

ED

32

CE

PT

15

The series of differences are normal at level 0.05, except for 15 and 20 For δ ∈ agents. For 15 and 20, the Wilcoxon {5,10,15,20,30,100,300,600} sign and the signed-rank tests denote H0: μ0 ≥ μ1 is rejected in all two different distributions. In both cases (for all) δ and for α ∈ cases I(µ1) > I(µ0). Removing only {0.01,0.05} three outliers in each case, we get a normal series. The series of differences does not appear to be normal, perhaps due to n being very large. The Wilcoxon sign and the signed-rank tests denote two H0 : μ0 ≥ μ1 is rejected for α different distributions. I(µ0) < I(µ1). ∈ {0.01,0.05} The median of the differences (Hodges-Lehman test) is 11.35 at level 0.05. The Box-Cox transform is not useful. For γ ∈ {5,10,15,20,30,100,1000} All the differences are normal at level H0: μ0 ≤ μ1 is rejected 0.05. whenever γ < 20 and for α =0.05, that is, at the level of

256

AC

16

17

32

29

ACCEPTED MANUSCRIPT

0.05, the intelligent mode yields more successes when there are not many messages (minimum number < 20)

CR IP T

225

32

257

CE

PT

20

ED

M

19

For δ ∈ All the series of differences for γ > 5 {5,10,15,20,30,100,300,600} are normal at level 0.05. For γ = 5, the H0: μ0 ≤ μ1 is rejected Wilcoxon sign and the signed-rank whenever δ ≤ 30, for α ∈ tests denote two different {0.01,0.05}, that is, the distributions. I(μ1) ˂ I(μ0). The intelligent mode yields more median of the differences (Hodgessuccesses when there are no Lehman test) is 16 at level 0.05. more than 30 agents (δ ≤30). The series of differences does not appear to be normal, perhaps due to n being very large. The Wilcoxon sign and the signed-rank tests denote two different distributions. I(µ1) > H0: μ0 ≤ μ1 is rejected for α I(µ0).The median of the differences ∈ {0.01,0.05} (Hodges-Lehman test) is - 5.75 at level 0.05. The Box-Cox transformation is applied to the original data which are subjected to the t-test. H0: μ0 ≤ μ1 is rejected for α The series is normal. ∈ {0.01,0.05} H0: μ0 ≥ μ1 is rejected for all qualities <1.0 at the level α H0: μ0 ≥ μ1 is rejected for all qualities ∈ {0.05} < 1.0 at the level α ∈ {0.05}. The The intelligent mode may intelligent mode may use just as many use just as many or more or more messages than the simple messages than the simple mode to achieve qualities = 1.0. It mode to achieve qualities = uses fewer messages for other 1.0. It uses fewer messages qualities. for other qualities. H0: μ0 ≥ μ1 is rejected for The series is normal at level 0.05.

AN US

18

H0: μ0 ≤ μ1 is rejected for α ∈ {0.01,0.05} For the first 164 differences H0 is rejected for α ∈ {0.001,0.01,0.05}

The series of differences does not appear to be normal, perhaps due to n being very large. Wilcoxon sign and the signed-rank tests denote two different distributions. I(µ0) ⊀ I(µ1). The median of the differences (Hodges-Lehman test) is - 5.086 at level 0.05. For the series of the first 164 differences, it is normal and the t-test can be applied. The Box-Cox transform is not useful.

AC

21

22

23

53

35

334

30

ACCEPTED MANUSCRIPT

α ∈ {0.001,0.01,0.05}

H0 is rejected at the level α ∈ 0.01,0.05 for qualities < 0.6

32

CR IP T

24

For qualities other than < 0.1 or 1.0, the differences are normal. For < 0.1 and 1.0, the Wilcoxon sign and the signed-rank tests denote two different distributions. For I(µ1) < I(µ0) but for < 0 it is I(µ0) < I(µ1). The median of the differences (Hodges-Lehman test) for 1.0 is - 1 at level 0.05, that is, the simple model tends to yield more lower quality responses than the intelligent model (fewer quality 1.0 responses).

Table IV. Analysis of results of tests with partial rejection of H0.

Test Definition

Result

7 and 19

H0: μ0 ≤ μ1 with μ0: Number of successes in intelligent mode for a maximum number of agents δ, with a given δ

12 and 24

H0: μ0 ≥ μ1 with μ0: Number of times that the quality of the reply was greater than, or equal to, α, with a given α, in intelligent mode

5 - H0 is rejected in all cases 17 - H0 is rejected for γ < 20

CE

PT

ED

M

5 and 17

H 0: μ0 ≤ μ1 with μ0: Number of successes in intelligent mode for a minimum number of messages γ, with a given γ

AN US

Test No.

7 - H0 is rejected in all cases 19 - H0 is rejected if δ ≤ 30

12 - H0 is rejected when α < 1 24 - H0 is rejected when α < 0.6

Meaning

There are more successes when not many messages are available (minimum number < 20) The intelligent mode achieves more successes when there are fewer agents (≤ 30). This may due to the “wisdom of crowds” phenomenon [30] The intelligent mode tends to increase the quality of the responses, as it produces more results of quality ≥ 0.6

6. Model Applicability

AC

After designing, implementing, and evaluating the proposed model, we recommend it for application in fields where the following circumstances occur at once: 1. When the querying agent Ω repeatedly queries at a greater rate than peers enter and leave the system (that is, system composition is relatively stable); 2. When N is small with respect to the number of active peers (i.e., when the opinion of a few agents suffices); and 3. When the time-out time is long enough for peers to complete their processes (otherwise the peers leave due to time-outs, and it is better not to use this model).

31

ACCEPTED MANUSCRIPT

On the other hand, there are situations where the proposed model is not recommended for application:

CR IP T

4. When peers enter and leave the system at a greater rate than the querying frequency (there is churn), since, in this case, the system would behave like a P2P (it would always end up asking all peers); 5. When the processing load of each peer cannot be increased any further (this model assumes that each peer requires extra CPU and memory usage to select and store its favourites). The extra usage will depend on the number of peers and the number of fields (e.g., if there are 5000 agents in the system and 50 calculated (“queryable”) fields, the success rates of 250,000 combinations would have to be registered); and 6. When it is important for a peer to retain its reputation after having left and re-entered the system. If a peer leaves the system (stops replying), it has to start to build its reputation from scratch when it re-enters (it has to start to generate good quality replies again from scratch). Note that there were times when the system operated in inadequate conditions. Suffice it to 

Condition 1 in Section 6 may not be satisfied when there are initially very few messages available to agents: the agent communicates and disappears within very few interactions (is isolated by message unavailability). This is the case when there are only very few initial messages, leading to a churn-like phenomenon (it would be real churn if the agents were to start communicating again after a time, but we did not model this possibility). As we took N = 5 (querying up to only the five most reliable peers), the system would operate in conformance with Condition 2 in Section 6 only when there are more than, say, 30 agents (N is much greater than the number of system agents).

M



AN US

say:

ED

In both cases, the properties of the proposed model scale up well to a growing number of messages and agents.

7.1. Conclusions

PT

7. Conclusions and Future Work

CE

The primary conclusion that we can draw from the series of experiments described in Section 5 is that response to the question posed in Section 3 is affirmative: there is a significant improvement in system performance measured as mean traffic (messages received and sent) per agent for different qualities of response of Γ and as the strict success rate, whereas the number of times that the system does not reply in time is smaller in the intelligent, than the simple, mode.

AC

The designed system model is based on an unstructured and totally decentralized P2P network that starts to evolve at some point. Examples of unstructured P2P networks are the original Gnutella, Pastry, Freenet, and FreeHaven networks. A significant advantage of P2P networks is that they quickly adapt to fast-growing populations as the query is sent to all the network peer agents [49]. The disadvantage is that the query is sent to numerous peers without bothering whether or not they can reply. This causes network congestion and an exponential use of messages. However, the approach proposed here avoids congestion as queries are sent to only the agent-defined N best agents for a given field instead of all the agents. N and the PREVIOUS.MESS parameter must be properly selected for this purpose. Scalability is improved: even though all the nodes are queried, not as many potential results are found as in an

32

ACCEPTED MANUSCRIPT

unstructured and decentralized pure network (e.g., Gnutella). The model employs techniques based on the Simulated Annealing algorithm's ideas to deal with local minima problem.

CR IP T

There is no centralized mechanism, such as a central server or super node serving other nodes, to coordinate system operations in a decentralized P2P system. All the participants have the same rights and obligations, and any peer can leave the network without having a significant impact on system operation. This rules out a single point of failure, as no one peer is indispensable to the system. The network is, therefore, highly immune to censorship, technical failures, and malicious attacks [49]. The proposed model conserves, and even improves upon, these properties with respect to bandwidth consumption (measured as the number of messages used) and response time (how often it is shorter than the limit, that is, it is not a time-out), for example.

AN US

The proposed model has an implicit cost: it has to maintain routing information (tables) for each agent, such as which are the favourite agents so far, and it requires the associated computing effort. This can pose a problem when the peers enter or leave the system extremely quickly (at a very high rate), as such tables would have to update permanently. However, this problem is not taken into account on the grounds of the no-churn hypothesis. Thanks to the proposed model, the set of agents are able to develop certain selforganizational characteristics (reproduction, mutability, adaptability, randomness, emergency, etc.) that are not detailed for reasons of space and will be addressed in future papers. 7.2. Future Research

The envisaged future lines of research are:

Modify the system in order to meet the boundary setting condition: the system determines its boundaries, and the system should make the decision on new components joining. In the proposed architecture, an agent that has a number of messages under a specified threshold does not recognize the entry request (HELLO) from a new agent. In future versions, the new agent might also be required to submit some sort of certificate attesting to its identity and authenticity.



Study the organizations produced by the model: a system can be organized as either a hierarchy, heterarchy, or both. Examine whether these organizations lead to a complexity reduction, that is, whether the system develops structures and hides details of the environment to reduce overall complexity (e.g., through the formation of clusters or by creating other entities such as active virtual peers or holons).

CE

PT

ED

M



An agent in this model never knows whether its reply was the best or was received too late (time-out). Adding feedback from the emitter (the querying agent) in this regard would increase the number of messages to be sent; however, it would trigger intelligent behaviour on the part of the receiver at an earlier stage. This would reduce the number of messages (e.g., when the receiver queries other agents to get a better quality response). The analysis of whether such feedback and the resulting agent learning should be enabled is, therefore, a design issue warranting evaluation.

AC





We have assumed that the agent is clear about how to decompose a given field (or, alternatively, of which other fields it is composed). A topic worth researching is the choice of one out of several feasible decompositions.

33

ACCEPTED MANUSCRIPT



Study how the number of simultaneous searches (N) and the different agent timeout values influence system behaviour.

References

AC

CE

PT

ED

M

AN US

CR IP T

[1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, Wireless sensor networks: a survey, Computer Networks 38(4) (2002) 393-422. [2] D.S. Alberts, R.E. Hayes, Power to the Edge. Command and Control in the Information Age, Information Age Transformation, Command and Control Re-search Program (CCRP), 3rd. edn., 2005. [3] A. Alston, P. Beautement, L. Dodd, Implementing Edge Organizations: Exploiting Complexity (Part 1: A Framework for the Characterization of Edge Organizations and their Environments), 2005 10 th International Command and Control Research & Technology Symposium McLean, Virginia. June 2005. [4] D.P. Ballou, H.L. Pazer, Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems, Manag. Sci. 31(2) (1985) 150-162. [5] E.T. Berkman, S.P. Reise, A Conceptual Guide to Statistics Using SPSS, Sage Publications Inc., ISBN 1412974062, 2012. [6] E. Bosse, B. Solaiman, Information Fusion and Analytics for Big Data and IoT, Artech House Publishers, ISBN 9781630810887, 2016. [7] A. Brayner, A. L. Coelho, K. Marinho, R. Holanda, W. Castro, On query processing in wireless sensor networks using classes of quality of queries, Inf. Fusion 15 (2014) 4455, ISSN 1566-2535, URL http://www.sciencedirect.com/science/article/pii/S1566253512000115 special Issue: Resource Constrained Networks. [8] Y. Cao, Z. Sun, Routing in Delay/Disruption Tolerant Networks: A Taxonomy, Survey and Challenges, Communications Surveys Tutorials, IEEE 15 (2) (2013) 654-677. [9] F. Castanedo, A Review of Data Fusion Techniques, 2013, doi: doi:10.1155/2013/704504. [10] E.H. Chen, The Power of the Shapiro-Wilk W Test for Normality in Samples from Contaminated Normal Distributions, Journal of the American Statistical Association 66(336) (1971) 760-762, ISSN 01621459 [11] M. Chen, S. Gonzalez, A. Vasilakos, H. Cao, V.C. Leung, Body Area Networks: A Survey, Mob. Netw. Appl. 16 (2) (2011) 171-193. [12] I. Corona, G. Giacinto, C. Mazzariello, F. Roli, C. Sansone, Information Fusion for Computer Security: State of the Art and Open Issues, Inf. Fusion 10(4) (2009) 274-284, ISSN 1566-2535, doi:10.1016/j.in us.2009.03.001. [13] K.A. Doksum, C.-W. Wong, Statistical Tests Based on Transformed Data, Journal of the American Statistical Association 78(382) (1983) 411-417, ISSN 01621459, URL http://www.jstor.org/stable/2288649. [14] P. Ekel, I. Kokshenev, R. Parreiras, W. Pedrycz, J. Pereira Jr., Multiobjective and Multiattribute Decision Making in a Fuzzy Environment and Their Power Engineering Applications, Inf. Sci. 361(C) (2016) 100-119, ISSN 0020-0255, doi: 10.1016/j.ins.2016.04.030. [15] T.L. Fine, Review: Glenn Shafer, A mathematical theory of evidence, Bull. Amer. Math. Soc. 83(4) (1977) 667 672. [16] J. Florentin Smarandache, Advances and Applications of DSmT for In-formation Fusion, Vol. IV: Collected Works, ISBN 9781599733241, 2015. [17] D.-L. Flores, A. Rodríguez-Díaz, J.R. Castro, C. Gaxiola, TA-Fuzzy Semantic Networks for Interaction Representation in Social Simulation, Springer Berlin Heidelberg, Berlin, Heidelberg, ISBN 978-3-64204514-1, 213 225, doi:10.1007/978-3-642-04514-1-12, 2009. [18] P.H. Foo, G.W. Ng, High-level Information Fusion: An Overview, J. of Adv. in Inf. Fusion 8(1) (2013) 33-72. [19] A. Frank, Information Processes Produce Imperfections in Data - The Information Infrastructure Compensates for Them, in: A. Ruas, C. Gold (Eds.), Headway in Spatial Data Handling, Lecture Notes in Geoinformation and Cartography, Springer Berlin Heidelberg, 2008, pp. 467-485. [20] G. Guo, L. Ding, Q.-L. Han, A distributed event-triggered transmission strategy for sampled-data consensus of multi-agent systems, Automatica 50(5) (2014) 1489-1496. [21] G. Guo, S. Wen, Protocol sequence and control co-design for a collection of networked control systems, International Journal of Robust Nonlinear Control 26(3) (2016) 489-508. [22] S.L. Hershberger, Hodges-Lehmann Estimators, Springer Berlin Heidelberg, Berlin, Heidelberg, ISBN 978-3642-04898-2, 2011, pp. 645-636.

34

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[23] H.-M. Hsu, C.-T. Chen, Aggregation of Fuzzy Opinions Under Group Decision Making, Fuzzy Sets Syst. 79(3) (1996) 279-285. [24] ISO-IEC, ISO/IEC 25012:2008 Data quality model, URL http://www.iso.org [accessed January 2018]. [25] G. Karagiannis, O. Altintas, E. Ekici, G. Heijenk, B. Jarupan, K. Lin, T. Weil, Vehicular Networking: A Survey and Tutorial on Requirements, Architectures, Challenges, Standards and Solutions, Communications Surveys Tutorials, IEEE 13(4) (2011) 584-616, ISSN 1553-877X, doi:10.1109/SURV.2011.061411.00019. [26] M. Lichman, UCI Machine Learning Repository, URL http://archive.ics.uci.edu/ml, 2013 [accessed January 2018]. [27] J. Llinas, C. Bowman, G. Rogova, A. Steinberg, E. Waltz, F. White, Revisiting the JDL Data Fusion Model II, in: Seventh International Conference on Information Fusion (FUSION 2004), 2004. [28] H.D. Meer, C. Koppen, Characterization of Self-Organization, Lecture Notes in Computer Science, Springer, 2005. [29] H.D. Meer, C. Koppen, Self-organization in Peer-to-Peer Systems, Lecture Notes in Computer Science, Springer, 2005. [30] B. Miller, M. Steyvers, The Wisdom of Crowds with Communication, in: C. H. L. Carlson, T. Shipley (Eds.), Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Cognitive Science Society, 2011. [31] E.F. Nakamura, A.A.F. Loureiro, A. Boukerche, A.Y. Zomaya, Localized algorithms for information fusion in resource constrained networks, Inf. Fusion 15 (2014) 2-4, special Issue: Resource Constrained Networks. [32] V. Novák, Are Fuzzy Sets a Reasonable Tool for Modeling Vague Phenomena?, Fuzzy Sets and Systems 156 (2005) 341-348. [33] Z. Ou, E. Harjula, O. Kassinen, M. Ylianttila, Performance evaluation of a Kademlia-based communicationoriented P2P system under churn, Comput. Netw. 54(5) (2010) 689-705. [34] H. Paggi, E. Bossé, M. Florea, B. Solaiman, On the use of holonic agents in the design of information fusion systems, 17th International Conference on Information Fusion (FUSION), Salamanca, Spain, 2014, pp. 1-8. [35] H. Paggi, M. Cochez, Indeterminacy Reduction in Agent Communication Using a Semantic Language, WSEAS Transactions on Systems 14 (2015) 77-89. [36] I. Rafique, P. Lew, M.Q. Abbasi, Z. Li, Information Quality Evaluation Frame-work: Extending ISO 25012 Data Quality Model, World Academy of Science, Engineering and Technology 65 (2012) 523-528. [37] J.R. Raol, Data Fusion Mathematics. Theory and Practice., CRC Press, 2015. [38] N.M. Razali, Y.B. Wah, Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and AndersonDarling tests 2(1) (2011) 21-33. [39] R.M. Sakia, The Box-Cox Transformation Technique: A Review, Journal of the Royal Statistical Society. Series D (The Statistician) 41(2) (1992) 169-178, ISSN 0039-0526, 1467-9884, URL http://www.jstor.org/stable/2348250. [40] F. Smarandache, J. Dezert, The Combination of Paradoxical, Uncertain, and Imprecise Sources of Information based on DSmT and Neutro-Fuzzy Inference, CoRR abs/cs/0412091, URL http://arxiv.org/abs/cs/0412091, 2004. [41] N.J.J. Smith, Vagueness, Uncertainty and Degrees of Belief: Two Kinds of Indeterminacy; One Kind of Credence, Erkenntnis 79(5) (2014) 1027-1044. [42] B. Solaiman, É. Bossé, L. Pigeon, D. Guériot, M.C. Florea, A conceptual definition of a holonic processing framework to support the design of information fusion systems, Inf. Fusion 21 (2015) 85-99, ISSN 15662535. [43] C. Solano-Aragón, A.A. Garza, Multi-Agent System with Fuzzy Logic Control for Autonomous Mobile Robots in Known Environments, Evolutionary Design of Intelligent Systems in Modeling, Simulation and Control. Studies in Computational Intelligence, vol 257. Springer, Berlin, Heidelberg, 2009. [44] D.S. Starnes, D. Yates, D.S. Moore, The Practice of Statistics, W. H. Freeman and Co., 5th edn., ISBN 1-46410873-0, 2014. [45] A.N. Steinberg, C.L. Bowman, F.E. White, Revisions to the JDL data fusion model, Aerosense’99, Proceedings Volume 3719, Sensor Fusion: Architectures, Algorithms, and Applications III, doi: 10.1117/12.341367 1999. [46] D. Stutzbach, R. Rejaie, Understanding churn in peer-to-peer networks, Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, Rio de Janeiro, Brazil, 2006, pp. 189-202. [47] P. Sutton, Vagueness, Communication, and Semantic Information, PhD Thesis, King's College London, 2013. [48] K. Tutschku, P. Tran-Gia, 23. Traffic Characteristics and Performance Evaluation of Peer-to-Peer Systems, vol. 3485 of Lecture Notes in Computer Science, book section 23, Springer Berlin Heidelberg, 2005, 383-397.

35

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[49] Q.H. Vu, M. Lupu, B.C. Ooi, Peer-to-Peer Computing. Principles and Applica-tions, Springer-Verlag Berlin Heidelberg, 1 edn., 2010. [50] Y. Wand, R.Y. Wang, Anchoring data quality dimensions in ontological foundations, Commun. ACM 39(11) (1996) 86-95. [51] S. Wen, G. Guo, Control and resource allocation of cyber-physical systems 10(16) (2016) 2038-2048. [52] R.S. Witte, J.S. Witte, Statistics, John Wiley & Sons, 9th edition edn., 2013. [53] L. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. SMC, 3(1) (1973) 28-44.

36

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Graphic abstract

37