Adaptive multi-flow opportunistic routing using learning automata

Adaptive multi-flow opportunistic routing using learning automata

Accepted Manuscript Adaptive Multi-Flow Opportunistic Routing Using Learning Automata Marzieh Ghasemi, Mostafa Abdolahi, Mozafar Bag-Mohammadi, Ali Bo...

457KB Sizes 0 Downloads 14 Views

Accepted Manuscript Adaptive Multi-Flow Opportunistic Routing Using Learning Automata Marzieh Ghasemi, Mostafa Abdolahi, Mozafar Bag-Mohammadi, Ali Bohlooli PII: DOI: Reference:

S1570-8705(14)00184-X http://dx.doi.org/10.1016/j.adhoc.2014.08.013 ADHOC 1104

To appear in:

Ad Hoc Networks

Received Date: Revised Date: Accepted Date:

4 March 2014 17 July 2014 19 August 2014

Please cite this article as: M. Ghasemi, M. Abdolahi, M. Bag-Mohammadi, A. Bohlooli, Adaptive Multi-Flow Opportunistic Routing Using Learning Automata, Ad Hoc Networks (2014), doi: http://dx.doi.org/10.1016/j.adhoc. 2014.08.013

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Adaptive Multi-Flow Opportunistic Routing Using Learning Automata Marzieh Ghasemi Engineering Faculty Science and Research Branch Islamic Azad University of Arak Arak , Iran [email protected] Mozafar Bag-Mohammadi Engineering Faculty Ilam University Ilam, Iran [email protected]

Mostafa Abdolahi Engineering Faculty kharazmi University Tehran , Iran [email protected]

Ali Bohlooli Department of Computer Engineering, Faculty of Engineering, University of Isfahan, Isfahan, Iran [email protected]

Abstract— Opportunistic routing is a promising routing paradigm that achieves high throughput by utilizing the broadcast nature of wireless media. It is especially useful for wireless mesh networks due to their static topology. In the current opportunistic routing protocols, it is assumed that all nodes have enough incentive and resource to help the source regardless of their load and presence of other network flows. In addition, the effect of each active flow on other flows and network status is reflected latter by means of a link quality metric (e.g. ETX) which is updated periodically. The coarse-grained behavior of the metric is not in harmony with network flows dynamics. Therefore, some flows may undergo performance degradation between two consecutive periodic updates of the metric. Our proposed approach which is called Dynamic Cooperative Routing (DCR) modifies MORE and equips it with an adaptive decision making mechanism. We use learning automata to accommodate network dynamics when building an opportunistic path for a flow. The learning automata are activated whenever the source transmits a new data batch for the flow. We have shown through simulation that DCR outperforms MORE when two or more flows are active simultaneously and in the presence of background unicast traffic. Keywords -- learning automata, opportunistic routing, network coding.

1. Introduction Wireless mesh networks (WMN) is an efficient technology for supporting high quality services, such as multimedia and real-time applications in wireless networks. WMN has an inherent property, namely static topology, making it an excellent subject for applying various network optimization techniques such as opportunistic routing (OR) and network coding. OR is designed to take benefit from broadcast advantage of wireless media. Interestingly, it was deprecated as an annoying phenomenon in the past. OR will increase the overall throughput of unicast and multicast flows by employing wireless nodes that opportunistically overhear an ongoing transmission between two

wireless nodes. In fact, it extends the classic notion of next-hop to almost all nodes that overhear the packet and hence spreads the traffic geographically in a wider area employing exposed nodes. Then, nodes that exposed to the traffic cooperatively forward the received data packets toward intended destination. The notion of OR is first introduced by EXOR[1]. EXOR uses ETX [2] metric to predict reception probabilities of wireless links. It then uses this prediction to choose some intermediate nodes between source and destination as forwarding nodes. Clearly, the quality of selected forwarding set affects EXOR performance directly. Currently, the selection algorithm employed by many opportunistic proposals takes into account the quality of network links just before the transmission of first data packet. Then, it updates links qualities based on periodic updates of ETX metric by a central node. It is quite possible that several new flows emerge or some old flows vanish between two consecutive periodic updates of ETX metric. These events affect the quality of selected forwarder set and their eligibility to help sender. Therefore, current proposals for OR do not react to dynamic patterns of network flows properly. Consequent works on OR examined many different aspects of this interesting routing paradigm [3]. Some work, such as MTS [4] tried to enhance the quality of selected forwarding set by proposing new metrics for link quality. Some other work extended the OR concept to different types of wireless networks [5] or combined it with emerging network technique such as network coding [6-9]. Among them, MORE [6] is regarded as a promising technique which efficiently addresses the main shortcoming of EXOR, i.e. strong coordination between forwarding nodes, by utilizing random network coding. It also efficiently reduces the number of duplicated packets. Subsequent work on OR combination with network coding focused on reducing the number of unnecessary coded packets (CCACK[7]) or optimizing the end to end delay associated with MORE acknowledgment mechanism (CodePipe[8], PipelineOR[9]). However, these work also did not considered dynamic interaction of multiple flows on the performance of selected forwarders. In OR, there is a fundamental yet erroneous assumption that all network nodes should help current flow. In fact, some nodes may be at an appropriate geographical position making them excellent candidates for helping the flow, but they may be overwhelmed with other flows or even their internal traffics as well. We want to design a cooperative mechanism taking into account a node’s willingness and motivation to help other nodes using reinforcement learning.

For a motivational example, suppose that there are two opposite flows in the network as depicted in Fig. 1, namely AB and D-C. There are some common forwarding nodes between these flows shown as bold nodes in Fig.1a. These nodes must help both flow and therefore quickly become a critical bottleneck for the performance of both flows. If we segregate the flows as shown in Fig.1b, the bottleneck nodes are avoided yielding in higher overall performance for both flows. In the later case, each flow intentionally uses a longer path to leverage extra forwarding capacities at the edges of the network.

Figure 1. (a) There are several common forwarding nodes between A-B and C-D flows. The bold nodes should help both flows. This extra burden harms the performance seen by both flows. (b) We could improve the performance by using two disjoint forwarding sets. Although both flows use longer opportunistic path than usual, the overall performance is improved due to traffic segregation.

Learning automata (LA) [10] is especially useful when undergoing system has incomplete knowledge about the environment in which it operates. Wireless network is a time-varying system with several dynamic characteristics which are unknown and unpredictable in many respects. Applications of LA in dynamic wireless networks are surveyed and studied in [11]. LA is suitable to random environment such as wireless networks where it can learn the optimal action from system feedback through a continuous interaction with the environment. Automaton is an adaptive decision making mechanism which optimizes its action using a predefined probability distribution function to maximize its reward from system and minimize incurred penalties. The environment responds to each action performed by automaton. Then, the probabilities used for decisions making are updated based on environment response to LA actions. Hence, the environment responses are used as a base for selecting next action. By repeating the above process, LA tries to maximize its reward and converge to an optimal point for system operation. In this paper we have applied LA to a network coding based opportunistic protocol, namely MORE, to handle multiple dynamic flows in presence of background traffic efficiently. In our proposal named dynamic cooperative routing (DCR), the source determines the identity of next forwarders for next three hops. One of the nodes residing in

the third hop is selected as designated LA. It selects the forwarders for next three hops the very same as source. This process is repeated every three hops till reaching the destination. An LA is active in three-hop distances and dynamically responds to dynamic change of the traffic. LA is executed to refine the process for selecting forwarding nodes to choose better forwarders. To the best of our knowledge, DCR is the first application of LA in the field of OR. DCR is an intra-flow network coding method which considers effect of other flows on the selected opportunistic path. We have simulated DCR using OMNET++ simulator [23]. DCR is simulated in conjunction with simultaneous background traffic and other opportunistic flows. Our results indicate that DCR efficiently segregates simultaneous flows to increase their throughput gain in comparison with MORE. In particular, DCR achieved following benefits: u

It increases network throughput in comparison to MORE for a single flow and multiple simultaneous flows. The implemented LA intelligently reroute the flows in order to maximize achieved throughput.

u

DCR will not harm other flows while injecting a new flow in the network.

u

It segregates intersecting flows and forces them to avoid congested part of the network.

u

DCR combines LA, network coding and OR for the first time.

The rest of this paper is organized as follow. First, we briefly review related work in section two. Then, we discuss LA concepts in section three. Section four is devoted to DSR description. Then, we present simulation result in section five. Finally, we conclude the paper in section six. 2.

Related Work

A survey of OR protocols is presented in [3]. The applicability of LA to wireless networks alongside some simulation analysis is studied in [11]. COPE [12] was among the first proposal noticed the unleashed power of network coding to reduce the number of forwarded packets using opportunistic overhearing of transmitted data packets. COPE is an inter-flow network coding protocol which work well with crossing flows. COPE is not considered as a full-fledge OR protocol because it does not determine the whole sequence of forwarders that should forward the packet toward destination. The main challenges against EXOR [1] are strict scheduling between forwarding nodes and the need for strong coupling between MAC and IP layer. In EXOR, The need for coordination between forwarding nodes is well

addressed by using random linear network coding in MORE [6] and many subsequent work. This will also decouple MAC and IP layer. Suppose that an intermediate forwarder received two packets namely A and B. but, the destination received only one of the transmitted packets. By transmitting a random linear coded packet (e.g. 2A+B), the intermediate forwarder allows the destination to extract the lost packet regardless of the received one. Therefore, the need for coordination between forwarding nodes is completely diminished. BEND [13] has a close relation with our proposal in a sense that it tries to bend the traffic to avoid congested path and find more coding opportunities. BEND modifies COPE to create more coding opportunities artificially and hence increase the throughput taking into account local traffic information. BEND is an inter-flow network coding approach the very same as COPE. Our last argument about COPE is applicable to BEND as well. In contrast, DCR is a fullfledge OR protocol and uses intra-flow coding paradigm. It determines the forwarder set using traffic information of other flows passing the LA. O3[14] found that there is a strong tension between OR and inter-flow network coding. OR tendency to broaden the coverage area of the packets inherently destroys coding opportunities. Therefore, they decided to design a two-tier system. O3 performs inter-flow network coding in an overlay network. The underlying network performs regular OR. There are some proposals which use LA technique in wireless networks. In [15], LA is used to alleviate broadcast storm which is inherent to flooding based approaches to obtain a near optimal solution for the problem. LACAS [16] has used LA to avoid network congestion in healthcare wireless sensor networks. An adaptive LA is implemented in all nodes. LA is used to predict mobility pattern of mobile hosts in mobile wireless networks in [17]. In [18], a stochastic estimator learning automata (SELA) routing algorithm is proposed for QoS routing in ATM networks. In [19], LA is used to improve call admission control for ATM networks by reducing the percentage of overestimation. 3. Learning Automata A learning automaton (LA) is an automaton that learns the optimal action out of a finite set of actions through repeated interactions with a random environment [11, 20]. During its interactions with the environment, LA repeatedly chooses one of the available actions at random based on a probability distribution. Then, it performs the action, receives a response from the environment, and transfers to a new state [11]. The environment responses to the action of LA belong to a set of available responses, which is probabilistically related to the automaton action [21]. The

goal of LA is to find the actions that minimize the average penalty received from the environment (or maximize the average rewards received from the environment) [11, 22]. There are several models for acceptable responses of the environment [21]. In P-model, the environment can take only one of two values, 0 or 1. In this model, the response value of 1 corresponds to a penalty (i.e., an unfavorable response) and, while output of 0 corresponds to a reward (i.e., a favorable response) [21]. This model is shown in Fig.  ,Q WKLV ILJXUH Į UHSUHVHQWV WKH VHW RI DXWRPDWRQ DFWLRQV ȕ LV WKH DOORZDEOH HQYLURQPHQW UHVSRQVHV DQG & LV WKH possible environment states. LA selects its actions based on a probability distribution. This probability distribution is updated at each instant based on the response it receives from the environment [20]. If the LA selects action ߙ௜ at time n, then the probability distribution of selecting the actions at time n+1 is updated as follows [21, 22]: -

If the automaton receives a favorable response from the environment (i.e., ȕ ) after accomplishing action ߙ௜ at time n, then:

pi (n 1)  pi (n) a[1 pi (n)] p j (n 1)  (1 a ) p j (n) j

(1) jwi

(2) where, pi (n) is the probability of selecting action i at time n. -

If the automaton receives a unfavorable response from the environment (i.e., ȕ ) after accomplishing action ߙ௜ at time n, then:

pi (n 1)  (1 b) pi (n)

p j (n 1) 

b (1 b) p j (n) j r 1

(3)

jwi (4)

Figure 2. LA is interacting with a P-model environment.

In DCR, the environment responds to the automaton is based on two criteria: traffic of the selected node, and the delay of this node in sending the packet. When the automaton receives a reward or penalty from the environment, it updates the probability distribution of its actions according to Equations (1), (2), (3), and (4). The values of parameters a and b in these equations are considered 0.7 and 0.3, in respect. The process of choosing the actions is repeated for several rounds until the probability of choosing one of the actions reaches 1. This action is considered as the optimal action. 4. DCR We describe the basic idea of DCR in next subsection. The details of DCR are covered in next subsection. In MORE and EXOR, ETX metrics are calculated for all network links by a central node periodically. The central node informs metric information to all network nodes in the beginning of each time period. The metric information is used by the source to determine forwarding nodes and their order of transmission. In OR, it is assumed that all network nodes have adequate resource and sufficient incentive to cooperate with the source and forward its data packets toward destination. The same assumption is made in traditional routing about nodes on shortest path between the source and the destination. The fundamental difference between these otherwise similar assumptions is their scope of operation. In fact, some nodes may not be able to cooperate with the source due to temporal lack of resources (such as queue length and processing power) or traffic patterns crossing those nodes. By using ETX metric, OR is able to accommodate coarse-grained network dynamics adaptively. But, it fails to handle fine-grained network dynamics occurring between two consecutive periodic updates of ETX metric. A.Basic Idea Assume that a wireless node n has an extra capacity c to help other nodes and cooperatively forwards their packets in time instant t1. Network flows that are initiated just after t1 will use n as a cooperative forwarder with high

probability in order to consume its forwarding capacity. Every such decision increases n traffic and decreases c. If the rate of generated flows is considerably higher than frequency of periodic ETX update, n will be became congested gradually. This situation will harm all traffics that relied on extra forwarding capability of n and decrease their perceived quality of service and utility function. Certainly, the next periodic update of ETX metric fixes the situation, but the performance of several flows will be degraded in the meantime. For example in Fig. 3, assume that MORE slightly prefers the lower part of network shown with dashed lines. MORE probably prefers these nodes due to their lower congestion levels and higher delivery probabilities. It will use the lower part for handling flow A-B. Now, suppose that a new flow is initiated between C and D. According to above reasoning, MORE will choose lower part of the network again which may not be still a good decision. A better solution is to leverage other network nodes dynamically to increase overall throughput of both flows.

Figure 3. The lower part of the network is congested and is not suitable for incoming flows. It is better to use upper part of the network to route the flow initiated from C toward D.

In MORE, the forwarder list is calculated and populated by the source. The list is valid until next periodic update of ETX metric. The period of ETX updates (e.g. 10 minutes) is order of magnitude larger than the time required for transmitting a single data batch (e.g. 500ms). Therefore, thousands of packets are transmitted during each period of ETX update. After reception of the ACK for the first data batch, the source could estimate the network performance and reroute the traffic taking into account the impact of its flow on the network. Unfortunately, MORE is not designed to react dynamically to changing network conditions even though there is a good chance to evaluate network performance after the delivery of each data batch. We will design and use learning reinforcement to alleviate this shortcoming and reroute MORE traffic around congested part of the network. B. Selecting LA nodes An automaton is placed in every mesh router in the network for intelligent decision making. It is worth noting that only a subset of intermediate nodes between source and destination will activate their automata to control the network

congestion. The main task of an active automaton is to select forwarding nodes between itself and next active automaton for this flow. Therefore, the first question is that which nodes should turn on their automata for this flow? The answer to this question affects both scalability and precision of proposed method directly. We partition the potential forwarding nodes between source and destination into several consecutive levels based on their distance from the source as depicted in Fig. 4. The first level consists of the source. Other levels contain several forwarding nodes (e.g. 2 to 5 nodes). In each level, an LA is selected as designated LA (DLA). DLA is responsible for running DCR for this flow. Clearly, the source must turn on its automaton. First, assume that all DLAs are turned on in a hop-by-hop fashion. For example, i1 (as DLA1) examines its immediate neighbors (i.e. h1, h2 and h3), finds that h1 and h2 are good candidates for handling the traffic, and decide to use them as forwarding nodes for the flow from A to B. Assume that k1, k2, k3 and k4 are heavily congested. These nodes (or a subset of them) are on the selected opportunistic path that passes through h1 and h2. But, this situation could not be avoided by DLA1 without aid of a carefully crafted mechanism for selecting forwarders. As a result, we could not only relay on congestion information from 1-hop neighbors in DCR. In addition, the scalability of this approach is poor.

Figure 4. The process for selecting next hop forwarders in DCR.

Above argument is valid when the distance between two consecutive and active DLA is two hops. In this case, DLAi considers congestion information obtained from nodes in levels i+1 and i+2 when selecting the identity of forwarding nodes for those levels. Clearly, scalability and precision of this approach is improved in comparison to single-level scenario described previously. Again, it could not avoid congested areas with 3-hops distance and more

from current DLA. In fact, moving from a local decision making toward a more global one will increase the precision and efficiency of the method. But, another scalability concern is raised. The synchronization between normal and DLA nodes requires more control overhead. Several control packets must be exchanged between DLAs and intermediate nodes in order to have a clear image about traffic information between two consecutive and active DLA. In fact, all nodes between two active DLA, say DLAi and DLAi+d (here, we assumed that the distance between two active DLAs is d), must inform their traffic level to DLAi. We decided to activate DLA nodes in levels 0, 3, 6 … as depicted in Fig. 4. This design choice preserves DCR precision at a convenient level while keeping its control overhead at a reasonable level. The source acts as first DLA and selects forwarding nodes for levels 1, 2 and 3. Then, DLA3 selects forwarding nodes for levels 4, 5 and 6. This procedure is repeated until reaching destination. Next, selected DLAs should react to dynamic traffic conditions after transmitting each data batch.

Figure 5. The scheme used for selecting forwarding nodes and activating DLA nodes.

C.Decision Making There are two types of nodes between two consecutive DLAs (say DLAi and DLAi+3), namely forwarding and ordinary nodes. Forwarding nodes could leverage data packets in forward direction to piggyback congestion information in reverse direction. But, ordinary nodes are not allowed to transmit coded packets and hence they must generate a congestion report periodically. The period for generation of traffic reports is the same as the time required to transmit a single data batch. The congestion information of both forwarding and ordinary nodes is accumulated by overhearing nodes. After reception of accumulated traffic information by DLAi, it turns on its automata to make a new decision based on received feedback from the network. Then, it performs the new action, i.e. selecting a new forwarding set to reroute the flow around congested part of the network.

At first, all possible actions receive equal probabilities. Then, DLAi chooses an action in a random manner based on actions probabilities. Each action specifies the identities of selected forwarding nodes lying in level i+1, i+2, and i+3. Then, the action is performed and the environment response to this action is received by DLAi. By performing the action, we mean that the selected forwarders are employed to forward the next data batch. The environment response includes the congestion information and the delay of selected path. LA compares congestion level and path delay for current action with a predetermined threshold. The current action is rewarded (using Equations 1 and 2) or penalized (using Equations 3 and 4) and probability distribution of the action is updated. We have set a=0.7 and b=0.3 in equations 1-4. These parameters are set so that LA converge quickly. Next action is selected randomly using updated probability distribution. This process is repeated for each data batch. The pseudo code for DCR algorithm is depicted in Fig. 5. DCR Algorithm (NODE currentActiveNode ,BATCH BatchList[1..batchCount]) { Set ActionList[1.. ActionCount]= Initialize actions; Set ActionDistributions[1..ActionCount]=Uniform Distribution; Set Converged=FALSE; for batch=1 to batchCount

{

if (Converged==TRUE) then else

Send(OptimalForwarder, BatchList[batch]);

{

SelectedForwarder=Select an action randomly according to ActionDistributions; Send(SelectedForwarder, BatchList[batch]); Receive DELAY and TRAFFIC from the environment; if (DELAY>Threshold1 && TRAFFIC>Threshold2) then ȕ  else ȕ  Update ActionDistributions according to Equations (1), (2), (3), and (4); MostProbableAct=Select the action that has maximum probability in the ActionDistributions; if (ActionDistributions[MostProbableAct]==1) then { Set Converged=TRUE; Set OptimalForwarder= MostProbableAct; } } } }

Figure 6. DCR Algorithm. This algorithm runs in every active DLA (e.g., currentActiveNode) that wants to send the batches (e.g., BatchList).

5. Simulation Result We have implemented MORE and DCR in OMNET++[23] simulator using MiXiM wireless package. We have used IEEE 802.11 and IP as MAC and network protocols respectively. The radio range and packet size are set to 250m and 1500B respectively. Each point in each graph is repeated 50 times. In each run, S sends several data batches. Batch delivery delay is measured from transmission of first data packet of the batch by S till reception of corresponding ACK by S. We have set BSize=32 in both protocols. First, we compare the performance of DCR and MORE for a single flow without background traffic and other competing flows. This situation is depicted in Fig. 7. We assumed that DLAs know the congestion levels of all nodes before transmission of first data batch. Therefore, identities of forwarding nodes for the first data batch are indentified using prior knowledge about the network. After that, LA is used to determine identities of forwarding nodes for subsequent data batches. Interestingly, DCR is performing better than MORE even when there is no disturbing flow in the network. In fact, DCR knowledge about congestion level of 1-hop, 2-hops and 3-hops away nodes helps it to improve the performance.

Figure 7. Arrival time of DCR is compared with MORE for 5 data batches for a single flow.

Figure 8. Arrival time of DCR is compared with MORE for 5 data batches, a single flow, and a disturbing background traffic which generates 60 pkts/s.

Next, we examine the performance of DCR in presence of dynamic background unicast traffic in Fig. 8. Evidently, DCR quickly adopts itself with background traffic and outperforms MORE even for early data batches. One would except that the gap between DCR and MORE become larger for next data batches. But, the dynamic nature of background traffic and random decisions of LA result in small fluctuation in DCR performance. The background traffic generates 60 packets per second. In the subsequent plots, we set up two different flows between S D and D S. the forwarding nodes of these flows are almost identical in MORE. Therefore, the performance of each flow is affected by other one. Hence, this configuration is a good scenario for examining the efficiency of DCR in avoiding congested areas. Clearly, DCR performs better than MORE after second data batch. The performance gap between DCR and MORE becomes wider for subsequent data batches due to learning effects of LA. This plot indicates that LA could improve MORE performance by selecting a longer but uncongested path for each flow. DCR performs 25% better than MORE on average.

Figure 9. DCR outperforms more when there are two opposite traffic in the networks.

Finally, we compare DCR and MORE for two opposite flows in presence of background traffic. The background traffic affects performance of both protocols. Again, DCR is able to use a better path than MORE. Also, the gap between two protocols becomes wider with the number of sent batches.

Figure 10. Average delivery time of in DCR and MORE are compared for 5 data batches in the presence of disturbing background traffic. DCR avoids congested area of the network efficiently.

6. Conclusion In this paper, we showed that reinforcement learning could be used to improve opportunistic routing. We have equipped MORE with an efficient LA mechanism and showed that it improves MORE performance. DCR achieves higher throughput than MORE by avoiding congested parts of the network and rerouting the flow around them. Our simulations indicate that DCR outperforms MORE for all scenarios including single flow and two-flows with and without disturbing background traffic.

References [1] S. Biswas , Robert Morris, ExOR: opportunistic multi-hop routing for wireless networks, ACM SIGCOMM Computer Communication Review, v.35 n.4, October 2005. [2] De Couto, D. S. J., Aguayo, D., Bicket, L., Morris, R., "A high Throughput Path Metric for Multi-hop Wireless routing ", in Proceedings of ACM MOBICOM, pp. 134-146, 2003. [3] R. Bruno and M. Nurchis, “Survey on diversity-based routing in wireless mesh networks: Challenges and solutions”, Comput. Commun. Vol. 33, no. 3 (February 2010), pps. 269-282. [4] Li, Y., Chen, W., Zhang, Z. L., "Optimal Forwarder List Selection in Opportunistic Routing", IEEE 6th International Conference on Mobile Adhoc and Sensor Systems, pp. 670-675, 2009. [5] Zehua, W., Yuanzhu, C., Cheng, Li., "CORMAN: A NOVEL Cooperative Opportunistic Routing Scheme in Mobile Ad Hoc Networks", IEEE Journal on Selected Areas in Communications, VOL.30, NO.2, pp. 289-296, 2012. [6] S. Chachulski , M. Jennings , S. Katti , D. Katabi, “Trading structure for randomness in wireless opportunistic routing”, Proc. of ACM SIGCOM, 2007. [7] D. Kotsonikolas, Wang, C., Hu, Y., C., "Efficient Network-Coding-Based Opportunistic Routing Through Cumulative Coded Acknowledgements ", IEEE/ACM Transactions on Networking, VOL.19, NO.5, pp. 13681381, 2011. [8] Peng Li, Song Guo, Shui Yu, Athanasios V. Vasilakos: CodePipe: An opportunistic feeding and routing protocol for reliable multicast with pipelined network coding. INFOCOM 2012: 100-108. [9] Y. Lin, X. Zhang, B. Li, CodeOR: opportunistic routing in wireless mesh networks with segmented network coding, in: Proc. IEEE ICNP'08, 2008, pp. 1092-1648. [10] J. A. Torkestani and M. R. Meybodi, “An Intelligent Backbone Formation Algorithm for Wireless Ad hoc Networks based on Distributed Learning Automata,” Computer Networks, Elsevier, vol. 54, no. 5, Apr. 2010, pp. 826–43. [11] P. Nicopolitidis, G. I. Papadimitriou, A. S. Pomportsis, P. Sarigiannidis, M. S. Obaidat “Adaptive wireless networks using learning automata”, IEEE Wireless Communications, Vol. 18, No. 2. (2011), pp. 75-81 [12] Katti, S. Rahul, H., Hu, W., Katabi, D., Medard, M., Crowcraft, J., "XORs in the air: Practical Wireless Network Coding", IEEE/ACM Transactions on Networking, Vol.16, No.3,pp. 497-510, 2008.

[13] J. Zhang, Y.P. Chen, I. Marsic, “Network coding via opportunistic forwarding in wireless mesh networks”, in Proc. IEEE WCNC, Las Vegas, NV, April 2008, pp. 1775-1780. [14] Mi Kyung Han, Apurv Bhartia, Lili Qiu, Eric Rozner, “O3: Optimized Overlay-based Opportunistic Routing” MobiHoc, 2011. [15] Bozidar Radunovic, Christos Gkantsidis, Peter B. Key, Pablo Rodriguez: Toward practical opportunistic routing with intra-session network coding for mesh networks. IEEE/ACM Trans. Netw. 18(2): 420-433 (2010). [16] Sudip Misra, Vivek Tiwari, Mohammad S. Obaidat “Lacas: learning automata-based congestion avoidance scheme for healthcare wireless sensor networks.” IEEE Journal on Selected Areas in Communications 27(4): 466-479 (2009). [17] Javad Akbari Torkestani, “Mobility prediction in mobile wireless networks.”, J. Network and Computer Applications 35(5): 1633-1645 (2012). [18] A. V. Vasilakos, G. I. Papadimitriou: A new approach to the design of reinforcement schemes for learning automata: Stochastic estimator learning algorithm. Neurocomputing 7(3): 275-297 (1995) [19] A. F. Atlasis, et al: The use of learning algorithms in ATM networks call admission control problem: a methodology. Computer Networks 34(3): 341-353 (2000) [20] M. A. L. Thathachar, and P. S. Sastry, Varieties of Learning Automata: An Overview, IEEE Trans. on System, Man, and Cybernetics- Part B: Cybernetics, Vol. 32, no. 6, pp. 711-722, 2002. [21] A.S. Poznyak, K. Najim , Learning Automata and Stochastic Optimization, Springer, 1997. [22] Learning Automata an introduction, Authors : K. S.Narendra , M. A.L. Thathachar. [23] Andras Varga. “OMNeT++ – Object-Oriented Discrete Event Simulator.” http://www.omnetpp.org, 2014.

Marzieh Ghasemi, received the BS degrees in Computer engineering from the Islamic Azad University of Mobarakeh, Iran in 2009. She received the MSC degree at the Science and Research Branch Islamic Azad University of Arak, Iran in 2013. Her research interests include wireless sensor network and congestion control algorithms.

Mostafa Abdolahi received his B.S. degree in Information Technology from Ilam University, Ilam, Iran at 2013. He is now a master student in Artificial Intelligence at Kharazmi University. His main research interests include QoS provisioning, opportunistic routing, network coding, MAC layer, and IP routing.

Dr Mozafar Bag-Mohammadi got his B.S. degree in Electrical Engineering from Sharif University of Technology, Tehran, Iran. He obtained his PhD degree in Computer Architecture from University of Tehran, Iran at 2005. He joined the engineering faculty of Ilam University as an Assistant Professor. His main research interests include multicast and unicast routing, opportunistic routing, wireless mesh networks, network coding and router design.

Ali Bohlooli received the BS and MS degrees in Computer engineering (with honors) from the School of Electrical &

Computer Engineering, Isfahan University of Technology, Iran in 2001 and 2003, respectively. He received the Ph.D. degree at the University Of Isfahan, Iran in 2011. Now he is an assistant professor at department of computer engineering university of Isfahan, Iran. His research interests include wireless network and as well as hardware design.