SmartLA: Reinforcement learning-based link adaptation for high throughput wireless access networks

SmartLA: Reinforcement learning-based link adaptation for high throughput wireless access networks

Computer Communications 110 (2017) 1–25 Contents lists available at ScienceDirect Computer Communications journal homepage: www.elsevier.com/locate/...

3MB Sizes 0 Downloads 31 Views

Computer Communications 110 (2017) 1–25

Contents lists available at ScienceDirect

Computer Communications journal homepage: www.elsevier.com/locate/comcom

SmartLA: Reinforcement learning-based link adaptation for high throughput wireless access networks Raja Karmakar a,∗, Samiran Chattopadhyay b, Sandip Chakraborty c a

Department of Information Technology, Techno India College of Technology, Kolkata 700156, India Department of Information Technology, Jadavpur University, Kolkata 700091, India c Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721302, India b

a r t i c l e

i n f o

Article history: Received 31 August 2016 Revised 4 May 2017 Accepted 28 May 2017 Available online 30 May 2017 Keywords: IEEE 802.11n IEEE 802.11ac Reinforcement learning Link adaptation

a b s t r a c t High throughput wireless standards based on IEEE 802.11n and IEEE 802.11ac have been developed and released within the last few years as new amendments over the commercially popular IEEE 802.11. IEEE 802.11n and IEEE 802.11ac support a large pool of parameter set such as increased number of spatial streams via multiple input multiple output (MIMO) communications, channel bonding, guard intervals, different modulation and coding schemes, several levels of frame aggregation, block acknowledgement etc. As a consequence, they boost up physical data rate in the order of Gigabits per second. However, all these enhancements have their internal trade-offs with the channel quality, as explored in the existing literature. For example, higher channel bonding levels result in poor performance under high bit error rate. In a free wireless environment, multiple heterogeneous stations share the wireless channel which is again a time-varying system. Consequently, none of these link level parameters provide an optimal performance for all channel quality instances. Therefore, to practically meet the theoretical high throughput, each wireless device should adapt its physical data transmission rate dynamically by an appropriate tuning of different link parameters. Otherwise, high transmission failure may arise. In this paper, we design an adaptive automated on-line learning mechanism, called “Smart Link Adaptation” (SmartLA), for dynamic selection of link parameters, motivated by “State-Action-Reward-State-Action” (SARSA) model, a variant of reinforcement learning. SmartLA can make a wireless station quite intelligent to cope up with various network conditions by exploiting the best suited data rate observed so far for various channel conditions from the past experience as well as by exploring different possible set of parameters. We analyze the performance of SmartLA in both from simulation analysis and over a 26 nodes IEEE 802.11ac testbed (6 access points and 20 client devices). We observe that the proposed link adaptation mechanism performs significantly better compared to other competing mechanisms mentioned in the literature. © 2017 Published by Elsevier B.V.

1. Introduction IEEE 802.11 defines a set of specifications for physical (PHY) layer and media access control (MAC) sublayer to implement wireless local area networks (WLAN). The base version was released in the year of 1997 and the standard has subsequent amendments improving the performance of WLAN. Now, the demand of wireless service is increasing very rapidly with the use of smart mobile phones and wireless devices. The smart cities are increasing across the world with the need for high throughput WLANs, which are popularly called as wireless hot-spots. A large number of new features has been incorporated into WLANs since its in∗

Corresponding author. E-mail addresses: [email protected] (R. Karmakar), [email protected] (S. Chattopadhyay), [email protected] (S. Chakraborty). http://dx.doi.org/10.1016/j.comcom.2017.05.017 0140-3664/© 2017 Published by Elsevier B.V.

troduction was to fulfill the growing demand of wireless capacity. To provide high range wireless broadband connectivity, IEEE 802.11 has been enriched with IEEE 802.11a, IEEE 802.11b and IEEE 802.11g, where the maximum throughput is up to 54 Mbps. Therefore, they are not suitable for multimedia communications. Then„ the new extensions of IEEE 802.11 like IEEE 802.11n [1] and IEEE 802.11ac [2,3] have been introduced to meet the growing demand for network capacity and achieve high throughput. Commonly, they are known as “High Throughput WLANs” (HT-WLANs), where the theoretical data rate in IEEE 802.11n is 600 Mbps. Whereas, in IEEE 802.11ac, the expected maximum throughput is 7 Gbps in wireless media. Many new features and link level parameters have been introduced into PHY and MAC of HT-WLANs to achieve high throughput, substantially much higher than the data rates previously available. In this context, PHY is enhanced by multiple input multiple output (MIMO) antenna technologies, channel bonding,

2

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

advanced modulation and coding schemes (MCS) and short guard interval (SGI). Whereas, the enhancements in MAC are frame aggregation, block acknowledgement (BACK) and reverse direction (RD) mechanism. Several new features of HT-WLANs have their own contributions in the perspective of high throughput. PHY enhancements: Being an important enhancement, MIMO can increase data rate with the use of multiple antennas (spatial streams). Under signal fading, multi-path long distance communications and signal interference, throughput can be increased significantly by MIMO. In this case, simultaneous transmission of multiple data streams is also possible. Both the transmitter and the receiver use multiple antennas to implement multi-path propagation. To create wider frequency bands like 40 MHz, 80 MHz and 160 MHz, channel bonding is applied by combining together multiple 20 MHz channels. By enhancing channel frequency band, this feature is able to increase physical data rate. Different advanced MCS levels have been introduced in PHY for supporting high data rates. The coding rate with the modification of signal is regulated by several levels of MCS. Considering the type of modulation and coding rate with channel bonding, MIMO and SGI create different MCS indices where each index provides a specific maximum level of throughput in HT-WLANs. Guard interval is applied to ensure the occurrence of distinct transmissions between successive data symbols transmitted by a device. It is intended to overcome signal loss from effect of multipath propagation. In this regard, SGI is the use of 400 ns guard interval instead of 800 ns. This PHY feature reduces the overhead of the additional idle time between the transmitted symbols. While the effect of multipath is not so much serious (not too many other reflecting materials), SGI can be enabled. Thus, the overall transmission delay is reduced and the overall network throughput is increased. MAC enhancements: MAC layer enhancements reduce MAC overhead for handling frame and thus, results in reducing the overall network delay. For example, frame aggregation technique aggregates multiple frames into a single frame. As a result, the overhead of adding separate MAC header and MAC trailer to each individual frame is overcome. In the another MAC enhancement, BACK combines multiple acknowledgements into a single acknowledgement. Hence, it reduces MAC overhead and the combined application of these two mechanisms can reduce MAC processing delay effectively. Frame aggregation is further classified into two categories – (i) Aggregated MAC Service Data Unit (A-MSDU) and (ii) Aggregated MAC Protocol Data Unit (A-MPDU) [2,4]. These two types of frame aggregation are described in the following. Fig. 1 illustrates the concept of A-MSDU and A-MPDU with two levels of frame aggregation. An A-MSDU aggregates multiple Logical Link Control (LLC) packets known as MAC Service Data Units (MSDUs), to create a single MSDU, called A-MSDU. Each of these MSDUs inside an A-MSDU is known as MSDU subframe. The aggregated frame contains a single MAC header which is followed by a maximum of 7935 MSDU bytes. The A-MSDU is constructed when the size of waiting packets is equal to the maximal A-MSDU threshold. The A-MSDU is also created when the maximum delay of the oldest packet reaches a threshold value. Aggregated MSDUs must have the same Traffic ID (TID) and the same destination and source. After adding MAC headers to each MSDU, multiple MAC Protocol Data Units (MPDUs) are combined to form an A-MPDU. The concept of A-MPDU is to combine multiple MPDU subframes, where the A-MPDU has a single PHY header as shown in Fig. 1. A key difference between A-MSDU and A-MPDU is that A-MPDU is formed after the encapsulation of MAC header. Hence, an A-MPDU is created before transmitting MSDU (or A-MSDU) to the PHY layer. The TID of each MPDU in an A-MPDU may differ. An A-MPDU has the maximum size of 65535 bytes. Although several new advanced features have been introduced into PHY and MAC layers in HT-WLANs, each of the features has

Subframe Header

MSDU

Padding

First Level MAC Header

...

MSDU Subframe

MSDU Subframe

FCS

A-MSDU

MPDU Delimiter

Padding

MPDU

Second Level PHY Header

MPDU Subframe

...

MPDU Subframe

A-MPDU

Fig. 1. Frame aggregation.

arrived with their notifiable internal trade-offs in wireless network performance. The internal trade-offs affect the performance of each other of PHY/MAC enhanced features and impact on the network performance negatively, particularly the application level throughput. The performance analysis of different new features along with their internal trade-offs is discussed in some existing works [5–11]. For weak signal strength, channel bonding reduces data transmission range since wider bandwidth needs high signal power for transmitting data for a long distance. Due to an inappropriate channel bonding (selection of channels for bonding), wider channels increase external signal interference and thus, results in increasing of packet loss [5,6]. MIMO has the challenge in designing antenna and multi-channel synchronization using multiple antennas of the transmitter and the receiver [11]. Instead of eliminating inter-symbol interference (IS), SGI can decrease overall network throughput when network traffic increases. SGI results in higher rates of packet error when the delay of the radio frequency (RF) channel exceeds SGI or if the transmitter and the receiver have no precise time synchronization between them. Hence, this feature should be disabled in highly congested wireless network. Further, higher values of MCS index require high signal strength which demands high value of signal to noise ratio (SNR) [7]. There are several factors like signal fading, signal attenuation, channel interference (co-channel or inter-channel interference) etc. in wireless network. These factors affect SNR value with a significant fluctuation. The enhanced MAC features are also associated with their internal trade-offs. For instance, frame aggregation increases packet loss since multiple frames are combined into a single aggregated frame. Hence, low signal quality and high channel error degrade the performance of frame aggregation mechanism [9]. A loss of a single BACK frame provides the loss of multiple frames of acknowledgement. As a result, the number of retransmission increases and the system throughput is decreased [10]. Link adaptation: Wireless network is a time-varying system where signal quality of channel changes abruptly. Therefore, high throughput standards should work cooperatively with the legacy standards. Considering several aforesaid trade-offs, it is very much needed to select different link configuration parameters dynamically such as channel bonding, MCS index, level of frame aggregation etc., looking at the current channel condition. Hence, high

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

physical data rate does not always necessarily convert into high application throughput due to link configuration trade-offs. High throughput can not be achieved due to different factors such as co-channel and inter-channel interference, packet collision, weak signal strength etc. Therefore, it is necessary to select data rate dynamically to adjust with such time-varying channel conditions. Link Adaptation is a mechanism where the best suited data rate is selected dynamically by considering the best possible link parameter set. Consequently, a wireless system can cope up with network condition. In this paper, we discuss dynamic link adaptation to resolve the problem of link adaptation. In this adaptation, a wireless station can select automatically different link configuration parameters (channel bonding, SGI, MCS index, level of frame aggregation etc.) based on channel condition. Although there exists several link adaptation algorithms specifically designed for the legacy IEEE 802.11 standards, commonly known as rate adaptation [12–17], dynamic link adaptation for high throughput standards is severally difficult. This is because of the presence of a significantly large number of PHY/MAC parameter set and their internal trade-offs. We explore the concept of distributed automotive learning namely Reinforcement Learning for dynamic link adaptation in HT-WLANs to design an intelligent sender-side closed-loop link adaptation algorithm called SmartLA. We apply a variant of online reinforcement learning, SARSA, to construct link configuration set using channel bonding, MCS values, SGI and frame aggregation. SNR value of the channel is engaged as the measurement of channel condition. We consider signal-to-interference-plus-noise ratio (SINR) and SNR analogous, because the noise power contains the total of channel noise and interference noise, which is returned by the device driver, according to our implementation. Applying SmartLA, a wireless device can become quite intelligent and learn about the wireless environment from the past experience. As a result, it can adaptively choose the best possible link parameter set when a change is observed in channel condition. At the time of data rate selection, a system looks at the past information which is stored in a statistic table, to select the values of the parameters from the configuration set. A wireless system also explores new parameter set that has not been chosen yet for data transmission. The performance of SmartLA is analyzed through simulation and testbed results. It is shown that SmartLA enhances the network performance significantly compared to the other related mechanisms explored in the existing works. Organization: The rest of the paper is organized as follows. Section 2 discusses about the related research works. A brief discussion of reinforcement learning is presented in Section 3 and the system model with state transition diagram is described in Section 4. We adopt the SARSA model into our proposed SmartLA mechanism and the correlation between these two is established in Section 5. The detailed of SmartLA algorithm is explained in Section 6. Whereas, Section 7 and Section 8 present the simulation and testbed results with analysis respectively. Section 9 highlights the overall impression of SmartLA. Finally, the conclusion is drawn in Section 10.

2. Related works There are several existing works which deal with link adaptation mechanisms, mostly over the legacy IEEE 802.11 devices. However, they are not suitable for high throughput standards. Here, we discuss an overview of such studies which are grouped into two subsections – (1) for legacy IEEE 802.11 devices and (2) for HTWLANs.

3

2.1. Rate adaptation for legacy IEEE 802.11 A large number of research works, such as [13–17] and the references mentioned therein, have discussed rate adaptation for legacy IEEE 802.11 devices. Holland et al. [13] designed a receiver-side rate adaptation algorithm with the help of requestto-send and clear-to-send (RTS/CTS) frames to estimate data rate. In [14], another rate adaptation mechanism is proposed by exchanging RTS/CTS frame. However, the RTS/CTS based rate adaptation schemes increase the overhead of control frame in wireless network and a failure of RTS/CTS transmission affects the rate adaptation procedure. A classical work is proposed by Kamerman et al. [15], where a robust rate adaptation scheme known as automatic rate fallback (ARF) is proposed. This scheme is a sender-side and MAC layer-assisted link adaptation mechanism. In [16], Adaptive ARF (AARF) and Adaptive Multi-Rate Retry (AMRR) are two proposed ARF-based approaches which provide better rate selection suitable in only stable channel environments. Whereas, they fail to adjust with dynamic channel condition. SampleRate [17] is a rate adaptation mechanism that sends data periodically by selecting different data rates to obtain information about wireless network condition. Here, the proposed model enhances the network overhead by applying excessive sampling. However, all these research works are based on the adaptation of modulation and coding scheme considering legacy IEEE 802.11 networks. They do not engage new PHY/MAC features of high throughput standards such as MIMO spatial streams, channel bonding, frame aggregation etc. 2.2. Link adaptation for HT-WLANs Very few recent works have considered the design issue of link adaptation mechanism for high throughput wireless networks. MiRA [18] is a MIMO-based link adaptation technique that uses receiver feedback to select data rate and spatial streaming. However, it performs excessive work for selecting data rate. In [19], the authors discussed a MIMO-based approach known as RAMAS. It is a credit-based mechanism that suffers from overhead of credit assignment for data rate selection and further, all the other enhanced features of PHY/MAC are not considered. MIMO based dynamic spatial streaming and channel bandwidth adaptation are discussed in [5]. It can not take the advantages of HT-WLANs because of the absence of other enhanced PHY/MAC features introduced in IEEE 802.11ac. Minstrel [20] is the default rate adaptation technique in Linux system. It acquires statistical information through channel overhearing but it is only suitable for the legacy wireless standards (IEEE 802.11a/b/g). Thus, it can not get close to the theoretical throughput mentioned in HT-WLANs since several PHY/MAC parameters are absent here. To select an appropriate data rate, different MCS levels and MIMO are applied in [21,22]. But, all the enhanced features of PHY/MAC are not considered in this work. Frame aggregation is a key enhancement of MAC throughput and it has been incorporated into the design of several link adaptation mechanisms such as [23]. All these mechanisms fail to utilize the full available network capacity of IEEE 802.11n and IEEE 802.11ac since they do not consider all the exhaustive PHY/MAC parameters of HT-WLANs. To solve the problem regarding dynamic link adaptation, Minstrel HT [24] considers the maximum number of enhanced PHY/MAC parameter set of IEEE 802.11n. It is the default mechanism for rate adaptation used by the wireless driver ath9k [25]. Exhaustive random sampling is carried out by this technique, which makes Minstrel HT slower in processing. Being a dynamic link adaptation scheme, SampleLite [8] has also perceived different features of IEEE 802.11n. It is a pure threshold-based algorithm taking received signal strength indicator (RSSI) as the threshold parameter without considering noise level and signal interference. Hence, it

4

R. Karmakar et al. / Computer Communications 110 (2017) 1–25 Table 1 Summarization of the main notations defined in this section. Notation

Meaning

S A

Set of states Set of actions Policy State at time t Action at time t Reward at time t Reward obtained by taking action at in state st Q − value of state st for action at (Action-value function) Learning rate Discount factor

 st at rt Rat Q(st , at )

α γ

Fig. 2. A basic block diagram of reinforcement learning.

may fail to cope up with all possible network conditions. In one of our previous research works [26], we have designed a link adaptation mechanism for IEEE 802.11n networks. However, this initial work considers a limited set of different channel conditions measured by RSSI. Further, this mechanism can have state explosion when there is a frequent and significant change of channel condition. Herzen et al. [27] designed a new mechanism to predict the performance of wireless networks. In this work, the proposed machine learning-based scheme does not set the best suited value of a parameter (e.g. channel bandwidth, physical data rate etc.) of a wireless network. Moreover, no exploration technique is applied in the learning process. The link adaptation approach in [28] considers multi-user MIMO and different MCS values. Whereas, other new features of IEEE 802.11ac are absent in the learning process. Due to the lack of exploration mechanism in [28], the proposed machine learning classifier fails to exploit different aspects of link adaptation. Based on the experience gained in our initial research work presented in [26], in this paper, we have designed and developed a robust link adaptation protocol for high throughput wireless access networks. 3. Reinforcement learning Reinforcement learning is a type of machine learning, which is inspired by the concept of behaviorist psychology. This psychology focuses on how software agents should be able to take actions in a time varying environment so that the maximum cumulative reward can be achieved. This action is determined based on some policies that define how an action can be selected in a state after getting reward from the environment. It allows machines to determine automatically the ideal behavior within a specific context to maximize its performance. An agent is constructed inside a system to perform reinforcement learning. In this context, the system is known as a learner. Reinforcement learning performs its operation based on the feedback that is provided from the environment. This behavior can keep on adapting with the environment as time progresses. Here, correct pairs of input and output are never presented and sub-optimal actions are also not corrected explicitly. In reinforcement learning, the focus is on-line activity (performance) that tries to find a balance between exploration and exploitation. We have defined several notations in this section and Table 1 summarizes these notations. The basic model of reinforcement learning consists of the following components. 1. A set of states (S): It is used to define the state of the machine (system) for a given condition of environment. 2. A set of actions (A): An action is applied on a state to get a new state. 3. A set of rules of transition from one state to another state: It defines the next state for a given present state and action.

Fig. 3. A simple execution flow of SARSA learning.

4. A set of rules that determine the immediate reward for state transition: It defines the scalar reward that is obtained by changing a state into a new state. 5. A set of rules that describe the observation of an agent: It defines how an agent will observe and learn the environment. All the rules used in reinforcement learning are often stochastic. In Fig. 2, a basic block diagram of reinforcement learning is shown. It has two basic modules – (i) system and (ii) environment. At any time, the system has a state. Then, the system takes an action and applies it on that state. After applying the action, the environment returns a reward for this state-action pair and also returns the next sate for the system. In this way, a system-environment interaction is developed. There are mainly two types of reinforcement learning: 1. Off-policy: In this policy, the learner always selects the policy that provides the maximum reward in every step. The value of this policy is learned independently of the agent’s actions. Example: Q-learning. 2. On-policy: The learner learns the value of the policy which is carried out by the agent using both the exploitation and exploration steps. Example: SARSA. 3.1. State-Action-Reward-State-Action (SARSA) SARSA is an reinforcement learning mechanism for learning a “Markov Decision Process” (MDP) policy and it is widely used in the area of machine learning. It is an on-line policy where agent performs both the exploitation and exploration steps. The agent learns the value of the policy being selected in a step. A SARSA learning agent learns the environment by interacting with it in discrete time step. Some terms are associated with this learning. At any discrete time t, these terms are defined as follows: • State: Let st be the state. • Action: Let at be the action on st . • Reward: Let rt be the reward associated with st . In Fig. 3, the execution procedure of SARSA is depicted. This figure shows three components of SARSA-based reinforcement learning – (i) state, (ii) action and (iii) reward. At any time instant, the

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

system resides in a state such as s2 . After applying an action a2 selected by a taken policy, the system moves to a new state s3 and earns a reward r3 . In each step, the policy taken by this model will be such that the system can select the best state (i.e., the state which is producing the maximum reward) or any state selected randomly. There are also some mapping functions in SARSA as given in the following.

5

Table 2 Summarization of the main notations defined in this section. Notation

Meaning

c m g a S < c, m, g, a > N

Channel bandwidth MCS value Guard interval Level of frame aggregation Tuple that represents a state Total number of states (configurations)

• Policy: This function finds an action for a given state. Thus, in general, it can be defined by:

The goal of choosing a policy  is to maximize the expected cumulative discounted sum of rewards and it can be defined as follows:

Then, the Q-value is updated for that state. Thus, the core of this mechanism is a value iteration update that considers the old Qvalue and makes a correction on the old Q-value based on the newly obtained information. This update is defined by the following equation:

∞ 

Q (st , at ) ← Q (st , at ) + α [rt+1 + γ Q (st+1 , at+1 ) − Q (st , at )]

 : S → A

β t Rat (st , st+1 )

t=0

Rat is the reward obtained by taking the action at in state st i.e.,

at = (st ) Here, at changes the state from st to st+1 . β is a factor used to set the weightage of reward in a time instant and it is set between 0 and 1. • Action-value-function: This function provides the expected utility for taking an action by the agent in a given state. That means it quantifies the state-action combination. This function is defined as follows:

Q (s, a ) =

d 

θi φi (s, a )

i=1

Q(s, a) is the action-value-function for a state s and action a applied on s. We refer to this function as Q function. At step i, φ i (s, a) is a function that assigns a finite scalar value to a state-action pair (here, state is s and action is a). θ i is a factor to give weightage of step i and it lies between 0 and 1. Here, d indicates that the action a has been applied d times on the state s. The Q(s, a) is also referred to as Q-value of state s for action a. Hence, in general the action-value-function Q can be represented as follows:

Q : S × A → R 3.2. SARSA: model A SARSA agent interacts with environment and updates the policy of learning based on the actions taken and thus, it is known as on-policy learning algorithm. Let at time instant t, the system’s state be st . Let policy  determine the action at that is to be applied on state st to find the next state i.e., (st ) = at . After applying at , let st+1 be the next state which produces the reward rt+1 . Hence, the system moves to the new state st+1 and the reward rt+1 is associated with this transition that can be represented as (st , at , rt+1 , st+1 ). The goal of this learning agent is to accumulate reward as much as possible. Now, in state st+1 , the agent again applies the policy  and determines the action for this state. Let at+1 be the action for state st+1 i.e., (st+1 ) = at+1 . These two consecutive transitions can be represented by a quintuple (st , at , rt+1 , st+1 , at+1 ). This quintuple yields – State-ActionReward-State-Action i.e., “SARSA”. Before learning is started, the function Q returns an arbitrary value that is chosen by the designer. Then, the agent chooses an action and receives a reward. The system goes to a new state which depends on both the selected action and the previous state.

(1)

In the above equation, st is the present state and at is the action applied on it. After applying at , st+1 is the next state and rt+1 is the observed reward. Two more parameters α and γ have been used in Eq. (1), where α is called learning rate and γ is known as discount factor. These two factors are defined as follows: • Learning rate (α ): This factor determines to what extent the newly obtained information will be used to override the old one. If it is 0 then the agent will not learn anything from the environment. When it is 1, the agent considers only the most newly received information. In practice, for all t, a constant learning factor is used like α = 0.1. This is because the learner is given a significant amount of time to gather knowledge about the environment. • Discount factor (γ ): The discount factor determines the importance of the future rewards. The factor with value 0 makes the agent to consider only the current rewards. When the discount factor is approaching to 1, the agent will strive for long-term high reward. If this factor meets or exceeds the value 1 then action values may be diverged. To make a balance between the present and the future reward, γ can be set to 0.5. 4. SmartLA: system model SmartLA is a closed-loop link adaptation technique that considers the receiver’s acknowledgement (ACK) frame as feedback. It employs three parameters – (i) SNR of the channel, (ii) bit error rate (BER) and (iii) frame error rate (FER) of the system. SNR considers noise ratio and thus, it is a very good measurement of signal quality. BER is very close for handling physical signal strength for a system. FER measures error in terms of frames and thus, it is a good performance indicator of MAC layer. In SmartLA, we use an on-line statistic-based learning mechanism that snoops the condition of the channel by observing the channel’s SNR. SmartLA measures the system performance in terms of BER and FER. Based on the acquired knowledge, SmartLA follows a state-transition model and finds out the best possible link parameters for a given channel condition. The detailed model of the system and the working principle of SmartLA are provided in subsequent subsections. Table 2 summarizes the main notations which have been declared in this section. 4.1. Metric selection For deciding values of link parameters, SmartLA employs SNR of the channel, BER and FER of the system, as the observation metrics. It considers channel bonding, wide range of MCS levels, SGI and frame aggregation (A-MPDU). SmartLA always searches for the best suited combination of these PHY/MAC features considering

6

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 4. State transition diagram of the system.

the present channel condition. With these features, a state transition model is followed by SmartLA as discussed next. 4.2. Model description Fig. 4 shows a state transition diagram that describes our proposed model. It presents the transition of the system to select the best suited data rate dynamically based on the different link parameters which are considered as performance observation metrics. The basis of the given state transition diagram prevents the abrupt changes of the link parameters which may bring the system into an inconsistent state. Hence, from the initial value of the link parameters, the goal state (the target set of link parameters) can be

reached through a series of a number of intermediate states. These states are transmitted gradually from a “higher stability”1 state to a “lower stability” state. We represent each state as a tuple S < c, m, g, a >, as shown in Fig. 4. Let M be the set of all possible values of S. If there are total of N number of such configurations in M then |M| = N. In S, c is channel bandwidth (minBandwidth to maxBandwidth in terms of channel bonding - 20 MHz, 40 MHz, 80 MHz etc.), m is the MCS level (minMcs to maxMcs), g is the applied guard interval

1 Lower valued parameters (such as low channel bonding or no bonding, disabled SGI, low levels of MCS etc.) are more stable factors for a channel condition since they have the maximum sustainability, so we call them as “higher stability” states.

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

7

Table 3 Summarization of the main notations defined in this section. Notation

Meaning

BERnorm BERmin BERmax BERk BER(st ,at ) (st ,at ) Qinit t ,at ) Q (f sinal B

Normalized BER The minimum BER of channel The maximum BER of channel The current calculated BER BER for st after applying at The value of FER at st after applying at The updated Q-value of st for at with BERnorm Sequence of SNR buckets Range of each SNR bucket Learning agent Action taken in a state Q-value of a state Statistic table Probability of exploration

δ

Fig. 5. Position of data rate estimation and state.

(40 0/80 0 ns) and a is the level of frame aggregation. We assume there are k number of MCS levels where minMcs and maxMcs are the minimum and the maximum values of MCS respectively. The available minimum and the maximum bandwidths in the network are minBandwidth and maxBandwidth respectively. Now, selection of a state indicates the setting of the values for different parameters of S. Let the maximum size of frame aggregation be maxMpdu and the minimum value be minMpdu. Let us also consider there are na number of values of a in this model. In na = 1, we employ the maximum number of MPDU aggregated frames (maxMpdu) which is decremented by the value of dcr in the successive higher values of na . Thus, we have, minMpdu = maxMpdu − (na − 1 ) × dcr. 4.2.1. Description of the state diagram From Fig. 4, for each value of g, we choose na number of levels of frame aggregation. Two levels of g(40 0/80 0) are considered for every level of c. There are k number of MCS levels for each bonding value of a. Thus, k different MCS values are generated for every level of frame aggregation. We update the channel bonding parameter c as follows:

c = 2 j × minBandwidth,

j = 0, 1 . . . maxIndex − 1

(2)

Here, j = 0 provides the minBandwidth, whereas j = maxIndex − 1 gives the maxBandwidth. As time progresses, the system gathers more information regarding the tuple S and thus, the number of states increases in Fig. 4. Therefore, as the number of states increases, the system can converge to the best possible link parameter set considering the current channel condition.

A C Q E



5. Application of SARSA in smartLA In this section, we discuss the application of SARSA-based reinforcement learning to develop our proposed SmartLA model. First, the motivation behind choosing reinforcement learning is given in the following.

5.1. Motivation for choosing reinforcement learning for SmartLA Reinforcement learning is an adaptive machine learning technique that applies exploration and exploitation mechanisms. This learning can be used to design an intelligent learner to adjust with an environment. Wireless network is a time-varying system where dynamic link adaptation becomes a challenging issue due to abrupt change of signal strength. In this scenario, an intelligent learner can be effective to learn the environment. Depending on the network condition, the learner can be able to take decision for its next move. Further, due to the presence of a large set of new PHY/MAC parameter set in HT-WLANs, dynamic link adaptation becomes extremely difficult. So, we have employed reinforcement learning to design an intelligent learner that will perform dynamic link adaptation in HT-WLANs. Reinforcement learning with exploration and exploitation schemes can help to apply unexplored parameters and exploit the best suited parameters as well.

5.2. General description 4.3. Phases of the system The best suited link parameters are estimated periodically after the interval tdur . In SmartLA, there are two phases followed by the system as shown in Fig. 5 – (i) data rate estimation and (ii) data transmission. At the data rate estimation phase, the system measures SNR of the channel and selects a state. That means a set of link parameters is selected. This phase provides the best possible data rate for the next transmission. In the data transmission phase, the system transmits data with the data rate and link parameters, which have been selected in the estimation phase. The data transmission is carried out for the interval tdur . In Fig. 5, tdur is mentioned as t. From Fig. 5, SNRinit and SNRend of the channel are measured at the beginning and ending of data transmission phase, respectively. After the data transmission period, the system finds avgSNR from SNRinit and SNRend as shown in Fig. 5. The system also calculates BER of the received packets and estimates FER from the ACK. Considering SNR, the combination of BER and FER plays the key role to choose the best suited link parameter set in the next estimation phase.

Here, we describe the application of SARSA in our proposed link adaptation mechanism (SmartLA). We show how on-line reinforcement learning can be incorporated into dynamic link adaptation of high throughput wireless networks. The main objective of this on-line learning is to make a wireless device capable of gathering information about time-varying wireless environment. Here, information means quality of signal, channel interference, rate of data loss, throughput of running applications etc. After gaining this information, a wireless device can decide which types of settings of the PHY (MIMO, MCS level, channel bandwidth and guard interval) and MAC parameters (frame aggregation and block acknowledgement) can be chosen for a given condition of wireless environment. This type of selection can be helpful to select the best possible combination of the PHY/MAC parameters for achieving the best possible performance for a given scenario. We have defined some notations in this section. Table 3 summarizes these notations. To apply SARSA in SmartLA, we need to map the different components of SARSA into different parts of SmartLA. In the following sections, a detailed description of the association between SARSA and SmartLA is presented.

8

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

5.3. Description of state, action and reward When the system selects a channel bandwidth, value of MCS, guard interval and frame aggregation, we consider these values collectively as a state for the system. To bring the system into a new state, an action is to be applied on the present state. Further, the system performs data transmission after choosing a state. Then, the performance is measured in terms of some metric which is called as reward for that state. These three terms (state, action and reward) form the backbone for the SARSA structure in our mechanism. Hence, an association is needed between these three terms and SmartLA model. This association is presented in the following. • State: Every state is represented by the tuple S < c, m, g, a >. According to Fig. 4, when the system resides in a state, some values are assigned into c, m, g and a, such that the combination of these values forms a unique set. Therefore, each value of tuple S denotes a unique state. • Action: It is performed to change a state. Let at any time instant t1 , the state of the system be S1 < c1 , m1 , g1 , a1 >. Now, the system wants to change its state into S2 < c2 , m1 , g2 , a1 >. Then, the required action will be the change of c1 and g1 into c2 and g2 respectively. In this case, other two parameters m1 and a1 remain same. Hence, the action is the change of channel bandwidth and guard interval. If the next state is S3 < c3 , m3 , g1 , a1 > from S1 then the required action is to change the value of c1 and m1 into c3 and m3 respectively. Thus, in this case, the action consists of changing the channel bandwidth and the value of MCS level. In this way, we can say that action refers to the change of values of the PHY/MAC parameters (c, m, g and a) associated with S. • Reward: After selecting a state s ∈ S, the system sends data by the selected values of c, m, g and a. After the data transmission phase, the system measures BER. For a given scenario, the BER is considered as reward in state s in our model. For the same state, BER may vary in different times since wireless environment is very dynamic in nature and changes frequently (change of signal strength, channel interference etc.). Therefore, for the same state, the reward will be different in different time instants. 5.4. Generation of Q-value As we discussed in Section 5, action-value-function generates the expected utility for taking an action by the agent in a state. This expected utility is measured by the Q-value following Eq. (1). To find the Q-value, first, we have to identify – state, action and reward, in the current context. FER is calculated based on the ACK received from the receiver after the data transmission phase. In our model, Q-value is represented in terms of FER. In this context, we define some parameters and their computations as discussed in the following. Calculation of FER: We define FER as follows: FER =

Fig. 6. Q-value update steps.

rt+1 = BER(st ,at ) , since we consider BER as reward in our SARSA model. Two types of Q-value: We define two types of Q-value for each state as discussed below: 1. Qinit : After applying an action, it is the value of FER. 2. Qfinal : It is the updated Q-value with the calculated BERnorm . At any instant t, the Qfinal is calculated only for st (the present state). At that time, Qinit is calculated for st+1 . Update of Q-value: Let at any time instant t1 , the state be s1 . Let an action a1 be chosen by a policy and applied on s1 . Let the next state be s2 with FER value of FER1 . Next, let us consider the policy chooses action a2 to be applied on s2 and the corresponding observed FER is FER2 . It can be explained as the application of the action a1 changes the state from s1 to s2 . This state is further changed by a2 . Thus, it results in changing the FER value from FER1 to FER2 . We also let after applying a1 in s1 , the calculated BER be BE R(s1 ,a1 ) i.e., rt1 +1 = BE R(s1 ,a1 ) . From Eq. (1), we define Qfinal of s1 for a1 as follows:

Q (s1 , a1 ) ← Q (s1 , a1 ) + α [rt1 +1 + γ Q (s2 , a2 ) − Q (s1 , a1 )] By representing this equation with FER and BER, we have, (s1 ,a1 ) (s2 ,a2 ) (s1 ,a1 ) 1 ,a1 ) Q (f sinal ← Qinit + α [BER(s1 ,a1 ) + γ Qinit − Qinit ]

Number of frames transmitted - Number of frames received Number of frames transmitted

← F E R1 + α [BE R(s1 ,a1 ) + γ F E R2 − F E R1 ] (3)

FER identifies the rate of the number of frames which have not been received by the receiver in a given period of time. Normalized BER: Normalized BER (BERnorm ) is calculated by,

BE Rnorm

BE Rk − BE Rmin = BE Rmax − BE Rmin

Here, BERk is the current calculated BER. Whereas, BERmin and BERmax are the minimum and the maximum BER of the channel, which have been observed so far. At t, BERnorm is measured for st after applying at . Let this BER be denoted by BER(st ,at ) . Hence,

(s ,a1 )

Here, Qinit1

(s ,a2 )

and Qinit2

are the Qinit values of s1 and s2 for a1 and (s ,a )

t t a2 respectively. In general, at instant t, the expression for Q f inal

can be represented as follows: (s

(st ,at ) t ,at ) Q (f sinal ← Qinit + α [BER(st ,at ) + γ Qinitt+1

,at+1 )

(st ,at ) − Qinit ]

That is, t ,at ) Q (f sinal ← F E Rst + α [BE R(st ,at ) + γ F E Rst+1 − F E Rst ]

(s ,a )

(s

,a

)

(4)

Here, we let Qinitt t = F ERst and Qinitt+1 t+1 = F ERst+1 . Fig. 6 shows the basic steps to update the Q-value (Qfinal ) of a state. It can

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

9

be noted that according to SARSA model, -greedy policy is applied twice in the update procedure of Q-value of a state (first time in st and second time in st+1 ). The Q-value measures how much FER has been changed by choosing an action in a state. The update of Q-value with BER makes the Q-value highly sensitive to the change of signal level. Higher Q-value indicates there is higher frame error or higher BER or the higher values of the both in the network. Thus, the objective will be to decrease the Q-value as much as possible. 5.5. SNR buckets In SmartLA, we consider a sequence of SNR buckets and each bucket stores a sequence of SNR values. Let the sequence of SNR buckets be represented by B and B =< B1 , B2 , B3 , . . . , Bn >, such that B1 < B2 < B3 < . . . < Bn , where Bi (1 ≤ i ≤ n) denotes the ith bucket. Here, n is the number of such buckets. The range of each bucket represents the range of SNR values stored inside a bucket. Let δ be the range of each bucket. For example, if δ = 10 and B3 starts from 21 dB then 21 dB ≤ B3 ≤ 30 dB and B4 will start from B3 + 10 = 31 dB. Let minSNR and maxSNR be the calculated minimum and the maximum SNR values of the channel respectively. Hence, B1 starts with the minSNR, whereas Bn ends with the maxSNR. All intermediate values of SNR are distributed through B1 to Bn following the range δ . The system measures the channel’s SNR value denoted by snr and puts snr into the appropriate bucket of B. 5.6. Design of learning agent We design a software agent called learning agent or agent denoted by A to execute the on-line adaptive learning procedure. For data transmission, each wireless station employs this agent to select a state s ∈ S and the corresponding data rate. The agent learns the environment using SARSA reinforcement learning technique. To carry out on-line learning, the agent needs to keep information about the parameters of learning – (i) state, (ii) action, (iii) reward, (iv) Q-value and (v) policy. When the system stays in a state, it employs the agent A to execute some steps which are described as follows: 1. 2. 3. 4. 5. 6.

The current state is read as the present state. A policy is taken to decide the action. The action is applied. The system is transmitted to the next state. A reward is observed from the environment. The Q-value of the present state is updated for the taken action.

Hence, the agent works with an input set and output set as discussed in the following. • Input: Present state, policy and reward. • Output: Action, Q-value and next state. In every iteration of data rate selection, the input and output sets are required for the agent to follow the steps of on-line learning. 5.7. Statistic table SmartLA maintains a statistic table denoted by E to interact actively with wireless environment. Here, E is defined by 4-tuple. Each row of E contains the tuple < B, S, C, Q > and acts as the experience for the system to select transmission rate. C represents the action taken in a state and Q is the Q-value of a state. The system measures two SNR values (SNRinit and SNRend ) of the channel as shown in Fig. 5. At the end of data transmission phase, the

Fig. 7. Block diagram of the system.

agent C calculates average SNR (avgSNR) from these two SNR values. Then, it stores this average SNR in an appropriate bucket(B) along with the corresponding state information (S), taken action (C) and the Qfinal -value (Q) of the selected state s ∈ S. Therefore, E contains the past experience regarding data transmissions. 5.8. System architecture The inputs to the system are ACK from the receiver and SNR of the channel. The output is the best suited set of link parameters selected by C. The block diagram of the system is shown in Fig. 7. A state s ∈ S is a set based on aforesaid inputs and E is updated accordingly. In Fig. 7, the inputs and the output of the system are depicted. SNR of the channel and ACK are the inputs. The output is the data rate selected for the next data transmission. Here, the system calculates BER and FER from ACK. Using SARSA, Q-value is calculated accordingly. Then, the E is updated with the Q-value measured by SARSA. Considering the present SNR value of the channel, the next configuration set is selected based on the policy (applied by SARSA), Q-value and other attributes of E. Data is transmitted by the selected data rate and the configuration set. 5.9. Importance of ACK frame from the receiver The ACK plays an important role for the agent as well as for the system. We have already discussed that reward and Q-value are two vital parameters for the agent. Reward acts as input and Q-value is treated as output for the agent. From ACK frame, the number of successfully transmitted frames can be calculated. This number will be equal to the number of frames acknowledged by the receiver in a given time interval. The sender knows the total number of frames transmitted. Hence, from ACK frame, FER can be calculated by Eq. (3) and it is served as the basis of Q-value. 5.10. Policy for SmartLA model A policy is to be chosen by the agent inside the system for taking action. We consider -greedy mechanism [29] as the policy for the model. The associated agent applies -greedy mechanism to select an action in our SARSA-based SmartLA model. -greedy policy: It is a simple well known policy for reinforcement learning. This policy introduces a parameter called exploration probability. At time t, we define t as follows:

t = min(1, rN/t 2 )

(5)

Here, N is the total number of possible states in the system. r is a parameter and r > 0.

10

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

• Exploration: In this procedure, a randomly selected state is chosen and the probability of this type of selection is . • Exploitation: The state which has produced the best reward is selected in this approach. The probability of exploitation is (1 ). These two approaches can be combined as a Strategy. Therefore, a Strategy is defined as follows:





Strategy = × Explore + 1 − × Exploit

Table 4 Snapshot of statistic table E at any time instant t. SNR bucket

State

B1 B2

< < < < < < < <

B3 B4

B5

c1 , c2 , c3 , c4 , c5 , c6 , c7 , c8 ,

Action m1 , m2 , m3 , m4 , m5 , m6 , m7 , m8 ,

g1 , g2 , g3 , g4 , g5 , g6 , g7 , g8 ,

a1 a2 a3 a4 a5 a6 a7 a8

> > > > > > > >

< < < < < < < <

w1 , w2 , w3 , w4 , w5 , w6 , w7 , w8 ,

Q-value x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 ,

y1 , y2 , y3 , y4 , y5 , y6 , y7 , y8 ,

z1 z2 z3 z4 z5 z6 z7 z8

> > > > > > > >

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8

Using exploration, this policy helps to gather knowledge about states which have not been applied in the past. Additionally, exploitation leads to the selection of the best state and thus, the maximum performance can be obtained in this case. -greedy scheme provides a nice balance between these two approaches by using Eq. (5). From this equation, the tendency of getting the maximum reward increases as time progresses (exploitation). Therefore, -greedy policy always tries to converge into some states which can provide the maximum reward. From Eq. (5), it can be noted that as time increases, the value of decreases. As a result, the probability of exploitation increases. Initially, the agent performs more exploration than exploitation. This initial approach helps to store more information in E about the wireless environment. As time passes away, the rate of exploitation increases. Hence, the system can select the best state so far from its experience stored in E. 5.11. Representation of action in Table E In E, action is represented by a four tuple C < w, x, y, z > . Each parameter of C is discussed as follows: 1. 2. 3. 4.

w = Change of channel bandwidth x = Change of MCS value y = Change of guard interval z = Change of level of frame aggregation

For example, if channel bandwidth is changed from 20 MHz to 40 MHz, MCS level is changed from 0 to 6, guard interval is changed from 800 ns to 400 ns and frame aggregation level is incremented by 2 then the values of the action will be < +20, +6, −400, +2 >. 5.12. Change of state in state transition diagram SmartLA applies SARSA and follows a state transition model as shown in Fig. 4. From this figure, it can be observed that there are k (the maximum MCS value) number of states at each level. When the system selects a state, it is situated in one of the states for the selected level of frame aggregation. Now, consider an example which is described in the following. Present state: Let presently the system be residing on state S(3k). So, it has chosen the following parameters of S: • • • •

c = maxBandwidth m = maxMcs(3k) g = 400(SGI) a = minMpdu(a = 4 )

Next state: Now, applying SARSA, if the system selects S(5k − 1 ) shown in Fig. 4, as the next state then the parameter set of this new state is defined as follows: • • • •

c = maxBandwidth m = minMcs(5k − 1 ) g = 800 a = maxMpdu(a = 1 )

Fig. 8. SARSA based SmartLA model.

Hence, the required action is the change of MCS index, guard interval and level of frame aggregation. In this way, the system spans virtually through this state transition diagram during link adaptation throughout its life time. 5.13. Execution steps The agent follows some steps to produce output after taking input. The agent selects a data rate for data transmission by executing the steps defined by SARSA. These predefined steps are performed periodically after a given time interval. At any time instant, let the entries of the table E be given in Table 4. Our proposed link adaptation mechanism is based on SARSA and its basic execution block diagram is shown in Fig. 8. In this figure, we can observe three blocks – (i) Present state, (ii) Next state and (iii) Update Qvalue. The -greedy policy is applied twice and the present state is updated at the beginning of each iteration. The association of reinforcement learning with SmartLA mechanism and the detailed execution steps of SmartLA are described in the following. 1. In every step, the agent applies -greedy policy to find the action. The present avgSNR is calculated and the SNR bucket is identified where avgSNR falls in. When exploitation is performed, the state with the lowest Q-value in this bucket is chosen from E as the next state. Otherwise, if exploration is followed then a state is selected randomly from E for the next state. As two Q-values (Qinit and Qfinal ) are computed from FER and BER respectively, SmartLA tries to choose the lowest Q-value. Therefore, SmartLA has a tendency to reduce FER and BER such that overall performance of the network can be improved.

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

2. At any time instant t, let the system be residing in state < c2 , m2 , g2 , a2 > and wants to select the next state and data rate for starting data transmission in the next transmission phase. 3. At t, we also assume that the content of the statistic table E is as shown in Table 4. Now, the following steps are performed according to SARSA model. 4. Applying -greedy policy, the agent selects the next state as follows: (a) Exploration: A state is selected randomly from set M with probability . (b) Exploitation: If avgSNR falls into the range of a bucket Bi (1 ≤ i ≤ n) stored in E then the state with the minimum Q-value is chosen from Bi in table E with probability (1 ). 5. The system calculates the avgSNR and let exploitation be performed in this phase. 6. In E, let us consider the avgSNR falls in bucket B4 . From Table 4, B4 has three states associated with it in E. 7. In B4 , let the state < c6 , m6 , g6 , a6 > have the lowest Qvalue i.e., Q6 < Q5 and Q6 < Q7 . Hence, < c6 , m6 , g6 , a6 > is selected as the next state. 8. If no SNR bucket is found in E, where avgSNR can fall into then the state with the lowest Q-value in E is selected as the next state. 9. Now, an action is selected to change the state from < c2 , m2 , g2 , a2 > to < c6 , m6 , g6 , a6 >. Let < w9 , x9 , y9 , z9 > be the required action to accomplish this change of state. The action is described in the following. (a) Channel bandwidth from c2 to c6 : w9 (b) MCS value from m2 to m6 : x9 (c) Guard interval from g2 to g6 : y9 and (d) Frame aggregation level from a2 to a6 : z9 After combining the above parameters, the chosen action, at , can be defined as follows:

at =< w9 , x9 , y9 , z9 > 10. Applying this action, the system goes to the state < c6 , m6 , g6 , a6 > and sends data by the selected data rate in this state. 11. At the end of the data transmission phase, the system calculates FER (i.e., F ERst ) based on the ACK received from the receiver. It also calculates BER (i.e., BER(st ,at ) ). Hence, BER is considered as reward (rt+1 ) of the state, < c6 , m6 , g6 , a6 >. 12. The agent again applies the -greedy policy in < c6 , m6 , g6 , a6 > to select an action in this state. Now, let exploration will be applied in this time. 13. The system calculates avgSNR and identifies the corresponding SNR bucket. 14. Then, exploration policy is applied and let us consider the next action (at+1 ) is < w10 , x10 , y10 , z10 > . Therefore, at+1 can be defined as follows:

at+1 =< w10 , x10 , y10 , z10 > 15. After applying at+1 , let the next selected state be < c8 , m8 , g8 , a8 >. Data transmission is carried out using the data rate set by this state. 16. At the end of the data transmission phase, the system calculates FER (i.e., F ERst+1 ) based on the ACK received from the receiver. (s2 ,at ) 17. Now, using Eq. (4), Q f inal is calculated, where the present

state (st ) is s2 i.e., < c2 , m2 , g2 , a2 > and the next state (st+1 ) is < c6 , m6 , g6 , a6 >. 18. Hence, Eq. (4) is applied to update the final Q-value of the sate s2 with the action at . In this scenario, the equation is as

11

Table 5 The updated snapshot of statistic table E. SNR bucket

State

B1 B2

< < < < < < < < <

B3 B4

B5 B6

c1 , c2 , c3 , c4 , c5 , c6 , c7 , c8 , c2 ,

Action m1 , m2 , m3 , m4 , m5 , m6 , m7 , m8 , m2 ,

g1 , g2 , g3 , g4 , g5 , g6 , g7 , g8 , g2 ,

a1 a2 a3 a4 a5 a6 a7 a8 a2

> > > > > > > > >

< < < < < < < < <

w1 , w2 , w3 , w4 , w5 , w6 , w7 , w8 , w9 ,

Q-value x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 ,

y1 , y2 , y3 , y4 , y5 , y6 , y7 , y8 , y9 ,

z1 z2 z3 z4 z5 z6 z7 z8 z9

> > > > > > > > >

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q10

follows: 2 ,at ) Q (f sinal ← F E Rst + α [BE R(st ,at ) + γ F E Rst+1 − F E Rst ]

19. The system calculates avgSNR and updates E for the state < c2 , m2 , g2 , a2 > with this SNR. Let this avgSNR fall into the bucket B6 . Let it is included as a new entry in E since B6 was not present in E. Thus, < c2 , m2 , g2 , a2 > is included as (s2 ,at ) state for B6 with the Q-value, Q10 = Q f inal . At this moment,

the updated content of E is shown in Table 5 (The last row is the new entry). 20. If avgSNR belongs to B2 then Q2 of s2 will be updated by, (s2 ,at ) Q2 = Q f inal .

21. In the next iteration of SARSA, < c6 , m6 , g6 , a6 > will be chosen as the present state and step (5) – step (20) will be repeated for this state.

In this way, the Q-values of the states will be updated. The system will be able to gather information about the environment as time progresses. It tries to reach the state which is producing the minimum Q-value. The system tries to make the Q-value as least as possible since it is the combination of FER and BER and the minimum value of such combination leads to provide the maximum system performance. Thus, the system can be able to reach the state that provides the best performance for a given wireless environment. Both exploration and exploitation select a single state at any time instant before data transmission begins. Therefore, multiple-states transitions are not allowed in SmartLA. 6. SmartLA: algorithmic description In this section, we present an algorithmic description of SmartLA. Algorithm 1 describes the execution steps of SmartLA that consists of two phases – (i) initialization and (ii) experience update. Each of these phases follows data rate estimation and data transmission phases. The phases of the algorithm are described in the following. 6.1. Initialization (Step 4) It is the beginning phase of SmartLA, where it gathers some initial experience. At different time instants, the system calculates values of SNR of the channel. Then, it selects the states corresponding to the minimum and the maximum MCS of every < c, m, gmin , amax > from M. Data transmission is performed for the interval of tdur . Necessary actions are taken for change of state. After each transmission phase, BER, FER and Q-value are calculated. Then, E is updated with SNR, selected state, corresponding action and Q-value. 6.2. Experience update (Step 5–Step 31) Depending on the experience gathered in the initialization phase, the system applies its SARSA-based model to select the best

12

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Algorithm 1 SmartLA – Algorithmic Description. 1: 2: 3: 4:

5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:

Start Input: M, E (initially E is empty), r and N. A the time of the next state selection, the system walks through the state transition diagram shown in Figure 4. Initialization: Let us consider the initialization phase has tinit number of rounds. For t=1, 2, 3,… ,tinit (successive data rate estimation and data transmission phases) calculate SNR of channel and select the states corresponding to the minimum and maximum MCS of every < c, m, gmin , amax > from M. Execute data transmission for a time interval of tdur . Identify actions for each change of state. After each transmission phase, calculate BER, FER and Q-value. Update E with SNR, selected state, corresponding action and Q-value. Experience Update: while t > tinit do Let st be the present state. Calculate avgSNR. Let l ← avgSNR. Apply -greedy policy and calculate t by t = min(1, rN/t 2 ). Let ρ ← Random(0,1). if ρ ≤ t then if l ∈ Bk in E (Bk is a SNR bucket) then Choose the state s ∈ E, such that s lies in Bk with the lowest Q-value. Therefore, st+1 = s . else Choose the state s ∈ E providing the lowest Q-value in E. end if else Choose a state s uniformly at random in M. end if Apply the action u needed to change the state st into st+1 , i.e. at = u . Start the data transmission phase with the selected configuration set st+1 . After tdur , calculate BER and FER. Let m ← BER and p ← F ER. Calculate avgSNR and it is denoted by l , i.e. l ← avgSNR. Repeat Step 9 to Step 19. Apply the action u to go to the next state st+2 from st+1 , i.e. at+1 = u . Start data transmission phase with the selected configuration set st+2 . Calculate FER and let p ← F ER. (st ,at ) Calculate Q f inal of st with m, p and p following Equation (s ,a )

t t (4). Let qt ← Q f inal .

Update E with l , st , at and qt . 30: Set st+1 as the present state, i.e. st = st+1 . 31: end while

29:

possible data rate. It also updates E with the newly gathered information about the wireless environment. At the beginning of each estimation phase, avgSNR is measured. Then, the SNR bucket (called target bucket) is searched in E, where avgSNR can belong to. Applying -greedy policy, the agent performs either exploitation or exploration depending on . For exploitation, if the target bucket is found in E then the state (st ) with the lowest Q-value in this bucket is selected. Otherwise, the state having the lowest Q-value in E is chosen. Now, the data transmission is carried out with the help of the selected state. At the end of the transmission, BER and FER are calculated. At the beginning of the next estimation phase, avgSNR is again calculated. Then, -greedy policy is applied to select the next state. After selecting the next state, data is transmitted with the selected data rate and FER is calculated after the completion of the data

transmission phase. Now, Q-value of st is updated accordingly by using Eq. (4). In the next iteration of the algorithm, the present state is set as st . 7. Performance analysis via simulation SmartLA has been implemented in network simulator (NS) version NS-3.24.1. The analysis of the performances is carried through infrastructure IEEE 802.11ac network having one access point (AP) and multiple wireless stations (STAs) which are contending for channel access. The selected frame aggregation is of type A-MPDU. We examine the system performance in terms of TCP throughput, packet loss ratio (PLR) and packet delay under two cases (i) for several SNR values of the channel considering one STA and (ii) for different number of STAs considering good signal strength (SNR = 45 dB). In our experiments, SNR of channel denotes channel’s SNR measured by a wireless station. The performance is also evaluated under dynamic channel conditions – signal strength varies from good to bad and from bad to good transitions. For these conditions, we generate graphs for poor channel condition (15 ≤ SNR ≤ 30) and good channel condition (30 < SNR ≤ 50). In simulation, we vary channel signal strength by using path loss model and propagation delay model. We also apply a random function that changes the SNR value of the channel randomly after a random period of time. Additionally, we find the impact of α over the learning methodology adopted by SmartLA. 7.1. Simulation set-up To analyze the performance of SmartLA, we set the values of several PHY/MAC and control parameters. The detail of the simulation set-up is presented in Table 6. 7.2. Competing heuristics We evaluate the efficiency of SmartLA with respect to SampleLite [8], Minstrel HT [24] and Minstrel [20]. SampleLite is a RSSI threshold-based link adaptation scheme. So, it can not cope up with all network scenarios. This mechanism applies channel bonding and MIMO spatial streams but it does not utilize SGI and frame aggregation. SampleLite was designed for IEEE 802.11n and thus, it can not use all enhancements of IEEE 802.11ac. Minstrel HT considers the maximum number of new features of HT-WLANs. But, it performs excessive sampling which increases its time complexity. Additionally, Minstrel HT was also designed for IEEE 802.11n. On the other side, Minstrel was developed for legacy WLANs. As a consequence, this approach fails to utilize enhanced features of HT-WLANs. In the following subsections, we present a comparative performance analysis of SmartLA with these competing schemes. 7.3. Analysis of throughput Fig. 9 shows the throughput analysis of SmartLA comparing with the other competing link adaptation mechanisms. Applying exploration approach, SmartLA always searches for the information regarding the wireless environment, such as Q-values of different link parameter set. At the same time, exploitation can be used to apply the link parameter set observed so far for the current channel condition. Being an on-line mechanism, SARSA always updates Q-values of different link configuration sets. Here, the Q-values act as the basis for selecting the configuration set for data transmission. Thus, SARSA makes our mechanism quite intelligent and highly dynamic. From the definition of -greedy, as time passes away, the rate of exploitation increases. As a result, the selection of the best suited configuration increases. Hence, in

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

13

Table 6 PHY/MAC and control parameters used in simulation. Parameter

Value

High throughput standard Type of WLAN Channel bandwidth Guard interval MIMO spatial stream Traffic source TCP payload Congestion protocol Data and control mode Frame aggregation A-MPDU length Value of dcr in frame aggregation Maximum physical data rate Path loss model

IEEE 802.11ac Infrastructure network 20/40/80/160 MHz 40 0/80 0 ns 1 TCP traffic 1448 Bytes TcpWestwood Constant rate wifi manager A-MPDU minMpdu = 10, maxMpdu = 50 10 866.7 Mbps Log-normal path loss model (path loss exponent=0.3) Constant speed propagation delay model 10 1.0 0.1 0.5 10 s 10

Propagation delay model Range of each SNR bucket (δ ) Value of r in -greedy policy Learning rate (α ) (when it is not mentioned) Discount factor (γ ) Simulation time (while X-axis does not denote time) Number of repetitions of simulation for each number of station and SNR value, for computing average performance Source Destination Frequency band

AP all STAs 5 GHz

Fig. 9. Throughput comparison.

any channel condition (different SNR values), the system can intelligently adjust with the situation using the experience gathered in the past. Therefore, SmartLA provides better performance than the other competing algorithms as shown in Fig. 9. In Fig. 10, we exclude SmartLA and this figure is a zoomed version of Fig. 9(b). Performance for a single active station: For a single active STA, SmartLA achieves throughput of 5 times higher than that of SampleLite as illustrated in Fig. 9. This is because SmartLA employs four enhanced features (c, m, g and a) of HT-WLANs to create configuration set. In this set, SGI plays an important role when the number of STAs is not high. Since SGI overcomes additional delay between transmitting two symbols, it can enhance throughput when the network is not congested. Moreover, a MAC enhancement, frame aggregation, is also included in SmartLA. Whereas, SampleLite does not use SGI and frame aggregation. Thus, the combination of PHY and MAC enhancements in SmartLA helps to utilize high throughput standards for a single STA more efficiently than SmapleLite. Additionally, setting of several PHY/MAC param-

Fig. 10. Throughput comparison without SmartLA (zoom of Fig. 9(b)).

eters in an intelligent way (SARSA) helps SmartLA to produce a significant better throughput than SampleLite. Although, all the aforesaid features are used by Minstrel HT, an exhaustive sam-

14

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 11. (a) Throughput convergence of SmartLA for different values of SNR (#STA:10); (b) Throughput convergence with average throughput (#STA:10).

pling mechanism leads to a remarkable lower performance compared to SmartLA for 1 active station. As Minstrel does not use high throughput standards, it also becomes very less impressive when the number of STA is 1. Therefore, by using SARSA and PHY/MAC feature combination, SmartLA can also achieve better throughput compared to the other schemes when network is not congested. Protocol convergence: To analyze the behavior of protocol convergence, we have analyzed the average throughput for different simulation times as shown in Fig. 11. By this test, we have observed the convergence region of SmartLA. The number of STA is set to 10 and we change signal quality of channel dynamically. In Fig. 11(a), it can be noted that initially, SmartLA is depicting a fluctuating nature because of exploration rate is higher in the early phases. As simulation goes on, the rate of exploitation increases, that always applies the best suited configuration set. Thus, the system is able to provide a stable average throughput after a short period of time. The convergence time increases for low SNR values as the system tries to cope up with the low signal strength. In Fig. 11(a), the system reaches the stable state after 80 s considering all SNR values. In Fig. 11(b), it can be noted that reaching time of steady state in SmartLA is not so much higher though it employs SARSA model with large parameter set. The instantaneous throughput with respect to simulation time is illustrated in Fig. 12(a). In this regard, the instantaneous throughput specifies the throughput which is measured at a particular time instant during the execution of an algorithm. In this case, we set the number of STA to 1 and use dynamic channel condition. The adaptive learning mechanism imposes a fluctuating nature in SmartLA. So, an approximate steady state is reached after 70 s which is greater than SampleLite and Minstrel HT. In spite of the presence of exploration and exploitation, action for each reward makes SmartLA highly dynamic to change a state and helps to reach a steady state as early as possible. SampleLite reaches the steady state before SmartLA but its average throughput is very less than SmartLA. Both Minstrel HT and Minstrel are providing very lower throughput in all the cases. Since Minstrel has throughput lower than Minstrel HT, we exclude this scheme in this analysis. 7.4. Fairness analysis To analyze fairness of network, we apply dynamic channel condition and set the number of STA to 10. The instantaneous

throughput is calculated for each number of stations and Jain’s fairness index is computed. The result is shown in Fig. 12(b). In SmartLA, we do not engage any fairness model but it still manages to provide a significant better throughput fairness than Minstrel HT and Minstrel. While the numbers of STAs are 8 and 12 in Minstrel and Minstrel HT respectively, these schemes fail to produce throughput. Considering the past information and the present network condition, each STA has a tendency to apply the best suited parameter set. Additionally, SARSA imposes an intelligent learning in each STA. This approach makes SmartLA quite intelligent to cope up with wireless environment. Hence, this mechanism leads to an average better fairness than SampleLite as illustrated in Fig. 12(b).

7.5. Impact of learning rate (α ) on throughput The learning rate (α ) influences the learning mechanism of SARSA. We analyze the effect of α on the average throughput and the results are shown in Fig. 13. When we increase the number of STAs, we change SNR of the channel dynamically (Fig. 13(a)). Whereas, the number of STA is set to 1 while the average throughput is measured in different ranges of SNR values (Fig. 13(b)). α defines to what extent the recently acquired information overrides the old information. When α is 0.0, the learner can not learn anything. As a consequence, the system fails to select an appropriate parameter set and thus, the achieved average throughput is very low. If α is set to 1.0, the system always considers the most recent information. At any time instant, the information which was acquired a very long time ago can be used to choose the best suited configuration set. Hence, this approach can not exploit all the past information. However, it is still far better than α = 0.0 since recent history is considered in α = 1.0. Moreover, for high value of α , the learner tries to gather knowledge about environment very quickly. So, the learning may become incomplete and the system can not choose the best configuration set in many situations. Whereas, an intermediate value between 0.0 and 1.0 makes a balance between the aforesaid two cases. As a result, Fig. 13 shows that the average throughput for α = 0.5 is better than the other two scenarios. As the number of STAs increases or channel condition deteriorates, average throughput decreases. In both the situations, the value of 0.5 imposes a balance between the very low and very high learning approaches. Hence, the system copes up in a better way for this value of α . Fig. 13(a) and (b) illustrate these two

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

15

Fig. 12. (a) Throughput convergence with instantaneous throughput (#STA:1); (b) Fairness (#STA:10).

Fig. 13. Impact of learning rate (α ) on throughput.

cases i.e., higher number of STAs and different channel conditions respectively. 7.6. Analysis of PLR and packet delay Fig. 14 depicts the performance of SmartLA in terms of PLR. Enhancement of average throughput of SmartLA helps to reduce number of packet loss and thus, minimizes PLR. Packet loss can be increased due to weak signal strength or packet collision which is enhanced as the number of wireless stations increases. In all the cases, the wireless system is still able to adjust with the channel condition by applying the information stored in its statistic table. As a consequence, SmartLA produces lower PLR value than that of SampleLite, Minstrel HT and Minstrel. Channel condition deteriorates as signal strength decreases or traffic increases (large number of STAs) in the network and thus, packet transmission time is increased. SmartLA has better adjustment in this regard with the application of the past experience (configurations explored in bad network conditions serve the references for the same type of situations in the future) which helps the system to provide the lowest packet delay as much as possible. Figs. 15 and 16(a) show that SmartLA produces better performance in terms of packet delay than that of SampleLite, Minstrel HT and Minstrel.

7.7. Analysis of performance under dynamic channel condition We change signal strength of the channel dynamically from good to bad as well as vice versa (15 dB ≤ SNR≤ 50 dB) and measure the average throughput. In good-to-bad condition, SNR of the channel is decreased from 50 dB to 15 dB. Whereas, bad-to-good signifies the increase of the channel’s SNR from 15 dB to 50 dB. The results are illustrated in Figs. 16(b) and 17. It can be noted that SmartLA is significantly better to adjust with dynamic channel condition. Thus, it is able to provide better throughput than the other competing schemes since it applies exploration and exploitation. SmartLA takes action to change the state accordingly using SARSA. It stores information for different SNR values which enable the system to cope up with different channel conditions. The past information is stored in the statistic table in terms of Q-values. With the help of these values, SmartLA takes the best possible action to set the link parameter set after calculating the reward. Hence, SARSA enables SmartLA to select intelligently the best possible data rate depending on the present network condition. Fig. 17 demonstrates dynamic adaptation of SmartLA considering different number of STAs. After the data transmission phase, when SmartLA stores information, different number of STAs may be present in the network in the last

16

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 14. Performance in terms of packet loss ratio.

Fig. 15. Performance in terms of packet delay.

Fig. 16. (a) Performance in terms of packet delay with respect to number of stations excluding Minstrel and Minstrel HT (zoom of Fig. 15(a)); (b) Performance in dynamic channel conditions where signal strength varies from good to bad and bad to good (#STA:1).

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

17

Fig. 17. (a) Performance in dynamic channel conditions where signal strength varies from good to bad and bad to good (#STA:10); (b) Performance in dynamic channel conditions where signal strength varies from good to bad and bad to good (#STA:20).

Fig. 18. Throughput comparison in dynamic channel condition with #STA:1.

transmission phase. It also helps to provide better performance with respect to higher number of STAs. We also compute the instantaneous throughput for several values of SNR of the channel. The numbers of STAs are considered as 1, 10 and 20 in Fig. 18, Fig. 19 and Fig. 20 respectively. In this case, separate analysis is presented for good-to-bad and bad-togood scenarios. In this experiment, in the first 1 min, we vary the signal quality dynamically and helps the system to gain some experience about the wireless environment. After 1 min, we change SNR of the channel after every 10 s following aforesaid goodto-bad and bad-to-good scenarios and capture the instantaneous throughput. The intelligent adaptive learning, SARSA, explores and exploits wireless environment and also makes a good balance between these two approaches. As a result, a significant better instantaneous throughput is achieved in SmartLA compared to the other competing schemes. 7.8. Analysis of congestion window, slow start threshold and retransmission timeout The TCP congestion window helps to control transmitter side flow of data transmission, which is based on the network capacity

and channel condition. As the offered load increases rapidly, packet loss may also increase. The size of TCP congestion window is controlled with the help of slow start threshold (SST) and retransmission timeout (RTO). We consider the number of STAs as 5 in the simulation experiments for congestion window, SST and RTO when X-axis does not specify the number of STAs. Here, Minstrel is not considered since it has the lowest performance in terms of throughput, PLR and delay than the other mechanisms. Congestion window: The change of congestion window with respect to simulation time is shown in Fig. 21(a). SmartLA takes necessary action to change the state into the best possible configuration state for the current channel condition. The statistic table helps to provide the past experience (performance of several configuration sets regarding different SNR values). It reduces the number of packet losses, which in turn gains a steady development of congestion window. Due to this reduction, it can be observed that the fluctuation of congestion window of SmartLA is very low compared to SampleLite and Minstrel HT. Low PLR value also enables SmartLA to maintain a very high value of congestion window in the linear improvement states. Fig. 21(b) shows the effect of the number of STAs over congestion window by setting SNR value to 45 dB. As the number of

18

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 19. Throughput comparison in dynamic channel condition with #STA:10.

Fig. 20. Throughput comparison in dynamic channel condition with #STA:20.

Fig. 21. Performance in terms of congestion window.

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

19

Fig. 22. Performance in terms of slow start threshold (SST).

wireless station increases, the number of wireless links also gets high. For example, one AP and n number of STAs create n number of links, where AP is common. Hence, the average size of the congestion window decreases as the value of n increases (we consider AP as the sender and all the STAs are the receivers). In SmartLA, transmitter always tries to send data in the best possible data rate considering the channel’s signal strength. An appropriate combination of c, m, g and a leads to the maximum data rate considering the current network condition. For a given SNR value, searching of such combination is the key part of our mechanism, which is executed intelligently by using SARSA and can cope up with different wireless environments. Transmission with high data rate as well as low PLR, helps to achieve a high value of congestion window in SmartLA than the other competing mechanisms. Slow start threshold: The congestion avoidance phase executes until the acknowledgements reach before their timeouts. Slow start threshold supplies a threshold value for the congestion window up to which this window can grow exponentially. After reaching this threshold value, if still there is no packet loss in the wireless network, the congestion window can grow linearly. Otherwise, the threshold value is set to one half of the current value of the congestion window and the window size is set to the initial starting value. Hence, if packet loss and delay get increase then the rate of change of SST also increases. In SmartLA, PLR and delay are very low compared to the other mechanisms as demonstrated in Figs. 14 and 15. This phenomenon helps to reduce the rate of change of SST in SmartLA. Hence, from Fig. 22(a) and (b), we can observe that the number of SST value is very less in SmartLA than SampleLite and Minstrel HT. As the number of wireless stations increases, PLR and delay also increase. In this situation, the PLR and delay in SmartLA are still very low compared to the competing mechanisms. Therefore, there are minimum number of changes of SST values in SmartLA than SampleLite and Minstrel HT as illustrated in Fig. 22(b). Retransmission timeout: Retransmission timeout helps to identify packet loss in the wireless network and it can locate congestion links. When a packet is transmitted, the RTO value is set for the packet. If the transmitter does not receive any acknowledgement of the packet within its RTO value then the packet is retransmitted. After the occurrence of each retransmission, the RTO value is doubled and the transmitter retries the transmission up to three times. We have arranged some simulations where we have included different RTOs to study the impact of RTOs over the

congestion window with respect to simulation time as shown in Figs. 23, 24 and 25(a). In our experiments, we measure RTO in second. As packet delay increases, the number of packet retransmissions also increases due to the unarrival of packets within their assigned RTOs. Packet loss also produces unsuccessful transmission and results in unarrival of acknowledgements within their RTOs. It is already analyzed through Figs. 14 and 15 that SmartLA provides very low value of PLR and packet delay than the other mechanisms. Our mechanism achieves it through the selection of the best possible value of tuple S for the current SNR of the channel. This selection is carried out intelligently by SmartLA by employing SARSA-based automotive learning to learn the wireless environment. Fig. 23(a) demonstrates that a steady linear improvement of the congestion window is delayed for the higher values of RTOs since before retransmission, the waiting time increases. In Figs. 23(b), 24 and 25(a), it can be observed that as RTO increases SmartLA is producing much better size of congestion window than that of SampleLite and Minstrel HT. This is because of the rate of exploitation in policy increases in SmartLA as time passes away. Retransmission timeout and average throughput: We also evaluate the impact of RTO over the achieved average throughput. With the increase of RTO values, the waiting time before retransmission increases. Thus, the chances of receiving the acknowledgements also enhance for the delivered packets. Hence, the average throughput of the overall network gets improved. This improvement is significantly better in SmartLA due to its adaptive selection of actions for setting the link parameters to form the best suited configuration set on the basis of the past experience considering the present channel condition. The tuning gets improvement (probability of setting the best possible value of S) as time progresses i.e., the rate of exploitation increases and thus, the throughput enhances. With the increase of RTO, from Fig. 25(a) and (b), the rates of increase of the achieved average throughputs of SmartLA, SampleLite and Minstrel HT are 0.261%, 0.195% and 0.189% respectively. Hence, the automotive learning methodology enables SmartLA to provide better throughput improvement with respect to the increase of RTO values. The change of average throughput with the increase of RTO value is shown in Fig. 25(b). Here, we can observe that the intelligence of SmartLA for selecting the best suited data rate provides higher rate of increase of average throughput than the other mechanisms.

20

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 23. Performance in terms of (a) different RTOs and (b) RTO:0.05.

Fig. 24. Performance in terms of (a) RTO:0.5 and (b) RTO:1.0.

Fig. 25. Performance in terms of (a) Throughput and (b) Throughput enhancement with increase of RTO.

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

21

well to select the underlying rate control algorithm using a user level configuration parameter, to avoid the kernel compilation every time a new protocols is being used for testing. In the SmartLA implementation, the rate control module runs a background thread that collects system information (BER and FER) periodically and builds up the learning module. The learning output is stored in a file that contains the explored configuration sets till now and the observation of their performance under the different SINR values observed till now. The rate control module uses the epsilon-greedy policy to choose the next configuration set based on the current observations, as discussed in the proposed methodology.

Fig. 26. Testbed Layout over the floor plan.

8. Testbed results The performance of SmartLA has been analyzed over a 26 node IEEE 802.11ac testbed, with 6 access points (AP) and 20 client stations. We have carried out a 4 months long experiment to understand the performance of SmartLA over a real network testbed. The performance is evaluated under both static and mobile scenarios. In static scenario, all the client devices connect to a single IEEE 802.11ac AP, whereas in mobile scenario, few volunteers roam around the HT-WLAN coverage zone with a fraction of the client devices. The detailed testbed configurations and test scenarios are described next. 8.1. Testbed setup The AP distribution over the floor map is shown in Fig. 26. All the APs have overlapping connectivity with each other, forming a complete connectivity graph. The APs are Asus RT-AC3200 IEEE 802.11ac supported wireless routers and the STAs are IEEE 802.11ac Asus USB-AC56 client boards (14 numbers) or Motorola Moto-X smart-phones (6 numbers) that support IEEE 802.11ac connectivity. The APs are configured with 3 × 3 multi user MIMO with a peak data rate of 1300 Mbps per transmit-receive (TX-RX) antenna pair, at 5 GHz channel with 80 MHz channel bandwidth. The testbed routers can be configured with three channel bonding levels: 20 MHz, 40 MHz and 80 MHz (at 5 GHz band, at 2.4 GHz it supports 20 MHz and 40 MHz bonding levels). The client STAs are single antenna IEEE 802.11ac boards. The routers and client boards are equipped with open-source asuswrt-merlin firmware built over Linux Kernel version 3.18. The smart-phones are built with Android version 5.1. It can be noted that in the setup shown in Fig. 26, all the wireless routers are within the communication range of each other. 50% of the STAs are placed uniformly across the floor within the coverage area of the APs, whereas rest others are with controlled mobility over the floor. We have implemented SmartLA, SampleLite [8] and Minstrel HT [24] as loadable kernel modules (LKM) within the firmware of the Linux/Android kernel of routers and client STAs. The mac80211 submodule under net module of Linux kernel contains the source for Minstrel and Minstrel HT which we considered as the baseline for rate control. The proposed SmartLA and SampleLite modules are built over that which tune different MCS values based on the corresponding algorithms. We have added a kernel hook as

8.1.1. Test scenarios We have used iperf to generate continuous TCP and UDP traffic flows from the wireless client STAs. We evaluate the performance for both under static and mobile scenarios. In mobile scenarios, client STAs (the smart-phones, primarily) roam around the floor within the communication range of the APs. We recruited 10 undergraduate students as volunteers for this experiment. They carried the client devices and roamed around the test region for approximately 4 h each day. For this experiment, we have used both 2.4 and 5 GHz channel bands. It can be noted that the normal campus wireless network (Wi-Fi) works in 2.4 GHz band. Therefore, the possibility of external interference is high when 2.4 GHz is used. The performance data for each individual protocol is collected for two weeks and the average as well as deviation in performance parameters are used to plot the graphs for analysis. 8.1.2. Analysis for UDP traffic We analyze UDP performance under four different scenarios – (a) all nodes are static and are operating in 5 GHz channel (Fig. 27), (b) all nodes are static and are operating in 2.4 GHz channel (Fig. 28), (c) 50% of the client nodes are mobile and all nodes are operating in 5 GHz channel (Fig. 29) and (d) 50% of the client nodes are mobile and all nodes are operating in 2.4 GHz channel (Fig. 30). We measure both STA throughput with respect to the UDP data generation rate and AP throughput with respect to time. In the static scenario, all the 20 STAs are within the coverage of a single AP and at 45 min, we removed 5 nodes i.e. total number of client STAs were 15 after 45 min. During the computation of AP throughput, we have saturated the STA data generation rate (every client generates data at a rate of 1 Mbps). In the mobile scenario, all the 6 APs were in place and 50% of the clients roamed around the coverage of the smart-phones. The salient observations from the testbed results are as follows. • Similar to the simulation results, SmartLA provides significantly better throughput compared to SampleLite and Minstrel HT. • During static testbed experiment, whenever we reduce number of clients from 20 to 15 (at 45th min, the highlighted region of Figs. 27 and 28), there is a sharp drop in average AP throughput for Minstrel HT and SampleLite. This is because the configuration cost for SampleLite and Minstrel HT is significantly higher, as they explicitly search required configuration parameters from the available parameter set. As a consequence, the convergence time for finding out the suitable link parameters is higher for SampleLite and Minstrel HT, compared to SmartLA, which uses the previous knowledge to figure out the required link parameters. Therefore, we observe that SmartLA is more stable compared to SampleLite and Minstrel HT. • Consequently, we observe instability in throughput under mobile scenario, in case of SampleLite and Minstrel HT. SmartLA is comparatively more stable and provide higher throughput even in the mobile scenario. • The performance of the link adaptation protocols is worse under 2.4 GHz band, because of (a) higher external interference

22

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 27. UDP performance over static 5 GHz frequency band (all nodes are static).

Fig. 28. UDP performance over static 2.4 GHz frequency band (all nodes are static).

Fig. 29. UDP performance over mobile 5 GHz frequency band (50% of client nodes are mobile).

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

23

Fig. 30. UDP performance over mobile 2.4 GHz frequency band (50% of client nodes are mobile).

Fig. 31. TCP performance over static clients.

from institute campus wireless networks and (b) 2.4 GHz mode does not support 80 MHz channel bonding. 8.2. Analysis for TCP traffic To analyze the TCP performance under four different scenarios as stated earlier, we have generated TCP traffic using iperf for one hour per day, for each of the link adaptation mechanism, SampleLite, Minstrel HT and SmartLA and continued the observations for two weeks. The per day plot for TCP goodput for the four different scenarios are shown in Figs. 31 and 32. Similar to the earlier observations, SmartLA provides better TCP goodput compared to SampleLite and Minstrel HT. Here, we compute the TCP goodput at the AP and average it over multiple APs and time, whenever applicable. It is interesting to observe that SampleLite performs quite poorly in 2.4 GHz, when the external interference is high. A thorough analysis of the system log revealed that under high interference scenario, SampleLite fails to choose the proper channel bonding level. While it selects 40 MHz bonding resulting in higher data loss at high interference scenario, the proposed SmartLA accurately tackles this situation by observing the loss rate (both BER and FER). Therefore, it significantly improves the performance even at high interference scenarios.

9. Discussion In a nutshell, both the simulation analysis and the testbed results reveal that the proposed SampleLite mechanism significantly improves the HT-WLAN performance by accurately selecting the link parameters based on channel condition. While the proposed mechanism may seem to be theoretically complex (as it uses a machine learning approach in a real-time decision environment), implementation wise it is simple enough and directly follows the steps given in Algorithm 1. The algorithm observes the BER (directly obtained from the hardware abstraction layer of the wireless firmware) and FER (computed by exploiting the DATA-ACK sequence of IEEE 802.11 MAC) for different configuration set under different SNR buckets and maintains a local log of its earlier experience to execute the learning steps as given in Algorithm 1. A related question may arise that why we do not use the learning log of one device to initiate the execution of another device, which may potentially solve the cold-start problem during bootload. It can be noted that if we consider the complete learning log under all different SNR combinations, the log size increases. In a general environment, the SNR fluctuation depends on the particular scenario (like number of neighboring wireless devices, coverage of different devices, other wireless devices in the vicinity etc.) and

24

R. Karmakar et al. / Computer Communications 110 (2017) 1–25

Fig. 32. TCP performance over mobile clients.

remains almost fixed for a given locality unless a major change of network deployment planning hits there. As we observed that the learning rate of our system is high enough, we designed the system to cater dynamic behaviors by learning its own environment. From the testbed logs, we observed that the learning logs (the statistics table E) for different devices significantly differ, although the size of the individual learning log at a single device is quite small (never exceeded more than 30 entries per table, in our testbed). Therefore, as a design decision, we let the individual devices explore and exploit the statistics table from the scratch, based on the SNR variation they observe. In summary, we can say that the proposed SmartLA link adaptation mechanism shows a good promise to improve the application performance by dynamically selecting the link configuration set, however, we believe that the system needs to be tested thoroughly under a large practical deployment, as there is always a concern that how such machine learning algorithm scales in real time environment. Nevertheless, we believe that this work can boost up researches in this domain where learning algorithms can be used to make the systems more intelligent and robust to cater dynamic scenarios.

10. Conclusion Experiencing high throughput in a practical wireless network is a big challenge and dynamic link adaptation plays an important role in this regard. In this paper, we propose and develop SmartLA, which is an intelligent model that adaptively performs dynamic link adaptation in HT-WLANs. Being SARSA-based reinforcement learning, the proposed mechanism considers enhanced PHY/MAC features of IEEE 802.11n/IEEE 802.11ac, such as channel bonding, advanced MCS levels, SGI and different levels of frame aggregations, to create link configuration set. This scheme employs SNR of the channel as a measurement of channel condition. Applying SARSA-based SmartLA, a wireless system stores its past experience (performance of the evaluated configurations) in a statistic table. It is used in turn to select the best suited data rate by exploiting the best configuration set for the current channel condition and also explore any new configuration set which is not experienced till now. From the simulation and testbed analysis, it is shown that SmartLA has quite better performance than SampleLite. Whereas, it can significantly boost up the overall network performance in several perspectives compared to both of Minstrel and Minstrel HT,

which are the default link adaptation schemes for Linux based systems. Acknowledgment This work is supported by Innovative Research and Development Program (ISIRD), funded by Sponsored Research and Industrial Consultancy, IIT Kharagpur (IITKGP/SRIC/ISIRB/2014-2015). References [1] IEEE standard for information technology– local and metropolitan area networks– specific requirements– part 11: wireless LAN medium access control (MAC)and physical layer (PHY) specifications amendment 5: enhancements for higher throughput, IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 as amended by IEEE Std 802.11k-2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-2008, and IEEE Std 802.11w-20 09)(20 09) 1–565. [2] E. Perahia, M.X. Gong, Gigabit wireless LANs: an overview of IEEE 802.11ac and 802.11ad, ACM SIGMOBILE Mobile Comput. Commun. Rev. 15 (3) (2011) 23–33. [3] IEEE standard for information technology– local and metropolitan area networks– specific requirements– part 11: wireless LAN medium access control (MAC)and physical layer (PHY) specifications amendment 4: Enhancements for very high throughput for operation in bands below 6 ghz., 802.11ac-2013, IEEE Standard Inf. Technol. Telecommun. Inf. Exch. (2013) 1–425. [4] V. Visoottiviseth, T. Piroonsith, S. Siwamogsatham, An empirical study on achievable throughputs of IEEE 802.11n devices, in: Proceedings of the 7th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks, IEEE, 2009, pp. 1–6. [5] L. Deek, E. Garcia-Villegas, E. Belding, S.-J. Lee, K. Almeroth, Joint rate and channel width adaptation for 802.11 MIMO wireless networks, in: Proceedings of the 10th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, 2013, pp. 167–175. [6] L. Deek, E. Garcia-Villegas, E. Belding, S.-J. Lee, K. Almeroth, Intelligent channel bonding in 802.11n WLANs, IEEE Trans. Mobile Comput. 13 (6) (2014) 1242–1255. [7] M. Taki, M. Rezaee, M. Guillaud, Adaptive modulation and coding for interference alignment with imperfect CSIT, IEEE Trans. Wireless Commun. 13 (9) (2014) 5264–5273. [8] L. Kriara, M.K. Marina, Samplelite: a hybrid approach to 802.11n link adaptation, ACM SIGCOMM Comput. Commun. Rev. 45 (2) (2015) 4–13. [9] K. Hassine, M. Frikha, MAC aggregation in 802.11n: concepts and impact on wireless networks performance, in: Proceedings of the 2014 International Symposium on Networks, Computers and Communications, 2014, pp. 1–6. [10] W.-J. Liu, C.-H. Huang, K.-T. Feng, P.-H. Tseng, Performance analysis of greedy fast-shift block acknowledgement for high-throughput wlans, Wireless Netw. 20 (8) (2014) 2503–2519. [11] S. Wu, W. Mao, X. Wang, Performance study on a CSMA/CA-based MAC protocol for multi-user MIMO wireless LANs, IEEE Trans. Wireless Commun. 13 (6) (2014) 3153–3166. [12] D. Xia, J. Hart, Q. Fu, Evaluation of the minstrel rate adaptation algorithm in IEEE 802.11g WLANs, in: Proceedings of the IEEE International Conference on Communications, 2013, pp. 2223–2228. [13] G. Holland, N. Vaidya, P. Bahl, A rate-adaptive MAC protocol for multi-hop wireless networks, in: Proceedings of the 7th Annual International Conference on Mobile Computing and Networking, 2001, pp. 236–251.

R. Karmakar et al. / Computer Communications 110 (2017) 1–25 [14] B. Sadeghi, V. Kanodia, A. Sabharwal, E. Knightly, Opportunistic media access for multirate ad hoc networks, in: Proceedings of the 8th Annual International Conference on Mobile Computing and Networking, 2002, pp. 24–35. [15] A. Kamerman, L. Monteban, WaveLAN-II: a high-performance wireless LAN for the unlicensed band, Bell Labs Tech. J. 2 (3) (2002) 118–133. [16] M.H. Manshaei, M. Lacage, C. Hoffmann, T. Turletti, On selecting the best transmission mode for WiFi devices, Wireless Commun. Mobile Comput. 9 (7) (2009) 959–975. [17] J. Bicket, D. Aguayo, S. Biswas, R. Morris, Architecture and evaluation of an unplanned 802.11b mesh network, in: Proceedings of the 11th Annual International Conference on Mobile Computing and Networking, 2005, pp. 31–42. [18] I. Pefkianakis, Y. Hu, S.H. Wong, H. Yang, S. Lu, MIMO rate adaptation in 802.11n wireless networks, in: Proceedings of the Sixteenth Annual International Conference on Mobile Computing and Networking, 2010, pp. 257–268. [19] D. Nguyen, J. Garcia-Luna-Aceves, A practical approach to rate adaptation for multi-antenna systems, in: Proceedings of the 19th IEEE International Conference on Network Protocols, 2011, pp. 331–340. [20] Madwifi: multiband Atheros Driver for WiFi, (http://sourceforge.net/projects/ madwifi/). [21] Q. Xia, M. Hamdi, K. Ben Letaief, Open-loop link adaptation for next-generation IEEE 802.11n wireless networks, IEEE Trans. Veh. Technol. 58 (7) (2009) 3713–3725.

25

[22] W.H. Xi, A. Munro, M. Barton, Link adaptation algorithm for the IEEE 802.11n MIMO System, in: Proceedings of the 7th International IFIP-TC6 Networking Conference Singapore, 2008, pp. 780–791. [23] K.-T. Feng, P.-T. Lin, W.-J. Liu, Frame-aggregated link adaptation protocol for next generation wireless local area networks, EURASIP J. Wireless Commun. Netw. 2010 (10) (2010). [24] F. Fietkau, Minstrel HT: new rate control module for 802.11n, 2010, (http://lwn. net/Articles/376765/). [25] ath9k 802.11n wireless driver, (http://linuxwireless.org/en/users/Drivers/ ath9k). [26] R. Karmakar, S. Chattopadhyay, S. Chakraborty, Dynamic link adaptation for high throughput wireless access networks, in: 2015 IEEE International Conference on Advanced Networks and Telecommuncations Systems (ANTS), IEEE, 2015, pp. 1–6. [27] J. Herzen, H. Lundgren, N. Hegde, Learning Wi-Fi performance, in: Proceedings of the 2015 12th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), 2015, pp. 118–126. [28] A. Rico-Alvarino, R. W. Heath, Learning-Based adaptive transmission for limited feedback multiuser MIMO-OFDM, IEEE Trans. Wireless Commun. 13 (7) (2014) 3806–3820. [29] C. Watkins, Learning from Delayed Rewards, PhD thesis, University of Cambridge, Cambridge, England, 1989.