Signal Processing: Image Communication 27 (2012) 722–736
Contents lists available at SciVerse ScienceDirect
Signal Processing: Image Communication journal homepage: www.elsevier.com/locate/image
Scalable video transmission over multi-hop wireless networks with enhanced quality of experience using swarm intelligence$ Pejman Goudarzi Information Technology Department of Research Institute for Information and Communication Technology (ITRC), Teheran, Iran
a r t i c l e in f o
abstract
Article history: Received 30 August 2011 Accepted 14 May 2012 Available online 26 May 2012
In this work, we take the advantages of the particle swarm optimization method which belongs to the family of swarm intelligence algorithms to find improved solutions for delivering digital video content with enhanced quality of experience to the end users over error-prone multi-hop wireless networks. In video transmission over such wireless networks, many network-based (packet loss, delay, etc.) and source-based (encoding quantization level, etc.) parameters can impair the perceived video quality. The main contributions of the proposed work are twofold. At first, an optimal bandwidth allocation framework is being developed based on the particle swarm optimization algorithm in which by incorporating an accurate video quality metric, the total weighted quality of experience of some competing video sources is being optimized. Secondly, these optimal rates have been used for differentiated quality of experience enforcement between multiple competing scalable video sources. The resulting optimal rates can be used as rate-feedbacks for on-line rate adaptation of a moderate scalable video encoder such as H.264/MPEG4 AVC. The aforementioned weight parameters are selected based on the importance of each video sequence’s quality and can be associated with some previous service level agreement based prices. Some guidelines about the practical implementation of the proposed algorithm are given. Numerical analysis has been performed to validate the theoretical results and to verify the claims. & 2012 Elsevier B.V. All rights reserved.
Keywords: Quality of experience Swarm intelligence Particle swarm optimization Bandwidth allocation Scalable video
1. Introduction Generally speaking, many difficult and non-linear engineering problems have found natural solutions which have been inspired from biological behaviors of the living kinds. Some important examples include neural networks, DNA computing, artificial immune systems and so on. Swarm intelligence (SI)-inspired optimization techniques are one of the most important tools for finding the optimal global solutions of many non-linear engineering problems. In these methods, the gradient of an optimization problem is unknown, perhaps because it cannot be
$
This work was supported by Research Institute for ICT (ITRC). E-mail address:
[email protected]
0923-5965/$ - see front matter & 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.image.2012.05.004
defined due to a partially discontinuous fitness function, or because the fitness measure changes over time, or perhaps even because the fitness is undefined for certain regions of the search-space. One important sample of these gradient-free methods is the well-known particle swarm optimization (PSO) algorithm. PSO is a multi-agent heuristic optimization method due to Kennedy and Eberhart [1]. The PSO method was originally intended for simulating the social behavior of a bird flock, but the algorithm was simplified and it was realized that the agents (here typically called particles) were actually performing black-box optimization. In PSO the population of particles is typically called a swarm. In the PSO method the particles are initially placed at random positions in the search-space, moving in randomly defined directions. The direction of a particle
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
is then gradually changed so it will start to move in the direction of the best previous positions of itself and its peers, searching in their vicinity and potentially discovering even better positions [2,3]. Quality of experience (QoE) is an essential feature associated with successful video content delivery to the end-users. The main concern of most telecommunication operators about their video-based services is because of video service assurance. Competition between service providers is tight and the key to success is to provide a service with the highest possible level of user satisfaction (quality of experience level). Digital video data, which is stored in video databases and distributed through communication networks, is subject to various kinds of distortions during acquisition, compression, processing, transmission and reproduction. For example, lossy video compression techniques, which are almost always used to reduce the bandwidth needed to store or transmit video data, may degrade the quality, during the quantization process. For another instance, the digital video bit streams delivered over error-prone channels, such as wireless channels, may be received imperfectly due to the impairments occurred during transmission. Packet-switched communication networks, such as the Internet, can cause loss or severe delay of received data packets, depending on the network conditions or the quality of service parameters. All these transmission errors may result in distortions in the received video data. It is therefore imperative for a video delivery system to be able to realize and quantify the video quality degradations that occur in the system, so that it can maintain, react, control and enhance the quality of the video data. Effective image and video quality metrics are crucial for this purpose. Recently some researchers such as Calyam et al. in [4], have proposed a suitable on-line metric for quality assessment of multimedia sequences. They have presented a novel framework that can provide acceptable on-line estimates of video QoE on network paths without end-user involvement and without requiring any reference video sequence. Their presented framework features the good, acceptable or poor (GAP) model, which is an approximately accurate model of QoE expressed as a function of measurable network parameters such as bandwidth, delay, jitter and loss. Using the GAP-model, their on-line framework can produce video QoE estimates in terms of good, acceptable or poor grades of perceptual quality solely from the on-line measured network conditions which is very close to the subjective mean opinion score (MOS) [5]. Scalable video coding (SVC) standardizes the encoding of a high-quality video bit stream that also contains one or more subset bit streams. A subset video bit stream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bit stream. The subset bit steam can represent a lower spatial resolution (smaller screen or spatial scalability), lower temporal resolution (lower frame rate or temporal scalability), or lower quality video signal (quality or SNR scalability). Hence, scalable video coding provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation
723
[6]. H.264/MPEG-4 AVC is an example scalable codec which is developed jointly by ITU-T and ISO/IEC JTC 1. The desire for scalable video coding, which allows on-thefly adaptation to certain application requirements such as display and processing capabilities of target devices, and varying transmission conditions, originates from the continuous evolution of receiving devices and the increasing usage of transmission systems that are characterized by a widely varying connection quality. Multi-hop wireless networks are computer networks in which the communication links are wireless. In these networks, each node is willing to forward (or relay) data for other nodes and so the determination of which nodes forward data is made dynamically based on the network connectivity. Wireless networks can also form a network without the aid of any pre-established infrastructure in a self-organized manner [7,8]. The requirements of a specific set of quality of service (QoS) parameters (delay, jitter, packet loss, etc.) must be guaranteed for each real-time traffic transmitted over such wireless networks. However, for most real-time applications of wireless networks, intrinsic and possibly large levels of interference or collisions in the physical or link layers caused by radio transmission, media access protocols or time-varying topological changes provide challenging issues in guaranteeing these stringent QoS requirements. Some of the routing protocols used in wireless networks introduce more than one feasible path for a source– destination pair. These category of routing algorithms are called multipath routing algorithms [9]. Multipath routing schemes can reduce interference, improve connectivity and allow distant nodes to communicate efficiently [9]. In [10,11] a congestion-minimized stream routing approach is adopted. In [11] the authors analyse the benefits of an optimal multipath routing strategy which seeks to minimize the congestion on the video streaming, in a bandwidth limited ad hoc wireless network. They also predict the performance in terms of rate and distortion, using a model which captures the impact of quantization and packet loss on the overall video quality. They showed that in such environments the optimal routing solutions which seek to minimize the congestion, are attractive as they make use of the resources efficiently. For low latency video streaming, they proposed to limit the number of routes to overcome the limitations of such solutions. Some researchers such as Agarwal and Goldsmith [12], Adlakha et al. [13] and Zhu et al. [14] follow some congestion-aware and delay-constrained rate allocation strategies. Agarwal and Goldsmith in [12] introduce a mathematical constrained convex optimization framework by which they can jointly perform both rate allocation and routing in a delay-constrained wireless ad hoc environment. Adlakha et al. extend the conventional layered resource allocation approaches by introducing a novel cross-layer optimization strategy in order to more efficiently perform the resource allocation across the protocol stack and among multiple users. They showed that their proposed method can support simultaneous multiple delay-critical applications such as multi-user video streaming [13].
724
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
For multipath video streaming over wireless networks, received video quality is influenced by both the encoder performance and the delayed packet arrivals due to limited bandwidth; hence, Zhu et al. propose a rate allocation scheme to optimize the expected received video quality based on simple models of encoder rate– distortion performance and network rate–congestion trade-offs [14]. As the quality of wireless link varies, video transmission rate needs to be adapted accordingly. In [15], measurements of packet transmission delays at the media access control (MAC) layer are used to select the optimal bit rate for video transmission. The benefit of cross-layer signaling in rate allocation has also been demonstrated in [16], where adaptive rate control at the MAC layer is applied in conjunction with adaptive rate control during live video encoding. In [17], the authors propose a distributed rate allocation algorithm which minimizes the total distortion of all video streams. Based on the sub-gradient method, their proposed scheme only requires link price updates at each relay node based on local observations and rate adaptations at each source node derived from rate–distortion (R–D) models of the video. They show by simulation that their proposed scheme can achieve the same optimal rate allocation as that obtained from exhaustive search. In this work, the evolutionary PSO algorithm has been applied for improving the perceived quality of some video applications which are being delivered over wireless networks. The presented work in this paper differs from all of the previous mentioned works in adopting bio-inspired optimization techniques for the enhanced quality video delivery problem. Specifically, this work differs from that of [17] in that in the presented work, it is assumed that each video source may use multipath routing for partitioning and transmission of the total video traffic. On the other hand, the presented work differs from that of [18] in that in this paper, the effect of the packet loss on the perceived video quality has been quantified and presented using the so-called GAP model in [4]. Another important difference of this work with those works similar to [18] is that here, we are not limited to many simplifying assumptions (such as the existence of a strict and pre-defined upper bound for the total packet error probability of each video session and special routing conditions for traffic paths of multiple competing scalable video sources) for guaranteeing the convergence to the optimal solutions, because these assumptions may not be met in general network conditions. In another word, in the current work, the packet error probability of each video session is relaxed to take any arbitrary value, furthermore, there is not any restriction on the traffic path selection process associated with the competing scalable video sources and the algorithm can still converge to the optimal point. Moreover, in contrast with the previous works such as [18], the proposed optimal rate allocation method in this work does not have any strict dependency on the exact mathematical packet error probability (PEP) model which is derived based on the Poisson packet arrival assumption [19] which may not be valid in general for the video traffic, and only some estimates of the PEP level with
a good accuracy is adequate for running the proposed PSO algorithm. The presented work also differs from [11,12] in that the GAP model has been used as an objective QoE measure in place of the QoS criterion used in [11,12]. Furthermore, another important difference between the current approach and that of [11,17] is that the GAPinspired QoE metric has been optimized in the presented work whereas in [11,17] the authors have used the video distortion as a quality measure. As it is clearly discussed in [20] and the references therein, the objective distortion (and peak signal to noise ratio (PSNR)) metrics are not good candidates for the representation of the video QoE. On the other hand, the presented work differs from [12] in considering more than one (and possibly interfering) multipath-routed video sources which compete for the available bandwidth in a bandwidth limited wireless network. In order to compute the total weighted QoE, it is assumed that multiple video sources use the same wireless medium for transmission and their associated weighted QoEs are additive [11]. Weight parameters are selected based on the importance of each video source and can be associated with some service level agreement prices in a game-theoretic service delivery perspective. In summary, the paper’s main contributions are twofold. Firstly, an optimal rate allocation strategy has been developed based on PSO algorithm. Secondly, these optimal rates have been used for differentiated QoE enforcement between multiple competing scalable video sources. The rest of the paper is organized as follows. In Section 2 the proposed PSO framework has been introduced in detail. Section 3 is devoted to the numerical analysis and finally in Section 4 some concluding remarks are presented. 2. Proposed optimization framework This section is composed of three subsections. In Section 2.1, a brief overview about the wireless medium and its assumptions is given. In Section 2.2, the PSO optimization framework is being introduced and in Section 2.3, the proposed optimization problem (which is being inspired from the PSO) will be introduced. Finally, in Section 2.4 some guidelines have been investigated for practical implementation of the rate adaptation/control mechanism associated with the proposed PSO method. 2.1. Wireless medium description Consider the multi-hop wireless network depicted in Fig. 1. Assume that there exist N video sources and the underlying multipath routing protocol, introduces nk node-disjoint multi-hop paths between each source–destination pair (Sk ,Dk ) periodically (1 rk r N ). Each path is associated with a traffic flow and these inverse multiplexed flows are aggregated in the destination node to reproduce the initial source-generated traffic stream. In Fig. 1, C stands for the bottleneck link. Each path j related to the source k contains Mjk wireless links from source to destination for 1 rk r N and 1r j r nk .
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
725
Table 1 Typical parameter values. Parameter
Definition
Value
r F
Nodes’ total transmission rate Noise figure Boltzman constant Number of nodes
106 bps 6 dB 1.38 10 23 J/K 16
Nodes’ spatial density Nodes’ transmission power Friss constant Frame length Temperature Correction factor Geometry indicator Packet rate
1 m2 1 mW 10 3 1000 bits 300 K 0.55 8.1 0.5 packets/s
b N~
rs T
a L T0
Fig. 1. Two competing multipath-routed video sources.
In this work, the communication-theoretic framework developed in [18] is extended to analyse a realistic mobile ad hoc wireless networking scenario with taking into account the inter-node interference (INI). In the rest of this paper as in [7], it is assumed that the packet transmission process has Poisson distribution with parameter l. Hence, the average inter-arrival time between two consecutive video packets is 1=l. Generally speaking, the video packet generation process follows usually from some heavy-tail distribution such as Markov modulated Poisson process (MMPP) [21,22]. MMPP is a doubly stochastic Poisson process whose rate l varies according to a Markov process. Incorporating this traffic generation model makes the derivation of the bit error rate (BER) formula very complicated. But, using some powerful parameter estimation techniques such as expectation maximization (EM) or maximum likelihood (ML) [23] we can estimate this Markov modulated parameters and if we assume that the parameter state transition behavior is such that each state duration is higher than the convergence time of the presented rate allocation algorithm in the current work, the BER model presented in (1) can still be used between each state transition. In some conditions that the mentioned assumption is not valid and the packet generation/transmission process must be described by a pure Poisson distribution, many scalable video encoders such as H.264/AVC can adapt themselves to these limiting network conditions for example through employing temporal scalability features (by dropping some frames), i.e. the encoder may transmit a frame only if an opportunity for packet transmission is presented from the lower layers. As depicted by part in Fig. 1, it is assumed that the wireless nodes are distributed in a square grid scenario with side length a and the node spatial density is selected to be rs . It is assumed that the nodes are omnidirectional and have the capability of moving within the area of the square grid. The MAC frame lengths are considered to be equal to L bits and as in [7] it is assumed that a simple MAC protocol such as REServe and GO (RESGO) has been implemented. Also it is assumed that Rjk is the (nonempty) set of wireless links associated with the jth flow of the kth video source. a is defined to be the Friss constant [24], F as the noise figure, b as the Boltzman constant, T0 as the room temperature, z as a correction factor to account for the
z DA ð16Þ l
average interference power and T and rijk as the transmitted power and the total transmission rate associated with the ith link in the jth flow of the kth source ~ is defined as the nodes’ geometric respectively. DA ðNÞ ~ It is layout indicator and the number of nodes is N. assumed that the transmission power T is identical for all of the nodes. Some assumed values for the mentioned parameters are listed in Table 1. Based on [7], with the assumption of strong multipath fading caused by the nodes mobility, the BER of the link i in the jth path of the kth video source can be approximated as follows: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi! 1 ars T bijk ¼ 1 ð1Þ 2 F bT 0 r ijk þ zP INI þ ars T where 1 rj rnk ,
1 ri r Mjk ,
1r k rN
PINI is the average interference power and in a scenario with square grid topology it can be defined for small traffic loads (l) as follows [7]: P INI C ars T
lL ~ DA ðNÞ r ijk
~ depends on the geometry of the nodes’ distribution DA ðNÞ
~ and also on the number of nodes N. Assume that Rjk can be partitioned in two disjoint subsets. One subset is associated with those wireless links that are shared by different video sources which is denoted by Rcjk (it is assumed that this subset is not empty for at least one j) and the other set contains noncommon wireless links which is denoted by Rnc jk . The set cardinality operator is represented by 9 9, so we have 9Rjk 9 ¼ Mjk . Also assume that 9Rnc jk 9 ¼ Ojk and thus we have 9Rcjk 9 ¼ Mjk Ojk . The rijk consists of two components: one is the traffic rate allocated to the jth flow of the kth source which is denoted by xjk and another part is associated with the time-varying ith link’s cross (background) traffic aijk. Thus, we have r ijk ¼ xjk þ aijk
8k,j,i 2 Rjk
726
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
Cijk is the capacity of the link i in the jth path of the kth video source. Hence, the available capacity which is denoted by eijk is equal to eijk ¼ Cijk aijk . In some cases (as is depicted in Fig. 1), two or more multipath video sources may compete for a common wireless link (in Fig. 1 this link is shown by bold line). Therefore, the available capacity of the common link must be shared between the competing flows in an optimal manner. Assume that for each common link i 2 Rcjk there exists an associated set S ijk which represents the set of all ordered pairs (path, source) that use the common link i in the path j of the source k (for example, in Fig. 1, path 1 of source 2 share the common link C with path 2 of source 1). For common links it is assumed that background traffic is composed only of those flows which are in S ijk , i.e. we can write 0 1 B X C xuv Axjk 8k,j,i 2 Rcjk aijk ¼ @ ðu,vÞ2S ijk
Considering the link’s BER in (1), the total bit error rate along the jth path of the kth source can be calculated as follows:
the following two steps [25]: (1) For each particle k ¼ 1, . . . ,Z do: J Initialize the particle’s position with a uniformly distributed random vector: X k Uðblo ,bup Þ, where blo and bup are the lower and upper boundaries of the search-space and Uð,Þ is the uniform distribution vector. J Initialize the particle’s best known position to its initial position: Y k ’X i . J If f ðY k Þ o f ðGÞ update the swarm’s best known position: G’Y k . J Initialize the particle’s velocity: V k Uððbup blo Þ, ðbup blo ÞÞ. Until a termination criterion is met (e.g., number of iterations performed, or adequate fitness reached), repeat: (2) For each particle k ¼ 1, . . . ,Z do: þ Create random vectors: Gy , Gg Uð0; 1Þ þ Update the particle’s velocity: V k ði þ 1Þ ¼ oV k ðiÞ þ fy Gy ðY k ðiÞX k ðiÞÞ þ fg Gg ðGX k ðiÞÞ
ð4Þ
Mjk
Bjk ¼ 1
Y
ð1bijk Þ
8j,k
ð2Þ
i¼1
pjk is the PEP associated with jth path of the video source k. If the forward error correction (FEC) induced error correction capability of a frame with length L bits is M bits ðM 4 1Þ, the wireless link-related PEP along the jth path (flow) of the kth source can be calculated as M X L Lm pjk ¼ 1 8j,k Bm jk ð1Bjk Þ m m¼0 The total PEP of the source–destination pair k with the assumption of independent path packet losses can be written as pkT ¼ 1
nk Y
ð1pjk Þ
8k
ð3Þ
where the operator indicates element-by-element multiplication (i.e. the Hadamard matrix multiplication operator). þ Update the particle’s position by adding the velocity: X k ði þ 1Þ ¼ X k ðiÞ þV k ðiÞ
ð5Þ
note that this is done regardless of improvement to the fitness. If f ðX k ðiÞÞo f ðY k ðiÞÞ do: Update the particle’s best known position: Y k ðiÞ’ X k ðiÞ. If f ðY k ðiÞÞ o f ðGÞ update the swarm’s best known position: G’Y k ðiÞ *Now G holds the best found solution.
j¼1
Based on [7], Eq. (2) is true if we can assume that the bit error rates on adjacent links are independent and no burst error exists. Also, Eq. (3) is true if we can assume that the packet error rates on adjacent paths are independent. Indeed, Eqs. (2) and (3) would be pessimistic upper bounds for the actual path’s total bit error rate and the total packet error rate respectively.
The parameters o (inertia weight), fy and fg (acceleration factors) are selected by the practitioner and control the behavior and efficacy of the PSO method. The optimal values of these parameters can be derived by a technique known as meta optimization [3]. The conditions which must be imposed on the selection of these parameters for guaranteeing the asymptotic stability of the PSO algorithm are given in works such as [26].
2.2. PSO algorithm 2.3. Proposed method Let f: Rn -R be the fitness or cost function which must be minimized. Let Z be the number of particles in the swarm, each having a position X i 2 Rn in the search-space and a velocity V i 2 Rn . Let Y i be the best known position of particle i and let G be the best known position of the entire swarm. A basic PSO algorithm is then composed of
In this paper, the main objective is to find an optimal rate allocation strategy that can maximize the total weighted QoE associated with multiple video sources. Thus, a mathematical formulation must be presented that can express the QoE associated with each video source in
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
terms of its allocated rate. After that, for solving the proposed optimization problem, the QoE optimization problem is converted into a PSO form. If we assume that the perceived video qualities are only affected by allocated bandwidth and the packet loss through transmission, then, according to [4], these QoEs are a function of the allocated bandwidth and the PEP associated with each video source as follows: D
Qk ¼ c2 ðpkT Þ2 þ c1 pkT þc0 BW k ,
1 r kr N
ð6Þ
where c1 o 0 and c2 and c0 are some positive constants and k pT and BWk are the total PEP and allocated bandwidth associated with the video source k respectively. As mentioned in [4], these parameters are obtained from training GAP-model parameters by several standard video sequences (with different contextual complexities) and their related subjective test results which derived from the human perception based on the MOS criterion. Also, as mentioned in [4], although the formulation in (6) is derived by focusing on the H.263 video codec which supports scalability, it can be applied to other scalable video codecs such as MPEG2 and H.264 too. Note that, in each type of the scalability (temporal, spatial and quality), improving the transmitted video quality means an increased number of enhancement layers which can be translated into the increased transmission bandwidth (BWk). This relationship is explicitly indicated in Eq. (6). In this work, it is assumed that the wireless network size is small and the delay and jitter parameters of the [4] are in the so-called Good range. Hence, these parameters have been neglected in the final form of function (6). In fact, if even delay or jitter exists in the proposed model, it can still be assumed with good approximation that the resulted error from these parameters is a loss and we can use the proposed model [12,18]. The constant factor which is used in the original GAP model, is only a biasing parameter for establishing a lower bound on the perceived QoE and subjects could not distinguish poor qualities below that level. It is assumed that the perceived QoE may be null instead of having a positive minimum when we are at the worst network quality conditions (high loss ratios and scarce bandwidth). Thus, this constant parameter has not been considered too. In Fig. 2 the relation of Q, PEP and BW is depicted. The values of c0, c1 and c2 are selected based on the so-called S-MOS model in [4] to be 0.0029, 1.4947 and 0.2918 respectively. Another important reason why we have selected packet error probability and bandwidth for QoE estimation is the fact that most video traffics in the network are highly sensitive to these two QoS parameters. Based on the above facts, the formulation of the proposed total weighted QoE maximization problem can be accomplished as the following constrained non-linear form D
max
QT ¼
subject to: D
BW k ¼
nk X j¼1
N X
xk Qk
ð7Þ
k¼1
xjk Zxk,min
8k
ð8Þ
727
5 4 Q
3 2 1 0 1
0.8
0.6
0.4 0.2 BW (Mbps)
00
0.2
0.4
0.8 0.6 PEP
1
Fig. 2. QoE as a function of PEP and bandwidth.
0 r xjk rminðeijk Þ i
8j,k,i 2 Rjk
ð9Þ
in which xk,min is the minimum required bandwidth for the kth video source. The weighting parameter xk 4 0 is for differentiating between different video sequences and the smaller this parameter be, less importance is being given to the corresponding video sequence. These parameters can be for example associated with the prices which video users are willing to pay for obtaining different qualities (e.g., from fair to premium levels) and based on some prenegotiated service level agreements (SLA). The important point that must be mentioned here is the fact that even if the independence assumption between the path/link bit error rates are not be true, by maximizing the pessimistic quality of experience in (7), we can still be confident that an enhanced quality video will be perceived by the end users in real scenarios. In another word, assume that X n is the optimal particles’ position vector resulted from running the PSO algorithm. It can easily be shown that X n is in fact a Pareto optimal solution vector [27]. The final step in the design process of the proposed optimization method is matching the parameters of the optimization problem (7)–(9) with that of PSO which is done as follows. P n is equivalent to the N k ¼ 1 nk . The fitness function f: Rn -R in the PSO is equivalent to Q T . The number of particles Z in the PSO is equivalent to the number of video sources N . Each particle k in PSO is equivalent to a video source k and the corresponding position vector X k in PSO is equivalent to the rate allocation vector X k has D
dimension 1 n and is defined as X k ¼ zfflfflffl}|fflfflffl{n1 þ n2 þ þ nk1 zfflfflffl}|fflfflffl{nk þ 1 þ þ nN ð0 0 x1k x2k xnk k 0 0 Þ1n . The reason why the position vector X k is denoted in such a form is the fact that after convergence of the PSO algorithm, the mentioned non-zero positions in the vector, represent the optimal allocated rated to each traffic path associated with the user k.
728
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
blo and bup for each non-zero element in the rate allocation vector k are equal to 0 and mini ðeijk Þ 8i 2 Rjk respectively. The best known particle’s position Y k has dimension 1 n and for video source k is initialized as follows.
For path j which is not shared with other video sources, xjk is equivalent with mini ðeijk Þ 8i 2 Rjk . For path j which is shared with other video sources in some links, xjk is equivalent with minðmini ðeijk Þ 8i 2 Rnc jk , mini ðCijk = j S ijk jÞ 8i 2 Rcjk Þ. Simply, the best known particle’s position is initialized as Y k ¼ X k . The best found swarm’s position vector G ¼ ðg 1 g 2 g n Þ P for video sources is initialized by G ¼ N k ¼ 1 Yk. Assume that the network’s available bandwidth is large enough such that the constraint (8) can be satisfied at all times. In fact, if constraint (8) cannot be met by the network, it can be resulted in reduced quality levels for the received video and has not any effect on finding the optimal quality point by the PSO algorithm. As the allocated rates must be located in the constraint set (9), the projection operator ðÞ> has been used to do this task on the elements of the rate allocation vector X k during running the PSO algorithm. Now, after initializations, the standard steps of PSO algorithm must be performed.
Table 2 Pseudo code for the proposed PSO algorithm. Start program Initialization phase : f ’Q T ; Z’N X k ’ð00 0Þ1n ; Y k ’ð00 0Þ1n 8k P n’ N k ¼ 1 nk for k ¼ 1 : N for j ¼ 1 : nk blo ’0; bup ’mini ðeijk Þ8i 2 Rjk if path j is not shared with other video sources, then: xjk ’mini ðeijk Þ8i 2 Rjk if path j is not shared with other video sources, then: ! Cijk xjk ’min mini ðeijk Þ8i 2 Rnc 8i 2 Rcjk jk ,mini i j S jk j end end for k ¼ 1 : N
Y k ’X k
V k Uððbup blo Þ,ðbup blo ÞÞ end
G’
PN
k ¼ 1 Yk Mainprogram :
ðM is a positive large number andı is the iteration numberÞ for i ¼ 1, . . . ,M for k ¼ 1, . . . ,N
Gy , Gg Uð0; 1Þ
For more clarity, the steps of the proposed algorithm are described in Table 2 as some pseudo codes. The overall time-complexity of the iterative algorithm (4) and (5) is related to the product of the total number of additions and multiplications needed for realizing the mentioned two iterations and the number of particles N . The complexity of each iteration of the iterative algorithm (4) and (5) for all particles is clearly a limited constant of O(nN ). But, the main concern here, is the complexity of the number of iterations M which is required for the convergence. It is shown in [28] that the number of iterations for convergence is indeed bounded by a slowly increasing function of the number of network nodes N~ . This time-complexity can be greatly reduced by intelligent tuning of optimization parameters o, fy and fg as mentioned in [3]. Another important factor in decreasing the run-time of each iteration is using more powerful processors in running the algorithm. In reality, due to some factors such as the nodes mobility, there may exist estimation errors or uncertainties in some of the parameters (e.g., link capacities) associated with constrained optimization problem (7)–(9). This may causes that an optimal and unique solution can hardly be derived or cannot be reached at all by the proposed iterative algorithm in (4) and (5). Hence, some modifications must be applied in the proposed method. One possible solution would be using some fast capacity estimation methods such as those in [29,30]. In such dynamic environments, for the ease of implementing the proposed PSO algorithm, some simple
ðUpdate the particle’s velocityÞ V k ði þ 1Þ ¼ oV k ðiÞ þ fy Gy ðY k ðiÞX k ðiÞÞ þ fg Gg ðGX k ðiÞÞ ðUpdate the particle’s position by adding the velocityÞ X k ði þ 1Þ ¼ ðX k ðiÞ þ V k ðiÞÞ> ððÞ> is the projection operatorÞ if f ðX k ðiÞÞ o f ðY k ðiÞÞ Y k ðiÞ’X k ðiÞ if f ðY k ðiÞÞ o f ðGÞ G’Y k ðiÞ if f ðGÞr g go Label ðg is a negative thresholdÞ end end Label: End program *Now, the vector Gn1n holds the best found solution.
and distributed approaches such as those in [31] and the references therein have been proposed. Remark. The mentioned PSO formulation in the text was based on finding a globally optimal solution which is matched to the so-called global best (gbest) formulation in [32]. There exist another local best (lbest) solution based on finding the optimal solution locally [32]. As described in [33] although the lbest solution has shown improved performance in many (not all) standard multimodal problems, the greater convergence speed of the gbest solution results in an improved performance in unimodal
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
729
problems [32,33]. The lbest is more suited for implementing in a distributed manner in large networks but, in the current work, the main focus is implementing the gbest solution in the smaller scale multi-hop wireless networks. Because of the time varying nature of the network due to variable rate background traffic and mobility, the optimal point is time-varying. So, faster convergence time in the gbest solution can help in tracking the time-varying optimal point more efficiently. Furthermore, due to the so-called decoder deadline present in the video streaming applications, faster convergence time of the gbest solution is a merit in comparison with the lbest one. In the numerical analysis section, the performance of the lbest and gbest methods has been compared in the dynamic network scenario. The communication overhead and the quality level have been compared for the performance comparison purposes. As it can be verified, in the mentioned scenario, the lbest solution while having slightly lower overhead, has worse quality performance. The lbest method can be considered as a good candidate in larger scale networks which are more delay tolerant and have a static nature.
As the H.264 encoder is driven by the quantization parameter QP, the average quantization parameter value (QP k,n ) must be calculated for the nth frame of the kth video source from Pk,n ðtÞ. According to the R–D model presented in [35], we can associate to bit rate Pk,n ðtÞ an average percentage of ‘‘zeros’’ equal to
2.4. Rate adaptation mechanism
rP k,n ðtÞ ¼ 1
At first, for simplicity it is assumed that we have a scalable video encoder with a single base layer and design an appropriate rate control algorithm. For scalable video coders with multiple enhancement layers, the design steps is similar but more complicated and beyond the scope of the current work. We refer the interested reader to [34] for further information. Assume that the video transmission process is time-slotted with T timeslots, the timeslot duration is assumed to be DT and the time-slot index is t 2 f0; 1, . . . ,T1g. It is also assumed that the average allocated bandwidth to the video user k by the PSO algorithm in time slot t is denoted by BW k ðtÞ,1 we can consider this allocated bandwidth as the so-called target bit rate and the main objective is to estimate the proper instantaneous quantization parameter (QP) at the frame level and at the macroblocks (MB) level within the frame [34]. The r-domain rate control mechanism presented in [35,36] has been adopted in this work for video rate adaptation. In each time-slot t, the algorithm can be divided into different steps operating at different levels. The first step is performed at the beginning of each group of picture (GOP) and allocate G w ðtÞ bits for the wth GOP in time slot t. Given the target bit rate BW k ðtÞ and the frame rate of the input video sequence Fk, the video encoder sets the following number of bits on average to code the whole ~ is the number of frames in each GOP. GOP, where M
where s can be estimated from previously coded pictures (e.g., the (n 1)th frame) [35]. Through the distribution px(a) of the previous picture,2 percentage rP k,n ðtÞ can be transform in QP k,n , which represents the estimated average quantization factor for current frame [38]. The QP k,n is clipped according to the law 8 > QP k,n1 ðtÞ þ 3 if QP k,n ðtÞ 4 QP k,n1 ðtÞ þ 3 > < QP k,n ðtÞ ¼ QP k,n1 ðtÞ3 if QP k,n ðtÞ oQP k,n1 ðtÞ3 > > : QP ðtÞ otherwise
G w ðtÞ ¼
~ BW k ðtÞ M Fk
1 We can use the exponential moving average method as BW k ðtÞ ¼ a~ BW k þ ð1a~ Þ BW k ðtÞ for calculating the BW k ðtÞ in each time slot where, 0 o a~ o 1 is a constant and BWk is the latest allocated bandwidth during the time slot.
At picture level, before the coding of nth frame of the wth GOP, the algorithm computes a target bit rate Pk,n ðtÞ from the expression
Pk,n ðtÞ ¼ K 0j
G w ðtÞ , 1 þK I,P nP þ K I,P K P,B nB
n ¼ 0; 1, . . . ,M1
in which nj is the number of remaining frames of j-type in the GOP and K j,k is the target bit ratio between an j-type coded frame and a k-type one (j,k ¼ I,P,B). The values of the K j,k parameters depend on the average relative contextual complexity of different frames and based on [37] can be approximated as: K I,P ¼ 1:1, K P,B ¼ 1:273. The constant K 0j , j ¼ I,P,B, depends on these ratios and corresponds to K 0I ¼ K I,P K P,B ,
K 0P ¼ K P,B ,
K 0B ¼ 1
Pk,n ðtÞ
s
k,n
At macroblock level, the quantization parameter must be corrected. This grants a good control both over picture quality and coded bits, keeping bit rate under given constraints and smoothing coding distortion across the different macro blocks. After coding mth MB of the nth frame, the percentage of null quantized coefficients in the previous coded m macroblocks is rPk,m ðtÞ and the number of bits used to code the picture is BPk,m ðtÞ. According to the given target, BRk,m ðtÞ ¼ Pk,n ðtÞBPk,m ðtÞ bits are left to code the remaining macroblocks: the percentage of ‘‘zeros’’ required to fit the constraints is equal to
rRk,m ðtÞ ¼ 1
BRk,m ðtÞ
s
N MB N MB m
where NMB is the total number of the macroblocks in each frame. This leads to estimate the ratio k ¼ rRk,m ðtÞ=rPk,m ðtÞ, which affects the quantization parameter QP k,m þ 1 of the 2 This distribution is considered to be a generalized gaussian function a0 of the form px ðaÞ ¼ g0 eb9a9 for the I and P frames and to be a 0 0 Laplacian þ impulsive distribution of the form px ðaÞ ¼ b eð2=g Þ9a9 þ a0 dðaÞ for the B frames [36].
730
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
following macroblock according to the equation: 8 QP k,n ðtÞ þ 3 if 1 þ 3 dk r k o þ1 > > > > > > QP k,n ðtÞ þ 2 if 1 þ 2 dk r k o 1 þ3 dk > > > > > > QP k,n ðtÞ þ 1 if 1 þ dk r k o 1þ 2 dk > < if 1dk r k o1 þ dk QP k,m þ 1 ðtÞ ¼ QP k,n ðtÞ > > > > QP ð t Þ1 if 12 dk r k o 1dk k,n > > > > > QP ð t Þ2 if 13 dk r k o 12 dk > k,n > > > : QP ðtÞ3 if 1 r k o 13:d k k,n
The numerical analysis consists of three parts. In part one, the fluid-flow (macroscopic) traffic model (not packet-based) is adopted and the MATLAB package is used for a centralized implementation of the pseudo code in Table 2. In the second part, a discrete-event simulator has been used for realistic packet-level simulation of the proposed resource assignment strategy. In part three, a distributed packet-based implementation of the PSO algorithm has been introduced and its performance has been compared with the so-called lbest solution.
where dk is calculated as follows
3.1. Part one
dk ¼
0:67
j
2
QP k,n ðtÞ=6
The rationale behind the mentioned calculation is the fact that in the H.264 encoder the relation between the quantization parameter QP and the quantization step size q is expressed by the exponential formula i.e. we have q ¼ 0:67 2QP=6 ,0 rQP r 51,QP 2 Z [34]. In order to fit the targeted bit rates BW k ðtÞ, the constant j should be set accordingly. A typical value for the j parameter is 3000 [36]. A reduced value of the dk allows the encoder to react more quickly to the changes in rPk,m ðtÞ. It can be verified that in the rate adaptation algorithm, the quality level of the encoded video is adjusted based on the allocated network bandwidth to each video source and so the scalability of interest in this research is the so-called quality scalability. The transmitted video packets which are generated by the network abstraction layer (NAL) are then dispatched over multiple paths in a weighted manner and the weights associated with each path j of the video source k is proportional to the average bandwidth share Zjk ðtÞ of that path from the average total allocated bandwidth BW k ðtÞ in time slot t which is equal to Zjk ðtÞ ¼ x jk ðtÞ=BW k ðtÞ, where the x jk ðtÞ is the average allocated rate to path j of the video source k in time slot t and can be calculated similar to the BW k ðtÞ. Note that the time slot duration DT has a great impact on the characteristics of the designed scalable rateadapted encoder. If the DT selected to be large, then the encoder must transmit in a constant bit rate (CBR) mode (with variable video quality) and by reducing the size of DT the encoder can adapt itself to the network dynamics more efficiently and moves toward a variable bit rate (VBR) transmission mode (with constant video quality). As implementing the proposed rate adaptation algorithm in real encoders is a complicated task and beyond the scope of the current work, in the numerical analysis section, instead of implementing the mentioned rate adaptation algorithm the EvalSVC extension to the EvalVid [39] has been used for supporting the mentioned quality scalability. 3. Numerical analysis In this section, the performance of the proposed PSO framework which uses some quality evaluation metrics in enforcing some optimal rates between the competing scalable video sources has been simulated and compared with other relevant methods.
Consider a sample scenario which is depicted in Fig. 1. This scenario is consisted of two competing scalable video sources S1 and S2 and each video source is routed through two disjoint paths. Path 2 of S1 and path 1 of the S2 are common in one wireless link. Sixteen nodes are randomly distributed in a square-shaped area in this scenario. A simplified strong line of sight propagation model is selected for the wireless links and the nodes mobility have been neglected by the assumption of a static network topology. Although the mentioned parameters are typical ones, selection of other values in practical situations cannot change the optimality of the results because the proposed optimization framework leads to optimal resource allocations with maximum total quality of experience for the competing scalable video sources independent of the selection of these parameters. M¼ 20, N ¼ 2, L¼1000 and x1,min ¼ x2,min are selected to be 1 Mbps, other parameters are listed in Table 1. Also the bandwidth of each link assumed to be 1 Mbps. The background traffic of every link is a time-varying random value with the maximum of 750 kbps that is 75% of the bandwidth of the wireless links. It is assumed that each path of the sources S1 and S2, consists of 3 wireless links. Similar to the meta optimization parameters settings in [3], the behavioral parameters o, fy and fg are calculated to be 0.73, 1.49 and 1.49 respectively. The average running time of each iteration of the proposed algorithm in (4) and (5), using MATLAB 10 on a 2.2 GHz Core2D CPU was calculated to be 1.5 ms. The average period of running the proposed algorithm is M (we have selected M ¼ 20) times the average speed of an iteration and is equal to 30 ms. The average running time of the fair-share method based on the same processing power was about 24.6 ms i.e. about 18% faster. The allocated aggregate rates of the two video sources are shown in Figs. 3 and 4. The total QoE of the scenario which is calculated based on relation (7), is depicted in Fig. 5. As it can be deduced from Figs. 3 and 4, the aggregate allocated rate to the video sources make some fluctuations near the target rate xmin ¼ 1 Mbps. This is the direct consequence of the competition process between the two video sources and the background traffic for consuming the network resources especially in the bottleneck links (link C in Fig. 1). This is resulted from the fast fluctuation of the background traffic patterns for each link. As the result of these fluctuations, the constraint set of optimization problem (7)–(9) changes rapidly, but the PSO algorithm is able to track the resulting fast variations
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
Fig. 3. Aggregate rate of S1.
731
Fig. 6. Aggregate rate comparison when x1 ¼ 0:8 (solid) and x2 ¼ 0:2 (dashed).
Fig. 4. Aggregate rate of S2. Fig. 7. Differentiated QoE of S1 with x1 ¼ 0:8 (solid) compared to QoE of S2 with x2 ¼ 0:2 (dashed).
Fig. 5. Total QoE.
in the constraint set; so it tries to track the target point and converge to the optimal solution. In Figs. 6 and 7 the aggregate rates of the two sources and the corresponding QoE of them are being depicted when quality of the video stream at destination D1 has precedence in comparison with the other one (i.e. x1 ¼ 0:8 and x2 ¼ 0:2). In such cases the proposed optimization algorithm emphasizes to increase the aggregate rate of S1,
even it leads to a lower performance of S2. Although the average allocated rate and QoE of S1 are much more than S2, in some cases the instant amount of QoE or allocated rate of S2 are more than the one’s of the S1 which is a result of instant high levels of background traffic in the paths of S1. Finally in Fig. 8 the aggregate rate of the proposed method is compared with a fair share scenario. In a fair share scenario, the rates are allocated in a dumb manner and based on the average-weighted available bandwidth of each path. For example if mini ðei11 Þ ¼ 0:5 Mbps and mini ðei21 Þ ¼ 0:6 Mbps then the allocated rate to path 1 of S1 will be x11 ¼ 0:5=ð0:5 þ 0:6Þ 1000 kbps ¼ 455 kbps. As it can be easily checked, the total QoE of the proposed method is much better than that of the non-optimal fair share regime in Fig. 9 because the philosophy behind the resource allocation process in the fair-share scenario is far apart from that of the proposed cross-layer PSO algorithm. It must be mentioned that the proposed bit error rate model in (1), is valid for moderate to low levels of the nodes’ mobility [7]. Hence, the proposed result can be extended to these mobility-based scenarios.
732
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
Fig. 8. Proposed aggregate rate of S1 (solid) compared to fair-share method (dashed).
Fig. 9. Proposed total QoE (solid) compared to fair-share method (dashed).
3.2. Part two For this part, the ns-2 network simulator combined with the Evalvid tool are used due to their extensive support for performance evaluation of the video transmission algorithms over wireless networks [40,41]. Also the EvalSVC extension to the EvalVid based on [39,42] has been used for implementing the SVC support for the EvalVid using the JSVM scalable codec (cf. [43] for further details). An experimental scenarios is generated with 50 wireless nodes distributed over a 20 m 20 m area. The power level of the nodes are adjusted such that half of them (25 nodes) are with an efficient transmission range of 2 m and another half are capable of efficient transmission within the range of 3 m. Ad hoc on demand multipath distant vector (AOMDV) multipath routing protocol has been implemented [8,44]. The simulation set-up consists of 2 video source–destination pairs and the routing protocol introduces some paths for each video source. Also, it is assumed that 12 background VBR sources with average bit rate of 60 kbps are present in the network. The same as part one we have x1 ¼ 0:8 and x2 ¼ 0:2 for the two video sources 1 and 2 respectively. Behavioral parameters are the same as part one. Each flow established from a source to
a destination uses a modified implementation of the rate adaptation part of the proposed PSO algorithm (relation (5)) as a transport protocol. The important point here is that in implementing the mentioned algorithm the original packet error rate which is calculated based on (3) has been replaced with the actual estimated packet error rate in the simulated IEEE 802.11b-based wireless network. For simplicity, the CBR model is used to represent video traffic with constant bit-rate but time-varying quality and send it over real-time transport protocol (RTP) agent. The reason why we may attach the CBR video sequence traffic in the ns-2 to the RTP agent for traffic transmission is that based on the JVT algorithm [45], there exist many rate adaptation algorithms which work in GOP-level, frame-level, etc., which can convert the variable bit rate feature of the scalable video codecs into constant bit rate and match the resulted bit rate to the available network bandwidth. Additionally, it is always possible to encode scalable video at a constant bit rate by adjusting the quantization parameter (the so-called quality scalability). Another important traffic shaping method which can be used in non-live video streaming applications (such as the current work), is simply using appropriate traffic shaping techniques (such as leaky bucket, token bucket, etc.) or buffer based methods [46]. In the current work, the video streaming application is not live, so, limited amounts of the processing delay which are inherent in the rate adaptation and traffic shaping methods are tolerable and feasible. Also it is assumed that each frame fits in one packet. Background traffic is specified as a random process with exponentially distributed packet sizes and arrival intervals. The average rate of background traffic on each link is randomly selected in the range of 0–50% of the link capacity. The QCIF video sequences Foreman and Mother and Daughter are encoded and decoded with H.264 (because of its support of scalable coding) at 30 frames per second, using various quantization levels and for different GOP lengths. For most of the following experiments, no random packet loss is introduced. Packets are dropped only if they do not arrive at the receiver by the play-out deadline. In this case, previous-frame concealment is used. Although the PSNR is not a good candidate for video quality [20], for the sake of its popularity in the video networking community, this metric is used for the performance comparisons in this part of the numerical analysis. So, decoded video quality is measured in terms of PSNR of the luminance component. For each experiment, the video sequence is looped for more than 200 times, and the average values of all realizations are calculated, which can be interpreted as the expected performance of the algorithm in a snapshot of time for the given network. Other required simulation parameters are listed in Tables 1 and 3. In Figs. 10 and 11, the average PSNR performances of two video sources (Foreman and Mother and Daughter) have been compared for PSO and Congestion Optimization (CO) (presented in [11]) models. In the CO model, the bandwidth is allocated to the video source based on optimizing the network congestion. The PSNR values for the PSO and CO models are calculated using the EvalVid
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
733
Table 3 Simulation parameters of dynamic scenario. Parameter
Value
MAC standard Antenna type Interface queue type Mobility model Propagation model Routing protocol Interface buffer size Packet size (L) Average node speed Simulation time Number of video sources (N) Number of simulation runs Threshold g Link bandwidth Number of VBR sources Average VBR rate
802.11b Omni Drop tail Random waypoint Two-ray ground AOMDV 50 packets 1000 bits 4 m/s 50 s 2 200 4.9 1 Mbps 12 60 kbps
Fig. 10. PSNR performance of PSO versus CO methods for Mother and Daughter video sequence.
Fig. 12. PSNR performance of PSO with packet loss versus PSO without packet loss for Mother and Daughter video sequence.
CO method. The reason why the PSO works better than CO in most cases is that the idea behind the PSO optimization is more quality-centric than that of CO. These figures show that in the existence of the parameter estimation inaccuracies resulted from many factors such as the inexact MAC layer modelling, the existence of the background variable bit rate traffics, etc., the proposed PSO method is robust and is still able to assign higher mean quality levels to the competing video sources with respect to other optimal methods. The fluctuations in the PSNR levels are not only the direct consequence of competition between the two video sources and the VBR background traffics for consuming the link capacities but also due to the MAC layer interactions and the nodes mobility. Finally, in Figs. 12 and 13 the PSNR performance of the PSO method has been compared for sample Foreman and Mother and Daughter sequences in two cases of existence and lack of the network-related packet loss. As can be verified in the figure, in the case of no packet loss, the quality degradation is only due to the encoder R–D characteristics and a higher quality (PSNR) level exists with respect to the existence of the packet loss. Fluctuations in the PSNR level of the video sequences in the no packet loss case are encoder dependent and are due to the fact that the CBR video transmission usually results in the time-varying video quality due to using different quantization parameters for coding different images scenes in the video sequence which have different contextual complexities. 3.3. Part three
Fig. 11. PSNR performance of PSO versus CO methods for Foreman video sequence.
tool. As it can be verified in these figures, more average quality can be achieved for two different video sequences (with different contextual complexity) with respect to the
In this part, the performance of the proposed PSO algorithm (gbest solution) has been compared with that of lbest solution. In this part a distributed solution has been implemented for the PSO algorithm which differentiates it from the centralized implementations which are presented in the previous parts. In this part, it is assumed that there exist only 10 video source destination pairs in the network which are distributed in a circular area with radius 20 m and there exists only a single path
734
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
Fig. 13. PSNR performance of PSO with packet loss versus PSO without packet loss for Foreman video sequence.
between any source–destination pair. We have selected the same x for all video sources which is equal to 0.2. The SNR SVC property of the EvalSVC has been used for enabling the quality scalability and transmitting the video sequences with a variable bit rate (which is decided based on the current allocated bandwidth) during the transmission time. The packet size L is considered to be 4800 bits, the average nodes’ speed is 10 m/s and the Manhattan Grid mobility model is adopted. Other simulation parameters are the same as part two. In implementing the proposed PSO algorithm, at first, it is assumed that the nodes know the weight parameters x of each other a priori. The real-time transport control protocol (RTCP) feedback [47] has been used for calculating the PEP and associated quality level Q of each source. Then, IP control packets are used which contain a single byte for communicating the current quality level Q and 20 b for communicating the vector G1n between the video sources3 in each update event. Note that having the quality, G and weight parameters of all of the nodes, each source node can run the proposed PSO algorithm (mentioned in Table 2) independently in a distributed manner. Immediately after computing the best found swarm position G by each video source k in each update event, it must communicate it joint with the calculated quality level Qk with other video sources in the swarm. Video sources must use the latest received G for running the next iteration of the PSO algorithm. In implementing the lbest solution, the Kth nearest neighbor (K NN) criterion has been considered for defining the neighbor nodes. The received signal strength indicator localization method in ns-2 which measures the Euclidean nodes’ distances, is used for finding the nearest neighbors. Each neighbor is defined to be a video source. We have set K ¼2 and K ¼4 in the simulations for defining the neighbors of a given node as depicted in Fig. 14. In this figure, we have depicted local groups for a 3 We have assumed that the allocated rate associated with each path of every video source is expressed in kbps and needs 2 b for representation (hence the bit rates up to 64 Mbps can be supported for each path). Note that we have n¼ 10 in this case.
Fig. 14. A graphical view of the K NN boundaries of a typical video source for K¼ 2 (dotted) and K¼4 (solid) in lbest solution.
Fig. 15. PSNR performance of PSO versus lbest K¼ 2 and K¼ 4 and CO methods for video source 2.
given video source (depicted in red) and for clarity of presentation, only the video sources are depicted and other nodes are omitted. Each local group comprises of a central video source and two and four nearest neighbor video sources for K ¼2 (ring topology [32]) and K ¼4 (random graph topology) respectively. The number of local groups for every K is the same as the number of the video sources N . The proposed PSO algorithm is run independently in each of the N local groups. In deploying the lbest solution for a given K, each video source communicates only with each of its K neighbors and runs the PSO algorithm locally as depicted in Fig. 14. In Figs. 15 and 16, the average PSNR performances of the video sequence Foreman have been compared for sources 2 and 6 between the PSO, lbest (K ¼2), lbest (K ¼4) and CO methods. The PSNR values for the PSO and lbest models are calculated using the Evalvid tool. As it can be verified in these figures, the proposed PSO method has better average PSNR performance in comparison with the lbest ones. The reason why the PSO works better than the local lbest model is this fact that because
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
Fig. 16. PSNR performance of PSO versus lbest K¼ 2 and K¼4 and CO methods for video source 6.
of the higher convergence speed of the proposed PSO method, it can better track the time-varying optimal point resulting from the dynamic nature of the network and the traffic fluctuations. Although, based on [32] it is more likely that the gbest model converges to some local optimal point during the execution, but, higher convergence rate of this method in such a dynamic scenario is an important benefit which justifies its use. By increasing the K parameter in the lbest model, the model converges to a gbest one and better quality levels can be achieved. We have also compared the communication overhead associated with the different methods in this part. A rough estimate for the communication overhead of each video source has been used in the simulation which is approximated as ðV T V S Þ=V T where the VT is the total traffic volume including all forward RTP, control IP and reverse RTCP packets (in b) during the transmission time of the video sequence and VS is the size of the transmitted video sequence (in b). The average communication overhead has been compared between the proposed PSO (gbest), the lbest (K ¼4) and lbest (K ¼2) methods and it has been concluded that the overhead associated with these methods are approximately 13.5%, 12.7% and 12% respectively, i.e. the lbest (K¼2) method has 1.5% lower overhead in average in comparison with the PSO. 4. Conclusions In this work, a bio-inspired optimization framework is introduced by which the optimal resource allocation to each path of a multipath-routed scalable video source over wireless networks can be performed. This allocation scheme works in such a way that the total weighted quality of experience of multiple video streams which are transmitted over a wireless environment can be maximized. Main application of such algorithms is in rate allocation to those subsets of real-time multimedia traffics which require differentiated guaranties about the minimum level of the perceived quality (e.g., video, audio). As a simple MAC layer scheduling model has been used,
735
using more practical MAC layer scheduling algorithms (such as those used in the IEEE 802.11 standard) can be recommended for the future research. Also, incorporating more QoS parameters such as delay or jitter in calculating the total quality of experience in the proposed cross-layer optimization framework can be considered as another open issue. The use of other objective quality assessment metrics for approximating the perceived QoE based on the network-related parameters remains open for future research. Designing the appropriate pricing mechanism for matching the users’ perceived quality to the prices paid by the individual users in order to meet the mentioned SLA is also open for further research. The proposed rate adaptation mechanism is for rate adaptation of scalable video with single base layer, so, extending the mentioned mechanism for scalable video coders with multiple enhancement layers is open for future research. As mentioned in the text, the resulting optimal rates are in fact some rate-feedbacks to the video encoder which can be used for controlling the quantization parameter (quality scalability), so, practical implementation of the proposed rate adaptation mechanism in a real-time encoder for adapting the encoding parameters is another important open research topic.
Acknowledgment The author must express his gratitude to the ITRC for its financial support during the research via Grant number 8732353. Also, the author must acknowledge Mrs F. Ayatollahi and Mr. H. Hayatdavoudi for their sincere help in developing some parts of the simulation software. References [1] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International Conference on Neural Networks, 1995, pp. 1942– 1948. [2] M. Clerc, J. Kennedy, The particle swarm—explosion, stability, and convergence in a multidimensional complex space, IEEE Transactions on Evolutionary Computation 6 (2002) 58–73. [3] M. Pedersen, Tuning Simplifying Heuristical Optimization, PhD Dissertation, University of Southampton, School of Engineering Sciences, Computational Engineering and Design Group, 2010. [4] P. Calyam, E. Ekicio, C. Lee, M. Haffner, N. Howes, A GAP-model based framework for online VVoIP QoE measurement, Journal of Communications and Networks 9 (2007) 446–456. [5] Methods for Subjective Determination of Transmission, ITU-T Recommendation P.800, 1996. [6] H. Schwarz, D. Marpe, T. Wiegand, Overview of the scalable video coding extension of the H.264/AVCstandard, In: Proceedings of the IEEE Transactions on Circuits and Systems for Video Technology Special Issue on Scalable Video Coding, vol. 17, no. 9, September 2007, pp. 1103–1120. [7] O.K. Tonguz, G. Ferrari, Ad Hoc Wireless Networks: A Communication-Theoretic Perspective, John Wiley, 2006. [8] M. Frodigh, P. Johansson, P. Larsson, Wireless ad-hoc networking: the art of networking without a network, in: Ericsson Review, 2000, pp. 248–263. [9] E. Royer, C.K. Toh, A review of current routing protocols for ad hoc mobile wireless networks, IEEE Personal Communications 6 (1999) 46–55. [10] E. Setton, B. Girod, Congestion–distortion optimized scheduling of video over a bottleneck link, in: IEEE 6th Workshop on Multimedia Signal Processing, 2004, pp. 179–182.
736
P. Goudarzi / Signal Processing: Image Communication 27 (2012) 722–736
[11] E. Setton, X. Zhu, B. Girod, Congestion-optimized multi-path streaming of video over ad hoc wireless networks, in: IEEE ICME, vol. 3, 2004, pp. 1619–1622. [12] R. Agarwal, A. Goldsmith, Joint Rate Allocation and Routing for Multi-hop Wireless Networks with Delay-constrained Data, Technical Report, Wireless Systems Lab, Stanford University, CA, USA, 2004. [13] S. Adlakha, X. Zhu, B. Girod, A.J. Goldsmith, Joint capacity, flow and rate allocation for multiuser video streaming over wireless ad-hoc networks, in: IEEE ICC, 2007, pp. 1747–1753. [14] X. Zhu, S. Han, B. Girod, Congestion-aware rate allocation for multipath video streaming over ad hoc wireless networks, in: IEEE ICIP, vol. 4, 2004, pp. 2547–2550. [15] P. van Beek, M. Demircin, Delay-constrained rate adaptation for robust video transmission over home networks, in: IEEE ICIP, vol. 2, 2005, pp. 173–176. [16] L. Haratcherev, J. Taal, K. Langendoen, R. Lagendijk, H. Sips, Optimized video streaming over 802.11 by cross-layer signaling, IEEE Communications Magazine 44 (2006) 115–121. [17] X. Zhu, B. Girod, Distributed rate allocation for multi-stream video transmission over ad hoc networks, in: IEEE ICIP, vol. 2, 2005, pp. 157–160. [18] P. Goudarzi, Multi source video transmission with minimized total distortion over wireless ad hoc networks, Springer Wireless Personal Communications 50 (2009) 329–349. [19] A. Papoulis, S.U. Pillai, Probability, Random Variables and Stochastic Processes, McGraw Hill Inc, USA, 2002. [20] S. Winkler, Digital Video Quality: Vision Models and Metrics, John Wiley and Sons, 2005. [21] S. Asmussen, Applied Probability and Queues: Stochastic Modelling and Applied Probability, Springer, 2000. [22] D.R. Cox, Some statistical methods connected with series of events, Journal of the Royal Statistical Society (1995). [23] L. Deng, J.W. Mark, Parameter estimation for Markov modulated poisson processes via the EM algorithm with time discretization, Telecommunication Systems (2007). [24] T.S. Rappaport, Wireless Communications: Principles and Practices, Prentice Hall Inc, Upper Saddle River, NJ, USA, 2002. [25] F. van den Bergh, An Analysis of Particle Swarm Optimizers, PhD Dissertation, University of Pretoria, Faculty of Natural and Agricultural Science, 2002. [26] T.C.J. Chen, F. Pan, X. Tu, Stability analysis of particle swarm optimization without Lipschitz constraint, Journal of Control Theory and Applications 1 (2007) 86–90. [27] A. Ben-Israel, A. Ben-Tal, A. Charnes, Necessary and sufficient conditions for a Pareto optimum in convex programming, Econometrica 4 (May) (1977). [28] W.K. Tsai, j.K. Antonio, G.M. Huang, Complexity of gradient projection method for optimal routing in data networks, IEEE/ACM Transactions on Networking 7 (1999) 897–905.
[29] L. Ning, G. Yan, G. Li, Fast capacity estimation algorithms for manets using directional antennas, Journal of Electronics (China) 24 (2007) 710–716. [30] J. Zhang, L. Cheng, I. Marsic, Models for Non-intrusive Estimation of Wireless Link Bandwidth, Lecture Notes in Computer Science, vol. 2775, 2003, pp. 334–348. [31] X. Cui, J.S. Charles, T.E. Potok, A Simple Distributed Particle Swarm Optimization for Dynamic and Noisy Environments, Studies in Computational Intelligence, Springer, Berlin/Heidelberg, 2009. [32] D. Bratton, J. Kennedy, Defining a standard for particle swarm optimization, in: IEEE Swarm Intell. Symp., 2007, pp. 120–127. [33] J. Kennedy, R. Mendes, Neighborhood topologies in fully informed and best-of-neighborhood particle swarms, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 36 (2006) 515–519. [34] S.S.R. Iguez-Escalona, A Rate Control Algorithm For Scalable Video Coding, PhD Dissertation, Universidad Carlos III De Madrid, 2011. [35] H. Zhihai, K. Yong, S. Mitra, Low-delayrate control for dct video coding via r-domain source modelling, In: proceedings of the IEEE Transactions on Circuits and Systems for Video Technology (2002), vol. 11 (8), August 2001, pp. 928–940. [36] S. Milani, L. Celetto, G.A. Mian, An accurate low-complexity rate control algorithm based on (r,Eq )-domain, IEEE Transactions on Circuits and Systems for Video Technology 12 (2008) 257–262. [37] D. Alfonso, D. Bagni, L. Celetto, S. Milani, Constant bit-rate control efficiency with fast motion estimation in H.264/AVC video coding standard, in: Eusipco, 2004. [38] S. Milani, Source and Joint Source-channel Coding for Video Transmission Over Lossy Networks, PhD Dissertation, University of Padova, Italy, 2006. [39] T.A. Le, H. Nguyen, H. Zhang, EvalSVC—an evaluation platform for scalable video coding transmission, in: IEEE ISCE, 2010. [40] The Network Simulator 2, Online available at /www.isi.edu/ nsnam/ns/S. [41] The EvalVid, Online available at /www.tkn.tu-berlin.de/research/ evalvid/S. [42] T.A. Le, Q.H. Nguyen, A.M. Nguyen, EvalSVC tool-set, 2009 /http:// code.google.com/p/evalsvc/S. [43] J. Reichel, H. Schwarz, M. Wien, Joint scalable video model JSVM-8, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-U, 2006. [44] M. Marina, S. Das, On-demand multipath distance vector routing in ad hoc networks, in: IEEE ICNP, 2001. [45] Joint video team (JVT) of ISO/IEC MPEG and ITU-T VCEG, joint final committee draft (JFCD) of joint video specification (ITU-T Rec. H.264—ISO/IEC 14496-10 AVC), JVT-D157, in: 4th Meeting: Klagenfur, 2002. [46] T. Anselmo, D. Alfonso, Buffer-based constant bit-rate control for scalable video coding, in: Picture Coding Symposium, 2007. [47] RTP: A Transport Protocol for Real-time Applications, RFC 3550, 2003.