I
L , +.
:., ~
'~
II
SIGNAL PROCESSING:
7,,+ ~.,,
IMAGE COMMUNICATION
ELSEVIER
Signal Processing: Image Commumcation 6 (1994) 25-45
Bit-rate control for digital TV and HDTV codecs* J.-P. Ledue* Lahoratoire de Tf'h;communications ,,t tit, Tt;Iddt;tectimr, Universitd Catholique tk' Louvain. Bftthnent Maxwell. B-1348 Lourain- t~-Neuve. Belghml
Received 3r, August 1992
Abstract This paper intends to present an optimum control algorhthm for digital television and high-definition television codecs. Transmissions at either constant or variable bit-rates will be taken into consideration with the purpose of transmitting on ATM networks. Two main goals are expected to be achieved in a codec which controls its output bit-rates. The first deals with realizing graceful commands and degradations of image quality and the second with achieving an optimum use of the buffer in order to maximize the image quality and avoid any buffer overflow. Both actions on buffer content and image quality will turn out to be tightly related in the optimum algorithm. The image control is mainly applied during the non-stationary periods of the incoming information source, i.e. especially during the scene changes to produce smooth variations of image quality. The control of buffer level is performed during the stationary or predictable periods of the source, i.e. within the scenes. The optimum algorithm is adaptive and combines concepts originating from adaptive filtering theory and operations research and is based on three functions: a non-stationary source predictor to estimate a coming horizon of future bit-rates, a cost function to be minimized and an algorithmic search of the optimum policy aiming at minimizing the previous cost function. Key words: Image sequence coding: Buffer regulation: Adaptive filtering; Optimum control
I. Introduction This paper intends to develop an o p t i m u m control scheme for digital television (TV) and high-definition television (HDTV) coding systems. The applications of interest deal with coding the digital sources formatted for TV in 4 : 2 : 2 CCIR 601 or, for H D T V , in HDP, EDP, H D I and EDI i. The control algorithms under
*Tel: + 32-(0)!0-472311, Fax: + 3240)10-476887. , The following text presents research results realized with ALCATEL-BELL-SDT company in the framework of the European RACE Project and the Belgian Broadband Experiment Project. As defined in EUREKA 95: HDP stands for High Definition Progressive, EDP for Enhanced Definition Progressive, both HDI and ED! are the corresponding Interlaced formats. 0923-5965/94/$7.00 © 1994 Elsevier Science B.V. All rights reserved SSDI 0923-5965{93)E0032-Y
26
J.-P. Leduc / Signal Processing: Image Communication 6 (1994) 25-45
1
,..m., 1,.o...I
1°.o.,,,.,,..I J 1 ! l°--I
i Level of buffer occupancy
1 1
Buffer
Channel Input -
Fig. I. General functions in a TV codec.
investigation in this paper will encompass not only all the digital codecs built with linear decorrelation operators (discrete cosine transform (DCT), subband and wavelet), quantizers and variable length entropy coders (VLC) but also codecs embedded in either simple or hybrid schemes eventually strengthened by motion compensation at subpixel accuracy (Fig. 1). Two main goals are expected to be achieved by a control algorithm. A primary goal concerns the control of the subjective image quality of the encoded pictures and a secondary concerns the control of the encoder buffer occupancy. Both objectives can be outlined as follows: 1. realizing a graceful control of the subjective image quality and, more specifically, ensuring the best quality with maximum uniformity and graceful degradations, 2. achieving the optimum use of the buffer capacity to maximize the image quality and to prevent the buffer from either overflowing or underflowing. An efficient codec control therefore has the task to balance these two closely related actions. A uniform subjective quality can be achieved locally within the images by implementing an adaptive quantization scheme based on the computation of a local block criticality as described in [I 1]. Using that adaptive mechanism, a parameter called the transmission factor controls the quantization step size computation and, when maintained constant, this parameter allows the codec to generate locally a quasi-constant subjective image quality. Hence, to ensure a maximum uniformity of subjective image quality, the transmission factor should be maintained as constant as possible or be a very slowly time-varying parameter. In real-time applications, the transmission factor turns out to be the control parameter to command the subjective image quality especially when an efficient adaptive algorithm is implemented into the quantizer. This paper will present the optimum control strategies defined as sequences of consecutive transmission factors to be applied on the quantizer. The TV encoder algorithms are mostly designed with an output buffer in charge of collecting the variable data stream at the VLC output. Due to functional constraints aiming at limiting transmission delay and ensuring fixed synchronism between decoder and encoder, the encoder output buffer has a limited size; its capacity ranges from one to a few coded images. A codec regulator has the task to compute the transmission factor in order to adjust the entropy rates at the quantizer output and to comply to the channel transmission rates without underflowing and overflowing the buffer. Several regulation algorithms have been successively presented in the past literature with different levels of complexity in order to reach both previous objectives. Two schemes can be outlined as follows: the first is a linear feedback reaction from buffer to quantization and is rather frequently proposed as a first approach to the problem especially when only one scene has to be encoded and that no transmissions are required, the second is a feedback law with a regulator, i.e. a discrete transfer function or filter. This latter scheme constitutes the simplest ways to achieve both basic control requirements: it guarantees stability, avoids buffer overflows, imposes transient time constants and returns to a consigned steady-state buffer level. Nevertheless, those two previous solutions cannot be optimum in any way and are not able to apprehend the stochastic and non-stationary nature of the TV input source. As a consequence, a scheme with an optimum controller will be proposed in this paper. Referring to a regulator, a controller is a device more general than a simple transfer function in the sense that it implements a complete
27
J.-P. Le&tc / Signal Proces.v#Ig: Image ('ommunh'ation 6 (1994) 25 45
algorithm designed to perform the general control task of optimally balancing the command for subjective image quality in conjunction with the use of an output buffer. The scheme is based on a model of non-stationary TV signals to be developed at different time scales of observation: stripe (eight consecutive video lines), image and scene levels. The controller uses the level of buffer occupancy as an ohsert;atio, parameter and constructs optimum control strategies. The algorithm combines theory originating from adaptive filtering and operations research. Both constant bit-rate (CBR) and variable bit-rate (VBR) transmission modes require a control of output bit-rates. In CBR appIications, the incoming non-stationary TV entropy sources need to be compressed to reach the rate required on the transmission channel. In VBR applications, two ways of transmitting can be thought of when referring to packet video on ATM networks: I. A free VBR transmission where the puckets are launched when filled up. In this case, no buffer and, therefore, no codec control are really required. Hence, the codec is able to transmit permanently at constant subjective image quality. 2. An enforced VBR transmission with police functions limiting, for instance, the cell rate variability and, at least, imposing a maximum cell rate. In this configuration also called enforced VBR, the codec has to comply with a traffic contract negotiated with the network at transmission set-up, in such contracts, the sources specify their specific traffic descriptors and the network management replies by furnishing enforced values to be applied on the statistics of the cell rates and the cell interarrival times. The codec generates constant subjective image quality, except when exceeding the negotiated bit-rate gauges, a fall-back degradation on image quality has to be activated to adapt the output bit-rate statistics. During those periods, a regulation is necessary to control the quantization-step-size computation. The paper is structured as follows: first, the whole regulation problem is laid down presenting the different functions involved in the algorithm. Secondly, the adaptive optimum-control scheme is proposed leading to three concepts: modeling the source to derive non-stationary predictors learning the source statistical parameter evolutions, defining a coding cost function to be minimized and an optimum-search algorithm in the graph of the coding costs.
2. Statement of the problem The control of a digital TV codec operates on a fourfold process to be addressed in this section (Fig. 2): a source of information, an entropy rate compression, a buffer and a feedback loop from buffer to quantization. The source of information is measured by an entropy rate. Entropy determines the minimum mean amount of information contained in the digital TV signal. When expressed in bits, the source entropy
SOURCE INPUT
H~Cn). [QUANTIZERvLcgr
Hq(n)
_l BUFFER '1
U(n)
BCn)
REGULATORI~ Fig. 2. Schematic regulation overview.
CHANNEL
28
J.-P. Leduc / Signal Processing: bnage Communication 6 (1994) 25-45
represents the minimum bit-rate necessary to transmit the information on the network in such a way that nothing gets lost. This entropy source corresponds here to the source of transform coefficients applied to quantization after spatial and temporal decorrelation. The source entropy rate can also be defined as the entropy rate generated while applying the smallest quantization step size [10]. The quantizer and the VLC are in charge of compressing the input data rate in compliance with the transmission factor. Most of the quantizers implemented so far in image coding have a uniform and linear adaptive law with weighting to take into account psychovisual properties r l0, i 1]. To characterize the bit-rate compression, a rate-distortion function will be derived expressing the output codec entropy rate as a function of both the input entropy rate and the transmission factor. The output data of the VLC are stored into the encoder output buffer which eventually accumulates an amount of data equal to the long-term excess of the compressed data rate with respect to that allowed on the channel. As a co~cquence, the level of buffer occupancy measures how far the quantized data rate overcomes th.," transmi~;sion rate. Eventually, the reaction law aims at computing the transmission factor applied to the qu,,:ltlzer in accordance with the current level of buffer occupancy and the channel rate of transmission. in the subsequent sections, those four functions will be addressed and developed to lay down the general foundations of any codec control. A sound but justified linear regulation approach will be first presented to delimit the drawbacks to be avoided and to motivate a thrust towards optimum algorithms.
2. i. Description o/" the complete source model
Let us now examine the structure of the entropy source. As the spatial decorrelative operators are applied locally within images, some correlation remains by consequence on larger scales. This correlation depends on several factors: the structure of editing the TV programs 2. the way of scanning images, the format used to represent the image and the encoding algorithms of digital TV signals. The way TV is edited leads to subdividing a TV session into programs, scenes and images. The image scanning is a periodic process performed by a camera which can additionally subdivide the image into fields. The encoding algorithms continue further subdividing the image into stripes. At the sub-stripe transmission level, the packet or cell interarrivai time will be considered. The definitions and the size of image formats and those of the cells have been fixed by the standardization bodies. The notion of stripe results from an algorithmic coding purpose and characterizes a video area of eight consecutive lines (CCIR Recommendation 723). Scenes and programs have definitions relevant to the specific issue of concern. In one sense, the visual scene specifies time intervals where the signal is characterized by a fixed background with possible panning and zooming and by a variable foreground with humans or objects in motion. A visual program is a set of scenes defining a whole entity for given subjects like a movie, a debate, a cartoon, etc. In the bit-rate modeling approach, the definition of scenes and programs have been restated in terms of their statistical properties. These aim at characterizing the process at a level slightly different from that of the visibility. The statistical scenes are characterized by time intervals over which bit-rate statistics (like the image bit-rates) are rather quasi-stationary. In that sense, the statistical scenes can be longer or shorter than the visible scenes depending on the changes in the prevailing stationary regime. A change of background can imply a drastic change of image bit-rate and, therefore, a discontinuity within stationarity. These singularity points within image and stripe bit-rates are referred to as scene changes, which delimit the span of each scene. A scene change can take all the forms between abrupt and smoothed transitions. The statistical scenes are characterized by statistics of durations and bit-rates. A statistical
The video-editing process consists of assembling a program into sequences and a sequence into plans, in this paper, the plan will be termed a scene: it is commonly defined as the basic unit of shooting video.
29
J.-P. Leduc / Signal Processbtg: Image Conmmnicatimt 6 (1994) 2.$-45
Hierarchical decomposition of a 'IV session
Modeling
"IN ~ession ]
/111\ I o°.1 /111\ ProgrammeI
Scene
Image
I
[
m m m o 0
= Semi-Markov Process
;g
/111\ I
O
O
, N o n - S t a t i o n a r y Autoregressive Process
- Periodic Process
t-t
m
= Point Processes D.,
Fig ~. Hierarchical modeling of the whole TV signal.
program is defined as the segment of time during which scene statistics are quasi-stationary. The final hierarchical model of bit-rates is presented in Fig. 3. To end up, let say that this model has been thoroughly supported by an experimental measurement of the bit-rates and by computations of all the statistical characteristics [9]. 2.2. Rate-distortion./'unction.for television codecs As previously mentioned, TV sources constitute one category of information source. The goal of the codecs being to compress as far as possible the entropy rate of the source, the concern of this section is the ratedistortion function, namely, the function relating the source entropy rate with the output codec entropy rate at different levels of subjective image quality. In this study, the transmission factor will act as an indicator of subjective quality and will be involved as such in the rate-distortion formulation. The TV codecs are composed of three main operations [ 111: the decorrelation, the quantization and the variable-length entropy coding of which the second is not invertible. The invertible operations do not affect the value of the entropy [10"l. As a consequence, the quantization procedure is the only operation within a codec which requires to model the compressed output entropy rate as a function of both input source entropy rate and value of the applied quantization step size.
J.-P. Lechu' S ~ m d Processhzg: hmtge ('ommunh'athm 6 { 1994) 25 45
30
Let us therefore introduce the rate-distortion function of a TV codec. As an intuitive guess, it is reasonable to state that the variation of the compressed output entropy rate is, in first approximation, proportional to the value of the compressed entropy rate and expressed as
AHq(n)
=
(I)
_ ~tHqln},
Aq~(n} where the variation of entropy is produced by removing the least significant levels of accuracy. In this sense, ~pln} is the normalized value of the quantizer scaling factor representing the inverse of the accuracy on a real scale ranging from 0 up to I. Therefore the model can be re-expressed as Hqln} = Hitn)exp[ - =(~p(,} -
(Po}],
(2)
where n is the sampling time, the sampling period will turn out to be equal to one stripe and the sampling time corresponds to the end of the current encoded stripe. cp(n} is the normalized quantizer filctor when quantizing the transform coefficients, therefore, with 6 being the quantization step size 6(n} = 2'''p('}, the value of (p(n) remains constant during the whole sampling period, m stands for the number of bits on which the coefficients are represented. .... ePo is the normalized quantizer scaling factor at the finest quantization step size 60 = 2 '''p°. Usually, tpo = 0. Hq(n) is the compressed entropy bit rate corresponding to the quantization step size 600 = 2'''p('}. ..... H~(n} is the source entropy bit-rate corresponding to the compressed entropy bit-rate at its finest quantization step size 6o. - ~t is the compression factor. First of all, ~ is an intrinsic characteristic of any given quantizer. .... the bit-rates H~(n) and Hq(n) represent bit-rate accumulations over the corresponding sampling period n. The interpretation of relation (2) is rather intuitive and is statistically verified in the codecs presented in this paper. At high values of the codec output entropy rate, the quantization action is fine so that the number of non-zero transform coefficients with high orders is significantly high and as such contributes strongly in the output data rate. This means high variations of the compressed entropy bit-rate for small changes of the quantization step size. The contrary holds at low values of the codec output entropy rate, the quantization is rather coarse so that it remains nearly no high-frequency non-zero coefficients and the energy is concentrated into a few low- and intermediate-frequency coefficients. The sensitivity of the output rate to variations of the quantization step size is very weak here. Relation (2) shows that the sensitivity of the output entropy rate is proportional to the level of the output entropy rate. The argument is valid in first approximation for compressible sequences, i.e. those with a high-frequency content. Consequently, this model no longer holds accurately when dealing with sequences of video telephony where the block content is mostly limited in the low-frequency part of the transform domain. Let us now reintroduce the command for a constant subjective quality called the transmission factor U(n). To consider the example of the RACE codec, the DCT transform coefficients have an accuracy of I I bits plus sign, The quantization step size is computed with respect to a variation of quantization scaling factor ~,~(n} - tpo = U ( n ) / ( 1 6 x I I). Therefore, when U(n~ = 0, the maximum coefficient precision is achieved at the quantizer output and, whenever the value of U(n) increases with 16, the output coefficients loose one representative bit. Owing to further computation of the block criticality, Uln) becomes a command for a constant subjective quality applied during the whole stripe. Therefore, taking the sampling period equal to one stripe, it follows in a first-order approximation that Hq(ti) =
Hi(n)exp [ - xU(n)].
(3)
J.-P. Le~hlc / Signal Process#tg: bnage ('ommunh'athm 6 (1994) 2 5 45
31
The experimental values of the ~ factor have been estimated on several reference TV sequences and, as an important conclusion, this factor turns out to be rather constant in luminance and chrominance for a given quantizer. For instance, using a normalized quantizer scaling factor, as tp(n) - ~0o = Uln)/(16 x 11}, the BBA 3 hardware quantizer is typically characterized by a e [5.06-2.97] and the RACE 4 hardware quantizer by • e [6.60-5.00].
2.3. Output buffer The encoder output buffer is a first in first out (FIFO) memory which accumulates the fractions of stripe bit-rates in excess either to the constant channel cell rate in CBR applications or to the variable cell rate enforced by the network contract in VBR applications. Ideally, the encoder buffer size should be as large as possible (i.e. "'infinite"). Nevertheless, besides the constraints of hardware limitations, functional transmission arguments tend to limit the transmission delay and therefore the buffer capacity. The encoder and the decoder both have a buffer with a coupled regulation of the level of buffer occupancy. The regulation of the level of the decoder buffer is in fact locked to that one of the encoder [10] in order to guarantee transmission synchronism between encoder and decoder. To achieve this, an amount of bit stream equal to the mean bit stream produced during a synchronism period is permanently buffered and shared among the buffer of the encoder, that of the decoder and, for a negligible proportion, the network queues. In practice, maximum tolerable transmission delays of a few images are allowed in live TV and interactive video conferencing. The choice of the time interval during which the transmission factor is maintained constant is a compromise between the buffer capacity and the need to maintain the subjective image quality as constant as possible. To avoid buffer overflow and to achieve a good system controllability, the feedback loop has to update the value of the transmission factor at time intervals as small as possible and, in any case, inferior to the period of one image. In the opposite way, the intervals between the consecutive updates of the value of the transmission factor should be stretched as large as possible to ensure an uniformity in the image quality. A period of one or a few stripe duration appears to be a compromise for the period to update the value of the transmission factor. In the following, the stripe period will be taken as the time interval over which the transmission factor is maintained constant and consequently, it will become naturally the sampling period to be considered in the discrete system equations.
2.4. The.feedback loop The feedback loop aims at relating the level of buffer occupancy to the computation of the quantization step size to achieve the codec control. The quantizer step size is computed for each transform coefficient as a function of first its order or its frequency, second the local image criticality and third the transmission factor which imposes the level of subjective quality. A rate-distortion function has been developed relating the output entropy bit-rate with its input value and the value of the transmission factor. The transmission factor enables to compress the input entropy rate at constant subjective quality: when the value of the transmission factor increases, the quantizer step size increases, the transform coefficients are more and more coarsely quantized and the entropy bit-rate decreases. On one end, the transmission factor enables to adjust the output quantized data rate; on the other end, the level of buffer occupancy is a cumulated measure of the excess of data rate with respect to the constraint rate of the transmission channel. Hence, when the level of buffer occupancy tends to increase, the transmission factor has to be increased and conversely. The buffer occupancy is therefore fed back to the quantizer to adjust the output bit-rate. The simplest feedback reaction -'TV codec developed within the Belgian Broadband Association project. 4TV codec developed within the RACE-HIVITS 1018 project, i.e. Research for Advanced Communications in Europe.
32
J.-P. Leduc / Signal Processing: bnage Communk'ation 6 (1994) 25-45
u(.) U4
.................................................
U3 U~
UI
B1
B3
~
Be ._
B3
B4
nominal working zone
Fig. 4. Feedback law for constant-bit-rate channels.
to be implemented is therefore a linear relationship with a positive slope relating the transmission factor to the level of buffer occupancy. The shapes of the elementary reaction curve are now precised some more in the CBR and VBR modes of transmission. In a CBR application, the loop reaction implements a linear curve with a positive slope between the relative level of buffer occupancy B(n) and the transmission factor U(n). Under the assumption of a constant input image entropy bit-rate, a stable equilibrium will be reached between the level of buffer occupancy and the transmission factor: a slight variation around this point implies either an increment of the transmission factor which tends to decrease the quantized data rate, to lower the level of buffer occupancy and eventually to force a return to the equilibrium or a decrer~ent which leads to a converse effect i.e. a final return to the equilibrium. When the incoming mean image bit-rate increases due to a scene change, the point ofequilibrium is shifted to higher values of both the transmission factor and the level of buffer occupancy and conversely. In the case of an incoming entropy bit-rate lower than the rate of the transmission channel, the equilibrium point decreases to a transmission factor equal to zero and induces an underflow. It is no more possible to comply with the transmission rate without stuffing with an additional stream of bits. If the incoming entropy bit-rate jumps to such a high value that the variation of the transmission factor during the following stripes is not enough to compensate the step, the buffer overflows momentarily. Two additional extreme fail-back zones have been provided with an increased slope, one at high levels of the buffer occupancy to prevent the buffer from overfowing, another at low levels of the buffer occupancy when it tends to empty. As an extreme case, a stuffing is performed by the codec police function to maintain the constant output bit-rate with an equilibrium point within admissible values (Fig. 4). When dealing with a VBR application, the transmission factor is ideally maintained constant for any image bit-rate and the output cell rate is free to fluctuate at the rhythm of the incoming non-stationary TV source. In practical cases, a police function negotiated at call set-up will limit the variability of the output rate within a bit-rate gauge specified at least by a mean and a peak image bit-rate. In this case, the buffer is used only to accumulate the extra bit-rate beyond the bit-rate gauge negotiated at the call set-up. No feedback regulation scheme is therefore required during this mode and a perfect uniform quality is provided by maintaining a constant transmission factor. Because of the potential occurrence of scenes with a information density inducing an output bit-rate higher than the traffic rate negotiated at the call set-up, the buffer occupancy can rise and reach a threshold level beyond which it is necessary to switch on a fall-back mode in order to prevent the buffer from overflowing. A feedback regulation is locked on to increase momentarily the transmission factor and to ensure a return to an acceptable output bit-rate. As a consequence, the image
J.-P. Leduc / Signal Processing: bnage Communication 6 (1994) 25-45
33
u(,o
s(,~- I) B2 = B,
~
. nominal working zone +
B
fall-back zone
Fig. 5. Feedback law for variable-bit-rate channels.
quality is no longer maintained constant but is slightly degraded by a more coarse quantization step size until the input density of information decreases. The diagram of the control loop is shown in Fig. 5.
2.5. The descriptive equations of the control loop To draw the discrete system equations, the sampling period n is chosen equal to the stripe period. The bit-rates to be considered are those accumulated up to the end of the stripe and, consequently, the sampling instants correspond to these particular epochs. The descriptive equations of the control-loop behaviour are given as follows by the set of relations: Hq(n) = Hi(n)exp [ - 0cx U(n)]
E(n) =
quantizer and VLC,
H q ( n ) - Dc channel constraint,
{4) (5)
B(n) fmax[B(n-1)+~---~E(n),O 1 bufferaccumulation,
(6)
U(n) = Uo + b[B(n- 1 ) - Bo] reaction loop,
(7)
where Hi, Hq, Dc are the stripe entropy bit-rates corresponding, respectively, to the quantizer input and output and to the transmission channel, B is the relative level of buffer occupancy with a maximum capacity equal to BMand E is the excess of the stripe output bit-rate eventually routed to the buffer. The parameters b, Uo and Bo are the slope, and the coordinates of the origin of the reaction law. One delay equal to one sampling period appears in the reaction loop equation. It expresses that the transmission factor is computed at the end of each stripe period and applied during the next one.
2.6. Drawbacks of linearfeedback reactions A linear feedback curve presents some drawbacks prohibiting its use with real-time TV transmissions in CBR modes. These drawbacks to be corrected can be enumerated as follows. 1. The transient responses towards the equilibrium point can deviate outside the bounds of the buffer capacity inducing an overflow. The reaction loop with a simple linear reaction law can never really be a guarantee and absolutely protect the buffer from overflowing.
J.-P. Le~hlc / Sigmd Prm'essh~g: bmtge Communh'ation 6 r1994 ) 25-45
34
2. The slope of the reaction curve can never be adequately fixed: if it is small, it induces a nearly constant value of both the transmission factor and the subjective quality but at an increasing risk of overflow, if it is high, it lowers the occurrence probability of the overflows but, as a counterpart, it produces a highly variable transmission factor. These few sentences restate the core of the problem of balancing the need to a uniform quality with a buffer control without overflow. Those major drawbacks justify adapting the reaction loop to correct the linear regulation scheme and get more efficient responses of the codec in real-time transmissions. The purpose of the (ollowing section is to study the optimum control mechanism which will solve all the problems presented so far and lead to some generalizations of the concepts proposed in the field.
3. Optimum-control algorithm As already defined, a controller is an algorithm which adds new potentialities and more general properties which enable the control design to take into account the stochastic nature of the input entropy source, to implement adaptive systems learning the incoming signal statistical properties and to derive schemes which are optimum according to some criteria. These schemes are based on combining three functions. I. A cost function to be optimized and relevant to the image sequence coding experience. 2. A predictor to project into the future the source bit-rates to be expected with respect to the correlations learnt from past realizations. The predictor refers to the model of non-stationary incoming TV signals leading to a general Kahnan filter as developed in the theory of adaptive filtering for non-stationary signals. 3. An optimum search through a graph of predicted bit-rates covering the future up to the horizon of one image using theories issued from operations research and graph theory. The horizon is defined as the maximum extend of future to which the previsions are estimated. The algorithm presented in this paper will turn out to be general enough to cope with non-linear systems (rate-distortion function), non-stationary sources and to encompass both CBR and enforced VBR transmission modes. Learning capacities of model parameters can be provided at different layers of the source model namely the stripe, the image, the scene and the program. 3.1. The cost /imctmn
The optimum stra,*egy consists of allowing, on a stripe-by-stripe basis, the smallest changes in the subjective image quality (these changes are defined by AU(n) - U(n) - U(n - ! )), i.e. in controlling the codec as smoothly as possible. Especially at scene changes, the first image is coded in intra and needs moreover to be finely quantized to constitute a substrate to efficient predictions. The control should also force the relative buffer level to converge to a consigned reference level B, (for instance, 20-50%) before the occurrence of the next scene change with preparing a space reserve into the buffer to cope with the potential complexity of the next coming scene in mind. As a consequence, the best long term balance between optimizing the image quality and regulating buffer level is performed by minimizing a cost function of the type i+N
v(0
= ~
E~[AUt.)]'
+
:~EB(,O- B,] 2,I
(8)
n=i
where i is the current epoch ofdecision and N the horizon. The parameters ~)and Br can be adapted further to take into account the statistics of image bit-rates and scenes at the horizon of each TV program. To take the example of a TV source, i can be the epoch of each stripe and the horizon can range up to the whole
J.-P. l.edtw / Signal Proressing." hnage Conununication 6 (1994) 25-45
35
image, i.e. a maximum of N = 72 5 coming stripes. Some additional constraints can eventually be added to the control: for example, a practical limitation into the maximum level of relative buffer occupancy, i.e. 0 ~< B{n) ~ 100%. To be more general, the control strategy should stress on controlling the image quality during the non-stationary and uncertain epochs and the buffer during the stationary and certain epochs. 3.2. Predictire models for the source bit-rates
The predictive models 6 for the TV sources will be directly derived from the source, model at the stripe level. The optimum predictive filter according to the performance criterion of the minimum-mean-squared-error estimation is the Wiener filter in the case of stationary stochastic signals: this filter achieves mini=#(y~ E{(X - ,~)2} where the random variable ,~(n)= g(Y) is an estimator of the variable Y(n). Let )~ be the optimum value of ,~. A geometric interpretation of this optimum estimator shows that it is the orthogonal projection on the subspace of all the finite mean-square functions X(n) = g(Y) of a specific,random variable Y(n) = X{n) + W(n) (where W{n) is a noise). As an important achievement of this optimum estimation theory, the random variable E{ X I Y} (expressing what one expects X to be having of observed Y)is also the minimum-mean-squared-error estimator. The optimum estimation can be implemented with a causal, stable and invertible filter called the innovations representation of Y(n). The term "innovations" means tile entirely new information generated at each predictive epoch as the error of prediction represented in this case as a white Gaussian noise. In the case of non-stationary discrete stochastic signals as those involved by the stripe or the image TV or HDTV bit-rate values, two companion versions of the innovations representation approaches are still amenable to computational efficient numerical implementations: the Kalman filter and the non-stationary autoregressive process. These two modeling approaches represent two identical predictive versions: the first one is a state-space version and the second one is an input-output version. The stochast;e state-space equation of the TV or HDTV source model can be written as follows, n being the epoch of observation and A(n) and COO the transition and transformation matrices: X(n + I ) = A ( n j X ( n ) + W(n), Y(n) = C(n)X(n).
(9) (10)
This model is valid for the prediction of both the stripe and the image bit-rates. According to the way of presenting the equations, the model can produce either a one-step-ahead predictor in the case of an image bit-rate prediction or an N-step-ahead predictor in the case of a stripe bit-rate prediction on the horizon of the image, in the case of the stripe bit-rates, X(n) is a state vector of dimension N (N being the periodicity of the periodic process modeling the stripe bit-rates), W(n) is a vector of N white Gaussian noises with E[W(n)W(n) T] - diag(~ . . . . . o'~) = Q and EEW(n)X(n)] = 0 and A(n) is the state transition matrix the coefficients of which are iteratively estimated by an algorithm of the least mean-square (LMS) family explained in the next subsection. The corresponding innovations model is expressed by the Kalman filter and uses the conditioned mean of x(n + 1) with the observations of y(n); n - 0, ..., n given in a recursive equation:
X(n + I) = A(n)X(n) + K(n)[Y(n),- C(n)X(n)], X(no) = Xo,
(I I)
where K(n) is the filtergain defined in function of the conditional state error covariance matrix ~(n): K(n) = [A(n)F.,(n)C(n)T][C(n)~(n)C(n)T] - l,
(12)
it refers to stripes of eight video lines in frames of 576 active lines (50 Hz countries). ~'The tilded variables, the barred variables and the hatted variables stand, respectively, for the optimum values, the mean values and the predicted values.
J.-P. Leduc / Signal Processing: Image Communication 6 (1994) 25-45
36
with .~(~)defined as ---(n)- E{EX'(n)- X(n)]EX(n)- X(n)]TIX(n- 1),...,X(no)}.
(13)
The gain K(n) satisfies a Riccati difference equation: no is the time origin. ,~(n) = A(n)S(n)A(n) T + Q + K(n)[C~(n)C(c)r]K(n),
which is stable if the filter transition matrix
[A(n)
-
-'-(no)= ~,o,
(14)
K(n)C(n)] has its eigenvalues inside the unit circle [5].
3.2.1. Prediction algorithms using L M S methods in this paragraph, the modeling investigates an implementable version of the Kalman filter in the form of a tractable, non-stationary and two-dimensional auto-regressive filter. In the present problem, a prediction of the future stripe bit-rates is wanted to be performed at each stripe epoch over an horizon of one image. The predictions of each individual source stripe bit-rate xdn) are performed by using auto-regressive models given by j+k
(15)
Yq(n) ffi ~ ao(n)xj(n - N + j) + bi(n)ti(n), jffi./-k
where n is the sampling epoch; here, the end of the stripe period• - N is the horizon or the number of stripes within one image and i the index of the stripe predictor I ~< i ~< N. Hence, there is one dedicated predictor per stripe to predict the stripe bit-rate• - k is the number of neighbour samples taken into account the prediction of the stripe bit-rates. Practically, k equal to 2 is the optimum value• - ti(n) is a Gaussian noise, ideally, N(0,1). - ao(n) and bdn) are the model parameters to be permanently updated according to the existing correlation. N identical equations can be written for N different stripes• The prediction equation is updated once during each stripe period when a new value of the stripe bit-rate is made available. The error of prediction ~(n) is computed as the difference between the previous observed and estimated value -
e,j(n) = x ~ ( n - N ) -
~i(n - N).
(16)
The N prediction equations lead to a system of N equations written in a matrix form as X(n + I)ffi A(n)X(n) + b(n)w(n),
(17)
where the matrix A(n) is expressed as "a(l,
A(n) =
1)
a(l,
2)
a(2, I) a(2, 2)
a(2, 3)
a(3, 2)
a(3, 3) a(N- I,N-
a(3, 4) 3) a ( N -
I,N-
2) a ( N -
I,N-
a(N, N - 1)
1) a(N,N) a
the vector X(n) consists of the values x~(n), the vector w(n) stands for the ~i(n). The values of at(n) and bi(n) are provided by an LMS research or an stochastic gradient algorithm (SG) (described in [3]); the numerical procedure is furnished in Appendix A. This leads to the working diagram presented in Fig. 6.
J.-P. Leduc / Signal Processing: Image Communication 6 (1994) 25-45
~(,~)
NON-STATIONARY PREDICTWE FILTER
F
# = [AC,~),bC,~)]
LMS or SO ALGORITHMS
z(n)
37
I.I
x(n)
<,~)
9 : estimated bit rates z:
observed bit rate8
O: set of estimated coefficients Fig. 6. Implementation of a predictive filter for non-stationary signals.
3.2.2. Integrated model of codecs and sources The source model can be included into the codec model to form a complete resulting Kalman filter with the quantized entropy bit-rates as state variables, with the transmission factor U(n) as a control variable and with the relative level of buffer occupancy B(n) as an observation variable. As a consequence, two equations will be deduced: one for the state and another for the observation. These equations will turn out to be general enough to encompass both the constant- and the variable-bit-rate codecs, the difference between these two transmission modes being expressed into the observation equation. Due to the quantization entropy model, the resulting stochastic state equation is non-linear:
Hq(. "t- 1)= Y[A(n)Hi(n),U(n)] + w(n),
(18)
the observation equation is given, in the case of the CBR codec, by B(n) = B(n - 1) + C(n)Hq(n)- D~,
(19)
and, generalized in the case of the VBR codec, by B(n) = B(n -
1) + CHq(n)- De(n) + v(n),
(20)
where the channel bit-rate Dc here is variable as a function of Hq(n) according to a predefined police function P(n): D~(n) = min{ Hq(n), P(n)}. The additional noise v(n) takes into account the way of estimating the output bit-rate. At this point, an optimum algorithm has to be designed to define the optimum strategy of control, i.e. the sequence of the future values of U(n) over the whole horizon of prediction. This strategy will be elaborated with the use of the current observed values of the level of buffer occupancy in order to minimize the cost function on the horizon. Practically, the observation of the quantized stripe data rates entering the buffer allows the controller to compute the corresponding source rates by the inverse rate distortion function. The controller updates the source predictor coefficients, computes the cost to be appended on the graph and carries out the optimum search to generate a new sequence of control parameters from which the first is the transmission factor, which has to be extracted and applied to the quantizer.
38
J.-P. Lethw / Signal Proces.~#lg: linage ('ommunicathm 6 { 1994 J 2 5 - 4 5 U=3
U=2
U(n U=O
step n
n+1
n+2
n+N-1
n+N
Fig. 7. Gr:lph of all possible strategies at horizon N (example with three values for U).
4. The optimum-control algorithm
The pt, rpose of this paragraph is to develop one version of the optimum control algorithm of the stripe bit-rate based on both cost function and optimum prediction filter. Dynamic programming theory will be used as an efficient alternative to the classical approach to optimum control, founded on the calculus of variations; it is in fact suited to solve the control problems with non-linear and time-varying systems and, as a powerful property, it has straightforward implementations using the graph theory. The fundamental statement of dynamic programming theory is based on Bellman's principle of optimality which can be summarized by two statements: "'Any substrateqyfrom the state Xi to the state Xj [the sequence Si.j(Xi, Xj) = (Xi, X~ +t . . . . . . X~)] extracted .from an optimum strate,qy from the state 0 to the state N [So.N(Xo, XN)] is also optimum". "'At any current state, an optimum strate~dy has the property that no matter what the previous values of the control izave been, the remainimd decisions must constitute an optimum strateqy with re,qard only to the current resulting state". The optimum algorithm will be deducedas an application of the principle of optimality. The prediction filter yields source stripe bit-rates on horizons of a whole image. At the occurrence of each stripe, the predictions are reactualized and a graph can be constructed corresponding to the values of U(n) for whi,:h decisions will have to be taken. Over the horizon of one image, N decisions have to be taken (Fig. 7): at each epoch (i = n . . . . . n + N), a node corresponds to each value of U; all the transitions are a priori possible at the expense of the cost function defining distances of the connecting arrows. The predicted stripe bit-rates (optimum past conditioned expected rates) enable computing distances to be appended on the graph. The optimum sequence ofcontrol is given by the shortest path starting from the current state n given by U(n) and proceeding epoch by epoch up to the state n + N at U(n + N). 4.1. Forward algorithm
The shortest path is established in two phases: the first consists in scanning the future starting with current state n and running forwards up to the horizon applying the principle of optimality on subtrajectories which span from the origin to col~secutive prevision points. The second phase consists of scanning the whole horizon backwards to deduce the optimum path. The next paragraph describes the implementation of the search algorithm for the shortest path; the most intuitive way to present the algorithm is by describing the forward version. Let us notice that an equivalent backward algorithmic version could have been described similarly. Phase !. The first phase is composed of N periods at which the subtrajectories are success;rely considered from j = n to the future epochs j = n + i (i ~ I to N). Fi[U(n)] stands for the set of all adr~iissible ~alueg oil
J.-P. Lethw / Signal Pro,'essing." hnage Communication 6 ¢!994) 2 5 - 4 5
39
U(n + i) starting from U(n) and proceeding i steps forwards (the ith step transitive closure). Conversely, F~- '[U(n)] stands for the set of all the admissible values of U(n) starting from Uln + 0 and proceeding i steps backward. Let us remark that the condition [B(n) < 1.0] limits thecontent of set of, the admissible values of U(n). The set ~ is thereafter defined as follows: U(n + i)eqJ,,[U(n + i-- 1),
U(n + i + 1)1 = r,[U(n + i-- l ) ] ~ r i - ' [ U ( n + i + 1)1.
Period 1: The current U(n) value is known and constitutes the origin node of the graph, the associate transition costs V are given by
V[U(n),U(n + I)1 = [U(n + I ) - U(n)]2 + ,9[B(n + I ) - B,] 2, U(n + l)er,[U(n)].
(21)
Period 2:
V[U(n), U(n + 2)] =
t' I/[U(n), U(n + 1)] + V[U(n + 1), U(n + 2)]'J.
min
(22)
['(n + I )e01 [U(n). U(n + 2)1
The value of/.7(n + 1) is obtained as a function of [U(n), U(n + 2)] at each value of U(n + 2), one can associate a corresponding value of U(n + 1) generating an optimum strategy over the sub-trajectory [U(n), U(n + 2)]. An associated expected buffer level of occupancy obtained by the optimum sub-trajectory is appended at each node of U(n + 2). Period i:
l?[U(n), U(n + i)] =
•'
min
V[U(n), U(n + i - 1)1 + V[U(n + i - 1), U(n + i)]j.t
U(n+ l)e61[U(n+i- l ) . U ( n + i + l | l
(23) The value of LT(n~- i Period N:
1) is obtained as a function of [U(n), U(n + i)].
17[U(n), U(n + N)] =
min
't V[U(n), U(n + N - 1)]
U(n + N - I )~tkt [U(n + N -- 2).Uln-t- N)!
+ V[U(n + N
1), U(n + N)] t~.
(24)
The value of 001 + N - I) is obtained as a function of [U(n), U(n + N)]. Phase 2. The second phase consists of reviewing backwards sequential optimum sub-trajectory relations to eventually extract the global sequence of optimum control values U(n), U(n + I) ..... U(n + N - I),
0(n + N). Implementation. The implementation of a search towards the ~hortest path in a graph is well known in the literature: the algorithmic version of Dijkstra described in [2] is well suited to achieve the most efficient search according to the nature of the present ~,'aph. The hardware implementation of that algorithm requires at least N 2 c o s t function computations at each update of the horizon (N being the number of levels for the transmission factor). A realistic way to implement a graph consists of considering a subsampled and pruned version in order to reach technological feasibility. Simple versions of predictors can be considered with an AR(I) model. Current researches in neural networks have demonstrated the way to implement Kalman filters, as non-stationary learning predictor, in conjunction with shortest path searches on a graph [61.
40
J.-P. Leduc / Signal Processing:Image Communication6 (1994) 25-45
/ ~_ 0_..
/
I dual
/ ~
m[
freeze .......I
regulate
.'~'~.[
learning
.'~'~.[
caution
Fig. 8. Summaryof optimumcontrol properties. 4.2. Properties of the optimum control The optimum control is referred to as a cost function composed of two distinctive actions: I. the "freeze" action: E{AUZ(n)} locking the value of the dynamic control, 2. the "control" forcing action: E{[.B(n)- ,~,]z} inducing a dual control action necessary to cope with the uncertainty. Both opposed actions are balanced by the parameter 9. Two main features are analyzed below: dualcontrol, an intrinsic property of optimum-control, and adaptivity, a property issued from hierarchical bit-rate models. Adaptivity balances "control" and "freeze" actions by acting on the two parameters ~ and B, (Fig. 8).
4.2.1. Dual control The concept of dual control is involved when the controller is alternatively forced between a "regulation" and a "learning" action. This behaviour is intrinsically necessary to deal with non-stationary input sources. I. The "regulation" action has to be effective in certainty horizon (stationarity) when the prediction works efficiently. The regulation consists in this case in forcing the buffer level to tend to the reference value. 2. The "learning" action has to be triggered in uncertainty horizon when the covariance of the prediction errors begins to increase. The learning is forced when choosing the paths which minimize the cost function. The prediction errors are a fraction of the partial cost as demonstrated in the following paragraph and, consequently, the system tunes itself towards refining knowledge in model parameters during uncertainty horizons. Moreover, during "learning" periods, the controller puts into the balance an additional "caution" action meaning that it tends to decrease the control action to cope with uncertainty (in this case, to increase the value of the transmission factor). it is demonstrated below that the optimum controller automatically balances both actions. To demonstrate these properties, we can rewrite the equations as follows taking the linear part: 1. For the one-dimensional state equation deduced from Eq. (18) Hq(n)-- Hq(n - N) + a(n))f[A(n), AHi(n - N), AU(n)] + w(n).
(25)
The Hq(n + 1) are the source bit-rates at the consecutive epochs n + t,..., N + n + 1. The optimum search algorithm is performed on the N components of Hq(n + 1). More synthetically, Eq. (25) can be rewritten and
J.-P. Leduc / Signal Processing: hnage Co.m;unication 6 (1994) 25-45
41
iinearized as y(,) = y(n - N) + a(n)u(n) + w(,),
(26)
where u(n) is the control parameter. 2. For the observation equation (one-dimensional) B(n) - B, = B(n - 1 ) - B, + C(n)Hq(n) + v(n)
(27)
or more synthetically
(28)
x(n) = x(n - 1) + c(n)y(n) + v(n).
A control strategy [u(i) = F(U(i); i = n, ..., i = n + N] is sought as a function of the past observations x(i) [x(i); i = , , , - 1..... 0] in a way to minimize the cost function:
}
(29)
xZ(i) .
V(n) = E i=n+ 1
Let us now calculate some expected optimum partial cost S~ associated with trajectories spanning from the epochs j + 1 (with n ~ 1 < j < n + N - 1) up to the epoch n + N + 1 following the shortest path: at epoch (n
+ j + 1):
S,,(n + j + 1) - min E{xZ(n + j + 1) + S,,(n + j -4- 2)1 x(0), ... ,x(n + j - 1)}.
(30)
u(. +j)
Using Eqs. (26) and (28), S,,(. + j + 1) = min { [ x ( . + j) + c(n + j + 1)y(. + j + 1 - N) + c(. + j + l)~(n + j + l)u(. + j)]2 u(n + j}
+ cZ(n + j + l)uZ(n + j)Po(n + j + 1) + o'~ + o'~ + E[S,,(, + j + 2)lx(O),...,x(, + j)]},
(31)
where ¢i(n + j + 1) = E{a(n + j +
1)Ix(0), ... , x ( , + j)}
(32)
is the conditional mean of a(n) at epoch n, Po(n + j + 1) = E{[a(n + j + 1) - d(n + j + 1)] [a(n + j + 1) - a(n + j + l)]Tlx(0), . . . , x ( n + j)}
(33)
is the conditional covariance which measures the amount of uncertainty in the estimate of a(n), i.e. it indicates the de:i;ree of knowledge of the source behavior and the divergence from stationarity. Its contribution is expected to be important at scene changes and to decrease within the scenes. at epoch (n + N + 1): S,,(n+N+I)=
min { [ x ( n + N ) + c ( n + N + l ) y ( n + l ) + c ( n + N + l ) ~ i ( n + N +
l)u(n+N)] z
u(n + N}
+ cZ(n + N + 1)u2(n + N)Po(n + N + 1) + ~'~ + o'~.}.
(34)
The optimum control at the horizon n + N is obtained by differentiating the cost S,,(n + N + 1) with respect to U(n + N) and equating to zero x(n + N) + c(n + N + l)y(n + 1)
ff(n+N)=
-2a(n+N+
l){cZ( n + N +
I)Po(n+N+
1)+~Z(n+N+
(35) 1)'
42
J.-P. Lethu" , Signal Processing." bnage Communication 6 ¢ 1994) 25 ~45
back at the epoch (n + S~(n + N) =
N):
rain
{[x(n + N - I) +
c(n +
N)y(n) + c(n + N)ti(n + N + l)u(n + N
-
I)] 2
u(n+ N - i)
term + c2(n + N)u2(n + N - l)P.(n + N) + a~ + a.,2
[i]
term [2]
terms [ 3 - 4 ]
+ E{ e'ln+N+l)[x(n+N)+Paln + N + ' ) + 82(n"("+N+I)YO'+NI)]21"(+O)+1)
..... x l n + N - I ) } term
[5].
(36)
The optimum control of~(n + N)in Eq. (35) demonstrates the influence of the uncertainty by the presence of Pain + N + I). Let us now distinguish the periods of certainty from the periods of uncertainty. I. in certainty equivalence control (P,,(n + N + I) = 0; Eq. (35)), the optimum control is at the horizon n + N a function of both observed variable x(n + N) and state variable y(n + !), the negative sign demonstrates the correct 'regulation' action of ~(n) ( ~ - :t A U(n + N)) with .x'(n) [ ~: B(n + N) - B r ] and y(tl + N) [ ~ Hq(t! + N)]. 2. in uncertainty horizon when the predictive parameters of source bit-rates are poorly known (Eq. (361k the covariance matrix has high values re: :ng the controller to reduce the values of u(n + N) (i.e. U(n + N) takes high values): the controller manages with caution to cope with the uncertainty. Eq. {36) demonstrates all actions influencing the optimum control which are optimally balanced by the minimization of the cost function: the regulation action (expressed in term I), the caution action (expressed in term 2), the learning action (expressed in term 3) and the potential effect of the parameter uncertainty in the future horizons can be referred to as the 'prevision' action (expressed in term 5). Due to the implementation of the optimum search, the algorithm always imposes the path with the minimum cost and therefore tries to minimize the most important terms of Eq. (36): this leads to an optimum balance of all the described actions with respect to the cost function.
4.2.2. Adaptit,iO' The TV source model has been described in Section 2.1 as a construction of embedded layers corresponding to tfrom top to bottom) programs, scenes, images, stripes: semi-Markov processes are used at scene level, autoregressive models at image level and periodic processes at stripe level. The optimum algorithm has been performed at stripe level. The adaptivity consists modifying the two parameters ~ and B, of the core optimum stripe bit-rate regulation by means of the outer layers, i.e. image, scene or program levels. Both parameters ~ and Br can be adapted to the degree of non-stationarity of the input source. Mainly, this non-stationarity is characterized by the mean image bit-rate per scene and the duration of the scenes. The way to adapt the regulation is as follows: I. the parameter Br determines the reserve of bit-rate to be available to cope with any incoming new scene: it should be as low as possible (20%, for example) when complex scenes are expected to occur. 2. the parameter ~ determines the strength of the control. As a matter of fact, ~ has to be increased in programs with a high scene variability, i.e. short scenes and high bit-rate variabilities (highly nonstationary sources). The value of ~ can be decreased during slow moving programs.
J.-P. Leduc / Signal Processhtg." h,age Communi<'atirm 6 (1994) 25-45
43
The parameter ~)plays an important part in VBR applications; this parameter can be used to switch on the regulation when the incoming source of entropy rate exceeds the policed rate. Furthermore, ~) is a variable acting as a continuous and a gradual potentiometer from the free VBR mode with ~ = 0 to a strict CBR mode with ~) > 0. This optimal regulation can therefore accommodate any kind of policing functions: the preventive police function can be incorporated in the control algorithm. The value of ~) determines the strength of the control, as a matter of fact, it is a function ~) = f(B, Po) in the sense that 8 increases in uncertainty, with high level of buffer occupancy, and with scenes of high expected bit-rates. In the VBR applications, the locked control mode should normally be privileged against the dual control acting only as a fall-back function when the predicted quantized image bit-rate overcomes the policed rate. In CBR applications, the locked control finds its meaning when encoding still pictures and highly predictable scenes. The consign Br has to be adapted as an inverse function of the likely complexity of the next coming scene (dependent on the program Statistics). Both parameters ~)and Br have to be imposed in order to maximize the mean image quality or to minimize the expected cost of the cumulated U values over the whole horizon V(n) = E[l/n ~,i-"t U(n)].
:5. Regulators and optimum controllers
The purpose of this section is to outline a comparative study between regulators and controllers. The regulator studied in this section is a transfer function known as a product-integral-derivative (PID) and presented in [8]. An example of response of the regulator and the controller algorithms is depicted and compared in Fig. 9 for a drastic change of quality level. The processed source originates from a database of live TV. The PID regulator is not an optimum filter and therefore fluctuations arising from the randomness present in the input stripe bit-rate remain in the control parameter U. In fact, the optimum controller can achieve maximally flat commands (i.e. with constant values of the transmission factor over the stationary periods) and produces smooth and graceful evolutions of the transmission factor whenever it is required by an inevitable long term variation in the density of the source information. As a consequence, the control produced by an optimum scheme has the form of a succession of flat areas along periods of rather uniform image complexity (the scenes) with sigmoid change in between the flat levels. In the other case, the PID is
COMPARISON PID-OPTIMUM CONTROL
Ii t I
25
'
I
II I '
~
T'
I
Jt I it
'-
I
11
I
,J~'
|
11" | I
i"'tl
~1
II
I iI
20
'
,I
J
ii
-
' I ~ II
';
,
;,,I
[
i-i
I
I
t I
tI
~ts
I
ii
,I
lo
(1): PID-regulator response 5
(2) : optimum-controller response
~I I
I
I
50
100
150
I
I
I
200 250 300 TIME [STRIPE]
I
I
I
350
400
450
500
Fig. 9. Response of transmission factor with PID regulator and optimum control
44
J.-P. Leduc / Signal Proce.~'sing: hnage Cmnmunication 6 (1994) 25-45
a suboptimum solution throughout. It requires the adaptation of its parameters when changing the source format, of channel constraints of bit-rates and of transmission mode (CBR-VBR). The PID needs moreover a complete study of the non-linear system behaviour to overcome any occurrence of overflow. Nevertheless, it is worth noting that a PID algorithm is extremely simple to implement and that it achieves the main basic goal of controlling both level of buffer occupancy and quantization step size. Let us remark that it is always possible to add an algorithm computing the PID parameters in real time according to any cost function to achieve a more flexible suboptimum solution.
6. Conclusions Several codec control algorithms have been investigated so far in the literature. Among them, the adjunction ofa PiD regulator appears to be the simplest solution which enables to cope with TV sources and introduces just enough capabilities to deal with non-stationary input signals. Its main advantage is the simplicity of the hardware implementation. Nevertheless, this solution suffers from some remaining drawbacks: e.g. it does not implement an optimum balance between the buffer regulation and the uniform quality; and it is not fully adaptive in the sense that each different application needs an intrinsic set of new PID parameters. Therefore, an optimum controller has been proposed in this paper as one of the most advanced solution. It offers most of the properties expected by a regulation in charge of controlling a non-stationary signal. Moreover, it has been demonstrated to be a solution resulting in the optimum balance of all the properties: the learning of the statistical parameters; the caution d 1ring the uncertainty; the regulation during certainty; the freeze during foreseeable periods of still or slow moving pictures. No matter whether a CBR application or a VBR application is involved, the controller builds the adequate optimum strategy with the same algorithm. Owing to the predictor, the buffer is used more efficiently and, as a consequence, the transient responses are better managed. The scheme proposed in this paper presents a whole palette of implementable versions ranging from the more complicated algorithm proposed for TV formats with N = 72 steps. By subsampling and pruning the coding cost graph (a set of rules can be designed), the search towards the optimum path can be implemented using the present advances made in hardware implementation techniques. On the other hand, the scheme is based on an algorithm as general to be open to intelligent techniques of learning image statistics and achieving the optimum. Further researches have still to be performed using neural network techniques and it should be expected to implement predictions and optimum searches in a single intelligent neural network. Moreover, the proposed control scheme can be expanded to the whole codec by optimally adapting all the codec functions (decorrelation, quantization and VLC) to the incoming statistics.
Acknowledgments The author would like to thank the anonymous reviewers for suggesting significant improvements in the manuscript.
Appendix A. Estimation of the auto-regressive coefficients In this appendix, the LMS algorithm used in the simulations [3, p. 65] to estimate the auto-regressive coefficients will be outlined to support the explanations given in Fig. 6. A non-stationary one-dimensional auto-regressive filter can be written in vector formulation as .~(t) = ~ ( t -- I)T()(t),
(37)
J.-P. Leduc / Signal Processing: Image Communication 6 (1994) 25 ,0~
45
where - O{t)
is the vector of the autoregressive coefficients. Referring to Eq. (15), (38)
Oft) = [ai, l , a i , 2 . . . . . a i . n , b i ] r
stands for a filter of order n. The symbol T stands for the matrix transposition. - O(t - 1) is the vector of the past observed values of the process and the prediction noise. Back to Eq. (15), ~(t - 1)= [ x l , x z . . . . . x,,e,i].
(39)
The estimation of the coefficients in (9 has been performed by the following least-mean-square algorithm: 2)4b(t - 1) 1)rP(t - 2 ) ~ ( t - 1 ) [ y ( t ) - q~(t-
P(t -
O(t) = (~(t- 1)+ 1 + ~ ( t -
I)T0(t- 1)]
(40)
with P(t-
l)=P(t-2)+
P(t-
2 ) ~ ( t - l)~b(t- l ) + P ( t - 2)
I + ¢~(t -- 1)TP(F --
2)~(t
-
1)
, P(-I) >0
{41)
The algorithmic gain is regularly revitalized and is prevented from numerically going to zero by covariance (matrix P) resetting every 15 iterations. Let us remark so far that numerous versions of the LMS algorithms exist in the literature and, among them, the normalized LMS proposed in [3] is worth being mentioned.
References [I] [2] [3] [4] [5] [6] [7j [8] [9] [lO]
[11] [12] [13]
[14]
A.E. Albert and L.A. Gardner, Stochastic Approximation and Non-linear Regression, MIT Press, Cambridge, MA, 1967. E.W. Dijkstra, "A note on two problems in connection with graphs", Numer. Math., Vol. I, 1959, pp. 269-271. G.C. Goodwin and K.S. Sin, Adaptive Filterin.q Prediction and Control, Prentice-Hail, Englewood Cliffs, N J, 1984. Y. Grenier, "Time-dependent ARMA modeling of nonstationary signal", IEEE Trans. Acoust. Speech Signal Process., Vol. 31, No. 4, August 1983. S. Haykin, Adaptire Filter Theory, Prentice-Hall, Englewood Cliffs, N J, 1992. J. Hertz, A. Krogh and R. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. J.-P. Ledu¢, ' .~uapuv~ . . . . . . . . . optimum ,,,,,,,.v.~ of ,I,,~,,,,;,~,,g..,,,.,,. .,t. . . . .llh,, . f,w .... digi!0.! . . . . . . . . .TV . and HDTV codecs". IEEE Beneiux and Prorisc on Circuits, Systems and Siggnal Processing, Houtalen, 8--9 April 1992, 9 pages. J.-P. Leduc, "Buffer regulation for universal video codec in the ATM Belgi~n broadband experiment", Proc. Third Internat. Workshop on Packet VTdeo, Morristown, USA, March 1990, Session c, 6 pages. J.-P. Leduc, "Complete bit rate models of digital TV and HDTV codecs for transmission on ATM networks", SPIE's Visual Communications and lma.qe Processingg 1992, Boston, 2-5 November 1992, Vol. il, pp. 849-860. J.-P. Leduc, Digital television and high-definition television: Coding algorithms and transmiasion on ATM networks, Ph.D. Thesis, Universit6 Catholiqu¢ de Louvain, Belgium, March 1993. J.-P. Leduc and S. D'Agostino, "Universal VBR videocodecs for ATM networks in the Belgian broadband experiment", Si.qnal Processingg: lma.qe Communication, Vol. 3, Nos. 2-3, June 1991, pp. 157-165. B. Widrow and S.D. Stearns, Adaptive Siggnal Processingg, Prentice-Hall, Englewood Cliffs, N J, 1985. B. Widrow and E. Walach, "'On the statistical efficiency of the LMS algorithm with nonstationary inputs", IEEE Trans. h![orm. Theory, Vol. IT-30, No. 2, March 1984. B. Widrow, J.M. McCool, M.C. Larimore and C.R. Johnson, "Stationary and nonstationary learning characteristics of the LM2 adaptive filter", Proc. IEEE, Vol. 64, No. 8, August 1976.