Copyright© IfAC Programmable Devices and Embedded Systems Bmo, Czech Republic, 2006
L'
I()C ~~
Publications
REAL-TIME IMPLEMENTATION OF MOTION DETECTION ALGORITHM BASED ON PIXELSTREAMS Miroslaw JablOllski, Jar-omil" Przybylo, Mar'ek Gorgori
AGH lfniversifyofScienee and Technologv Biocybemetic Laboratory. Deparlmenl ofAlllomalies, AI. Afiekiewieza 30. 30-059 Krakow. Poland
[email protected]. przybylo@aghedllpl.
[email protected]
Abstract: In this paper we present the Sum of Absolute Difference algorithm (SAD) implementation on FPUA device. The SAD operation is frequently used by number of algorithms for digital motion estimation. The paper presents SAD implementation in JIandel-C and its evaluation to the PixelStreams module. We shortly describe new design environment PixelStreams. recently introduced by Celoxica, and our experience gained during exploration of the software. The algorithm has been applied to real-time video stream coming from standard PAL camera. Copyrigllll!) 2006lFAC Keywords: Image Processing. Computer Vision. Complex Systems. Hardware, Computing Systcms. Computer Control.
1. INTRODUCTION
2.
Among the expanding directions of modem technologies applied m the design of reprogrammable devices. are the graphical interfaces used for design works on high abstraction level. In this paper the authors present their experience gained during the transfer of the SAD motion detection algorithm from the Matlab/Simulink environment to the PixclStream environment. There is a lack of publ ications concerning the new PixelStream cnvironmcnt. introduccd by thc Ccloxica company (2005). Therefore in Section 3.1 a short description of its 1'catures is given. On the other hand the details of the SAD motion detection algorithm arc described in the paper by Koga (1981). It contains the classical methods uscd in c.g. in thc motion detection in the MPEG video compression (Koga. 1981) (Wang and Ostcrmann. 2002). In the papcrs (Vassiliadis S .. elal 1998; Wong el al. , 2002a; Wong el al .. 2002b) the algorithm implementations in reconfigurable devices are presented. The new aspect in the present paper is the implementation of the SAD algorithm as a new lihrary element of the PixelStream environment, what allows realization of motion detection in real time, on a live video stream. coming from a TV video camera. working in the PAl. standard.
SAD ALGORITM
3.1. SadAlgorillrm description As a basic algorithm we used a motion detection method hased on the sum of ahsolute differences (SAD), proposed by Mathworks, 2005. Motion detection algorithm applies the SAD method indepcndcntly to lour non-ovcrlapping quadrants of a video sequence. If motion is detected in a given quadrant. that is SAD value is greater than threshold, this quadrant is highlighted and the dctection cvcnt is signaled. The outline of the algorithm, is depicted on the Fig. 1. The original algorithm has been implemented in Simulink, using single and double precision arithmetic. An input signal (video scqucncc) is delaycd by onc framc, forming a reference image for SAD. Input and template images arc split into four parts (quadrants) , and passcd to SAD block. SAD block returns four element vector of SAD values (motion energy) wmputed for each of the quadrants respecti vely. Also. for debug purposes, the image dillcrenee bclween current iramc and reference is returned . Motion energy vector is compared with motion threshold and passed to visualization block . When motion energy is greater
186
SAD
Templ_tl '" qUldrnt}
Im .
ourf'''"' ' '-'_ =;;;-;=;;;-__-.j
Fig. I. A Block diagram of SAD algorithm
2.2 Preparing the referellce algorithm for
then the threshold a detection event is signaled with corresponding quadrant. The SAD algorithm is used as a tcmplate matching metric for motion estimation in many applications. for example: MPEG video compression (Koga, 1981; Wang and Ostermann. 2002). Performing the SAD operation could be time-consuming, because the SAD operation is usually computed for large search area and involves a high number of templates. There are many approaches to acceleration of SAD using hardware implementation. Wong et al. (2002a) investigated several hardware implemcntations of the SAD operation and mapped the most promising one in FPGA. They have established that the SAD operation can be divided into two stages, namely absolute value and sum. For each stage. several implementation alternatives can be identified. Based on the expected speed and area estimates they have selected to implement the SAD operation utilizing a carry generator in the absolute stage and an adder tree in the sum stage. For the implementation they used VHDL language.
hardware implemenlalion.
The original , software vcrsion of the algorithm employs the calculation in floating point fonnat. The realisation of the Iloating point arithmetic in reprogrammable structures is a difficult task , requiring considcrable rcsourccs. The discusscd algorithm allows the application of integer arithmetic, "ithout any loss in the calculation quality. Due to this the hardware realisation may be veT)' -effective - fast and non-demanding, when the employed FPGA resources arc conccrned. In order to implement the motion detection algorithm on a hardware platform using Handel-C , the basic algorithm has to be adjusted to integer (or fixedpoint) arithmetic. First we have analyzed the data range required on each step of the algorithm . Assuming maximum input image size as rM ,N] pixcls of unsigncd 8-bit numbers, thc required data range for SAD stages can be determined: For computing difference between image and tcmplate pixcls, ncgative values has to be considered. If both operands arc unsigned 8-bit numbers the dil1crence operation requires 9-bits to reprcscnt the same range of positivc valucs. This computations can be improved to use only 8-bits using carry generator proposed by Wong , et al. (2002a, 2002b). In this case boundary check can be skipped. In the sum stage the maximum SAD value for image dil1'erenee can be determined by the following equation:
The equation for the two-dimensional discrete SAD is following : (Mtl )(Nt l )
C(j,k) =
L
Lahs(T(m+ j ,n+k)-T(m.n»
m-Q
n=O
(1)
where: OS,j
l-inputimagel},Ji x Ni] , T -templatelMt x Ntl A direct approach in computation of the SAD consists of the following stages: compute difference between image and template pixdsI, -~ ,
MaxSAD =M *N * 255
The requircd numbcr of bits can be computed as:
determine which I, - T, are less than zero and produce in that case the
~
(boundary check), perform the accumulate absolutc valucs.
- 1, as absolute value operation
for
(2)
N,.()jHits = ceil(max(Jog2 (MaxSAf)) )) (3)
all
We have r~-implemented original algorithm using integer arithmetic adjusted to maximum input image size or 512x512 pixels.
187
Assumed image size results in the following data representation: absolute difference of frames - 9-bits integer numbers, SAD accumulator - 27-bits integer numbers, motion energy vector and comparator - same as SAD accumulator, other parts of algorithm unsigned R-hit integers.
3. PIXELSTREAMS IMPLEMENTATION OF THE SAD ALGORITHM
image analysis: image pixel statistics, object indexing. The list of supported operations may be exte;;nde;;d by the user by creation of additional library elements. There is also a possibility of construction of new modules from the existing components. An essential advantage of the lihrary is the possihility of manual and automated parameterisation or the components, Due to this the application is easily scalable, and the elements can be re-used without any change in the implementation.
3.2. Hardware implemelllUtion of motion detection 3.1
Fixe/Streams Library
PixelStreams is a lihrary of parametrizahle IF modulcs, dcdicated for implementation of image processing systems on hardware platforms. Components of the PixelStreams library - operations and filters , shown graphically as huildi ng hlocks are created in Handel-C language, and are dedicated for use in PDK (platform Development Kit) package, by Celoxica company. The construction of video processing application consists ofselting the network of connections, fonned by the streams in which the data arc transmitted: pixc1s togcther with their coordinates and synchronisation information. The idea of data processing is based on the concurrent processing of data streams by individual instances of building blocks. The operation of each module is synchronised by the data transported in the streams. Each clock cycle corresponds to transmission of one data pixel. The environment can work with several widely applicd data formats likc RGB, YC,Cb and signcd 16-bit fonnat for increased calculation precision. It also supports thc intcrlcavcd and non-intcrleaved picture organisation in thc TV and VGA standards. In addition to platform-independent building blocks used for image processing, there are several blocks available, which arc dedicated for specilic targct boards. Thcy are rcsponsible lor the image acquisition, data huffering in the RAM memory and visualisation. Parallelism olIcrcd by the Handcl-C language, modularity, data flow and synchronisation make it a universal tool for creation of e;;ffe;;ctive, real-time video systems. Thc real-timc opcration parameters are determined hy the source of the video signal. Fulfilment of the real-time operation critcrion results dire;;ctly from the applied model of data transmission and the method of blocks synchronisation.
The algorithm realised on the hardware platfonn does not differ from software version in sense of not functionality. Porting the algorithm from the software platform to FPGA requires the adjustments of the input image acquisition and the presentation of results. In the PixelStream environment dedicated library components are responsible for handling the inputs and outputs. There are essential differences in the execution of the calculation sequence. In the so/lware application, realised in Simulink, the elementary data-quantum is the whole imagc framc, what offers free access to all image data . Pipeline nature of the video signal and PixclSlrcam library impose a synchronism. Thus, the calculations need to be realized in a precisely determined sequence, for consecutive data portions, given by individual pixels. Because of the PAL video transmission format applied in the system (each image frame consists of two consecutive fields). it is necessary to employ four concurrently running accumulators of the unit dctcrmining the statistics of the dillerence image. Each of the accumulators sums the dillcrence ol'values lor pixcls with coordinatcs belonging to a given image area, according to the;; e;;quation (I). The;; last stage of the calculations is the thresholding of the detennined motion energy values. Sueh operation is triggered lor cach of the accumulators by thc llag rcpresenting the end of each frame. The calculation are carried out in cycles lor each of the two half-lrames. The behaviour described ahove has been coded in HandclC language by the authors and encapsulated in the PxsSAD block (Fig. 2).
~:,-----,
The library provides elements realising the basic operations for image processing as well as elements covcring its further analysis . coordinate transforms, affine transforms, translations, rotations, scaling; non-contcxt operation: lormat conversions , arithmetic operations (single and multi argumcnt), LUT translormation, thresholding; context operations: convolution, morphological operations;
" fig. 2. I1lock diagram of the PxsMD motion detection module The remaining elements or the motion detection module, related to the buffering, distribution and synchronisation of video streams. The PxsSpli t3
188
block is used for multiplication of the video stream. The PxsPALIRamFrameBuffer delays the image pixel transmission by one frame period. The stream of delayed pixels and current data is synchronised by PxsSynchronise components. Presented network (Fig. 2) of standard PixelsStreams blocks and custom PxsSAD module has been used to create additional component PxsMD . The outputs of designed component are binary signals for sector activity and differential image.
The motion dclection module computes one pixel per single clock cycle. A frequency of 65 MIIz is applied to the whole system. The frequency exceeds requirements for PAL video processing however it is necessary for proper operation of the display controller.
3.3 Motion detection - PixelStream Application fig. 3. Video line
The application running in real-time is driven by PAL video signal(Fig. 3). Complete solution (Fig. 4) contains the component designed and implemented by the authors (PxsMD) as well as other standard modules taken from Pixel Streams library. These are necessary for data acquisition and visualisation. RC300E reconfigurable platform, has been chosen for implementation. The board oilers wide range of vidco inputs and outputs as wcll as built-in TFT screen. The display is used for visualisation of differential images with a mesh of detection fields overlaid. Additionally the current contents of the accumulators and the number of active pixcls in the individual fields are presented in the remaining display area. For visual notification of the motion detection in a given image sector a white square is used, flashing in the upper left corner of each field. The binarisation threshold is obtained from a register implemented in the FPGA. device. The value of the threshold can be modified online from a supervising computer using the USB interface.
B"l~l'
P)(sConvert
PxsVGASyncGen
~, ~l"
~l"
0",
PxsCli~ectangle
PxsAectongle
Oul
Out
Implementation of the motion detection application has been realised in the FPGA XC2V(,OOO device . Table I shows the usage of systcm rcsources. Additionally two of the four external ZBT memory static banks present in the RC300 board arc used.
Table I Implementation results Resources of FPGA device XC2V6000 Number of occupied Slices: Number ofhonded JORs: Number of Block RAMs Numher ofhonded JOBs Total equivalent gak count
OUt
~l'
PxsMD
O~
~ln
~l"
p,,u nclpO,ndllank Out ~l'
PKsAeclangle Out
PxsConsole
Fig. 4. Motion detection
HI,
PlC$PIJIPL 1AAMFrameEh.lfer Cv<
CoordIn
H'"
PxsScalePower2 OUt
3.41mplemelltatioll re~1"ts
OUI
~l"
~I"
~l"
OUI
8
5~'o 23%~
11%
O"'b
P"G'jd""'~
0"'0 ~l"
PixelStream application diagram
189
% of the device 11% 23%
194 649,799
P>
Pl<,R.ctongle
PxsCon....ert
PxsFIFO
Used for the design 3.465 194
Pl
4. CONCLUSIONS A Handel-C implementation has been presented for the SAD motion detection algorithm The algorithm has been transferred from a floating point version to a signed fixed-point version. In the HandeJ-C code a library module has been created in the PixelStream standard, calculating the difference image for four sectors of the image trame and realising the motion detection for each of the sectors. A complete video line has been set up. based on the PixC;]Stream li brary components. The application has been tested on the RC300 board equipped with the XC2V6000 device. by analysing the data strcam from a camera working in the PAL standard.
ACKNOWLEDGEMENTS This work was supported by the Polish State Committee for Scientific Research. as a research Project No. 4TlIC01725. We wish to thank the Celoxica University Program and Xilinx University Program for software donation
REFERENCES Celoxica (2005). The Pixel Stream Manual, http://www.celoxica.com!cup!regi stered _ users/ manuals/default.asp, Celoxica Limited. Koga, T. , el al. (1981). Motion-compensated interframe coding for video conferencing. In: Nal. Telecommun. Conf , G53.1-5, New Orleans. LA. The Mathworks http ://www.mathworks .com. Wang Y .. 1. Ostermann and Y.-Q. 7.hang (2002). I 'ideo
Processing
and
Cummunicatiulls,
Upper Saddle River. NJ: Prentice IIalL Wong S. , B . Stougie, and S. Cotofana (2002) . Alternatives 111 FPGA-based FPGA SAD Implementations . In: Proceeding' of the Isl Ih;Hr: International Conference on Fie/dProgrammable Technology (Fl'T2002). pp. 449452. Hong Kong SAR. China. Wong S., B. Stougie and S. Cotofana (2002). An Investigation on FPGA based SA]) Hardware Implementations. In: ?rnceedil/gs of the 13th Allllllal Wurkshup un Circuils. S1'slems, and Signal Processing (pRORlSC2002) , pp. 568-573, Vddhovcn, The Netherlands. Vassiliadis S .. et al (1998). The Sum Absolute DitTerence Motion Estimation Accelerator. In: Proceedings oJ lhe 241h J::uromicru Cunference, pp.559-566, Vasteras, Sweden.
190