H.264 medical video compression for telemedicine: A performance analysis

H.264 medical video compression for telemedicine: A performance analysis

JID:IRBM AID:384 /FLA [m5+; v1.214; Prn:6/11/2015; 13:19] P.1 (1-9) Disponible en ligne sur ScienceDirect www.sciencedirect.com IRBM ••• (••••) •••...

1MB Sizes 20 Downloads 160 Views

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.1 (1-9)

Disponible en ligne sur

ScienceDirect www.sciencedirect.com IRBM ••• (••••) •••–•••

Medical image analysis CAD

H.264 medical video compression for telemedicine: A performance analysis A. Chaabouni a,∗ , Y. Gaudeau b,c , J. Lambert a , J.-M. Moureaux a , P. Gallet d a Université de Lorraine, CRAN, UMR 7039, 9 Avenue de la Foret de Haye, Vandoeuvre-lès-Nancy, 54500, France b Université de Strasbourg, 30 Rue du Maire Andre Traband, Haguenau, 67500, France c CRAN, UMR 7039, France d CHRU Nancy – Institut Lorrain du Coeur et des Vaisseaux Louis Mathieu, 5 Rue du Morvan, 54500 Vandœuvre-lès-Nancy, France

Received 30 March 2015; received in revised form 7 July 2015; accepted 29 September 2015

Abstract Today, lossy compression is becoming increasingly important for storage and transmission, of great amounts of numerical data, especially in the telemedicine field. However, the loss of information involved by this kind of compression can be considered as risky in particular for medical applications like diagnosis. To minimize this risk a balance should be done between compression efficiency and experts’ perceived quality by running subjective quality assessment for compressed medical data. In this study, we address this issue and we deal with this problem by determining H.264 compression bitrate thresholds for full HD otolaryngology medical sequences. We show that this type of videos could be lossy encoded up to a ratio threshold ranging from 100:1 to 270:1 with maintaining practitioners’ satisfaction. Besides, objective results showed us, that the use of quality assessment algorithms such as MSE, NIQE, NQM, SSIM, MSSIM and BRISQUE, could be helpful to compute realistic compression ratios and could, thus, validate human perception. These results are finally improved by a preliminary study, in which we used the new video encoding standard HEVC. It seems to be more efficient than H.264 in terms of objective video quality. © 2015 AGBM. Published by Elsevier Masson SAS. All rights reserved. Keywords: Objective and subjective quality assessment; H.264 encoding standard; HEVC; Biomedical image processing

1. Introduction Nowadays, doctors increasingly need to share information remotely among different sites such as their office, hospitals and patient’s home. Consequently, the demand of high quality medical data transmission and storage has become more and more important. To face this need, new digital technologies offer the possibilities not only to save time and money for hospitals and practitioners, but can also contribute to improve care for patients, especially for those living far from reference hospitals. * Corresponding author.

E-mail addresses: [email protected] (A. Chaabouni), [email protected] (Y. Gaudeau), [email protected] (J. Lambert), [email protected] (J.-M. Moureaux), [email protected] (P. Gallet). http://dx.doi.org/10.1016/j.irbm.2015.09.007 1959-0318/© 2015 AGBM. Published by Elsevier Masson SAS. All rights reserved.

Furthermore, it is known that the opportunity to share medical data among several remote experts can improve both diagnosis and therapeutic care, especially for difficult cases. However, the transmission and the storage of medical data, especially medical video such as endoscopic and microscopic streams, require very large network bandwidth, hardware and software resources. As a consequence, it seems unavoidable to compress these data before starting to transmit them. Until now, the different medical applications used lossless compression algorithms since they ensure to preserve medical data integrity. Unfortunately, such a compression does not provide a significant reduction in the volume of these data and this is not sufficient in most of applications. For several years, the tolerance to lossy compression in medical applications has been shown in several works [1–3], concerning essentially still images. Moreover, due

JID:IRBM AID:384 /FLA

2

[m5+; v1.214; Prn:6/11/2015; 13:19] P.2 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

to the great video size, original videos are strained to be lossy compression tolerant thanks to both spatial and temporal masking effect of the HVS (human visual system). Thus, we propose to study the lossy video compression effect in medical context. In order to make the balance between compression efficiency and practitioners’ perceived quality, it is important to conduct subjective tests dedicated to quality assessment, in other words, to estimate the impact of compression on the decoded video with respect to the usage. In this study, we try to examine the issue of medical video quality assessment under the angle of both subjective tests and objective metrics. The work presented in this paper is a part of the European Celtic project HIPERMED (HIgh PERformance teleMEDicine platform), winner of the EUREKA INNOVATION gold award in November 2014. In fact, HIPERMED is designing an open telemedicine platform based on a unified Service Oriented Architecture (SOA) providing media services, Session Initiation Protocol (SIP) based control plane services and network services over the Internet Protocol (IP). The platform supports remote consulting, remote teaching and distributed consensus decision-making on treatment programs. This project designs, implements and demonstrates integration of high definition videoconferencing, video streams from endoscopes and other medical instruments, stereoscopic video streams providing true depth perception and sharing of highresolution digital images. Based on this architecture and the different medical services, 6 scenarios are developed in the HIPERMED platform. Scenario 1 and 2 aims to achieve a global and comprehensive tele-rehabilitation program for knee and gait rehabilitation. Scenario 3 is called remote consultation. It covers aspects of collaboration between professionals (here, doctors specialized in otolaryngology). Scenario 4, called speech training, is a patient to professional scenario, allowing to ease therapies for patients with speech disorders. The scenario 5 describes a professional to professional emergency scenario that allows a regional hospital with lack of specialized clinical services to connect with a reference hospital which provides an urgent remote diagnostic and a remote follow-up while burned patient recovers. Finally, the scenario 6 represents a process of remote rehabilitation for chronic patients suffering cardio respiratory problems. The work proposed here is related to quality assessment in the framework of video transmission designed for the different healthcare scenarios of HIPERMED, especially the scenario 3 called “remote consultation” represented in Fig. 1. In this scenario, one doctor is located in the so-called “regional hospital” (here, the Nancy University Hospital in France, or the Poznan University Hospital in Poland). He contacts one or several specialists (in endoscopic surgery), in the other country. The called surgeon, from the “reference hospital” either gives support or second opinion about a difficult or specific case. This scenario can be derived in three sub-scenarios: an “emergency consultation” where surgery is processed by a surgeon who needs an advise from a distant expert in real-time to face a difficult situation, a “planned consultation” called a multidisciplinary consultation meeting where doctors from different remote sites can discuss about difficult cases and a “remote lecture” sce-

Fig. 1. Remote consultation scenario.

nario, offering the possibility to students or doctors to assist to a live remote surgery. To perform all these scenarios, the HIPERMED platform allows to establish the remote connection between doctors, including the possibility to manage the level of quality of the audio and video sent from one location to the other within a limited network condition. The native video sequences to be transmitted are full HD 1080p videos (1920 × 1080 – 1080p60 – 4:2:2 – 8 bits). We show in this study that they can be compressed with a substantial compression ratio using H.264-AVC codec with no impact on their medical usage. This paper is organized as follows: all settings on the equipment and methods for the subjective quality study are presented in Section 2. A presentation of the objective quality assessment implementation is given in Section 3. Section 4 is devoted to the analysis of H.264 encoding results and evaluation. We improve these results in Section 5 where we introduce a preliminary study using HEVC encoding standard. Finally, we conclude and present the future works in Section 6. 2. Material and methods 2.1. Subjective tests presentation To evaluate the impact of post processing on medical video, especially for sensitive applications such as diagnosis or surgery, it is essential to perform subjective tests with a panel of experts. Here, we propose to follow the ITU-BT.500-13 (from the International Telecommunication Union) [4] protocol, which provides methodologies for the assessment of picture and video quality including general methods of test, the grading scales and the viewing conditions. Based on this standard, it is recommended to perform the double-stimulus continuous quality-scale (DSCQS) [4] assessment of the quality method, using a continuous scale like shown in Fig. 2. During about 1 hour, doctors should assess the video sequence, comparing the video sequence to the reference video sequence and issue quality scores from 0 (Bad) to 1 (Excellent). Tests should be effected in a normalized environment

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.3 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

3

Table 1 HIPERMED H.264 encoding configuration.

Fig. 2. Continuous scale of video quality.

(room temperature controlled, suitable brightness and less ambient noise) to be well focused. Doctors can also take a rest at about 10 minutes in the middle of the wall test. 2.2. H.264 encoding standard To follow DSCQS method, the quality assessment test requires the availability of a video reference to compare it to the corresponding encoded video. In our study, we made some acquisitions of original endoscopic streams (without encoding process) from a real ENT (Ear, Nose and Throat) surgery made in Nancy University hospital. Those videos are related to surgery, an application identified as one of the most critical, as far as quality is concerned. Then, we made our choice of subjective test sequences after consulting our ENT expert to have the most important medical ENT acts. In addition, we need to include as many bitrates as possible. Taking into account these parameters and the subjective test conditions such as the recommended test duration, we chose to study 4 original sequences of 10 seconds, all en-

Parameter

Pixel format

Resolution

Frequency

Value Parameter Value

uyvy422 Codec ×264

1920 × 1080 Latency Zerolatency

60 Buffer size 2

coded at 1.99 Gbits/s with a full HD resolution (1920 × 1080 – 1080p60 – 4:2:2 – 8 bits). These videos were selected as medical imaging reference sequences for HEVC development by the Joint Collaborative Team on Video Coding (JCT-VC) [5]. They are represented in Fig. 3. This type of ENT medical data was shared remotely in real time between medical doctors through the HIPERMED platform. For example, we performed a remote consultation between a regional hospital, where a surgeon asks for help and one or more reference medical doctors, to help him by receiving the endoscopic videos in live. To apply this scenario, these videos were H.264 encoded according to the configuration shown in Table 1. The choice of H.264 standard is due to its performance improved in video encoding, providing more efficiency and flexibility, than previous standards, for different applications in a very wide variety of network environments, systems and transport protocols. In fact, H.264 is jointly developed by ITU-T and

Fig. 3. Original ENT endoscopic video sequences denoted 1 to 4 clockwise from the top left (original bitrate: 1.99 Gbits/s).

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.4 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

4

Fig. 4. GUI notation application on tablet (iPad).

ISO/IEC and it is the product of a partnership effort known as the Joint Video Team (JVT). It offers more flexible block partitioning for the motion estimation (16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, 4 × 4), a greater number of reference images in GOP structure. It uses the motion compensation with a precision up to quarter-pixel resolution and two entropy encoders (CABAC and CAVLC) [6]. The 4 original test sequences were thus H.264 encoded at different rates, preparing upcoming subjective tests. 2.3. Quality assessment tools To perform subjective tests, we used the “living lab” PROMETEE PeRceptiOn utilisateur pour les usages du MultimÉdia dans les applications mÉdicalEs (User perception for multimedia medical usages). The platform is located in Telecom Nancy engineering school, University of Lorraine (France). It is an innovation platform allowing to study and well manage the technical quality of videos with respect to the medical usage. This platform, well equipped and arranged, provides a highly efficient environment to comply with the general viewing conditions for subjective assessments in such laboratory environment fixed by the ITU-BT.500-13 recommendation [4]. Thus, subjective rating sessions have been conducted in this “living

lab”, where doctors were watching medical videos on a standard screen 41 and gave notes to each video (original and encoded sequence) on a tablet. Practically, they have a digital tool on this tablet in order to assess quality of each video (see Fig. 4). In this figure, sequences A and B represent a pair of compressed and original videos. The order is randomly precessed. In fact, only medical doctors can assess and validate the medical usage of these compressed videos. Their expectations are to be able to watch these compressed video sequences while performing the surgery with the same confidence they have in the usual non-compressed videos. Thus, their assessment is crucial before using compressed videos. The subjective test session takes about 1 hour during which the observers note the 4 sequences compressed with 11 different H.264 compression ratios. Between each two sequences, we put a grey-screen as we can see in Fig. 5. At the beginning of the session, each participant notes 2 fictitious videos to stabilize the judgment of observers. These notes are not taken into account in the analysis. In addition, some sequences may be twice submitted to verify the judgment of observers. At the end of the scoring phase a questionnaire allows us to identify information of each participant (age, specialty, experience...) in order to better characterize the sample of observers [1]. 2.4. Analysis of the database observers As recommended in the standard ITU-BT.500-13 [4], some sequences have been doubled and randomized. They allow us to make an initial assessment on the consistency of observers and, if the same person during the same meeting emits too different notations for the same sequence, he/she will be rejected. A second test, called Beta2 [4], is then performed for the remaining people in order to normalize their ability to answer coherently with respect to the entire panel. This test analyzes the distribution of observer’s scores and if it turns out that an observer answers systematically differently than the panel, he will also be rejected. Once these steps ended, we have a database containing the ratings of observers consistent judged. That will allow us to define a MOS (Mean Opinion Score) [1], representing for each video sequence the average score of observers. MOS is given by: Nobs 1  uj k = uij k Nobs

Fig. 5. Structure of a test session.

i=1

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.5 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

where Nobs is the number of observers and uij k is the note of the observer i corresponding to the H.264 compression ratio j of the video sequence k. Actually, uij k represents the difference between the note given to the original video and the note given to the encoded one. Note that the MOS can be a negative value. In fact, uij k corresponds to the difference between the original and the compressed video subjective note. Thus, the lower the MOS is, the better the video quality is. This average opinion score is the unit of subjective perception of quality obtained for a panel of observers who have realized a strictly identical test. We consider that this note is the MOS of the perception of more reliable quality. 3. Objective quality assessment Despite the relevance of subjective tests in the quality assessment of the medical videos, this method is still considered very expensive, consuming time and human resources. One alternative could be the use of an appropriate objective metrics, which should be highly correlated to the human perception, to the medical expert perception in our case. The first most common metrics are PSNR (Peak Signal to Noise Ratio) and MSE (mean squared error) based on pixel per pixel comparison. They are used to measure image similarity using simple mathematical comparison between the reference image and the degraded one. Besides, one of the most known and studied image quality metrics is SSIM [7]. The Structural SIMilarity index is based on the structural information extracted from the stimuli. It is considered that this information contains the distortion perceived changes. As an extension of the SSIM, MSSSIM [8] is developed to a multi-scale elaboration of the two images (the original and the compressed one). In addition, the Universal Quality Index UQI [9] considers the distortion as a combination of luminance distortion, contrast distortion and a loss of correlation between the original image and encoded image. Moreover, other metrics calculate the similarity of two images rather than human perceived image quality, based on Natural Scene Statistics (NSS) such as Information Fidelity Criterion IFC [10], the Visual Information Fidelity VIF and VIFP [11]. Research conducted recently has led to the establishment of psychovisual based tools that have helped to better understand the behavior of the HVS and refine associated models. In fact, high efficient metrics for objective quality assessment have emerged in recent years like PSNR-HVS [12], PSNR-HVS-M [12] and HDR-VDP [13]. Most of the above cited metrics are available in Matlab Metrix Mux library [14]. Some recent efficient metrics have been added to this library like PSNR-HVS [12] and PSNRHVS-M [12] which attempt to model the HVS. Finally, we implemented metrics without reference (the Blind/Referenceless Image Spatial Quality Elevator BRISQUE [15] and the Natural Image Quality Evaluator NIQE [16]), which allow to define the presence of compression artifacts as the traditional “blocking effect” related to the implementation of the DCT (discrete cosine transform). The reader can refer to reference [17] for more details on all of these metrics. In this study, we compare a set

5

Table 2 Features observers of our study. Experts are in bold.

Observer 1 Observer 2 Observer 3 Observer 4 Observer 5 Observer 6 Observer 7 Observer 8 Observer 9 Observer 10 Observer 11 Observer 12 Observer 13 Observer 14

Age

Sex

Specialty

Experience

31 28 51 26 26 29 32 32 35 35 45 27 56 30

F F M M F F M M M F M M M M

ENT ENT ENT ENT ENT ENT ENT ENT ENT ENT ENT ENT ENT ENT

9 4 25 2 1 6 6 10 10 10 15 3 30 2

Fig. 6. MOS vs compression ratio (H.264-CBR mode) – Sequence 1 of Fig. 3.

of objective quality metrics to the participants’ scores collected during subjective tests. 4. Experimental results For this study, we collected a panel of 14 observers from different ENT experiences in medical curriculum (intern, extern, resident, doctor, professor). Table 2 summarizes the characteristics of these observers in terms of age, specialty, sex and years of experience. To measure the subjective quality of the encoded videos, we draw the curve representing the evolution of the MOS with respect to the H.264 compression bitrate. We initially interpolate points (doctors’ subjective notes) with a regression of exponenn tial type e(−ax+b) that is well suited to this type of curve (in our case we set n = 1) as we can see in Fig. 6. Then, we determine the threshold of quality by choosing a MOSmin = +0.1 (10% of the rating scale), a minimum value estimated as a variation that does not alter the technical quality for medical use. In other words, we consider that observers tolerate the medical quality of encoded video when MOS value is less than 10%. Thus, this value allows us to find the minimum H.264 compression bitrate

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.6 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

6

Fig. 7. MOS vs compression ratio (H.264-VBR mode) – Sequence 1 of Fig. 3. Table 3 H.264 compression bitrate thresholds (VBR mode). Sequence

1

2

3

4

Bitrate thresholds (Mbits/s)

9.89

11.45

19.44

7.31

that can be used to encode this type of medical video. Following the different steps, we find 10.64 Mbits/s as a compression bitrate threshold for the sequence 1 of Fig. 3 (see Fig. 6). Note that during HIPERMED real scenario validation, the medical data was effectively transmitted at 4 Mbits/s corresponding to MOS = 0.37 without significant loss in perception by doctors! Thus, fixing MOS = 0.1 as a threshold in terms of quality can be considered as a lower bound. Furthermore, we can still improve the results related to MOS = 0.1 by modifying the H.264 encoding parameters. In fact, the last curve was the result of H.264 encoding using CBR (Constant Bit Rate) method which is less efficient than VBR (Variable Bit Rate) method [18]. In fact, CBR encoding mode consists of the application of a constant bit rate setting over the whole video

sequence. On the contrary, VBR encoding mode adjusts the data rate, assigning a higher bit rate to the more complex image areas and a less bit rate is allocated to less complex ones. Thus, we made an additional study based on cited ENT videos encoded with H.264 standard using the VBR mode. Fig. 7 shows the result for the sequence 1. This curve shows us that we have a better compression bitrate threshold 9.89 Mbits/s for the same observer satisfaction MOS = 0.1. Table 3 summarizes the compression bitrate thresholds found for all four sequences. After having determined the H.264 encoding threshold, we have also measured the correlations between objective metrics, given in Section 3, and the MOS in order to define which of these metrics would be the most suitable for medical video quality assessment studied. For each of them, the Pearson correlation coefficient (LCC Linear correlation coefficient) is calculated indicating the quality of the linear regression. The closer the LCC is to 1, the better correlation between subjective and objective notes is. We also assessed the Spearman coefficient (SROC Spearman rank order correlation), which measures both the monotony of results and the ability of the objective rating to vary in the same direction as the subjective rating measures. We can show the different results of LCC in Table 4. As we can see in Table 4, the best objective and the most effective metrics in terms of correlation with MOS are MSE, NIQE, NQM, SSIM, MSSIM and BRISQUE. In fact, MSE is a simple metric, which calculates the mean square error between the original and the encoded video. It is more likely that doctors are more sensitive to the overall quality of the medical video (video de-noising) than image special structure. NIQE (Naturalness Image Quality Evaluator) seems to be also efficient for this kind of images. We know that human vision is sensitive to the variation of the luminance and the contrast. Thus, we notice the good rank of the NQM (Noise Quality Measure) metrics. Besides, SSIM and MSSIM have good scores because of their structural approach. Finally, the table shows us that BRISQUE is the best metrics in the case of CBR encoding mode as it is more appropriate to measure encoding artifacts such as blocking effect, which are more present in the CBR mode. This is

Table 4 Pearson LCC between objective and subjective measures for different medical video sequences.

SSIM UQI PSNR WSNR VSNR HDRVDP IFC MSE MSSIM NIQE NQM PSNRHVS PSNRHVSM VIF VIFP BRISQUE

SEQ 1 CBR

CBR rank

SEQ 1 VBR

SEQ 2 VBR

SEQ 3 VBR

SEQ 4 VBR

VBR average

VBR rank

0.9094 0.8772 0.7899 0.8191 0.7057 0.8969 0.6014 0.9257 0.9058 0.9659 0.9123 0.7521 0.719 0.7059 0.7351 0.9677

5 8 10 9 15 7 16 3 6 2 4 11 13 14 12 1

0.9088 0.6606 0.8288 0.8668 0.8004 0.6795 0.7178 0.9728 0.9235 0.9755 0.9528 0.8189 0.801 0.7775 0.788 0.9174

0.9459 0.9304 0.8813 0.9072 0.8521 0.919 0.7801 0.9733 0.9443 0.9717 0.9763 0.8758 0.8589 0.8336 0.8474 0.9535

0.9557 0.6287 0.8537 0.8883 0.851 0.9474 0.7793 0.9857 0.9441 0.952 0.9612 0.8658 0.8526 0.8649 0.8464 0.968

0.8863 0.7303 0.8638 0.8995 0.8235 0.9553 0.7352 0.9549 0.9289 0.9205 0.9621 0.8436 0.8251 0.7865 0.8071 0.963

0.9241 0.7375 0.8569 0.8904 0.8317 0.8753 0.7531 0.9716 0.9352 0.9549 0.9631 0.8510 0.8344 0.8156 0.82222 0.95047

6 16 9 7 12 8 15 1 5 3 2 10 11 14 13 4

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.7 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

Fig. 8. MOS vs BRISQUE (H.264-CBR mode) – Sequence 1 of Fig. 3.

7

Fig. 10. PSNR vs Bit-rate (VBR mode) – Sequence 1 of Fig. 3.

cessor to H.264 developed and finalized on January 25, 2013 by the JVT team. This new video compression standard can support the ultrahigh resolutions 4K (3840 × 2160) and 8K (7680 × 4320). It allows the parallel processing by the multi-core architecture. Compared to H.264, it is based on the Coding Tree Unit (CTU), offering macroblocks with larger sizes (16, 32 or 64) and giving, thus, more efficiency and flexibility. Besides, it uses 3 filter types: adaptive deblocking filtering, Sample offset (SAO) and Adaptive Loop Filtering (ALF) unlike H.264 which uses just the deblocking filtering. Finally, its entropy coding only uses the adaptive binary arithmetic coding Context (CABAC) [19]. We try in this study to show the efficiency of the HEVC compared to H.264 by computing objective quality metrics on the 4 mentioned sequences. Fig. 9. BRISQUE vs Bit-rate (H.264-CBR mode) – Sequence 1 of Fig. 3.

confirmed by the curve showing the MOS vs the objective metrics BRISQUE scores in Fig. 8. Therefore these metrics could be used to preliminary determine the range of quality factor thresholds to drive such kind of tests. Furthermore, if the subjective tests are not feasible, the use of this kind of metrics could be helpful to compute realistic compression ratios. For instance, in Fig. 9, if we know that the necessary value of BRISQUE for this type of video is at around 48, we could simply approximate a compression bitrate threshold around 11 Mbits/s. This result is consistent with those obtained by subjective tests. In the next section, we perform a preliminary study by encoding the same medical sequences with the new encoding standard HEVC. 5. HEVC improvements and preliminary study 5.1. The new encoding standard HEVC To still improve H.264 results, we made a preliminary study using the new encoding standard HEVC, considered as a suc-

5.2. Preliminary study To make difference between these two encoders, we decided to make an objective study by encoding the same 4 original sequences using the x265 library (www.x265.org), an open source application library for HEVC encoding video streams. Then, we assess the video quality by calculating 4 efficient objective metrics, having a good Pearson correlation coefficient during the last H.264 objective study, which are PSNR, MSE, MMSSIM and NQM. To compare the two video encoders, we draw curves showing the objective metric notes based on the HEVC and H.264 bit-rate using VBR mode. The sequence 1 results are shown in Figs. 10, 11, 12 and 13. From these figures, we can conclude that HEVC seems to give better results than H.264. In fact, as we can see in Fig. 13, the curve corresponding to HEVC notes is always over the H.264 one. Thus, HEVC encoding quality seems to be better than H.264 encoding quality. These results are also validated for the other 3 sequences. As we work in telemedicine context, in a real time context, we study the time processing need to encode these medical videos to be completely fair. Fig. 14 shows us the encod-

JID:IRBM AID:384 /FLA

8

[m5+; v1.214; Prn:6/11/2015; 13:19] P.8 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

Fig. 11. MSE vs Bit-rate (VBR mode) – Sequence 1 of Fig. 3.

Fig. 14. Time processing (s) vs Bit-rate (VBR mode) – Sequence 1 of Fig. 3 – CPU: Intel Core i7 quad-core 2.7 GHz/RAM: 16 GB.

Consequently, this preliminary study allows us to have a first conclusion that HEVC is more efficient than H.264 in objective video quality but slower which can be crucial in real time telemedicine scenarios. These results should be validated by subjective tests and generalized in more advanced work taking into account the live context. 6. Conclusion and prospects

Fig. 12. MMSSIM vs Bit-rate (VBR mode) – Sequence 1 of Fig. 3.

Fig. 13. NQM vs Bit-rate (VBR mode) – Sequence 1 of Fig. 3.

ing time processing, corresponding to sequence 1, made by a laptop with high performance (CPU: Intel Core i7 quad-core 2.7 GHz/RAM: 16 GB). It is clear that HEVC takes more time to encode the medical video than H.264.

In this paper, we have addressed the problem of quality assessment for H.264 compressed video sequences in the sensitive medical context. Experts from the American College of Radiology (ACR) and the Canadian Association of Radiologists (CAR) [20] recommend to use lossy compression at low rates for medical still images. The study presented here shows the possibility to lossy encoding full HD medical video without perceiving any loss of quality. We showed, in a real-time telemedicine application, that we can highly compress medical videos without changing doctors’ perception and opinion. Thanks to subjective tests, run in our living lab PROMETEE, the lossy compression could be effective for a range of H.264 compression ratio thresholds, from 100:1 up to 270:1, corresponding to 10% of error (MOS = 0.1) only, whereas we can go up to about 40% error as it has been shown in real HIPERMED scenarios. Furthermore, the main objective image quality assessment algorithms (IQA) related to subjective ratings of the panel have been tested. For this kind of full HD medical video, we have found the best IQA models that assess image quality without knowledge of anticipated distortions or human opinions of them. Finally, we have performed a preliminary study based on the new HEVC standard. This study has shown a significant improvement in term of video quality with respect to H.264, at a price, however, of an increasing computational cost. In the future, we will extend this work to other types of medical videos using the new video encoding standard HEVC. We have already started to move in this direction in the new Celtic plus European telemedicine project E3 (E-health services Everywhere and for Everybody).

JID:IRBM AID:384 /FLA

[m5+; v1.214; Prn:6/11/2015; 13:19] P.9 (1-9)

A. Chaabouni et al. / IRBM ••• (••••) •••–•••

Acknowledgements This study was conducted as a part of European Celtic HIPERMED project. In consequence, we are deeply indebted to all HIPERMED partners. The results of this work strongly relies on the platform PROMETEE, awarded at the Innovation program of the French Mines-Telecom Institute in 2011, which offered us the opportunity to achieve subjective tests in recommended conditions. We actively thank Denis Abraham who was one of the promoters of this project. Finally, we are very thankful to doctors (A. Bey, R. Jankowski, G. Bonfort, G. Koch, C. Lorentz, N. Boulanger, R. Grosjean, H. Eluecque, M. Varoquier, B. Toussaint, A. Russel, S. Botti, C. Rumeau) who participated to subjective tests and all the employees of the ENT department of Nancy University Hospital. References [1] Nouri N, Abraham D, Moureaux J-M, Dufaut M, Hubert J, Perez M. Subjective MPEG2 compressed video quality assessment: application to tele-surgery. In: 7th IEEE international symposium on biomedical imaging. 2010. [2] Gaudeau Y, Moureaux JM. Lossy compression of volumetric medical images with 3d dead-zone lattice vector quantization. Ann Télécommun 2009;64(5–6). [3] Chaabouni A, Gaudeau Y, Lambert J, Moureaux JM, Gallet P. Subjective and objective quality assessment for H.264 compressed medical video sequences. In: International conference on image processing theory, tools and applications. 2014. p. 18–22. [4] ITU-R. Recommendation 500-13, Methodology for the subjective assessment of the quality of television pictures, ITU-R Rec – BT.500, 2012. [5] Nicholson D, Pawałowski P, Moureaux J-M. Selected medical imaging sequences for HEVC development. In: Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11–15th Meeting. 2013.

9

[6] Schwarz H, Marpe D, Wiegand T. Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans Circuits Syst Video Technol September 2007;17(9). [7] Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process April 2004;13(4):600–12. [8] Wang Z, Simoncelli EP, Bovik AC. Multi-scale structural similarity for image quality assessment. In: Proceedings of the 37th IEEE Asiloma conference on signal, systems and computers. 2003. [9] Whang Z, Bovik AC. A universal image quality index. IEEE Signal Process Lett 2002;9(3):1–4. [10] Sheikh HR, Bovik AC, de Veciana G. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans Image Process 2005;14(12):2117–28. [11] Sheikh HR, Bovik AC. Image information and visual quality. IEEE Trans Image Process 2006;15(2):430–44. [12] Egiazarian K, Astola J, Ponomarenko N, Lukin V, Battisti F, Carli M. Two new full-reference quality metrics based on HVS. In: Workshop on video processing and quality metrics. 2006. [13] Mantiuk R, Kim K, Rempel AG, Heidrich W. HDR-VDP-2: a calibrated visual metrics for visibility and quality predictions in all luminance conditions. ACM Trans Graph 2011;30(4). [14] Gaubatz M. Metrix Mux visual quality AssessmentPackage. Available on foulard.ece.cornell.edu. [15] Mittal A, Moorthy AK, Bovik AA. No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 2013;20(2):209–12. [16] Mittal A, Soundararajan R, Bovik AC. Making a “Completely Blind” image quality analyzer. IEEE Signal Process Lett March 2013;20(3):209–12. [17] Moorthy A, Choi L, Bovik A, de Veciana G. Video quality assessment on mobile devices: subjective, behavioral and objective studies. IEEE J Sel Top Signal Process October 2012;6(6). [18] Kamariotis O. Bridging the gap between CBR and VBR for H.264 standard. World Acad Sci, Eng Technol 2007;8. [19] Sullvian GJ, Ohm JR, Wiegand T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 2012;22(12). [20] Canadian Association of Radiologists. Car standards for irreversible compression in digital diagnostic within radiology. June 2011. p. 1–11.