A bio-inspired synergistic virtual retina model for tone mapping

A bio-inspired synergistic virtual retina model for tone mapping

Accepted Manuscript A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping Marco Benzi, Mar´ıa-Jose´ Escobar, Pierre Kornprobst PII: DOI: R...

3MB Sizes 0 Downloads 13 Views

Accepted Manuscript

A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping Marco Benzi, Mar´ıa-Jose´ Escobar, Pierre Kornprobst PII: DOI: Reference:

S1077-3142(17)30204-7 10.1016/j.cviu.2017.11.013 YCVIU 2645

To appear in:

Computer Vision and Image Understanding

Received date: Revised date: Accepted date:

10 February 2017 22 November 2017 27 November 2017

Please cite this article as: Marco Benzi, Mar´ıa-Jose´ Escobar, Pierre Kornprobst, A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping, Computer Vision and Image Understanding (2017), doi: 10.1016/j.cviu.2017.11.013

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • Tone mapping-like processing is done by retina through adaptation mech-

CR IP T

anisms • We show how a neurophysiological model of the retina can be extended for tone mapping

• Our model considers space and time dynamics, and works for images and videos

AN US

• Contrast gain control layer enhances dynamically local brightness and contrast

• This paper provides new insights for designing synergistic computer vision

AC

CE

PT

ED

M

methods

1

ACCEPTED MANUSCRIPT

A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping ∗ a Universidad

Co-senior-authors

CR IP T

Marco Benzia,b , Mar´ıa-Jos´e Escobar∗a , Pierre Kornprobst∗b

T´ ecnica Federico Santa Mar´ıa, Departamento de Electr´ onica, 2390123 Valpara´ıso, Chile b UCA, Inria, Biovision team, France

AN US

Abstract

Real-world radiance values span several orders of magnitudes which have to be processed by artificial systems in order to capture visual scenes with a high visual sensitivity. Interestingly, it has been found that similar processing happens in biological systems, starting at the retina level. So our motivation in this paper is to develop a new video tone mapping operator (TMO) based on a synergistic

M

model of the retina. We start from the so-called Virtual Retina model, which has been developed in computational neuroscience. We show how to enrich this

ED

model with new features to use it as a TMO, such as color management, luminance adaptation at photoreceptor level and readout from a heterogeneous population activity. Our method works for video but can also be applied to

PT

static images (by repeating images in time). It has been carefully evaluated on standard benchmarks in the static case, giving comparable results to the

CE

state-of-the-art using default parameters, while offering user control for finer tuning. Results on HDR videos are also promising, specifically w.r.t. temporal luminance coherency. As a whole, this paper shows a promising way to ad-

AC

dress computational photography challenges by exploiting the current research in neuroscience about retina processing. Keywords: Tone mapping, HDR, retina, photoreceptor adaptation, contrast gain control, synergistic model

Preprint submitted to CVIU

November 28, 2017

ACCEPTED MANUSCRIPT

1

1. Introduction Real-world radiance values span several orders of magnitudes which have to

3

be processed by artificial systems in order to capture visual scenes with a high

4

visual sensitivity. Think about scenes of twilight, day sunlight, the stars at night

5

or the interiors of a house. To capture these scenes, one needs cameras capable of

6

capturing so-called high dynamic range (HDR) images, which are expensive, or

7

via the method proposed by Debevec and Malik (1997), currently implemented

8

in most standard cameras. The problem is how to visualize these images af-

9

terwards since standard monitors have a low dynamic range (LDR). Two kinds

10

of solutions exist. The first is technical: there are HDR displays, but they are

11

not affordable for the general public yet. The second is algorithmic: there are

12

methods to compress the range of intensities from HDR to LDR. These methods

13

are called tone mapping operators (TMOs) (Reinhard et al., 2005). TMOs have

14

been developed for both static scenes and videos rather independently. There

15

has been intensive work on static images (see Kung et al. (2007); Bertalm´ıo

16

(2014) for reviews), with approaches combining luminance adaptation and lo-

17

cal contrast enhancement sometimes closely inspired from retinal principles, as

18

in Meylan et al. (2007); Benoit et al. (2009); Ferradans et al. (2011); Muchungi

19

and Casey (2012) just to cite a few. Recent developments concern video-tone

20

mapping, where a few approaches have been developed so far (see Eilertsen et al.

21

(2013, 2017) for surveys).

PT

ED

M

AN US

CR IP T

2

Interestingly, in neuroscience it has been found that a similar tone map-

23

ping processing occurs in the retina through adaptation mechanisms. This is

24

crucial since the retina must maintain high contrast sensitivity over this very

25

broad range of luminance in natural scenes (Rieke and Rudd, 2009; Garvert and

CE

22

Gollisch, 2013; Kastner and Baccus, 2014). Adaptation is both global through

27

neuromodulatory feedback loops and local through adaptive gain control mech-

28

anisms so that retinal networks can be adapted to the whole scene luminance

29

level while maintaining high contrast sensitivity in different regions of the image,

30

despite their considerable differences in luminance (see Shapley and Enroth-

AC 26

3

ACCEPTED MANUSCRIPT

Cugell (1984); Thoreson and Mangel (2012) for reviews). Luminance and con-

32

trast adaptation occurs at different levels, e.g., at the photoreceptor level where

33

sensitivity is a function of the recent mean intensity, and at the bipolar level

34

where slow and fast contrast adaptation mechanisms are found. These multiple

35

adaptational mechanisms act together, with lighting conditions dictating which

36

mechanisms dominate.

CR IP T

31

Thus there is a functional analogy between artificial and biological systems:

38

Both target the task of dealing with HDR content. Our motivation is to develop

39

a new TMO based on a synergistic model of the retina. We start from the

40

so-called Virtual Retina simulator (Wohrer and Kornprobst, 2009) which

41

has been developed in neuroscience to model the main layers and cell types

42

found in primate retina. It is grounded in biology and it has been validated

43

on neurophysiological data. As such, it has been in several theoretical studies

44

as a retina simulator to make predictions of neural response (Masquelier, 2012;

45

Basalyga et al., 2013; Vance et al., 2015; Masquelier et al., 2016; Cessac et al.,

46

2017a). Thus this model can be considered as a good candidate to build a

47

synergistic model since it could pave a way for much needed interaction between

48

the neuroscience and computer vision communities, as discussed in the survey

49

by Medathati et al. (2016).

ED

M

AN US

37

Interestingly, Virtual Retina has also been used to solve artificial vision

51

tasks, such as hand written recognition (Mohemmed et al., 2012) and image com-

52

pression (Masmoudi et al., 2010; Doutsi et al., 2015a,b). The reason why it could

53

be also an interesting model for a TMO is that it includes a non-trivial Con-

CE

PT

50

trast Gain Control (CGC) mechanism, which not only enhances local contrast

55

in the images but also adds temporal luminance coherence. However, Virtual

56

Retina is missing several important features to address the tone mapping task.

AC

54

57

It was not designed to deal with color images, even less HDR images, there is

58

no photoreceptor adaptation nor readout mechanisms to create a single output

59

from the multiple retinal response given by different output cell layers. In this

60

paper we address these questions by enriching Virtual Retina model keeping

61

its bio-plausibility and testing it on standard tone mapping images. 4

ACCEPTED MANUSCRIPT

This article is organized as follows. In Sec. 2 we describe our bio-inspired

63

synergistic model and we compare it to former bio-inspired TMOs In Sec. 3 our

64

method is evaluated on static images and videos. We show the impact of the

65

different steps and of main parameters. We show comparisons with the state-

66

of-the-art. Finally, In Sect. 4 we summarize our main results and discuss future

67

work.

68

2. Bio-inspired synergistic retina model

69

2.1. Why Virtual Retina?

AN US

CR IP T

62

In this paper we are interested in investigating how the retina could be a

71

source of inspiration to develop a new TMO. In neuroscience, there is no unique

72

model of the retina but different classes of mathematical models depending

73

on their use. In Medathati et al. (2016), the authors identified three classes

74

of models: (i) linear-nonlinear-poisson models used to fit single cell record-

75

ings (Chichilnisky, 2001; Carandini et al., 2005), (ii) front-ends retina-like mod-

76

els for computer vision tasks (Benoit et al., 2010; H´erault, 2010) and (iii) models

77

based on detailed retinal circuitry (Wohrer and Kornprobst, 2009; Lorach et al.,

78

2012). There is no unique model also because the retinal code remains an open

79

challenge in neuroscience. With the advent of new techniques such as Multi-

80

Electrode Arrays, the recording of the simultaneous activity of groups of neurons

81

provides a critical database to unravel the role of specific neural assemblies in

82

spike coding.

PT

ED

M

70

There is currently an intensive ongoing research activity on the retina trying

84

to decipher how it encodes the incoming light by understanding its complex cir-

85

cuitry (Gollisch and Meister, 2010). As such, models based on detailed retinal

AC

CE

83

86

circuitry will directly benefit from this research and this is why it seems promis-

87

ing to focus on this class of models. Here, we consider the Virtual Retina

88

model (Wohrer and Kornprobst, 2009) as starting point. Virtual Retina was

89

designed to be a spiking retina model which enables large scale simulations (up

90

to 100.000 neurons) in reasonable processing times while keeping biological plau-

5

ACCEPTED MANUSCRIPT

sibility. The underlying model includes a non-separable spatiotemporal linear

92

model of filtering in the Outer Plexiform Layer, a shunting feedback at the level

93

of bipolar cells, and a spike generation process using noisy leaky integrate-and-

94

fire neurons to model RGCs. To be self contained, this paper will remind the

95

main equations of Virtual Retina but we will not enter their justifications,

96

so the interested reader should refer to the original paper.

97

2.2. Model overview Our TMO has three main stages shown in Fig. 1:

98

CR IP T

91

(i) Pre-processing steps including a conversion of HDR-RGB input color im-

100

age to luminance space, and a calibration of this luminance data so that

101

these values are mapped to the absolute luminance of the real world scene.

102

(ii) A detailed retina model including a new photoreceptor adaptation taking

103

into account pupil adaptation, which has been incorporated into the input

104

level of Virtual Retina model that follows.

M

AN US

99

(iii) Post-processing including the definition of a readout given the multi-

106

ple retinal output of Virtual Retina and followed by colorization and

107

gamma correction to finally obtain a LDR-RGB output image.

ED

105

Each stage is explained in detail in the following section and also illustrated

109

using a representative example to show its impact. The source code is available

110

upon request to the authors via the paper website.1 Website contains sample

111

results for static images and videos, and the download form for the source code.

112

2.3. Model detailed description

CE

PT

108

Input is a HDR video defined by (Lred , Lgreen , Lblue )(x, y, t), where Lred ,

114

Lgreen and Lblue represent the red, green and blue channels of the input HDR

115

video respectively. Variables (x, y) stand for spatial position and t for time.

AC 113

1 Paper

website: https://team.inria.fr/biovision/tmobio

6

ACCEPTED MANUSCRIPT

HDR-RGB image

Color to luminance and HDR calibration

Retina model

Virtual Retina Edge and change detection

Rods Cones

Horizontal cells

AN US

Global nonlinear adaptation

CR IP T

Pre-processing

OPL

Outer plexiform layer (OPL)

Inner plexiform layer (IPL)

Negative contrast

(RGC)

RGC

ED

Positive contrast

Bipolar cells CGC

M

Dynamical temporal adaptation

Post-processing

AC

CE

PT

Readout: Convergence of multiple layers

Figure 1:

Colorization and Gamma correction

LDR-RGB image

Tone-mapping operator general pipeline. Block diagram showing the three

main stages of our method: Preprocessing, retina model extending Virtual Retina and post-

processing. Retina main steps are illustrated on the right hand-side to remind that our retina models follows the main circuitry of primate retina (Courtesy Purves et al. (2001))

7

ACCEPTED MANUSCRIPT

116

2.3.1. Pre-processing: Color-to-luminance and HDR calibration This first stage is classical. One first convert HDR color video into a HDR radiance map video following the linear operation:

CR IP T

Lw (x, y, t) = 0.2126 Lred (x, y, t) + 0.7152 Lgreen (x, y, t) + 0.0722 Lblue (x, y, t).

This equation correlates the way light contributes differently to the retina de-

118

pending on the proportion of each photoreceptor type with green the most

119

energy and blue the least. Here we use the standard matrix specified by ITU-R

120

BT.709 for the input channel weights (Reinhard et al., 2005), also used in the

121

sRGB color space.2 This is the most common color space used in consumer

122

cameras and devices, but other spaces could be used by operating with different

123

matrices.

AN US

117

Then Lw has to be calibrated. Calibration is related to the linearity between

125

HDR image and the absolute luminance, regardless the light conditions. This

126

step is not biologically inspired but compensates for each camera individual

127

properties: each camera is agnostic w.r.t. the scene acquired and will have

128

their own encoding of the scenes which do not correspond necessary to the

129

absolute luminance. Thus calibration is an important step since we do not

130

know a priori how HDR image values differ from the real scene. In our case, it

131

is also necessary since the photoreceptor adaptation model defined below needs

132

absolute luminance values.

PT

ED

M

124

Some existing calibration methods relies on using meta-data as EXIF tags,

CE

GPS locations or light probes (Gryaditskya et al., 2014). In this approach we use the heuristic proposed by Reinhard et al. (2002); Reinhard (2002) for calibrating data, which does not need information about the environment where the image

AC

was taken upon. Following Reinhard et al. (2002), the calibration process defines how to obtain an estimation of the real luminance L(x, y, t) from Lw (x, y, t): L(x, y, t) =

2 sRGB

α Lw (x, y, t), log ¯ Lw (t)

color space: https://www.w3.org/Graphics/Color/sRGB

8

(1)

ACCEPTED MANUSCRIPT

133

where the incoming data gets scaled by the ratio between the calibration coef-

134

¯ log ficient α and the logarithmic average L w (t) defined below. This equation gives a way of setting the tonal range of the output image

CR IP T

according to the so-called key of scene α. The key of the scene is a unitless number which relates to whether the picture is subjectively dark, normal ¯ log or bright. The underlying idea is that this ratio between α and L w (t) will

map the incoming luminance so that the calibrated output luminance would be

correlated to the subjective notion of the brightness of the scene. The parameter α is generally user defined and set to a fixed value (0.18 for most of the cases),

defined by



AN US

¯ log while the logarithmic average L w (t) is an approximation of the key of the scene  X 1 ¯ log  L log(δ + Lw (x, y, t)) , w (t) = exp N

(2)

x,y∈Ω

135

where N is the number of pixels in the Ω image space and δ is a small value in

136

order to avoid numerical errors coming from zero-valued pixels.

M

Then equation (1) can be further improved by letting the key of the scene be automatically adjusted depending of the statistics of each image. Follow-

ED

ing Reinhard (2002), it can be obtained as follows:

with

PT

α(t) = 0.18 · 4f (t) ,

¯ x,y,1% (t)) (t)) − log2 (Lmax,x,y,1% 2 log2 (L (t)) − log2 (Lmin,x,y,1% w w w

CE

f (t) =

(t)) log2 (Lmax,x,y,1% (t)) − log2 (Lmin,x,y,1% w w

(3)

,

(4)

¯ x,y,1% where L (t), Lmin,x,y,1% (t), and Lmax,x,y,1% (t) denote the average, miniw w w

138

mum, and maximum values of the incoming data respectively over the spatial

139

space, excluding 1% of the lightest and darkest pixels, statistically trimming

140

values coming from overexposure in the camera or noise. This is very impor-

141

tant since an image can have a distorted luminance distribution due to, e.g.,

142

overexposure. Note that in (3) the original value 0.18 is kept but is now mod-

143

ulated by 4f (t), which takes into consideration the statistical features of the

AC

137

9

ACCEPTED MANUSCRIPT

image through equation (4). The value 4 was empirically determined by the

145

author (Reinhard, 2002).

146

2.3.2. Retina model

CR IP T

144

In this section, we try to present the different steps of the retina model in

148

functional terms rather than in biological terms which would be less informative

149

for our readers. For that purpose, we illustrate in Fig. 2 the effect of the retinal

150

steps in TMO pipeline, and also provide an DRIM metric (Aydin et al., 2008) for

151

each stage in order to allow a better appreciation of the accumulative changes

152

compared to the original HDR image. We refer the interested readers to the

153

original paper describing Virtual Retina (Wohrer and Kornprobst, 2009) to

154

understand the biological foundation of the model. We also try to find analogies

155

with previous works, not necessarily bio-inspired methods, to show that there

156

are common ingredients found in artificial and biological models, even if their

157

implementations differ (see also Sec. 2.4).

158

Photoreceptor adaptation: Global non-linear adaptation. Photoreceptors adjust

159

their sensitivity over time and space according to the luminosity present in the

160

visual scene, providing a first global adaptation for the incoming light (Dunn

161

et al., 2007). This stage was not present in Virtual Retina original model

162

which was not designed to process HDR images.

ED

M

AN US

147

PT

Here we consider the model by Dunn et al. (2007), where the authors proposed to fit empirical data coming from primate retinal cones. Their model

CE

describes photoreceptor activity as a function of the background and average luminance. Using their results, we showed how the luminance received by each photoreceptor can be mapped depending on the input luminance L(x, y, t), fol-

AC

lowing the nonlinear equation: h(x, y, t; L) =

1+



¯ l1/2 (L(t)) L(x, y, t)

n !−1

,

(5)

163

where n is a constant which can be set according to experimental data to n =

164

¯ 0.7 ± 0.05, L(t) is the instantaneous spatial average of the incoming luminance 10

ACCEPTED MANUSCRIPT

Retina model

DRIM

AN US

Edge and change detection

CR IP T

Global nonlinear adaptation

HDR image

M

Dynamical temporal adaptation

Positive & negative contrast

LDR-RGB image

OFF

-

PT

ED

ON

Figure 2: Illustration of the ”Retinal model” steps from TMO pipeline presented

CE

in Fig. 1 (pre- and post-processing are indicated by yellow and red colored circles respectively). Input image is tree mask by Image, Light and Magic. On the first step the photoreceptor adaptation provides a compression of dynamic range. Then the OPL layer

AC

provides sharpening of the image. After that, the CGC layer improves the overall lightning reducing or amplifying the contrast locally. Finally, the readout is done by merging the two images describing positive and negative contrasts corresponding to ON and OFF ganglion cells. Note that for each step, except for contrast images, images have been colorized and gamma corrected to facilitate comparisons between different steps. These results were obtained using the model parameters presented on Table 1, except for λA = 10000 [Hz] and σA = 1.2 [deg]. For each step the respective DRIM (Aydin et al., 2008) is presented, showing in green loss of contrast, blue amplification of contrast and red reversal of contrast against the input HDR image as reference. Remark that ganglion cell 11layer is not affecting the image contrast, but only the global brightness.

ACCEPTED MANUSCRIPT

R

165

¯ defined by L(t) =

166

details to prove this relation (5) are given in the Appendix A (see also Fig. A.12

167

for an illustration of the function l1/2 (.)). Results about this stage are presented

168

in Sec. 3.1.

L(x, y, t)dxdy, and l1/2 is nonlinear function. Technical

CR IP T

x,y

Our photoreceptor model can be related to the TMO proposed by Van Hateren

169

170

(2006) based on dynamic cone model. However, in our method dynamics is in-

171

troduced later (namely at the CGC level) so that the photoreceptor adaptation

172

behaves as a static nonlinear global operator.

AN US

Outer plexiform layer (OPL): Edge and change detection. This stage operates

edge detection and temporal change detection given the light information already adapted by the photoreceptors (h(x, y, t; L)). In retinal circuitry, this is performed at the OPL level. OPL designates the first synapse layer in the retina making the connection between photoreceptors, horizontal cells and bipolar cells. Output of this stage is the IOPL current. This band-pass behavior is

M

obtained by center-surround interactions, which can be written: IOPL (x, y, t) = λOPL (C(x, y, t) − wOPL S(x, y, t)),

ED

(6)

where C(x, y, t) represents the center excitatory signal, calling for the photoreceptor adaptation and photo-transduction and S(x, y, t) is the surround in-

PT

hibitory signal, which models the horizontal cell response. They are defined

CE

as follows:

t

t

(7)

S(x, y, t) = GσS ∗ EτS ∗ C(x, y, t),

(8)

x,y

t

x,y

t

where ∗ denotes convolution that can be either spatial ( ∗ ) or temporal (∗), Gσ

AC

173

x,y

C(x, y, t) = GσC ∗ TwU ,τU ∗ EnC ,τC ∗ h(x, y, t; L),

174

is the standard Gaussian kernel, Tw,τ is a partially high-pass temporal filter,

175

Eτ is an exponential filter to model low-pass temporal filtering and En,τ is

176

an exponential cascade (Gamma filter)3 . The balance between the center and 3 Expressions

are: Eτ = exp(−t/τ )/τ and En,τ (t) = (nt)n exp(−nt/τ )/ (n − 1)!τ n+1

12



ACCEPTED MANUSCRIPT

177

surround contribution is given by wOPL and it is normally close to 1. λOPL

178

defines the overall gain of this stage. Parameters σC , nC , τC , wU , τU , σS and

179

τS are other constant parameters. Equation (6) with (7)–(8) results in a spatiotemporal non-separable filter. It

181

extends the only-spatial classical model of the retina as a difference of Gaussians

182

(DoG). As such, the IOPL is a real valued function corresponding to the presence

183

of contrasts in the scene either a spatial contrast (with sign indicating if there is a

184

bright spot other a dark background or the opposite) or a change in time (with

185

sign also indicating the nature of the temporal transition). This real-valued

186

function will lead to the two rectified components defining ON and OFF cells

187

in the ganglion cell layer (see below). This is a classical model in neuroscience

188

so that all its parameters are well understood and can be set to default values

189

given by biological constrains. Results about this stage are presented in Sec. 3.2.

190

Note that this DoG-like operation done in the OPL layer in conjunction

191

with the dynamic range compression performed by the photoreceptor adaptation

192

generate an operator which can be qualitatively compared to the bilateral filter

193

TMO proposed by Durand and Dorsey (2002) in the sense that there is an

194

implicit estimation of edges.

195

Contrast Gain Control layer (CGC): Dynamical temporal adaptation. Contrast

196

gain control, or contrast adaptation, is the usual term to describe the influence

197

of the local contrast of the scene on the transfer properties of the retina (Shapley

198

and Victor, 1978; Victor, 1987; Kim and Rieke, 2001; Rieke, 2001; Baccus and

199

Meister, 2002). In Virtual Retina, an original model was proposed for fast

200

contrast gain control, an intrinsically dynamic feature of the system, which has

201

a strong and constant influence on the shape of retinal responses, and very

AC

CE

PT

ED

M

AN US

CR IP T

180

202

likely on our percepts. It is an effect intrinsically nonlinear, and dynamical.

203

Dynamically, it modulates the gain of the transmission depending on the local

204

contrast. In Virtual Retina, given the input current IOPL (x, y, t), the membrane

13

ACCEPTED MANUSCRIPT

potential of bipolar cells denoted by VBip is evolving according to: dVBip (x, y, t) = IOPL (x, y, t) − gA (x, y, t)VBip (x, y, t), dt

CR IP T

with

(9)

  x,y t 0 gA (x, y, t) = GσA ∗ EτA ∗ Q(VBip ) (x, y, t) and Q(ν) = gA + λA ν 2 , 205

0 where σA , τA , gA and λA are constant parameters. This model has been rigor-

206

ously evaluated by comparisons with real cell recordings using specific stimuli.

So, in practice this stage translates in contrast equalization, which we see

208

as an image enhancing capability and a temporal coherence source for videos.

209

Figure 2 illustrates this effect as the increase of brightness in some areas. As in

210

the former stage, all its parameters can be set to default values to fit biological

211

constrains. Results about this stage are presented in Sec. 3.3.

212

Ganglion cells layer. In the last layer of the retina, namely ganglion cells layer,

213

two main processing stages happen. The first consists in transforming the poten-

214

tial of bipolar cells into an electric current. The second consists in transforming

215

this current into spikes which is how information is encoded in the end by the

216

retina to be transmitted to the visual cortex through the optic nerve. In this

217

paper we will only consider the first stage since the spike-based encoding will

218

not be exploited.

PT

ED

M

AN US

207

219

From a more functional perspective, it is in this last stage that the response

220

of ganglion cells can be interpreted in terms of features in the observed scene. There are about 20 ganglion cell types reporting to the brain different fea-

222

tures (Masland, 2011). In this paper we focus on two major classical types,

223

namely ON and OFF cells which respond to positive and negative spatiotempo-

AC

CE 221

224

ral contrasts. As explained above, the emergence of contrast sensitivity comes

225

from the Outer Plexiform Layer where center-surround processing occur (see

226

Eq. (6)). Following the model proposed in Virtual Retina, ON and OFF ganglion ON OFF cell currents (resp. denoted by IGang (x, y, t) and IGang (x, y, t)) are obtained by

14

ACCEPTED MANUSCRIPT

applying a nonlinear rectification to the bipolar potential to either keep the positive or negative parts. This can be written as follows:

with N (V ) =

(x, y, t) = N (ε VBip (x, y, t)) ,

  

i0G 0 )/i0 1−λG (V −vG G

 i0 + λG (V − v 0 ) G G

(10)

CR IP T

ON/OFF

IGang

0 if V < vG ,

(11)

0 if V > vG ,

where ε defines the output polarity (ε = 1 for ON cells and ε = −1 for OFF

228

cells), i.e., changing the ε parameter in (10), one defines the two ganglion cells

229

ON OFF currents IGang (x, y, t) and IGang (x, y, t), describing respectively positive and

230

negative contrasts. The function N (V ) is a non-linear function which recti-

231

fies IGang

232

feature in neural modeling and in retinal models. It reflects static nonlineari-

233

0 ties observed experimentally in the retina. The parameter vG is the ”linearity

234

threshold” between the two operation modes while λG is the slope of the rectifi-

235

cation in the linear part. Note that (10) is a simplification of the original model

236

in which additional convolutions were added to fit biological data.4

(x, y, t) using a smooth curve. Such rectification is a very common

M

ON/OFF

AN US

227

Results about this stage are presented in Sec. 3.4 where we show the influence

ED

237

of the parameter λG .

239

Remark 1. Are ON an OFF circuits symmetric? In the current model

240

and following Wohrer and Kornprobst (2009), ON and OFF circuits are sym-

241

metric in the sense that they were obtained by simply changing the polarity ε.

242

However, this is a simplified assumption with respect to real retina properties.

243

Indeed, there are physiological and mechanistic asymmetries in retinal ON and

244

OFF circuits (Chichilnisky and Kalmar, 2002; Zaghloul et al., 2003; Balasubra-

AC

CE

PT

238

245

manian and Sterling, 2009; Kremkow et al., 2014). For example, across species,

246

it has been observed that OFF cells (compared to ON cells) are more numerous, 4 In

Eq.

the original model, there were additional spatial and temporal convolutions (see

(14) from Wohrer and Kornprobst (2009)). These terms were useful to fit biological

data but in our context they were introducing some blur so that we decided to discard them.

15

ACCEPTED MANUSCRIPT

sample at higher resolution with denser dendritic arbor and respond to contrast

248

with strong rectification. Differences between the two circuits have been also ob-

249

served depending on the overall light condition, e.g., photopic or scotopic condi-

250

tions (Pandarinath et al., 2010; O’Carroll and Warrant, 2017; Takeshita et al.,

251

2017). Interestingly, this asymmetry could arise from the asymmetries between

252

dark versus bright regions in natural images (Balasubramanian and Sterling,

253

2009; Ratliff et al., 2010). For example, in Ratliff et al. (2010) the authors con-

254

sidered the distribution of local contrast in 100 natural images and suggest that

255

because of the skew observed in that distribution (natural scenes contain more

256

negative than positive contrast; see also ON and OFF ganglion cells response

257

given in Fig. 2), the system should devote more contrast levels to encode the

258

negative contrasts for example (e.g., OFF channels should be more rectified).

259

All these results suggest that an interesting avenue for further study is to model

260

explicitly these asymmetries to better account for natural image statistics (see

261

discussion in Sec. 4). This is why we introduced the ON and OFF circuits here

262

although there role in the present study remains limited (slight impact on global

263

luminance) and we refer to Sec. 3.4 where we show a preliminary result where

264

some asymmetry is introduced between ON and OFF circuits.

ED

M

AN US

CR IP T

247

Readout population activity. Given ON and OFF responses, one needs to define

PT

how to readout an activity based on different population responses. Here we chose a simple way to combine the activity generated by both the ON and OFF

CE

circuits:

ON OFF Lout (x, y, t) = IGang (x, y, t) − IGang (x, y, t),

(12)

where Lout (x, y, t) represents the output radiance map which can now be col-

266

orized, normalized, quantized and gamma corrected for the output LDR file, as

AC

265

267

described in final step below. Colorization and Gamma correction. Up to now, our method operates on the luminance map of the input image. Once the luminance map has been corrected, it is necessary to colorize it. To do this we apply the method proposed

16

ACCEPTED MANUSCRIPT

by Mantiuk et al. (2009) which involves linear interpolation between chromatic input information ({Li (x, y, t)}i=red, green, blue ), the respective achromatic input information (L(x, y, t)) and the achromatic output information Lout (x, y, t)). To

CR IP T

obtain the output in linear space Olin,i , each chromatic channel is colorized as follows: Olin,i (x, y, t) = 268



  Li (x, y, t) − 1 s + 1 Lout (x, y, t), L(x, y, t)

(13)

where s is the color saturation factor, in our implementation set by default to 1. The model also considers a gamma correction step. Luminance values com-

AN US

ing from a HDR image are usually in a linear scale, which does not map well

with the color spaces of computer monitors mostly having non linear spaces. Moreover these response curves are usually described by an power law, so we need to apply the inverse operation to keep the desired appearance correctly on screen. For each channel chromatic channel, a standard gamma correction is applied:

ED

M

   0    1/γ Oi (x, y, t) = b255 · Olin,i + 0.5c      255

1/γ

255 · Olin,i ≤ 0, 1/γ

0 < 255 · Olin,i ≤ 255,

(14)

otherwise,

where the choice of the gamma factor normally depends on the screen, (default:

270

γ = 2.2). The factor of 255 present in the equation is related to the maximum

271

possible value that can be stored in a common LDR image format like PNG or

272

JPEG with a resolution of 8 bits per pixel.

273

2.4. Relation with previous work

CE

PT

269

Given the nature of our model, in this section we first discuss a selection of

275

models inspired by visual system properties and architecture. We found a range

276

of models from psychophysical inspiration (e.g., based on fitting psychophysical

277

curves for light adaptation) to more bio-plausible models of the visual system

278

(e.g., mathematical model of the retinal circuitry).

AC

274

17

ACCEPTED MANUSCRIPT

One of the earliest examples, Rahman et al. (1996) developed an TMO based

280

on the retinex algorithm. This yields a local non-linear operator that relies on

281

the work of Land and McCann (1971), who suggested that both the eye and

282

the brain are involved in color processing, particularly in the color constancy

283

effect. The TMO proposed by Rahman et al. (1996) process each color channel

284

independently and finally merged using a color restoration method proposed by

285

the author. Their method has two modes of operation: single and multi-scale.

286

At single scale each color channel is convolved with Gaussian kernels of different

287

sizes. Each pixel (center) is combined with the blurred image (surround) in a

288

nonlinear manner defining thus different center/surround interactions at differ-

289

ent spatial scales. At multi-scale the resulting center/surround interactions are

290

weighted and summed obtaining the equalized output image. The selection of

291

the weights is based on a heuristic considering either small or large surrounds.

AN US

CR IP T

279

In Reinhard and Devlin (2005), the authors developed a TMO inspired in the

293

adaptation mechanism present in photoreceptors. This TMO describes the au-

294

tomatic adjustment to the general level of brightness that color-photoreceptors

295

(cones) have. The core of their algorithm is the adaptation of a photorecep-

296

tor model, which is mainly implemented as a variation from the Naka-Rushton

297

equation5 . Their model lies on a global adaptation of the image fitting the

298

Naka-Rushton parameters in a global (from average scene luminance) or local

299

(from color average luminance) manner. This method does not allow luminance

300

coherence over time because of its static formulation (each frame is processed

301

independently). In our case, photoreceptor adaptation is done frame by frame,

CE

PT

ED

M

292

302

and luminance coherence is obtained by the temporal filtering in the CGC layer. A front-end retina-like model for computer vision tasks was proposed by (Benoit

303

et al., 2010; H´erault, 2010) and applied for tone mapping images and video (Benoit

AC

304

305

et al., 2009). They modeled the retina as two adaptation layers. The first layer

306

is related to photoreceptor adaptation and the second one to ganglion cells. Be5 Sigmoid

function which computes the response of the photoreceptor based on the incoming

brightness and the semi-saturation constant

18

ACCEPTED MANUSCRIPT

tween the two main layers lies a non-separable spatio-temporal filter modeling

308

the synaptic Outer Plexiform Layer (OPL), which summarizes the interaction

309

between photoreceptors and horizontal cells of the retina. For the photoreceptor

310

adaptation the authors propose a non-linear local process in which each photore-

311

ceptor response depends on the incoming luminance inside a local neighborhood.

312

The OPL layer is then used as spectral whitening and contrast enhancement. In

313

a similar manner, the ganglion cell layer reuses the nonlinear mechanism used

314

for the photoreceptor adaptation with different parameters. Specifically, the

315

authors consider a smaller neighborhood around each pixel to compute the aver-

316

age luminosity. Color processing is done through a multiplexing/demultiplexing

317

scheme based on the Bayer color sampling, normally found in cameras. Regard-

318

ing HDR video, the authors use the temporal aspect of the OPL spatio-temporal

319

filter to provide temporal coherence through the sequence. This model shares

320

some features with Virtual Retina model (Wohrer and Kornprobst, 2009),

321

for instance, the modeling of the OPL as a non-separable spatio-temporal fil-

322

ter. Benoit et al. (2009) directly connects the OPL output to the nonlinearity

323

proposed in the ganglion cell layer, while in Virtual Retina OPL output gets

324

into a Contrast-Gain Control stage before the ganglion cell layer.

ED

M

AN US

CR IP T

307

Finally, even though is not properly a TMO, in Zhang et al. (2015), the

326

authors proposed a method for de-hazing static images that relies on a thor-

327

ough retina model, from photoreceptors to retinal ganglion cells. In this model,

328

photoreceptors, bipolar, amacrine and ganglion cells are modeled as a cascade

329

of Gaussian and difference of Gaussian filters through several signal pathways,

CE

PT

325

representing the intricate relationships between colors and global versus local

331

adaptation. We mention this work here because our model shares the model-

332

ing of those biological layers, except for amacrine cells. Also, their model does

AC

330

333

not account for any dynamical Contrast Gain Control which is an important

334

processing stage in our method.

335

Compared to general video tone mapping approaches, it is worth mention-

336

ing that our method does not need optical flow estimation, as in Aydin et al.

337

(2014); Hafner et al. (2014); Mangiat and Gibson (2010) where the authors use 19

ACCEPTED MANUSCRIPT

it stabilize scenes with small motion, with the limitation of producing artifacts

339

in scenes with larger displacements (Kalantari et al., 2013). Other video TMOs

340

apply transforms that change over time but applied uniformly in space (Ferw-

341

erda et al., 1996; Pattanaik et al., 2000; Irawan et al., 2005; Van Hateren, 2006;

342

Mantiuk et al., 2008; Reinhard et al., 2012). They perform well in terms of

343

temporal coherence but can produce a loss in detail, whereas our model take

344

into account local contrast information in space, which better preserves spatial

345

details.

346

3. Results and comparisons

AN US

CR IP T

338

347

In this section we evaluate and compare our bio-inspired synergistic TMO

348

on both static images and videos. Remark that if the input is a static image, it

349

is considered as a video by simply repeating that image other time. Unless specified otherwise, all model parameters used to process HDR images

351

are set as given in Table 1. It is important to emphasize that a majority of these

352

parameters have been set thanks to biological constraints. Note that since all

353

gaussian standard deviation parameters were defined in visual angle degrees in

354

Virtual Retina, we used a conversion of five pixels per visual angle degree,

355

or [p.p.d.]. This value was kept constant for different image sizes. Concerning

356

temporal constants, we assume that videos are at 30fps and between each frames

357

we do six iterations of the CGC equation (9). For a static input image, we

358

let the iterative process converge towards a final image (in practice, about five

359

repetitions of the frame, i.e., 30 iterations). We used a time step of dt = 0.5 [ms].

CE

PT

ED

M

350

360

Sections 3.1 to 3.5 deal with static images. We will show the influence of

361

some parameters in each step to show their impact and benchmark our method

363

with respect to the state-of-the-art. Section 3.6 gives an example of result on a

364

real video sequence. We show how the temporal convolutions provide temporal

365

coherence in the output.

AC 362

20

ACCEPTED MANUSCRIPT

Table 1: Default parameters used in our implementation. Note that all parameters and variables presented in the Virtual Retina model are expressed in reduced units and not physical units (see justification in Sec. 2.1.2 from Wohrer and Kornprobst (2009)). As a consequence, for example, the dimension of currents is expressed in Hertz and the overall gain

τC

0.01 [s]

nC

Value  −1  52000 P s 2

τU

0.1 [s]

wU

0.8   −1 10 Hz lum

σS

0.2 [deg]

τS

0.01 [s]

0.55

dt

0.005 [s]

0.2 [s]

λA

100 [Hz]

σA

0.2 [deg]

τA

0.0005 [s]

0 gA

i0G

80 [Hz]

λG

n

0.5

λOPL tf

γ

366

Parameter i1/2

wOPL

2.2

Parameter

5 [Hz]

100 [Hz]

Value

σC

0.03 [deg]

AN US

Parameter

Value

CR IP T

of the center-surround filter λOPL is expressed in Hertz per unit of luminance.

0 vG

0

s

1

3.1. Photoreceptor adaptation: Global non-linear adaptation

Figure 3 shows the effect of photoreceptor adaptation. In Fig.3(a) we show

368

how an input radiance map L(x, y, .) (converted to an LDR image using a linear

369

mapping) is transformed into h(x, y, .; L) image defined in (5). This is done by

370

a global non-linear operator effectively compressing the whole incoming data

371

range to the interval ]0, 1[, step necessary for the Virtual Retina model to

372

work properly. The behavior of this nonlinearity amplifies the cases where the

373

luminance is higher compared to the mean spatial luminance of the scene, non-

374

linearly computed through l1/2 (·), i.e., overexposed regions. In Fig. 3(b) we

PT

ED

M

367

show the absolute difference between h(·) and L(·)6 . Note that the main differ-

376

ences between the two images are located in the overexposed regions present in

377

the stained glass windows.

AC

CE 375

378

A comparison between different values of the exponential parameter n in

379

Eq. (5) is shown in Fig. 4. In the experimental fit of Dunn et al. (2007) this 6 Both

images were normalized between [0,1] to compute the absolute difference between

them

21

ACCEPTED MANUSCRIPT

constant was set to n = 0.7 ± 0.05. One can change n to increase or decrease

381

the effect of photoreceptor adaptation. From our tests, we found that n = 0.5

382

gives consistently good results.

383

3.2. Outer plexiform layer (OPL): Edge and change detection

CR IP T

380

In Fig. 5 we analyze the impact of the parameters from Eq. (6) defining the

385

center-surround interaction. First we vary the relative weight value of wOPL

386

(experimentally estimated near 1 for mammal retinas, and in general in the

387

range of [0, 1]). High values of wOPL , e.g., 0.9 excessively enhances borders spe-

388

cially when the size of the surround, given by the σS parameter, increases. So,

389

we have experimentally set the parameter wOPL with a default value of 0.55,

390

which performs consistently well in our tests. Then the size of the surround

391

produces a high impact on the resulting IOPL layer. In Fig. 5 we show that

392

increasing the value of σs induces artifacts for high values of wOPL specially

393

in borders with a high difference of luminosity, mainly because saturation in

394

the IOPL (x, y) value. The effect of σS additionally enhances details on regions

395

with low contrast when there is no saturation (low values of wOPL ). The de-

396

fault value in our implementation is σS = 0.1 [deg], approximately the modeled

397

value for primate retinas. This value shows a good compromise between edge

398

enhancement and image distortion, while being biologically accurate.

399

3.3. Contrast Gain Control layer (CGC): Dynamical temporal adaptation

PT

ED

M

AN US

384

In Fig. 6 we show the effect of Contrast Gain Control (CGC) varying pa-

401

rameters. CGC layer enhances dynamically local brightness and local contrast

402

in the image. According to (9), there are two parameters that directly impact

403

the value of VBip (x, y, t): λA , the overall gain of the feedback loop and σA , the

AC

CE

400

404

standard deviation of the spatial gaussian blur. In Fig. 6 we focus on these two

405

parameters. Similarly to what was reported by Wohrer and Kornprobst (2009),

406

the gain in the feedback loop given by λA translates into more intense differences

407

with respect to the IOPL input signal, obtaining as a result brighter images. The

408

value of λA = 1 [Hz] yields little differences regardless of the choice of σA . The

22

CR IP T

ACCEPTED MANUSCRIPT

Relative luminance

Figure 3:

(b)

AN US

(a)

Effect of global non-linear photoreceptor adaptation (a) Effect of the

non-linear operator h(·) defined in (5) over the input luminance image L. The image obtained after the photoreceptor adaptation is represented in h(·). (b) Difference of the normalized signals |h − L|.

n = 0.5

n=1

CE

PT

ED

M

n = 0.25

AC

Figure 4:

Effect of the parameter n in the photoreceptor adaptation (5). This

parameter has an effect in terms of compression of the incoming luminance map. When n is low, compression is exaggerated resulting in a arguably distorted image. When n is close to one, function h(x, y, t; L) behaves as a quasi-linear operator.

23

S

= 0.01

S

= 0.1

=1

S

= 10

AN US

wOP L = 0.2

S

CR IP T

ACCEPTED MANUSCRIPT

CE

PT

wOP L = 0.9

ED

M

wOP L = 0.55

Figure 5:

Resulting IIOPL for different values of σS and wOPL (6). The relative

weight wOPL highlights edges in the input image according to the local average described by

AC

the surround signal S(x, y). This becomes more prominent with a large values of σS .

24

ACCEPTED MANUSCRIPT

effect of σA is evident when the value of λA increases, and in this case the

410

gaussian blur produced by σA generates local enhancement specially observed

411

in overexposed regions such as the stained glass windows. After this analysis

412

we decided to set the default values of these parameters to λA = 104 [Hz] and

413

σA = 1.2 [Hz].

CR IP T

409

In Fig. 7 we illustrate how the dynamics of CGC layer work. We denote by

414

415

ni VBip

416

equation (9). We show how the CGC layer progressively enhances details in

417

ni time. In Fig. 7(a) we show the image of VBip for ni = 0, 10 and 30 iterations,

418

which can be considered as the steady-state. Convergence of the differential

419

equation to the steady state is illustrated in Fig. 7(c) showing the relative differ-

420

ence between two successive iterations. Stronger variations are observed during

421

the first 10 iterations. Numerically, we chose tf = dt · 20 = 0.2 [s], as the de-

422

fault stop time for CGC dynamics over an static image. The effect of the CGC

423

is more noticeable for the windows of the altar where most of the light of the

424

scene comes from (see zoom). This is also visible from Fig. 7(b) where we show

425

ni at initial state (ni = 0) and at convergence (here the difference between VBip

426

ni = 0). CGC compensates the contrast locally, thus allowing for details to be

427

enhanced, as the edges of the stained glass. Also, the additional compression

428

in dynamic range translates into an overall brighter image, shown on the walls

429

around the windows.

430

3.4. Ganglion cells layer

PT

ED

M

AN US

the value of the map VBip after ni iterations of the discretized differential

In Fig. 8 we study the influence of parameter λG in the ganglion cells layer

432

(12). This parameter represents the slope of the rectification in the linear part.

433

In Fig. 8(a) we show results for different values of λG . Note that in or-

AC

CE 431

434

der to enhance the effect of this stage, we pushed the CGC parameters to be

435

σA = 0.2 [deg] and λA = 102 [Hz]. High values of λG (106 [Hz]) distort the out-

436

put image with an excessive brightness, while low values of λG (∈ [100 , 104 ] [Hz]

437

approx.) improve the overall brightness without overexposing the scene. Fol-

438

lowing our motivation to keep the TMO biologically plausible, the value chosen 25

ACCEPTED MANUSCRIPT

A

= 0.2

= 102

A

= 104

CR IP T

= 0.02

A

=2

CE

PT

A

ED

M

A

= 100

AN US

A

Figure 6:

Effect of CGC parameters on VBip . This figure summarizes the effect of λA

AC

and σA parameters on the Contrast Gain Control layer behavior. The gain of the loop, given by λA , translates into an improved overall brightness in the image, while an increase of the standard deviation σA value enhances details. The parameter values chosen as default in this model are shown in Table 1.

26

ACCEPTED MANUSCRIPT

10 VBip

30 VBip

30 |VBip

0 VBip |

(a) ni +1 |VBip

ni VBip |

(b)

CE

PT

ED

M

ni |VBip |

AN US

CR IP T

0 VBip

Figure 7:

(c)

Number of iterations

ni

Effect of CGC in local contrast enhancement. Comparison between the n

AC

i input and output of the CGC stage for different iterations number. (a) Evolution of VBip

after ni = {0, 10, 30} iterations (note that resulting image was colorized and gamma corrected for display). (b) A colorized map of the differences between the iteration 30 and the initial state at iteration 0. (c) The mean squared error computed between two consecutive iterations n

i of VBip . After approximately 15 iterations the changes on the scene are not noticeable.

27

ACCEPTED MANUSCRIPT

439

as default for this gain is λG = 100 (see Wohrer et al. (2009), Midget cells). In Fig. 8(b) we propose a preliminary result to illustrate the effect of in-

441

troducing asymmetries between ON and OFF circuits (see Remark 1 in the

442

paragraph Ganglion cell of Sec. 2.3.2). This is done by choosing different values

443

of slope of the sigmoid function (i.e., λG ) to obtain the ON and OFF responses.

444

Namely, we choose a fixed value to estimate the ON part (λON G = 100) while we

445

F vary the value for the OFF part (λOFF ). Increasing λOF has a directly proG G

446

portional effect in the compression rate over the channel, thus diminishing the

447

contribution to the output luminance map. This effect can also be understood

448

as the effective reduction of global contrast in the image. G

=1

G

= 102

G

= 104

G

= 106

ED

M

(a)

AN US

CR IP T

440

OFF / ON G G

=1

OFF / ON G G

= 10

OFF / ON G G

= 100

AC

CE

(b)

= 0.1

PT

OFF / ON G G

Figure 8:

Effect of λG on the output of the Ganglion cells layer Lout (x, y). (a)

Testing different values of λG . High values of λG (> 106 [Hz]) saturate image brightness: see

distortions in the ceiling and the stained glass windows. Lower values of λG , on the contrary, improve the global perception of the scene. (b) Introducing asymmetry between ON and OFF

OFF is changed. For circuits. Different values of λG are used for each circuit: λON G is fixed; λG

bigger values of λOFF there is a clear loss of global contrast across the image. G

28

ACCEPTED MANUSCRIPT

449

3.5. Comparison with other methods In Fig. 9 we visually compared our results to linear compression, logarithmic

451

compression, Retinex model (Rahman et al., 1996), histogram adjustment (Lar-

452

son et al., 1997a), bilateral filter-based approach (Durand and Dorsey, 2002),

453

photo-receptor model (Reinhard and Devlin, 2005) and bio-inspired spatio-

454

temporal model (Benoit et al., 2009). We used implementations provided in Rein-

455

hard et al. (2005). These were run under a Linux machine using our C++

456

implementation. The TMO proposed in this paper was implemented using a

457

default set of parameters, which can be found in Table 1. We used four HDR

458

images in total, all of which are already properly calibrated. Every tone mapped

459

image was obtained without intervention of the default parameters proposed in

460

the respective models so that a fair comparison can be done. Results show that

461

our method gives promising results in general for the four images selected. It

462

manages to compress the luminance range without overexposure. For Bristol

463

Bridge, a typical twilight outdoors scene, our method allows to see both the

464

woods and the sky. In Clock Building, both the edifice details and the outdoors

465

lighting are preserved. In Memorial the inside of the altar is recovered while

466

conserving the details from the stained glass. Finally in Waffle House both

467

the indoors of the building and the car can be seen without too much specular

468

lighting noise, which can be seen over the roof of the building.

ED

M

AN US

CR IP T

450

Note that we carefully verified that methods were tested in the same con-

470

ditions regarding input calibration and gamma correction to obtain the final

471

output. This is important in order to provide a fair comparison ground for

CE

PT

469

every TMO. Concerning the calibration step (see Sec. 2.3.1) this step is not a

473

priori necessary since all the images are already calibrated. For these images

474

the alpha factor (i.e., the key of the scene) is close to a logarithmic average of

AC

472

475

the input luminosity. As a consequence, it is not important to know if other ap-

476

proaches include or not this calibration step. Regarding gamma correction, for

477

all the TMOs evaluated we checked that the output image was passed through

478

a gamma correction layer (γ = 2.2) before being evaluated. If this was not in

479

the original code, we added it. 29

ACCEPTED MANUSCRIPT

Clock Memorial Building

Linear

AN US

Logarithmic

Retina Rahman et al. (1996)

M

Histogram adjustment Larson et al. (1997)

Bilateral filter-based Durand and Dorsey (2002)

Waffle House

CR IP T

Bristol Bridge

ED

Photoreceptor adaption Reinhard and Devlin (2005)

CE

PT

Bio-inspired Benoit et al. (2009)

Our Method

AC

Figure 9:

Qualitative comparison between different TMOs and our method (see

text for references). Bristol Bridge and Clock Building images are copyright of Greg Ward. Memorial image is copyright of Paul Debevec. Waffle House is copyright of Mark D. Fairchild.

30

ACCEPTED MANUSCRIPT

In order to obtain quantitative comparisons with other TMOs, we computed

481

the TMQI (Yeganeh and Wang, 2013) for 29 different TMOs over a big subset of

482

the HDR Photographic Survey7 , by Mark Fairchild (a total of 101 images). For

483

each TMO we represent the distribution of the TMQI obtained for all the images

484

in the dataset using violin plots (see Fig. 10(a)). Note that violin plots were cho-

485

sen because they bring more information than standard box-plots. For example,

486

they can detect bimodal behaviors while the classical box-plots miss this infor-

487

mation. We can observe the variety of performances obtained for the different

488

TMOs evaluated, and that the method here proposed has a comparable perfor-

489

mance with the best methods reported in the literature. The methods evaluated,

490

from top to bottom are: Linear, Logarithmic (see Reinhard et al. (2005)), Op-

491

penheim et al. (1968); Horn (1974); Miller et al. (1984); Chiu et al. (1993);

492

Tumblin and Rushmeier (1993); Ferschin et al. (1994); Ward (1994); Schlick

493

(1995); Ferwerda et al. (1996); Rahman et al. (1996); Larson et al. (1997b); Pat-

494

tanaik et al. (1998, 2000); Ashikhmin (2002); Durand and Dorsey (2002); Fattal

495

et al. (2002); Reinhard et al. (2002); Choudhury and Tumblin (2003); Drago

496

et al. (2003); Yee and Pattanaik (2003); Li et al. (2005); Reinhard and Devlin

497

(2005); Kuang et al. (2007); Kim and Kautz (2008); Benoit et al. (2010); Oskars-

498

son (2015); Zhang and Li (2016), and our method. The Zhang and Li (2016)

499

implementation was done by us, the Kim-Krautz consistent TMO was made

500

available by Banterle et al. (2011), Li method was provided by Yuanzhen Li,

501

the Democratic TMO implementation was done by Magnus Oskarsson, Benoit

502

TMO code was retrieved from the OpenCV 2.0 source package and the rest of

CE

PT

ED

M

AN US

CR IP T

480

503

TMO implementations were provided by Reinhard et al. (2005). Additionally, in Figures 10(b-c) we show a more precise comparison with the

505

two competing bio-inspired approaches reported in Fig. 10(a), namely Benoit

AC

504

506

et al. (2009) and Zhang and Li (2016). Since TMOs yield high variability

507

in the results depending on the input, it is of interest to evaluate how our

508

method performs against them on each sample input individually. For both 7 Mark

Fairchild dataset website: http://rit-mcsl.org/fairchild/HDR.html

31

ACCEPTED MANUSCRIPT

AN US

CR IP T

(a)

(c)

CE

PT

ED

M

(b)

Figure 10: Quantitative comparison between different TMOs and our method (a) Comparison of the TMQI metric obtained by our method against other 29 TMOs (see text for the exact references). (b) Detailed comparison between TMQI values obtained by Benoit

AC

et al. (2009) and our method. We observe no correlation (r = −0.18, p = 0.071). (c) Detailed comparison between TMQI values obtained by Zhang and Li (2016) and our method. We observe a correlated response (r = 0.33, p < 0.05). (b)-(c) Orange diagonal line shows the region of equal performance. Points below this line correspond to images where our method obtained a better TMQI value.

32

ACCEPTED MANUSCRIPT

approaches we compare the TMQI value for each image of the Fairchild dataset.

510

In Fig. 10(b) we compare our method with Benoit et al. (2009). Methods seem

511

to be complementary: there are images where our method obtained a TMQI

512

value between 0.7 and 0.8, while Benoit et al. (2009) obtained a score over

513

0.8. By the contrary, images where the TMQI value of Benoit et al. (2009)

514

did not passed over 0.8, our method generated a TMQI value in the range

515

0.8–1.0. Overall, our method gives a better score in 59 out of 101 images. The

516

regression applied to this data (r = −0.15, p = 0.13) confirm the complementary

517

performance showing no correlation between them. Similarly, in Fig. 10(b) we

518

compare our method with Zhang and Li (2016). Overall, our method gives a

519

better score in 56 out of 101 images. In this case, the regression applied to data

520

suggests a slight correlation between both TMQI values (r = 0.33, p < 0.05).

521

3.6. Result videos

AN US

CR IP T

509

Common artifacts in video TMO are brightness flickering and ghosting pro-

523

duced by temporal filtering. We tested our method on the two HDR videos

524

from the Grzegorz Krawczyk dataset.8 Results are shown in Fig. 11. Videos

525

with the resulting radiance maps can be found in the paper website.

ED

M

522

Our model generates satisfying results regarding radiance maps, particularly

527

when the car driver faces direct sunlight generating overexposure in the camera.

528

Brightness flickering is reduced as shown by the evolution of the temporal av-

529

erage (compare temporal variations in IOP L and VBip . However, we found that

530

this time coherence is lost if the video frames are then colorized and gamma

PT

526

corrected, bringing back some chromatic flickering. This is a limitation our our

532

approach coming from the colorization method we chose (Mantiuk et al., 2009),

533

since flickers in the input are transmitted to the output.

AC

CE 531

534

Another interesting observation from Fig. 11 is about the temporal effect of

535

CGC layer. We show the mean log luminance of each frame at different levels of

536

our architecture: the input luminance map (L), the output of the OPL (IOPLt ) 8 Grzegorz

Krawczyk dataset website: http://resources.mpi-inf.mpg.de/hdr/video/

33

ACCEPTED MANUSCRIPT

and the output CGC stage (VBip ). Input sequence has fast changes in the overall

538

luminance (flickering) which are still present at the OPL stage. These temporal

539

variations are smoothed in the next stage thanks to CGC. This effect is best

540

seen on the videos shown on the paper website.

CR IP T

537

We found in our model that a critical parameter in terms of temporal co-

542

herence is τA , as seen in the gA (x, y, t) component of Eq. (9). As explained

543

in Wohrer and Kornprobst (2009), this constant relates to the time scale of the

544

adaptation and cannot be fixed from experimental data, because of the hypo-

545

thetical nature of the stage. Since this time constant is related to a exponential

546

kernel, and thus a low-pass time filter, it is inversely proportional to the cut off

547

frequency of such filter. We found through tests that the best value for this is

548

τA = 0.0005 [s], for which temporal coherence is improved compared with the

549

case without temporal convolutions. On the other hand, a bigger selection of

550

τA leads to an strong ghosting effect, because the tight bandwidth in the filter

551

prevents the natural changes in luminance across time per pixel, in contrast to

552

the luminance flickering we want to avoid.

553

4. Conclusion

ED

M

AN US

541

In this paper we show a promising way to address computational photogra-

555

phy challenges by exploiting the current research in neuroscience about retina

556

processing. Models based properties of the visual system, and more specifically

557

on detailed retinal circuitry, will certainly benefit from the research in neuro-

558

science. As such, choosing Virtual Retina appears to be a promising idea

559

since our goal is to pursue its development towards a better understanding of

560

retinal processing (Cessac et al., 2017b). Indeed, one of our goal is to systemat-

CE

PT

554

ically investigate how new mechanisms found in the retina could be interpreted

562

and be useful in the context of artificial vision problems. This conveys to this

563

work a high potential for further improvements.

AC 561

564

For static images our TMO works as expected, effectively compressing the

565

dynamic range to fit a normal 24-bit LDR image generating an equalized image.

34

ACCEPTED MANUSCRIPT

nf = 0

nf = 200

nf = 500

nf = 0

nf = 200

nf = 500

nf = 0

nf = 200

nf = 500

IOPL

AN US

VBip

CR IP T

L

(a)

nf = 0

nf = 200

VBip

nf = 200

PT

nf = 0

ED

IOPL

nf = 200

CE

nf = 0

Figure 11:

M

L

nf = 500

nf = 500

nf = 500

(b)

Results on the two videos from Grzegorz Krawczyk’s dataset: (a)

AC

Highway video and (b) Tunnel video. For each case we show sample frames of the video at different levels of our architecture: the input luminance map (L), the output of the OPL (IOPL ) and the output CGC stage (VBip ). On the right hand side, we show the time-evolution of the corresponding log spatial average of each frame. Results show how temporal coherence is improved at CGC level.

35

ACCEPTED MANUSCRIPT

In the case of videos, the TMO here proposed also work properly in the lumi-

567

nance information, being the CGC layer an important step to guarantee the

568

temporal coherence to the video. In videos, one of the limitations of the TMO

569

here presented is that the current colorization method generates color flicker-

570

ing on the output. This limitation arises because the original Virtual Retina

571

model is based on luminance maps only, i.e., monochrome. A workaround of this

572

problem could be either to include color processing in retina model or maybe

573

improve colorization method in post-processing stage.

CR IP T

566

The model here presented contains a series of parameters related to the dif-

575

ferent stages of global and local adaptation of the input luminance image. We

576

have observed in Section 3 the effect of different parameters in the global and lo-

577

cal contrast adaptation. Interestingly, the parameters related to the IOPL (x, y, t)

578

computation relating the balance between center and surround, define the ex-

579

tent of local adaptation. Areas with low contrast looks brighter is the size of

580

the surround neighborhood increases. This type of computation shows similar-

581

ities with observations in neurophysiology data of sensory systems, where the

582

system integrates information inside a wider neighborhood if the signal to noise

583

ratio is small, or more drastically, the shape of the surround varies according to

584

the input information (Pack et al., 2005; Fournier et al., 2011). This behavior

585

suggests that an adaptive selection of the surround size and shape, both at the

586

OPL and CGC could highly benefit the quality of the TMO.

PT

ED

M

AN US

574

We see three challenging perspectives for this work. First perspective is to

588

include more cells types to have a richer representation of the input and know

CE

587

how to exploit them. It has been shown that the retina contains a wide variety

590

of ganglion cell types simultaneously encoding features of the visual world (Gol-

591

lisch and Meister, 2010; Sanes and Masland, 2015; Masland, 2011; Lorach et al.,

AC

589

592

2012). A natural extension of the model here proposed could be to increase the

593

types of retinal ganglion cells in order to expand its computational capabilities.

594

This extension should be also joined with the proposition of efficient readouts

595

to obtain a single retina output traduced on an output image. The readout of

596

neural activity is related to a decoding mechanism, which is an open research 36

ACCEPTED MANUSCRIPT

topic in the neuroscience community (Quiroga and Panzeri, 2009; Warland et al.,

598

1997; Pitkow and Meister, 2012; Graf et al., 2011). Second perspective is to fur-

599

ther investigate other nonlinear mechanisms in the retina such as Berry et al.

600

(1999); Chen et al. (2013); Souihel and Cessac (2017). These mechanisms have

601

been shown to account for temporal processing such as anticipation and we

602

plan to study how they would impact the quality of tone mapped videos. Third

603

perspective is to include fixational eye movements for the static image sce-

604

nario (Martinez-Conde et al., 2013). Indeed, our eyes are never at rest. They

605

are continuously moving even in a fixation tasks. These eye movements, and in

606

particular the microsaccades part, are known to enhance visual perception. This

607

idea has been explored in the context of edge detection (Hongler et al., 2003),

608

superresolution (Yi et al., 2011) and coding (Masquelier et al., 2016). We think

609

it is interesting to investigate their effect also for tone mapping. Finally, the

610

quantitative comparison through TMQI values with other bio-inspired meth-

611

ods suggested complementary behaviors as we observed in Fig. 10(b)-(c). To

612

analyze in detail the complementary contributions of each computation stage,

613

projecting to merge the better aspects of different TMOs in a single one, could be

614

an interesting track to explore. In particular, the color de/multiplexing scheme

615

proposed by Benoit et al. (2010) is of special interest for us to deal with videos

616

and might reduce the flickering observed by our current colorization approach.

617

Appendix A. Photoreceptor adaptation and pupil model

PT

ED

M

AN US

CR IP T

597

Photoreceptors adjust their sensitivity over time and space, providing a first

619

adaptation step for the incoming light. The Virtual Retina model proposed

620

by Wohrer and Kornprobst (2009), originally designed to process LDR images,

CE

618

does not have this adaptation in consideration at the input stage so there is a

622

need to provide it without interfering the mechanisms of the rest of the model.

623

In the other hand, Dunn et al. (2007) showed that the photoreceptor adapta-

624

tion is mutually exclusive from the adaptation occurring further into the retina,

625

since the latter happens as the signal go from the cone bipolar cells to the gan-

AC 621

37

ACCEPTED MANUSCRIPT

626

glionar layer. Also, the authors provided a response model for cones, which in

627

light of the previous statement, would fit well with our model. Using electrophysiological data recorded from L type cone response, Dunn

CR IP T

et al. (2007) constructed an empirical model which relates the normalized response of a cone h(·) to the background intensity iB cast over it, presented in the following relation: h(iB ) = 1+

629

630

631

i1/2 iB

n ,

(A.1)

where i1/2 means the half maximal background intensity, estimated at 45·103 ±7·   103 P s−1 on their data. On the other hand, n is the Hill exponent describing

AN US

628

1 

the relationship between response amplitude and background intensity iB , value estimated at n = 0.7 ± 0.05.

M

The model proposed in (A.1) is expressed in Absorbed photons per cone per   second - P s−1 , and to easily use it in our model we need to convert it into   cd m−2 . The following formula is used to convert between these units9 : i = f (l) = 10π · l · ρ2pupil ,

ED

(A.2)

632

where ρpupil is the radius of the pupil in millimeters and l is the luminance to

633

convert.

PT

There are several models to relate the radius of the pupil ρpupil with luminance covering different complexities (Watson and Yellott, 2012). For the sake

CE

of simplicity, we chose the de Groot and Gebhard (1952) model defined as:

¯ = which relates the average incoming spatial luminance L(t)

AC

634

   ¯ = 3.5875 exp −0.00092 7.597 + log L ¯ 3 , ρpupil (L) R

x,y

(A.3)

L(x, y, t)dxdy,

635

with the radius of the pupil ρpupil in millimeters. This model comes from an ex-

636

perimental fit which does not consider additional factors such as age, contraction

637

/ dilation or adapting field size. 9 http://retina.anatomy.upenn.edu/

~rob/lance/units_photometric.html

38

ACCEPTED MANUSCRIPT

Thus finally, the response of the photoreceptos h(·) can be now expressed ¯ as in terms of the luminance background lB and the background luminance L follows: 1+



1 i1/2 ¯ 2 10π·lB ·ρpupil (L)

n .

(A.4)

CR IP T

¯ = h(lB , L)

¯ Furthermore, since i1/2 is constant, we define a new expression l1/2 (L): ¯ = h(lB , L) 1+

638

639

1 ¯ l1/2 (L) lB

n

i1/2 ¯ 2 10π · ρpupil (L)

(A.5)

(A.6)

AN US

¯ = l1/2 (L)



where l1/2 represents the half maximal background luminosity for the fitted data   - now in cd m−2 - and lB . Figure A.12 shows how l1/2 depends on the average ¯ luminance L.

10 4

ED

l1/2 (L¯)

10 5

M

10 6

PT

10 3

CE

10 2 0

AC

Figure A.12:

2000 4000 6000 8000 Average global luminance L¯

10000

¯ Plot of Equation (A.6) in function of the Characteristic curve of i1/2 (L).

¯ For each image i1/2 (L) ¯ behaves as a constant on Eq. (A.5), incoming average luminance L. yet over a video sequence the non-linear property of the curve helps to smooth luminance transients between frames.

640

39

ACCEPTED MANUSCRIPT

641

Acknowledgements Financial support: CONICYT-FONDECYT 1140403, CONICYT-Basal Project

642

FB0008, Inria. The authors are very grateful to Adrien Bousseau for the fruit-

644

ful discussions and his valuable feedback on several preliminary versions of this

645

paper.

646

References

647

Ashikhmin, M., 2002. A tone mapping algorithm for high contrast images. In:

648

Proceedings of the 13th Eurographics Workshop on Rendering. EGRW ’02.

649

Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, pp. 145–

650

156.

AN US

CR IP T

643

Aydin, T. O., Mantiuk, R., Myszkowski, K., Seidel, H.-P., Aug. 2008. Dynamic

652

range independent image quality assessment. ACM Transactions on Graphics

653

(TOG) 27 (3).

M

651

Aydin, T. O., Stefanoski, N., Croci, S., Gross, M., Smolic, A., 2014. Temporally

655

coherent local tone mapping of hdr video. ACM Transactions on Graphics

656

(TOG) 33 (6), 196.

Baccus, S., Meister, M., 2002. Fast and slow contrast adaptation in retinal circuitry. Neuron 36 (5), 909–919.

PT

657

ED

654

658

Balasubramanian, V., Sterling, P., 2009. Receptive fields and functional architecture in the retina. The Journal of Physiology 587 (12), 2753–2767.

CE

659

660

Banterle, F., Artusi, A., Debattista, K., Chalmers, A., 2011. Advanced High

662

Dynamic Range Imaging: Theory and Practice. AK Peters (CRC Press),

663

Natick, MA, USA.

AC

661

664

Basalyga, G., Montemurro, M. A., Wennekers, T., 2013. Information coding

665

in a laminar computational model of cat primary visual cortex. J. Comput.

666

Neurosci. 34, 273–83.

40

ACCEPTED MANUSCRIPT

667

Benoit, A., Alleysson, D., H´erault, J., Le Callet, P., 2009. Spatio-temporal tone

668

mapping operator based on a retina model. In: Computational Color Imaging

669

Workshop. Benoit, A., Caplier, A., Durette, B., Herault, J., 2010. Using human visual

671

system modeling for bio-inspired low level image processing. Computer Vision

672

and Image Understanding 114 (7), 758 – 773.

673

CR IP T

670

Berry, M., Brivanlou, I., Jordan, T., Meister, M., 1999. Anticipation of moving stimuli by the retina. Nature 398 (6725), 334–338.

674

Bertalm´ıo, M., 2014. Image Processing for Cinema. CRC Press.

676

Carandini, M., Demb, J. B., Mante, V., Tollhurst, D. J., Dan, Y., Olshausen,

677

B. A., Gallant, J. L., Rust, N. C., Nov. 2005. Do we know what the early

678

visual system does? Journal of Neuroscience 25 (46), 10577–10597.

AN US

675

Cessac, B., Kornprobst, P., Kraria, S., Nasser, H., Pamplona, D., Portelli, G.,

680

Vi´eville, T., 2017a. Pranas: a new platform for retinal analysis and simulation.

681

Frontiers in NeuroinformaticsTo appear.

ED

M

679

Cessac, B., Kornprobst, P., Kraria, S., Nasser, H., Pamplona, D., Portelli, G.,

683

Viville, T., 2017b. PRANAS: a new platform for retinal analysis and simula-

684

tion. Frontiers in NeuroinformaticsTo appear.

PT

682

Chen, E. Y., Marre, O., Fisher, C., Schwartz, G., Levy, J., da Silviera, R. A.,

686

Berry, M. J. I., 2013. Alert response to motion onset in the retina. Journal of

CE

685

Neuroscience 33 (1), 120132.

687

Chichilnisky, E. J., 2001. A simple white noise analysis of neuronal light re-

AC

688

689

sponses. Network: Comput. Neural Syst. 12, 199–213.

690

Chichilnisky, E. J., Kalmar, R. S., Apr. 2002. Functional asymmetries in ON

691

and OFF ganglion cells of primate retina. The Journal of Neuroscience 22 (7),

692

27372747.

41

ACCEPTED MANUSCRIPT

693

Chiu, K., Herf, M., Shirley, P., Swamy, S., Wang, C., Zimmerman, K., 1993.

694

Spatially nonuniform scaling functions for high contrast images. In: In Pro-

695

ceedings of Graphics Interface 93. pp. 245–253. Choudhury, P., Tumblin, J., 2003. The trilateral filter for high contrast im-

697

ages and meshes. In: Proceedings of the 14th Eurographics Workshop on

698

Rendering. EGRW ’03. Eurographics Association, Aire-la-Ville, Switzerland,

699

Switzerland, pp. 186–196.

700

CR IP T

696

de Groot, S. G., Gebhard, J. W., 1952. Pupil size as determined by adapting luminance. J. Opt. Soc. Am. 42 (7), 492–495.

AN US

701

Debevec, P. E., Malik, J., 1997. Recovering high dynamic range radiance

703

maps from photographs. In: Proceedings of the 24th Annual Conference

704

on Computer Graphics and Interactive Techniques. SIGGRAPH ’97. ACM

705

Press/Addison-Wesley Publishing Co., New York, NY, USA, pp. 369–378.

706

Doutsi, E., Fillatre, L., Antonini, M., Gaulmin, J., 2015a. Retinal-inspired fil-

707

tering for dynamic image coding. In: Image Processing (ICIP), 2015 IEEE

708

International Conference on. IEEE, p. 35053509.

ED

M

702

Doutsi, E., Fillatre, L., Antonini, M., Gaulmin, J., 2015b. Video analysis and

710

synthesis based on a retinal-inspired frame. In: Signal Processing Conference

711

(EUSIPCO), 2015 23rd European. IEEE, p. 22262230.

PT

709

Drago, F., Myszkowski, K., Annen, T., Chiba, N., 2003. Adaptive logarith-

713

mic mapping for displaying high contrast scenes. Computer Graphics Forum

CE

712

22 (3), 419–426.

714

Dunn, F. A., Lankheet, M. J., Rieke, F., 2007. Light adaptation in cone vision in-

AC

715

716

volves switching between receptor and post-receptor sites. Nature 449 (7162),

717

603–606.

718

Durand, F., Dorsey, J., 2002. Fast bilateral filtering for the display of high-

719

dynamic-range images. In: Akeley, K. (Ed.), Proceedings of the SIGGRAPH.

720

ACM Press, ACM SIGGRAPH, Addison Wesley Longman. 42

ACCEPTED MANUSCRIPT

721

Eilertsen, G., Mantiuk, R. K., Unger, J., 2017. A Comparative Review of Tone-

722

mapping Algorithms for High Dynamic Range Video. Computer Graphics

723

Forum 36 (2), 565592. Eilertsen, G., Wanat, R., Mantiuk, R., Unger, J., Oct. 2013. Evaluation of tone

725

mapping operators for hdr-video. Computer Graphics Forum 32 (7), 275–284.

726

Fattal, R., Lischinski, D., Werman, M., Jul. 2002. Gradient domain high dy-

CR IP T

724

namic range compression. ACM Trans. Graph. 21 (3), 249–256.

727

Ferradans, S., Bertalmio, M., Provenzi, E., Caselles, V., Oct. 2011. An Analy-

729

sis of Visual Adaptation and Contrast Perception for Tone Mapping. IEEE

730

Transactions on Pattern Analysis and Machine Intelligence 33 (10), 2002–

731

2012.

AN US

728

Ferschin, P., Tastl, I., Purgathofer, W., Nov 1994. A comparison of techniques

733

for the transformation of radiosity values to monitor colors. In: Proceedings of

734

1st International Conference on Image Processing. Vol. 3. pp. 992–996 vol.3.

735

Ferwerda, J. A., Pattanaik, S. N., Shirley, P., Greenberg, D. P., 1996. A model

736

of visual adaptation for realistic image synthesis. In: Proceedings of the 23rd

737

annual conference on Computer graphics and interactive techniques. ACM,

738

pp. 249–258.

ED

M

732

Fournier, J., Monier, C., Pananceau, M., Frgnac, Y., 2011. Adaptation of the

740

simple or complex nature of v1 receptive fields to visual statistics. Nature

741

neuroscience 14 (8), 10531060.

CE

PT

739

742

Garvert, M. M., Gollisch, T., 2013. Local and global contrast adaptation in retinal ganglion cells. Neuron 77 (5), 915928.

AC

743

744

745

Gollisch, T., Meister, M., Jan. 2010. Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65 (2), 150–164.

746

Graf, A. B., Kohn, A., Jazayeri, M., Movshon, J. A., 2011. Decoding the activ-

747

ity of neuronal populations in macaque primary visual cortex. Nature neuro-

748

science 14 (2), 239245. 43

ACCEPTED MANUSCRIPT

749

Gryaditskya, Y., Pouli, T., Reinhard, E., Seidel, H.-P., Oct. 2014. Sky based

750

light metering for high dynamic range images. Comput. Graph. Forum 33 (7),

751

61–69. Hafner, D., Demetz, O., Weickert, J., 2014. Simultaneous hdr and optic flow

753

computation. In: Pattern Recognition (ICPR), 2014 22nd International Con-

754

ference on. IEEE, pp. 2065–2070.

755

CR IP T

752

H´erault, J., 2010. Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception. World Scientific.

756

Hongler, M., de Meneses, Y. L., Beyeler, A., Jacot, J., Sep. 2003. The resonant

758

retina: Exploiting vibration noise to optimally detect edges in an image. IEEE

759

Transactions on Pattern Analysis and Machine Intelligence 25 (9), 1051–1062.

760

Horn, B. K., 1974. Determining lightness from an image. Computer Graphics

AN US

757

and Image Processing 3 (4), 277 – 299.

761

Irawan, P., Ferwerda, J. A., Marschner, S. R., 2005. Perceptually based tone

763

mapping of high dynamic range image streams. In: Rendering Techniques.

764

pp. 231–242.

ED

M

762

Kalantari, N. K., Shechtman, E., Barnes, C., Darabi, S., Goldman, D. B., Sen,

766

P., Nov. 2013. Patch-based high dynamic range video. ACM Trans. Graph.

767

32 (6), 202:1–202:8.

PT

765

Kastner, D. B., Baccus, S. A., Apr. 2014. Insights from the retina into the

769

diverse and general computations of adaptation, detection, and prediction.

CE

768

Current Opinion in Neurobiology 25, 63–69.

770

Kim, K. J., Rieke, F., Jan. 2001. Temporal contrast adaptation in the input and

AC

771

772

output signals of salamander retinal ganglion cells. Journal of Neuroscience

773

21 (1), 287–299.

774

Kim, M. H., Kautz, J., 2008. Consistent tone reproduction. In: Proceedings

775

of the Tenth IASTED International Conference on Computer Graphics and

776

Imaging. CGIM ’08. ACTA Press, Anaheim, CA, USA, pp. 152–159. 44

ACCEPTED MANUSCRIPT

777

Kremkow, J., Jin, J., Komban, S. J., Wang, Y., Lashgari, R., Li, X., Jansen,

778

M., Zaidi, Q., , Alonso, J.-M., 2014. Neuronal nonlinearity explains greater

779

visual spatial resolution for darks than lights. PNAS 111 (8), 31703175. Kuang, J., Johnson, G. M., Fairchild, M. D., 2007. icam06: A refined image

781

appearance model for hdr image rendering. Journal of Visual Communication

782

and Image Representation 18 (5), 406 – 414, special issue on High Dynamic

783

Range Imaging.

CR IP T

780

Kung, J., Yamaguchi, H., Liu, C., Johnson, G., Fairchild, M., Jul. 2007. Eval-

785

uating hdr rendering algorithms. ACM Transactions on Applied Perception

786

4 (2).

787

AN US

784

Land, E. H., McCann, J. J., 1971. Lightness and retinex theory. Josa 61 (1), 1–11.

788

Larson, G. W., Rushmeier, H., Piatko, C., 1997a. A visibility matching tone

790

reproduction operator for high dynamic range scenes. IEEE Transactions on

791

Visualization and Computer Graphics 3 (4), 291–306.

M

789

Larson, G. W., Rushmeier, H., Piatko, C., Oct 1997b. A visibility matching tone

793

reproduction operator for high dynamic range scenes. IEEE Transactions on

794

Visualization and Computer Graphics 3 (4), 291–306.

ED

792

Li, Y., Sharan, L., Adelson, E. H., Jul. 2005. Compressing and companding

796

high dynamic range images with subband architectures. ACM Trans. Graph.

797

24 (3), 836–844.

CE

PT

795

Lorach, H., Benosman, R., Marre, O., Ieng, S.-H., Sahel, J. A., Picaud, S., Oct.

799

2012. Artificial retina: the multichannel processing of the mammalian retina

AC

798

800

achieved with a neuromorphic asynchronous light acquisition device. Journal

801

of Neural Engineering 9 (6), 066004.

802

Mangiat, S., Gibson, J., 2010. High dynamic range video with ghost removal.

803

In: SPIE Optical Engineering+ Applications. International Society for Optics

804

and Photonics, pp. 779812–779812. 45

ACCEPTED MANUSCRIPT

805

Mantiuk, R., Daly, S., Kerofsky, L., 2008. Display adaptive tone mapping. In: ACM Transactions on Graphics (TOG). Vol. 27. ACM, p. 68.

806

Mantiuk, R., Tomaszewska, A., Heidrich, W., 2009. Color correction for tone

808

mapping. In: Computer Graphics Forum. Vol. 28. Wiley Online Library, pp.

809

193–202.

CR IP T

807

810

Martinez-Conde, S., Otero-Millan, J., Macknik, S. L., Feb. 2013. The impact

811

of microsaccades on vision: towards a unified theory of saccadic function.

812

Nature Reviews Neuroscience 14 (2), 83–96.

Masland, R. H., Jun. 2011. Cell populations of the retina: The proctor lecture.

AN US

813

Investigative Ophthalmology and Visual Science 52 (7), 4581–4591.

814

815

Masmoudi, K., Antonini, M., Kornprobst, P., 2010. Another look at the retina

816

as an image scalar quantizer. In: Proceedings of the International Symposium

817

on Circuits and Systems (ISCAS).

Masquelier, T., 2012. Relative spike time coding and stdp-based orientation

819

selectivity in the early visual system in natural continuous and saccadic vision:

820

a computational model. Journal of Computational Neuroscience 32 (3), 425–

821

441.

ED

M

818

Masquelier, T., Portelli, G., Kornprobst, P., Apr. 2016. Microsaccades enable

823

efficient synchrony-based coding in the retina: a simulation study. Scientific

824

Reports 6, 24086.

CE

PT

822

Medathati, N. V. K., Neumann, H., Masson, G. S., Kornprobst, P., 2016. Bio-

826

inspired computer vision: Towards a synergistic approach of artificial and

827

biological vision. Computer Vision and Image Understanding, –.

AC

825

828

Meylan, L., Alleysson, D., S¨ usstrunk, S., 2007. Model of retinal local adaptation

829

for the tone mapping of color filter array images. J. Opt. Soc. Am. A 24 (9),

830

2807–2816.

46

ACCEPTED MANUSCRIPT

831

Miller, N. J., Ngai, P. Y., Miller, D. D., 1984. The application of computer

832

graphics in lighting design. Journal of the Illuminating Engineering Society

833

14 (1), 6–26. Mohemmed, A., Lu, G., Kasabov, N., 2012. Evaluating SPAN Incremental

835

Learning for Handwritten Digit Recognition. In: Neural Information Pro-

836

cessing. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 670–677.

CR IP T

834

Muchungi, K., Casey, M., Sep. 2012. Simulating Light Adaptation in the Retina

838

with Rod-Cone Coupling. In: ICANN. Springer Berlin Heidelberg, Berlin,

839

Heidelberg, pp. 339–346.

840

AN US

837

O’Carroll, D. C., Warrant, E. J., Feb. 2017. Introduction: Vision in dim light: highlights and challenges. Phil. Trans. R. Soc. B.

841

Oppenheim, A., Schafer, R., Stockham, T., Sep 1968. Nonlinear filtering of

843

multiplied and convolved signals. IEEE Transactions on Audio and Electroa-

844

coustics 16 (3), 437–466.

845

M

842

Oskarsson, M., 2015. Democratic Tone Mapping Using Optimal K-means Clustering. Springer International Publishing, Cham, pp. 354–365.

ED

846

Pack, C., Hunter, J., Born, R., 2005. Contrast dependence of suppressive in-

848

fluences in cortical area MT of alert macaque. Journal of Neurophysiology

849

93 (3), 1809–1815.

Pandarinath, C., Victor, J., Nirenberg, S., 2010. Symmetry Breakdown in the

CE

850

PT

847

ON and OFF Pathways of the Retina at Night: Functional Implications. The

852

Journal of neuroscience 30 (30), 1000610014.

AC

851

853

Pattanaik, S. N., Ferwerda, J. A., Fairchild, M. D., Greenberg, D. P., 1998. A

854

multiscale model of adaptation and spatial vision for realistic image display.

855

In: Proceedings of the 25th Annual Conference on Computer Graphics and

856

Interactive Techniques. SIGGRAPH ’98. ACM, New York, NY, USA, pp.

857

287–298.

47

ACCEPTED MANUSCRIPT

Pattanaik, S. N., Tumblin, J., Yee, H., Greenberg, D. P., 2000. Time-dependent

859

visual adaptation for fast realistic image display. In: Proceedings of the 27th

860

annual conference on Computer graphics and interactive techniques. ACM

861

Press/Addison-Wesley Publishing Co., pp. 47–54.

862

CR IP T

858

Pitkow, X., Meister, M., 2012. Decorrelation and efficient coding by retinal ganglion cells. Nature neuroscience 15 (4), 628635.

863

Purves, D., Augustine, G. J., Fitzpatrick, D., Katz, L. C., LaMantia, A.-S.,

865

McNamara, J. O., Williams, S. M., 2001. Neuroscience, 2nd Edition. Sinauer

866

Associates, Inc.

AN US

864

867

Quiroga, R. Q., Panzeri, S., 2009. Extracting information from neuronal popu-

868

lations: information theory and decoding approaches. Nature Reviews Neu-

869

roscience 10 (3), 173185.

870

Rahman, Z.-u., Jobson, D. J., Woodell, G. A., 1996. A multiscale retinex for color rendition and dynamic range compression. Tech. rep., NASA Langley.

M

871

Ratliff, C. P., Borghuis, B. G., Kao, Y.-H., Sterling, P., Balasubramanian, V.,

873

Oct. 2010. Retina is structured to process an excess of darkness in natural

874

scenes. Proc Natl Acad Sci U S A. 107 (40), 1736817373. Reinhard, E., Nov. 2002. Parameter estimation for photographic tone reproduc-

PT

875

ED

872

tion. J. Graph. Tools 7 (1), 45–52.

876

Reinhard, E., Devlin, K., 2005. Dynamic range reduction inspired by photore-

CE

877

ceptor physiology. IEEE Transactions on Visualization and Computer Graph-

879

ics 11 (1), 13–24.

AC

878

880

Reinhard, E., Pouli, T., Kunkel, T., Long, B., Ballestad, A., Damberg, G., Nov.

881

2012. Calibrated image appearance reproduction. ACM Trans. Graph. 31 (6),

882

201:1–201:11.

883

884

Reinhard, E., Stark, M., Shirley, P., Ferwerda, J., Jul. 2002. Photographic tone reproduction for digital images. ACM Trans. Graph. 21 (3), 267–276. 48

ACCEPTED MANUSCRIPT

Reinhard, E., Ward, G., Pattanaik, S., Debevec, P., 2005. High Dynamic Range

886

Imaging: Acquisition, Display, and Image-Based Lighting. The Morgan Kauf-

887

mann Series in Computer Graphics. Morgan Kaufmann Publishers Inc., San

888

Francisco, CA, USA.

889

Rieke, F., Dec. 2001. Temporal contrast adaptation in salamander bipolar cells. Journal of Neuroscience 21 (23), 9445–9454.

890

891

CR IP T

885

Rieke, F., Rudd, M. E., Dec. 2009. The challenges natural images pose for visual adaptation. Neuron 64 (5), 605–616.

892

Sanes, J., Masland, R., 2015. The types of retinal ganglion cells: current status

894

and implications for neuronal classification. Annu Rev Neurosci 38, 221–246.

895

Schlick, C., 1995. Quantization Techniques for Visualization of High Dynamic

AN US

893

Range Pictures. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 7–20.

896

Shapley, R., Enroth-Cugell, C., 1984. Visual adaptation and retinal gain controls. Progress in retinal research 3, 263–346.

M

897

898

Shapley, R. M., Victor, J. D., 1978. The effect of contrast on the transfer proper-

900

ties of cat retinal ganglion cells. The Journal of Physiology 285 (1), 275–298.

901

Souihel, S., Cessac, B., Sep. 2017. Modifying a biologically inspired retina sim-

902

ulator to reconstruct realistic responses to moving stimuli. Bernstein Confer-

903

ence 2017, submitted.

PT

ED

899

Takeshita, D., Smeds, L., Ala-Laurila, P., Feb. 2017. Processing of single-photon

905

responses in the mammalian on and off retinal pathways at the sensitivity

906

limit of vision. Phil. Trans. R. Soc. B.

AC

CE

904

907

908

909

910

Thoreson, W., Mangel, S., 2012. Lateral interactions in the outer retina. Prog Retin Eye Res. 31 (5), 407–441.

Tumblin, J., Rushmeier, H., Nov 1993. Tone reproduction for realistic images. IEEE Computer Graphics and Applications 13 (6), 42–48.

49

ACCEPTED MANUSCRIPT

911

Van Hateren, J., 2006. Encoding of high dynamic range video with a model of human cones. ACM Transactions on Graphics (TOG) 25 (4), 1380–1399.

912

Vance, P., Coleman, S. A., Kerr, D., Das, G., McGinnity, T., 2015. Modelling

914

of a retinal ganglion cell with simple spiking models. In: IEEE Int. Jt. Conf.

915

Neural Networks. pp. 1–8.

916

Victor, J. D., 1987. The dynamics of the cat retinal X cell centre. The Journal of Physiology 386 (1), 219–246.

917

918

Ward, G., 1994. Graphics gems iv. ACM, New York, NY, USA, Ch. A Contrastbased Scalefactor for Luminance Display, pp. 415–421.

AN US

919

920

Warland, D., Reinagel, P., Meister, M., 1997. Decoding visual information from a population of retinal ganglion cells. J. Neurophysiol 78, 23362350.

921

922

CR IP T

913

Watson, A. B., Yellott, J. I., 2012. A unified formula for light-adapted pupil size. Journal of Vision 12 (10), 12.

M

923

Wohrer, A., Kornprobst, P., 2009. Virtual Retina : A biological retina model and

925

simulator, with contrast gain control. Journal of Computational Neuroscience

926

26 (2), 219, dOI 10.1007/s10827-008-0108-4.

927

ED

924

Wohrer, A., Kornprobst, P., Antonini, M., Jun. 2009. Retinal filtering and image reconstruction. Tech. Rep. 6960, INRIA.

PT

928

Yee, Y. H., Pattanaik, S., 2003. Segmentation and adaptive assimilation for

930

detail-preserving display for high-dynamic range images. Visual Computer

931

19, 457–466.

CE

929

Yeganeh, H., Wang, Z., Feb 2013. Objective quality assessment of tone-mapped

AC

932

933

images. IEEE Transactions on Image Processing 22 (2), 657–667.

934

Yi, D., Jiang, P., Mallen, E., Wang, X., Zhu, J., Aug. 2011. Enhancement of

935

image luminance resolution by imposing random jitter. Neural Computing

936

and Applications 20 (2), 261–272.

50

ACCEPTED MANUSCRIPT

Zaghloul, K., Boahen, K., Demb, J. B., Apr. 2003. Different circuits for on and

938

off retinal ganglion cells cause different contrast sensitivities. The Journal of

939

Neuroscience : The Official Journal of the Society for Neuroscience 23 (7),

940

264554.

CR IP T

937

941

Zhang, X.-S., Gao, S.-B., Li, C.-Y., Li, Y.-J., 2015. A retina inspired model for

942

enhancing visibility of hazy images. Frontiers in Computational Neuroscience

943

9, 151.

944

Zhang, X.-S., Li, Y.-J., 2016. A Retina Inspired Model for High Dynamic Range Image Rendering. Springer International Publishing, Cham, pp. 68–79.

AC

CE

PT

ED

M

AN US

945

51