Accepted Manuscript
A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping Marco Benzi, Mar´ıa-Jose´ Escobar, Pierre Kornprobst PII: DOI: Reference:
S1077-3142(17)30204-7 10.1016/j.cviu.2017.11.013 YCVIU 2645
To appear in:
Computer Vision and Image Understanding
Received date: Revised date: Accepted date:
10 February 2017 22 November 2017 27 November 2017
Please cite this article as: Marco Benzi, Mar´ıa-Jose´ Escobar, Pierre Kornprobst, A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping, Computer Vision and Image Understanding (2017), doi: 10.1016/j.cviu.2017.11.013
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights • Tone mapping-like processing is done by retina through adaptation mech-
CR IP T
anisms • We show how a neurophysiological model of the retina can be extended for tone mapping
• Our model considers space and time dynamics, and works for images and videos
AN US
• Contrast gain control layer enhances dynamically local brightness and contrast
• This paper provides new insights for designing synergistic computer vision
AC
CE
PT
ED
M
methods
1
ACCEPTED MANUSCRIPT
A Bio-inspired Synergistic Virtual Retina Model for Tone Mapping ∗ a Universidad
Co-senior-authors
CR IP T
Marco Benzia,b , Mar´ıa-Jos´e Escobar∗a , Pierre Kornprobst∗b
T´ ecnica Federico Santa Mar´ıa, Departamento de Electr´ onica, 2390123 Valpara´ıso, Chile b UCA, Inria, Biovision team, France
AN US
Abstract
Real-world radiance values span several orders of magnitudes which have to be processed by artificial systems in order to capture visual scenes with a high visual sensitivity. Interestingly, it has been found that similar processing happens in biological systems, starting at the retina level. So our motivation in this paper is to develop a new video tone mapping operator (TMO) based on a synergistic
M
model of the retina. We start from the so-called Virtual Retina model, which has been developed in computational neuroscience. We show how to enrich this
ED
model with new features to use it as a TMO, such as color management, luminance adaptation at photoreceptor level and readout from a heterogeneous population activity. Our method works for video but can also be applied to
PT
static images (by repeating images in time). It has been carefully evaluated on standard benchmarks in the static case, giving comparable results to the
CE
state-of-the-art using default parameters, while offering user control for finer tuning. Results on HDR videos are also promising, specifically w.r.t. temporal luminance coherency. As a whole, this paper shows a promising way to ad-
AC
dress computational photography challenges by exploiting the current research in neuroscience about retina processing. Keywords: Tone mapping, HDR, retina, photoreceptor adaptation, contrast gain control, synergistic model
Preprint submitted to CVIU
November 28, 2017
ACCEPTED MANUSCRIPT
1
1. Introduction Real-world radiance values span several orders of magnitudes which have to
3
be processed by artificial systems in order to capture visual scenes with a high
4
visual sensitivity. Think about scenes of twilight, day sunlight, the stars at night
5
or the interiors of a house. To capture these scenes, one needs cameras capable of
6
capturing so-called high dynamic range (HDR) images, which are expensive, or
7
via the method proposed by Debevec and Malik (1997), currently implemented
8
in most standard cameras. The problem is how to visualize these images af-
9
terwards since standard monitors have a low dynamic range (LDR). Two kinds
10
of solutions exist. The first is technical: there are HDR displays, but they are
11
not affordable for the general public yet. The second is algorithmic: there are
12
methods to compress the range of intensities from HDR to LDR. These methods
13
are called tone mapping operators (TMOs) (Reinhard et al., 2005). TMOs have
14
been developed for both static scenes and videos rather independently. There
15
has been intensive work on static images (see Kung et al. (2007); Bertalm´ıo
16
(2014) for reviews), with approaches combining luminance adaptation and lo-
17
cal contrast enhancement sometimes closely inspired from retinal principles, as
18
in Meylan et al. (2007); Benoit et al. (2009); Ferradans et al. (2011); Muchungi
19
and Casey (2012) just to cite a few. Recent developments concern video-tone
20
mapping, where a few approaches have been developed so far (see Eilertsen et al.
21
(2013, 2017) for surveys).
PT
ED
M
AN US
CR IP T
2
Interestingly, in neuroscience it has been found that a similar tone map-
23
ping processing occurs in the retina through adaptation mechanisms. This is
24
crucial since the retina must maintain high contrast sensitivity over this very
25
broad range of luminance in natural scenes (Rieke and Rudd, 2009; Garvert and
CE
22
Gollisch, 2013; Kastner and Baccus, 2014). Adaptation is both global through
27
neuromodulatory feedback loops and local through adaptive gain control mech-
28
anisms so that retinal networks can be adapted to the whole scene luminance
29
level while maintaining high contrast sensitivity in different regions of the image,
30
despite their considerable differences in luminance (see Shapley and Enroth-
AC 26
3
ACCEPTED MANUSCRIPT
Cugell (1984); Thoreson and Mangel (2012) for reviews). Luminance and con-
32
trast adaptation occurs at different levels, e.g., at the photoreceptor level where
33
sensitivity is a function of the recent mean intensity, and at the bipolar level
34
where slow and fast contrast adaptation mechanisms are found. These multiple
35
adaptational mechanisms act together, with lighting conditions dictating which
36
mechanisms dominate.
CR IP T
31
Thus there is a functional analogy between artificial and biological systems:
38
Both target the task of dealing with HDR content. Our motivation is to develop
39
a new TMO based on a synergistic model of the retina. We start from the
40
so-called Virtual Retina simulator (Wohrer and Kornprobst, 2009) which
41
has been developed in neuroscience to model the main layers and cell types
42
found in primate retina. It is grounded in biology and it has been validated
43
on neurophysiological data. As such, it has been in several theoretical studies
44
as a retina simulator to make predictions of neural response (Masquelier, 2012;
45
Basalyga et al., 2013; Vance et al., 2015; Masquelier et al., 2016; Cessac et al.,
46
2017a). Thus this model can be considered as a good candidate to build a
47
synergistic model since it could pave a way for much needed interaction between
48
the neuroscience and computer vision communities, as discussed in the survey
49
by Medathati et al. (2016).
ED
M
AN US
37
Interestingly, Virtual Retina has also been used to solve artificial vision
51
tasks, such as hand written recognition (Mohemmed et al., 2012) and image com-
52
pression (Masmoudi et al., 2010; Doutsi et al., 2015a,b). The reason why it could
53
be also an interesting model for a TMO is that it includes a non-trivial Con-
CE
PT
50
trast Gain Control (CGC) mechanism, which not only enhances local contrast
55
in the images but also adds temporal luminance coherence. However, Virtual
56
Retina is missing several important features to address the tone mapping task.
AC
54
57
It was not designed to deal with color images, even less HDR images, there is
58
no photoreceptor adaptation nor readout mechanisms to create a single output
59
from the multiple retinal response given by different output cell layers. In this
60
paper we address these questions by enriching Virtual Retina model keeping
61
its bio-plausibility and testing it on standard tone mapping images. 4
ACCEPTED MANUSCRIPT
This article is organized as follows. In Sec. 2 we describe our bio-inspired
63
synergistic model and we compare it to former bio-inspired TMOs In Sec. 3 our
64
method is evaluated on static images and videos. We show the impact of the
65
different steps and of main parameters. We show comparisons with the state-
66
of-the-art. Finally, In Sect. 4 we summarize our main results and discuss future
67
work.
68
2. Bio-inspired synergistic retina model
69
2.1. Why Virtual Retina?
AN US
CR IP T
62
In this paper we are interested in investigating how the retina could be a
71
source of inspiration to develop a new TMO. In neuroscience, there is no unique
72
model of the retina but different classes of mathematical models depending
73
on their use. In Medathati et al. (2016), the authors identified three classes
74
of models: (i) linear-nonlinear-poisson models used to fit single cell record-
75
ings (Chichilnisky, 2001; Carandini et al., 2005), (ii) front-ends retina-like mod-
76
els for computer vision tasks (Benoit et al., 2010; H´erault, 2010) and (iii) models
77
based on detailed retinal circuitry (Wohrer and Kornprobst, 2009; Lorach et al.,
78
2012). There is no unique model also because the retinal code remains an open
79
challenge in neuroscience. With the advent of new techniques such as Multi-
80
Electrode Arrays, the recording of the simultaneous activity of groups of neurons
81
provides a critical database to unravel the role of specific neural assemblies in
82
spike coding.
PT
ED
M
70
There is currently an intensive ongoing research activity on the retina trying
84
to decipher how it encodes the incoming light by understanding its complex cir-
85
cuitry (Gollisch and Meister, 2010). As such, models based on detailed retinal
AC
CE
83
86
circuitry will directly benefit from this research and this is why it seems promis-
87
ing to focus on this class of models. Here, we consider the Virtual Retina
88
model (Wohrer and Kornprobst, 2009) as starting point. Virtual Retina was
89
designed to be a spiking retina model which enables large scale simulations (up
90
to 100.000 neurons) in reasonable processing times while keeping biological plau-
5
ACCEPTED MANUSCRIPT
sibility. The underlying model includes a non-separable spatiotemporal linear
92
model of filtering in the Outer Plexiform Layer, a shunting feedback at the level
93
of bipolar cells, and a spike generation process using noisy leaky integrate-and-
94
fire neurons to model RGCs. To be self contained, this paper will remind the
95
main equations of Virtual Retina but we will not enter their justifications,
96
so the interested reader should refer to the original paper.
97
2.2. Model overview Our TMO has three main stages shown in Fig. 1:
98
CR IP T
91
(i) Pre-processing steps including a conversion of HDR-RGB input color im-
100
age to luminance space, and a calibration of this luminance data so that
101
these values are mapped to the absolute luminance of the real world scene.
102
(ii) A detailed retina model including a new photoreceptor adaptation taking
103
into account pupil adaptation, which has been incorporated into the input
104
level of Virtual Retina model that follows.
M
AN US
99
(iii) Post-processing including the definition of a readout given the multi-
106
ple retinal output of Virtual Retina and followed by colorization and
107
gamma correction to finally obtain a LDR-RGB output image.
ED
105
Each stage is explained in detail in the following section and also illustrated
109
using a representative example to show its impact. The source code is available
110
upon request to the authors via the paper website.1 Website contains sample
111
results for static images and videos, and the download form for the source code.
112
2.3. Model detailed description
CE
PT
108
Input is a HDR video defined by (Lred , Lgreen , Lblue )(x, y, t), where Lred ,
114
Lgreen and Lblue represent the red, green and blue channels of the input HDR
115
video respectively. Variables (x, y) stand for spatial position and t for time.
AC 113
1 Paper
website: https://team.inria.fr/biovision/tmobio
6
ACCEPTED MANUSCRIPT
HDR-RGB image
Color to luminance and HDR calibration
Retina model
Virtual Retina Edge and change detection
Rods Cones
Horizontal cells
AN US
Global nonlinear adaptation
CR IP T
Pre-processing
OPL
Outer plexiform layer (OPL)
Inner plexiform layer (IPL)
Negative contrast
(RGC)
RGC
ED
Positive contrast
Bipolar cells CGC
M
Dynamical temporal adaptation
Post-processing
AC
CE
PT
Readout: Convergence of multiple layers
Figure 1:
Colorization and Gamma correction
LDR-RGB image
Tone-mapping operator general pipeline. Block diagram showing the three
main stages of our method: Preprocessing, retina model extending Virtual Retina and post-
processing. Retina main steps are illustrated on the right hand-side to remind that our retina models follows the main circuitry of primate retina (Courtesy Purves et al. (2001))
7
ACCEPTED MANUSCRIPT
116
2.3.1. Pre-processing: Color-to-luminance and HDR calibration This first stage is classical. One first convert HDR color video into a HDR radiance map video following the linear operation:
CR IP T
Lw (x, y, t) = 0.2126 Lred (x, y, t) + 0.7152 Lgreen (x, y, t) + 0.0722 Lblue (x, y, t).
This equation correlates the way light contributes differently to the retina de-
118
pending on the proportion of each photoreceptor type with green the most
119
energy and blue the least. Here we use the standard matrix specified by ITU-R
120
BT.709 for the input channel weights (Reinhard et al., 2005), also used in the
121
sRGB color space.2 This is the most common color space used in consumer
122
cameras and devices, but other spaces could be used by operating with different
123
matrices.
AN US
117
Then Lw has to be calibrated. Calibration is related to the linearity between
125
HDR image and the absolute luminance, regardless the light conditions. This
126
step is not biologically inspired but compensates for each camera individual
127
properties: each camera is agnostic w.r.t. the scene acquired and will have
128
their own encoding of the scenes which do not correspond necessary to the
129
absolute luminance. Thus calibration is an important step since we do not
130
know a priori how HDR image values differ from the real scene. In our case, it
131
is also necessary since the photoreceptor adaptation model defined below needs
132
absolute luminance values.
PT
ED
M
124
Some existing calibration methods relies on using meta-data as EXIF tags,
CE
GPS locations or light probes (Gryaditskya et al., 2014). In this approach we use the heuristic proposed by Reinhard et al. (2002); Reinhard (2002) for calibrating data, which does not need information about the environment where the image
AC
was taken upon. Following Reinhard et al. (2002), the calibration process defines how to obtain an estimation of the real luminance L(x, y, t) from Lw (x, y, t): L(x, y, t) =
2 sRGB
α Lw (x, y, t), log ¯ Lw (t)
color space: https://www.w3.org/Graphics/Color/sRGB
8
(1)
ACCEPTED MANUSCRIPT
133
where the incoming data gets scaled by the ratio between the calibration coef-
134
¯ log ficient α and the logarithmic average L w (t) defined below. This equation gives a way of setting the tonal range of the output image
CR IP T
according to the so-called key of scene α. The key of the scene is a unitless number which relates to whether the picture is subjectively dark, normal ¯ log or bright. The underlying idea is that this ratio between α and L w (t) will
map the incoming luminance so that the calibrated output luminance would be
correlated to the subjective notion of the brightness of the scene. The parameter α is generally user defined and set to a fixed value (0.18 for most of the cases),
defined by
AN US
¯ log while the logarithmic average L w (t) is an approximation of the key of the scene X 1 ¯ log L log(δ + Lw (x, y, t)) , w (t) = exp N
(2)
x,y∈Ω
135
where N is the number of pixels in the Ω image space and δ is a small value in
136
order to avoid numerical errors coming from zero-valued pixels.
M
Then equation (1) can be further improved by letting the key of the scene be automatically adjusted depending of the statistics of each image. Follow-
ED
ing Reinhard (2002), it can be obtained as follows:
with
PT
α(t) = 0.18 · 4f (t) ,
¯ x,y,1% (t)) (t)) − log2 (Lmax,x,y,1% 2 log2 (L (t)) − log2 (Lmin,x,y,1% w w w
CE
f (t) =
(t)) log2 (Lmax,x,y,1% (t)) − log2 (Lmin,x,y,1% w w
(3)
,
(4)
¯ x,y,1% where L (t), Lmin,x,y,1% (t), and Lmax,x,y,1% (t) denote the average, miniw w w
138
mum, and maximum values of the incoming data respectively over the spatial
139
space, excluding 1% of the lightest and darkest pixels, statistically trimming
140
values coming from overexposure in the camera or noise. This is very impor-
141
tant since an image can have a distorted luminance distribution due to, e.g.,
142
overexposure. Note that in (3) the original value 0.18 is kept but is now mod-
143
ulated by 4f (t), which takes into consideration the statistical features of the
AC
137
9
ACCEPTED MANUSCRIPT
image through equation (4). The value 4 was empirically determined by the
145
author (Reinhard, 2002).
146
2.3.2. Retina model
CR IP T
144
In this section, we try to present the different steps of the retina model in
148
functional terms rather than in biological terms which would be less informative
149
for our readers. For that purpose, we illustrate in Fig. 2 the effect of the retinal
150
steps in TMO pipeline, and also provide an DRIM metric (Aydin et al., 2008) for
151
each stage in order to allow a better appreciation of the accumulative changes
152
compared to the original HDR image. We refer the interested readers to the
153
original paper describing Virtual Retina (Wohrer and Kornprobst, 2009) to
154
understand the biological foundation of the model. We also try to find analogies
155
with previous works, not necessarily bio-inspired methods, to show that there
156
are common ingredients found in artificial and biological models, even if their
157
implementations differ (see also Sec. 2.4).
158
Photoreceptor adaptation: Global non-linear adaptation. Photoreceptors adjust
159
their sensitivity over time and space according to the luminosity present in the
160
visual scene, providing a first global adaptation for the incoming light (Dunn
161
et al., 2007). This stage was not present in Virtual Retina original model
162
which was not designed to process HDR images.
ED
M
AN US
147
PT
Here we consider the model by Dunn et al. (2007), where the authors proposed to fit empirical data coming from primate retinal cones. Their model
CE
describes photoreceptor activity as a function of the background and average luminance. Using their results, we showed how the luminance received by each photoreceptor can be mapped depending on the input luminance L(x, y, t), fol-
AC
lowing the nonlinear equation: h(x, y, t; L) =
1+
¯ l1/2 (L(t)) L(x, y, t)
n !−1
,
(5)
163
where n is a constant which can be set according to experimental data to n =
164
¯ 0.7 ± 0.05, L(t) is the instantaneous spatial average of the incoming luminance 10
ACCEPTED MANUSCRIPT
Retina model
DRIM
AN US
Edge and change detection
CR IP T
Global nonlinear adaptation
HDR image
M
Dynamical temporal adaptation
Positive & negative contrast
LDR-RGB image
OFF
-
PT
ED
ON
Figure 2: Illustration of the ”Retinal model” steps from TMO pipeline presented
CE
in Fig. 1 (pre- and post-processing are indicated by yellow and red colored circles respectively). Input image is tree mask by Image, Light and Magic. On the first step the photoreceptor adaptation provides a compression of dynamic range. Then the OPL layer
AC
provides sharpening of the image. After that, the CGC layer improves the overall lightning reducing or amplifying the contrast locally. Finally, the readout is done by merging the two images describing positive and negative contrasts corresponding to ON and OFF ganglion cells. Note that for each step, except for contrast images, images have been colorized and gamma corrected to facilitate comparisons between different steps. These results were obtained using the model parameters presented on Table 1, except for λA = 10000 [Hz] and σA = 1.2 [deg]. For each step the respective DRIM (Aydin et al., 2008) is presented, showing in green loss of contrast, blue amplification of contrast and red reversal of contrast against the input HDR image as reference. Remark that ganglion cell 11layer is not affecting the image contrast, but only the global brightness.
ACCEPTED MANUSCRIPT
R
165
¯ defined by L(t) =
166
details to prove this relation (5) are given in the Appendix A (see also Fig. A.12
167
for an illustration of the function l1/2 (.)). Results about this stage are presented
168
in Sec. 3.1.
L(x, y, t)dxdy, and l1/2 is nonlinear function. Technical
CR IP T
x,y
Our photoreceptor model can be related to the TMO proposed by Van Hateren
169
170
(2006) based on dynamic cone model. However, in our method dynamics is in-
171
troduced later (namely at the CGC level) so that the photoreceptor adaptation
172
behaves as a static nonlinear global operator.
AN US
Outer plexiform layer (OPL): Edge and change detection. This stage operates
edge detection and temporal change detection given the light information already adapted by the photoreceptors (h(x, y, t; L)). In retinal circuitry, this is performed at the OPL level. OPL designates the first synapse layer in the retina making the connection between photoreceptors, horizontal cells and bipolar cells. Output of this stage is the IOPL current. This band-pass behavior is
M
obtained by center-surround interactions, which can be written: IOPL (x, y, t) = λOPL (C(x, y, t) − wOPL S(x, y, t)),
ED
(6)
where C(x, y, t) represents the center excitatory signal, calling for the photoreceptor adaptation and photo-transduction and S(x, y, t) is the surround in-
PT
hibitory signal, which models the horizontal cell response. They are defined
CE
as follows:
t
t
(7)
S(x, y, t) = GσS ∗ EτS ∗ C(x, y, t),
(8)
x,y
t
x,y
t
where ∗ denotes convolution that can be either spatial ( ∗ ) or temporal (∗), Gσ
AC
173
x,y
C(x, y, t) = GσC ∗ TwU ,τU ∗ EnC ,τC ∗ h(x, y, t; L),
174
is the standard Gaussian kernel, Tw,τ is a partially high-pass temporal filter,
175
Eτ is an exponential filter to model low-pass temporal filtering and En,τ is
176
an exponential cascade (Gamma filter)3 . The balance between the center and 3 Expressions
are: Eτ = exp(−t/τ )/τ and En,τ (t) = (nt)n exp(−nt/τ )/ (n − 1)!τ n+1
12
ACCEPTED MANUSCRIPT
177
surround contribution is given by wOPL and it is normally close to 1. λOPL
178
defines the overall gain of this stage. Parameters σC , nC , τC , wU , τU , σS and
179
τS are other constant parameters. Equation (6) with (7)–(8) results in a spatiotemporal non-separable filter. It
181
extends the only-spatial classical model of the retina as a difference of Gaussians
182
(DoG). As such, the IOPL is a real valued function corresponding to the presence
183
of contrasts in the scene either a spatial contrast (with sign indicating if there is a
184
bright spot other a dark background or the opposite) or a change in time (with
185
sign also indicating the nature of the temporal transition). This real-valued
186
function will lead to the two rectified components defining ON and OFF cells
187
in the ganglion cell layer (see below). This is a classical model in neuroscience
188
so that all its parameters are well understood and can be set to default values
189
given by biological constrains. Results about this stage are presented in Sec. 3.2.
190
Note that this DoG-like operation done in the OPL layer in conjunction
191
with the dynamic range compression performed by the photoreceptor adaptation
192
generate an operator which can be qualitatively compared to the bilateral filter
193
TMO proposed by Durand and Dorsey (2002) in the sense that there is an
194
implicit estimation of edges.
195
Contrast Gain Control layer (CGC): Dynamical temporal adaptation. Contrast
196
gain control, or contrast adaptation, is the usual term to describe the influence
197
of the local contrast of the scene on the transfer properties of the retina (Shapley
198
and Victor, 1978; Victor, 1987; Kim and Rieke, 2001; Rieke, 2001; Baccus and
199
Meister, 2002). In Virtual Retina, an original model was proposed for fast
200
contrast gain control, an intrinsically dynamic feature of the system, which has
201
a strong and constant influence on the shape of retinal responses, and very
AC
CE
PT
ED
M
AN US
CR IP T
180
202
likely on our percepts. It is an effect intrinsically nonlinear, and dynamical.
203
Dynamically, it modulates the gain of the transmission depending on the local
204
contrast. In Virtual Retina, given the input current IOPL (x, y, t), the membrane
13
ACCEPTED MANUSCRIPT
potential of bipolar cells denoted by VBip is evolving according to: dVBip (x, y, t) = IOPL (x, y, t) − gA (x, y, t)VBip (x, y, t), dt
CR IP T
with
(9)
x,y t 0 gA (x, y, t) = GσA ∗ EτA ∗ Q(VBip ) (x, y, t) and Q(ν) = gA + λA ν 2 , 205
0 where σA , τA , gA and λA are constant parameters. This model has been rigor-
206
ously evaluated by comparisons with real cell recordings using specific stimuli.
So, in practice this stage translates in contrast equalization, which we see
208
as an image enhancing capability and a temporal coherence source for videos.
209
Figure 2 illustrates this effect as the increase of brightness in some areas. As in
210
the former stage, all its parameters can be set to default values to fit biological
211
constrains. Results about this stage are presented in Sec. 3.3.
212
Ganglion cells layer. In the last layer of the retina, namely ganglion cells layer,
213
two main processing stages happen. The first consists in transforming the poten-
214
tial of bipolar cells into an electric current. The second consists in transforming
215
this current into spikes which is how information is encoded in the end by the
216
retina to be transmitted to the visual cortex through the optic nerve. In this
217
paper we will only consider the first stage since the spike-based encoding will
218
not be exploited.
PT
ED
M
AN US
207
219
From a more functional perspective, it is in this last stage that the response
220
of ganglion cells can be interpreted in terms of features in the observed scene. There are about 20 ganglion cell types reporting to the brain different fea-
222
tures (Masland, 2011). In this paper we focus on two major classical types,
223
namely ON and OFF cells which respond to positive and negative spatiotempo-
AC
CE 221
224
ral contrasts. As explained above, the emergence of contrast sensitivity comes
225
from the Outer Plexiform Layer where center-surround processing occur (see
226
Eq. (6)). Following the model proposed in Virtual Retina, ON and OFF ganglion ON OFF cell currents (resp. denoted by IGang (x, y, t) and IGang (x, y, t)) are obtained by
14
ACCEPTED MANUSCRIPT
applying a nonlinear rectification to the bipolar potential to either keep the positive or negative parts. This can be written as follows:
with N (V ) =
(x, y, t) = N (ε VBip (x, y, t)) ,
i0G 0 )/i0 1−λG (V −vG G
i0 + λG (V − v 0 ) G G
(10)
CR IP T
ON/OFF
IGang
0 if V < vG ,
(11)
0 if V > vG ,
where ε defines the output polarity (ε = 1 for ON cells and ε = −1 for OFF
228
cells), i.e., changing the ε parameter in (10), one defines the two ganglion cells
229
ON OFF currents IGang (x, y, t) and IGang (x, y, t), describing respectively positive and
230
negative contrasts. The function N (V ) is a non-linear function which recti-
231
fies IGang
232
feature in neural modeling and in retinal models. It reflects static nonlineari-
233
0 ties observed experimentally in the retina. The parameter vG is the ”linearity
234
threshold” between the two operation modes while λG is the slope of the rectifi-
235
cation in the linear part. Note that (10) is a simplification of the original model
236
in which additional convolutions were added to fit biological data.4
(x, y, t) using a smooth curve. Such rectification is a very common
M
ON/OFF
AN US
227
Results about this stage are presented in Sec. 3.4 where we show the influence
ED
237
of the parameter λG .
239
Remark 1. Are ON an OFF circuits symmetric? In the current model
240
and following Wohrer and Kornprobst (2009), ON and OFF circuits are sym-
241
metric in the sense that they were obtained by simply changing the polarity ε.
242
However, this is a simplified assumption with respect to real retina properties.
243
Indeed, there are physiological and mechanistic asymmetries in retinal ON and
244
OFF circuits (Chichilnisky and Kalmar, 2002; Zaghloul et al., 2003; Balasubra-
AC
CE
PT
238
245
manian and Sterling, 2009; Kremkow et al., 2014). For example, across species,
246
it has been observed that OFF cells (compared to ON cells) are more numerous, 4 In
Eq.
the original model, there were additional spatial and temporal convolutions (see
(14) from Wohrer and Kornprobst (2009)). These terms were useful to fit biological
data but in our context they were introducing some blur so that we decided to discard them.
15
ACCEPTED MANUSCRIPT
sample at higher resolution with denser dendritic arbor and respond to contrast
248
with strong rectification. Differences between the two circuits have been also ob-
249
served depending on the overall light condition, e.g., photopic or scotopic condi-
250
tions (Pandarinath et al., 2010; O’Carroll and Warrant, 2017; Takeshita et al.,
251
2017). Interestingly, this asymmetry could arise from the asymmetries between
252
dark versus bright regions in natural images (Balasubramanian and Sterling,
253
2009; Ratliff et al., 2010). For example, in Ratliff et al. (2010) the authors con-
254
sidered the distribution of local contrast in 100 natural images and suggest that
255
because of the skew observed in that distribution (natural scenes contain more
256
negative than positive contrast; see also ON and OFF ganglion cells response
257
given in Fig. 2), the system should devote more contrast levels to encode the
258
negative contrasts for example (e.g., OFF channels should be more rectified).
259
All these results suggest that an interesting avenue for further study is to model
260
explicitly these asymmetries to better account for natural image statistics (see
261
discussion in Sec. 4). This is why we introduced the ON and OFF circuits here
262
although there role in the present study remains limited (slight impact on global
263
luminance) and we refer to Sec. 3.4 where we show a preliminary result where
264
some asymmetry is introduced between ON and OFF circuits.
ED
M
AN US
CR IP T
247
Readout population activity. Given ON and OFF responses, one needs to define
PT
how to readout an activity based on different population responses. Here we chose a simple way to combine the activity generated by both the ON and OFF
CE
circuits:
ON OFF Lout (x, y, t) = IGang (x, y, t) − IGang (x, y, t),
(12)
where Lout (x, y, t) represents the output radiance map which can now be col-
266
orized, normalized, quantized and gamma corrected for the output LDR file, as
AC
265
267
described in final step below. Colorization and Gamma correction. Up to now, our method operates on the luminance map of the input image. Once the luminance map has been corrected, it is necessary to colorize it. To do this we apply the method proposed
16
ACCEPTED MANUSCRIPT
by Mantiuk et al. (2009) which involves linear interpolation between chromatic input information ({Li (x, y, t)}i=red, green, blue ), the respective achromatic input information (L(x, y, t)) and the achromatic output information Lout (x, y, t)). To
CR IP T
obtain the output in linear space Olin,i , each chromatic channel is colorized as follows: Olin,i (x, y, t) = 268
Li (x, y, t) − 1 s + 1 Lout (x, y, t), L(x, y, t)
(13)
where s is the color saturation factor, in our implementation set by default to 1. The model also considers a gamma correction step. Luminance values com-
AN US
ing from a HDR image are usually in a linear scale, which does not map well
with the color spaces of computer monitors mostly having non linear spaces. Moreover these response curves are usually described by an power law, so we need to apply the inverse operation to keep the desired appearance correctly on screen. For each channel chromatic channel, a standard gamma correction is applied:
ED
M
0 1/γ Oi (x, y, t) = b255 · Olin,i + 0.5c 255
1/γ
255 · Olin,i ≤ 0, 1/γ
0 < 255 · Olin,i ≤ 255,
(14)
otherwise,
where the choice of the gamma factor normally depends on the screen, (default:
270
γ = 2.2). The factor of 255 present in the equation is related to the maximum
271
possible value that can be stored in a common LDR image format like PNG or
272
JPEG with a resolution of 8 bits per pixel.
273
2.4. Relation with previous work
CE
PT
269
Given the nature of our model, in this section we first discuss a selection of
275
models inspired by visual system properties and architecture. We found a range
276
of models from psychophysical inspiration (e.g., based on fitting psychophysical
277
curves for light adaptation) to more bio-plausible models of the visual system
278
(e.g., mathematical model of the retinal circuitry).
AC
274
17
ACCEPTED MANUSCRIPT
One of the earliest examples, Rahman et al. (1996) developed an TMO based
280
on the retinex algorithm. This yields a local non-linear operator that relies on
281
the work of Land and McCann (1971), who suggested that both the eye and
282
the brain are involved in color processing, particularly in the color constancy
283
effect. The TMO proposed by Rahman et al. (1996) process each color channel
284
independently and finally merged using a color restoration method proposed by
285
the author. Their method has two modes of operation: single and multi-scale.
286
At single scale each color channel is convolved with Gaussian kernels of different
287
sizes. Each pixel (center) is combined with the blurred image (surround) in a
288
nonlinear manner defining thus different center/surround interactions at differ-
289
ent spatial scales. At multi-scale the resulting center/surround interactions are
290
weighted and summed obtaining the equalized output image. The selection of
291
the weights is based on a heuristic considering either small or large surrounds.
AN US
CR IP T
279
In Reinhard and Devlin (2005), the authors developed a TMO inspired in the
293
adaptation mechanism present in photoreceptors. This TMO describes the au-
294
tomatic adjustment to the general level of brightness that color-photoreceptors
295
(cones) have. The core of their algorithm is the adaptation of a photorecep-
296
tor model, which is mainly implemented as a variation from the Naka-Rushton
297
equation5 . Their model lies on a global adaptation of the image fitting the
298
Naka-Rushton parameters in a global (from average scene luminance) or local
299
(from color average luminance) manner. This method does not allow luminance
300
coherence over time because of its static formulation (each frame is processed
301
independently). In our case, photoreceptor adaptation is done frame by frame,
CE
PT
ED
M
292
302
and luminance coherence is obtained by the temporal filtering in the CGC layer. A front-end retina-like model for computer vision tasks was proposed by (Benoit
303
et al., 2010; H´erault, 2010) and applied for tone mapping images and video (Benoit
AC
304
305
et al., 2009). They modeled the retina as two adaptation layers. The first layer
306
is related to photoreceptor adaptation and the second one to ganglion cells. Be5 Sigmoid
function which computes the response of the photoreceptor based on the incoming
brightness and the semi-saturation constant
18
ACCEPTED MANUSCRIPT
tween the two main layers lies a non-separable spatio-temporal filter modeling
308
the synaptic Outer Plexiform Layer (OPL), which summarizes the interaction
309
between photoreceptors and horizontal cells of the retina. For the photoreceptor
310
adaptation the authors propose a non-linear local process in which each photore-
311
ceptor response depends on the incoming luminance inside a local neighborhood.
312
The OPL layer is then used as spectral whitening and contrast enhancement. In
313
a similar manner, the ganglion cell layer reuses the nonlinear mechanism used
314
for the photoreceptor adaptation with different parameters. Specifically, the
315
authors consider a smaller neighborhood around each pixel to compute the aver-
316
age luminosity. Color processing is done through a multiplexing/demultiplexing
317
scheme based on the Bayer color sampling, normally found in cameras. Regard-
318
ing HDR video, the authors use the temporal aspect of the OPL spatio-temporal
319
filter to provide temporal coherence through the sequence. This model shares
320
some features with Virtual Retina model (Wohrer and Kornprobst, 2009),
321
for instance, the modeling of the OPL as a non-separable spatio-temporal fil-
322
ter. Benoit et al. (2009) directly connects the OPL output to the nonlinearity
323
proposed in the ganglion cell layer, while in Virtual Retina OPL output gets
324
into a Contrast-Gain Control stage before the ganglion cell layer.
ED
M
AN US
CR IP T
307
Finally, even though is not properly a TMO, in Zhang et al. (2015), the
326
authors proposed a method for de-hazing static images that relies on a thor-
327
ough retina model, from photoreceptors to retinal ganglion cells. In this model,
328
photoreceptors, bipolar, amacrine and ganglion cells are modeled as a cascade
329
of Gaussian and difference of Gaussian filters through several signal pathways,
CE
PT
325
representing the intricate relationships between colors and global versus local
331
adaptation. We mention this work here because our model shares the model-
332
ing of those biological layers, except for amacrine cells. Also, their model does
AC
330
333
not account for any dynamical Contrast Gain Control which is an important
334
processing stage in our method.
335
Compared to general video tone mapping approaches, it is worth mention-
336
ing that our method does not need optical flow estimation, as in Aydin et al.
337
(2014); Hafner et al. (2014); Mangiat and Gibson (2010) where the authors use 19
ACCEPTED MANUSCRIPT
it stabilize scenes with small motion, with the limitation of producing artifacts
339
in scenes with larger displacements (Kalantari et al., 2013). Other video TMOs
340
apply transforms that change over time but applied uniformly in space (Ferw-
341
erda et al., 1996; Pattanaik et al., 2000; Irawan et al., 2005; Van Hateren, 2006;
342
Mantiuk et al., 2008; Reinhard et al., 2012). They perform well in terms of
343
temporal coherence but can produce a loss in detail, whereas our model take
344
into account local contrast information in space, which better preserves spatial
345
details.
346
3. Results and comparisons
AN US
CR IP T
338
347
In this section we evaluate and compare our bio-inspired synergistic TMO
348
on both static images and videos. Remark that if the input is a static image, it
349
is considered as a video by simply repeating that image other time. Unless specified otherwise, all model parameters used to process HDR images
351
are set as given in Table 1. It is important to emphasize that a majority of these
352
parameters have been set thanks to biological constraints. Note that since all
353
gaussian standard deviation parameters were defined in visual angle degrees in
354
Virtual Retina, we used a conversion of five pixels per visual angle degree,
355
or [p.p.d.]. This value was kept constant for different image sizes. Concerning
356
temporal constants, we assume that videos are at 30fps and between each frames
357
we do six iterations of the CGC equation (9). For a static input image, we
358
let the iterative process converge towards a final image (in practice, about five
359
repetitions of the frame, i.e., 30 iterations). We used a time step of dt = 0.5 [ms].
CE
PT
ED
M
350
360
Sections 3.1 to 3.5 deal with static images. We will show the influence of
361
some parameters in each step to show their impact and benchmark our method
363
with respect to the state-of-the-art. Section 3.6 gives an example of result on a
364
real video sequence. We show how the temporal convolutions provide temporal
365
coherence in the output.
AC 362
20
ACCEPTED MANUSCRIPT
Table 1: Default parameters used in our implementation. Note that all parameters and variables presented in the Virtual Retina model are expressed in reduced units and not physical units (see justification in Sec. 2.1.2 from Wohrer and Kornprobst (2009)). As a consequence, for example, the dimension of currents is expressed in Hertz and the overall gain
τC
0.01 [s]
nC
Value −1 52000 P s 2
τU
0.1 [s]
wU
0.8 −1 10 Hz lum
σS
0.2 [deg]
τS
0.01 [s]
0.55
dt
0.005 [s]
0.2 [s]
λA
100 [Hz]
σA
0.2 [deg]
τA
0.0005 [s]
0 gA
i0G
80 [Hz]
λG
n
0.5
λOPL tf
γ
366
Parameter i1/2
wOPL
2.2
Parameter
5 [Hz]
100 [Hz]
Value
σC
0.03 [deg]
AN US
Parameter
Value
CR IP T
of the center-surround filter λOPL is expressed in Hertz per unit of luminance.
0 vG
0
s
1
3.1. Photoreceptor adaptation: Global non-linear adaptation
Figure 3 shows the effect of photoreceptor adaptation. In Fig.3(a) we show
368
how an input radiance map L(x, y, .) (converted to an LDR image using a linear
369
mapping) is transformed into h(x, y, .; L) image defined in (5). This is done by
370
a global non-linear operator effectively compressing the whole incoming data
371
range to the interval ]0, 1[, step necessary for the Virtual Retina model to
372
work properly. The behavior of this nonlinearity amplifies the cases where the
373
luminance is higher compared to the mean spatial luminance of the scene, non-
374
linearly computed through l1/2 (·), i.e., overexposed regions. In Fig. 3(b) we
PT
ED
M
367
show the absolute difference between h(·) and L(·)6 . Note that the main differ-
376
ences between the two images are located in the overexposed regions present in
377
the stained glass windows.
AC
CE 375
378
A comparison between different values of the exponential parameter n in
379
Eq. (5) is shown in Fig. 4. In the experimental fit of Dunn et al. (2007) this 6 Both
images were normalized between [0,1] to compute the absolute difference between
them
21
ACCEPTED MANUSCRIPT
constant was set to n = 0.7 ± 0.05. One can change n to increase or decrease
381
the effect of photoreceptor adaptation. From our tests, we found that n = 0.5
382
gives consistently good results.
383
3.2. Outer plexiform layer (OPL): Edge and change detection
CR IP T
380
In Fig. 5 we analyze the impact of the parameters from Eq. (6) defining the
385
center-surround interaction. First we vary the relative weight value of wOPL
386
(experimentally estimated near 1 for mammal retinas, and in general in the
387
range of [0, 1]). High values of wOPL , e.g., 0.9 excessively enhances borders spe-
388
cially when the size of the surround, given by the σS parameter, increases. So,
389
we have experimentally set the parameter wOPL with a default value of 0.55,
390
which performs consistently well in our tests. Then the size of the surround
391
produces a high impact on the resulting IOPL layer. In Fig. 5 we show that
392
increasing the value of σs induces artifacts for high values of wOPL specially
393
in borders with a high difference of luminosity, mainly because saturation in
394
the IOPL (x, y) value. The effect of σS additionally enhances details on regions
395
with low contrast when there is no saturation (low values of wOPL ). The de-
396
fault value in our implementation is σS = 0.1 [deg], approximately the modeled
397
value for primate retinas. This value shows a good compromise between edge
398
enhancement and image distortion, while being biologically accurate.
399
3.3. Contrast Gain Control layer (CGC): Dynamical temporal adaptation
PT
ED
M
AN US
384
In Fig. 6 we show the effect of Contrast Gain Control (CGC) varying pa-
401
rameters. CGC layer enhances dynamically local brightness and local contrast
402
in the image. According to (9), there are two parameters that directly impact
403
the value of VBip (x, y, t): λA , the overall gain of the feedback loop and σA , the
AC
CE
400
404
standard deviation of the spatial gaussian blur. In Fig. 6 we focus on these two
405
parameters. Similarly to what was reported by Wohrer and Kornprobst (2009),
406
the gain in the feedback loop given by λA translates into more intense differences
407
with respect to the IOPL input signal, obtaining as a result brighter images. The
408
value of λA = 1 [Hz] yields little differences regardless of the choice of σA . The
22
CR IP T
ACCEPTED MANUSCRIPT
Relative luminance
Figure 3:
(b)
AN US
(a)
Effect of global non-linear photoreceptor adaptation (a) Effect of the
non-linear operator h(·) defined in (5) over the input luminance image L. The image obtained after the photoreceptor adaptation is represented in h(·). (b) Difference of the normalized signals |h − L|.
n = 0.5
n=1
CE
PT
ED
M
n = 0.25
AC
Figure 4:
Effect of the parameter n in the photoreceptor adaptation (5). This
parameter has an effect in terms of compression of the incoming luminance map. When n is low, compression is exaggerated resulting in a arguably distorted image. When n is close to one, function h(x, y, t; L) behaves as a quasi-linear operator.
23
S
= 0.01
S
= 0.1
=1
S
= 10
AN US
wOP L = 0.2
S
CR IP T
ACCEPTED MANUSCRIPT
CE
PT
wOP L = 0.9
ED
M
wOP L = 0.55
Figure 5:
Resulting IIOPL for different values of σS and wOPL (6). The relative
weight wOPL highlights edges in the input image according to the local average described by
AC
the surround signal S(x, y). This becomes more prominent with a large values of σS .
24
ACCEPTED MANUSCRIPT
effect of σA is evident when the value of λA increases, and in this case the
410
gaussian blur produced by σA generates local enhancement specially observed
411
in overexposed regions such as the stained glass windows. After this analysis
412
we decided to set the default values of these parameters to λA = 104 [Hz] and
413
σA = 1.2 [Hz].
CR IP T
409
In Fig. 7 we illustrate how the dynamics of CGC layer work. We denote by
414
415
ni VBip
416
equation (9). We show how the CGC layer progressively enhances details in
417
ni time. In Fig. 7(a) we show the image of VBip for ni = 0, 10 and 30 iterations,
418
which can be considered as the steady-state. Convergence of the differential
419
equation to the steady state is illustrated in Fig. 7(c) showing the relative differ-
420
ence between two successive iterations. Stronger variations are observed during
421
the first 10 iterations. Numerically, we chose tf = dt · 20 = 0.2 [s], as the de-
422
fault stop time for CGC dynamics over an static image. The effect of the CGC
423
is more noticeable for the windows of the altar where most of the light of the
424
scene comes from (see zoom). This is also visible from Fig. 7(b) where we show
425
ni at initial state (ni = 0) and at convergence (here the difference between VBip
426
ni = 0). CGC compensates the contrast locally, thus allowing for details to be
427
enhanced, as the edges of the stained glass. Also, the additional compression
428
in dynamic range translates into an overall brighter image, shown on the walls
429
around the windows.
430
3.4. Ganglion cells layer
PT
ED
M
AN US
the value of the map VBip after ni iterations of the discretized differential
In Fig. 8 we study the influence of parameter λG in the ganglion cells layer
432
(12). This parameter represents the slope of the rectification in the linear part.
433
In Fig. 8(a) we show results for different values of λG . Note that in or-
AC
CE 431
434
der to enhance the effect of this stage, we pushed the CGC parameters to be
435
σA = 0.2 [deg] and λA = 102 [Hz]. High values of λG (106 [Hz]) distort the out-
436
put image with an excessive brightness, while low values of λG (∈ [100 , 104 ] [Hz]
437
approx.) improve the overall brightness without overexposing the scene. Fol-
438
lowing our motivation to keep the TMO biologically plausible, the value chosen 25
ACCEPTED MANUSCRIPT
A
= 0.2
= 102
A
= 104
CR IP T
= 0.02
A
=2
CE
PT
A
ED
M
A
= 100
AN US
A
Figure 6:
Effect of CGC parameters on VBip . This figure summarizes the effect of λA
AC
and σA parameters on the Contrast Gain Control layer behavior. The gain of the loop, given by λA , translates into an improved overall brightness in the image, while an increase of the standard deviation σA value enhances details. The parameter values chosen as default in this model are shown in Table 1.
26
ACCEPTED MANUSCRIPT
10 VBip
30 VBip
30 |VBip
0 VBip |
(a) ni +1 |VBip
ni VBip |
(b)
CE
PT
ED
M
ni |VBip |
AN US
CR IP T
0 VBip
Figure 7:
(c)
Number of iterations
ni
Effect of CGC in local contrast enhancement. Comparison between the n
AC
i input and output of the CGC stage for different iterations number. (a) Evolution of VBip
after ni = {0, 10, 30} iterations (note that resulting image was colorized and gamma corrected for display). (b) A colorized map of the differences between the iteration 30 and the initial state at iteration 0. (c) The mean squared error computed between two consecutive iterations n
i of VBip . After approximately 15 iterations the changes on the scene are not noticeable.
27
ACCEPTED MANUSCRIPT
439
as default for this gain is λG = 100 (see Wohrer et al. (2009), Midget cells). In Fig. 8(b) we propose a preliminary result to illustrate the effect of in-
441
troducing asymmetries between ON and OFF circuits (see Remark 1 in the
442
paragraph Ganglion cell of Sec. 2.3.2). This is done by choosing different values
443
of slope of the sigmoid function (i.e., λG ) to obtain the ON and OFF responses.
444
Namely, we choose a fixed value to estimate the ON part (λON G = 100) while we
445
F vary the value for the OFF part (λOFF ). Increasing λOF has a directly proG G
446
portional effect in the compression rate over the channel, thus diminishing the
447
contribution to the output luminance map. This effect can also be understood
448
as the effective reduction of global contrast in the image. G
=1
G
= 102
G
= 104
G
= 106
ED
M
(a)
AN US
CR IP T
440
OFF / ON G G
=1
OFF / ON G G
= 10
OFF / ON G G
= 100
AC
CE
(b)
= 0.1
PT
OFF / ON G G
Figure 8:
Effect of λG on the output of the Ganglion cells layer Lout (x, y). (a)
Testing different values of λG . High values of λG (> 106 [Hz]) saturate image brightness: see
distortions in the ceiling and the stained glass windows. Lower values of λG , on the contrary, improve the global perception of the scene. (b) Introducing asymmetry between ON and OFF
OFF is changed. For circuits. Different values of λG are used for each circuit: λON G is fixed; λG
bigger values of λOFF there is a clear loss of global contrast across the image. G
28
ACCEPTED MANUSCRIPT
449
3.5. Comparison with other methods In Fig. 9 we visually compared our results to linear compression, logarithmic
451
compression, Retinex model (Rahman et al., 1996), histogram adjustment (Lar-
452
son et al., 1997a), bilateral filter-based approach (Durand and Dorsey, 2002),
453
photo-receptor model (Reinhard and Devlin, 2005) and bio-inspired spatio-
454
temporal model (Benoit et al., 2009). We used implementations provided in Rein-
455
hard et al. (2005). These were run under a Linux machine using our C++
456
implementation. The TMO proposed in this paper was implemented using a
457
default set of parameters, which can be found in Table 1. We used four HDR
458
images in total, all of which are already properly calibrated. Every tone mapped
459
image was obtained without intervention of the default parameters proposed in
460
the respective models so that a fair comparison can be done. Results show that
461
our method gives promising results in general for the four images selected. It
462
manages to compress the luminance range without overexposure. For Bristol
463
Bridge, a typical twilight outdoors scene, our method allows to see both the
464
woods and the sky. In Clock Building, both the edifice details and the outdoors
465
lighting are preserved. In Memorial the inside of the altar is recovered while
466
conserving the details from the stained glass. Finally in Waffle House both
467
the indoors of the building and the car can be seen without too much specular
468
lighting noise, which can be seen over the roof of the building.
ED
M
AN US
CR IP T
450
Note that we carefully verified that methods were tested in the same con-
470
ditions regarding input calibration and gamma correction to obtain the final
471
output. This is important in order to provide a fair comparison ground for
CE
PT
469
every TMO. Concerning the calibration step (see Sec. 2.3.1) this step is not a
473
priori necessary since all the images are already calibrated. For these images
474
the alpha factor (i.e., the key of the scene) is close to a logarithmic average of
AC
472
475
the input luminosity. As a consequence, it is not important to know if other ap-
476
proaches include or not this calibration step. Regarding gamma correction, for
477
all the TMOs evaluated we checked that the output image was passed through
478
a gamma correction layer (γ = 2.2) before being evaluated. If this was not in
479
the original code, we added it. 29
ACCEPTED MANUSCRIPT
Clock Memorial Building
Linear
AN US
Logarithmic
Retina Rahman et al. (1996)
M
Histogram adjustment Larson et al. (1997)
Bilateral filter-based Durand and Dorsey (2002)
Waffle House
CR IP T
Bristol Bridge
ED
Photoreceptor adaption Reinhard and Devlin (2005)
CE
PT
Bio-inspired Benoit et al. (2009)
Our Method
AC
Figure 9:
Qualitative comparison between different TMOs and our method (see
text for references). Bristol Bridge and Clock Building images are copyright of Greg Ward. Memorial image is copyright of Paul Debevec. Waffle House is copyright of Mark D. Fairchild.
30
ACCEPTED MANUSCRIPT
In order to obtain quantitative comparisons with other TMOs, we computed
481
the TMQI (Yeganeh and Wang, 2013) for 29 different TMOs over a big subset of
482
the HDR Photographic Survey7 , by Mark Fairchild (a total of 101 images). For
483
each TMO we represent the distribution of the TMQI obtained for all the images
484
in the dataset using violin plots (see Fig. 10(a)). Note that violin plots were cho-
485
sen because they bring more information than standard box-plots. For example,
486
they can detect bimodal behaviors while the classical box-plots miss this infor-
487
mation. We can observe the variety of performances obtained for the different
488
TMOs evaluated, and that the method here proposed has a comparable perfor-
489
mance with the best methods reported in the literature. The methods evaluated,
490
from top to bottom are: Linear, Logarithmic (see Reinhard et al. (2005)), Op-
491
penheim et al. (1968); Horn (1974); Miller et al. (1984); Chiu et al. (1993);
492
Tumblin and Rushmeier (1993); Ferschin et al. (1994); Ward (1994); Schlick
493
(1995); Ferwerda et al. (1996); Rahman et al. (1996); Larson et al. (1997b); Pat-
494
tanaik et al. (1998, 2000); Ashikhmin (2002); Durand and Dorsey (2002); Fattal
495
et al. (2002); Reinhard et al. (2002); Choudhury and Tumblin (2003); Drago
496
et al. (2003); Yee and Pattanaik (2003); Li et al. (2005); Reinhard and Devlin
497
(2005); Kuang et al. (2007); Kim and Kautz (2008); Benoit et al. (2010); Oskars-
498
son (2015); Zhang and Li (2016), and our method. The Zhang and Li (2016)
499
implementation was done by us, the Kim-Krautz consistent TMO was made
500
available by Banterle et al. (2011), Li method was provided by Yuanzhen Li,
501
the Democratic TMO implementation was done by Magnus Oskarsson, Benoit
502
TMO code was retrieved from the OpenCV 2.0 source package and the rest of
CE
PT
ED
M
AN US
CR IP T
480
503
TMO implementations were provided by Reinhard et al. (2005). Additionally, in Figures 10(b-c) we show a more precise comparison with the
505
two competing bio-inspired approaches reported in Fig. 10(a), namely Benoit
AC
504
506
et al. (2009) and Zhang and Li (2016). Since TMOs yield high variability
507
in the results depending on the input, it is of interest to evaluate how our
508
method performs against them on each sample input individually. For both 7 Mark
Fairchild dataset website: http://rit-mcsl.org/fairchild/HDR.html
31
ACCEPTED MANUSCRIPT
AN US
CR IP T
(a)
(c)
CE
PT
ED
M
(b)
Figure 10: Quantitative comparison between different TMOs and our method (a) Comparison of the TMQI metric obtained by our method against other 29 TMOs (see text for the exact references). (b) Detailed comparison between TMQI values obtained by Benoit
AC
et al. (2009) and our method. We observe no correlation (r = −0.18, p = 0.071). (c) Detailed comparison between TMQI values obtained by Zhang and Li (2016) and our method. We observe a correlated response (r = 0.33, p < 0.05). (b)-(c) Orange diagonal line shows the region of equal performance. Points below this line correspond to images where our method obtained a better TMQI value.
32
ACCEPTED MANUSCRIPT
approaches we compare the TMQI value for each image of the Fairchild dataset.
510
In Fig. 10(b) we compare our method with Benoit et al. (2009). Methods seem
511
to be complementary: there are images where our method obtained a TMQI
512
value between 0.7 and 0.8, while Benoit et al. (2009) obtained a score over
513
0.8. By the contrary, images where the TMQI value of Benoit et al. (2009)
514
did not passed over 0.8, our method generated a TMQI value in the range
515
0.8–1.0. Overall, our method gives a better score in 59 out of 101 images. The
516
regression applied to this data (r = −0.15, p = 0.13) confirm the complementary
517
performance showing no correlation between them. Similarly, in Fig. 10(b) we
518
compare our method with Zhang and Li (2016). Overall, our method gives a
519
better score in 56 out of 101 images. In this case, the regression applied to data
520
suggests a slight correlation between both TMQI values (r = 0.33, p < 0.05).
521
3.6. Result videos
AN US
CR IP T
509
Common artifacts in video TMO are brightness flickering and ghosting pro-
523
duced by temporal filtering. We tested our method on the two HDR videos
524
from the Grzegorz Krawczyk dataset.8 Results are shown in Fig. 11. Videos
525
with the resulting radiance maps can be found in the paper website.
ED
M
522
Our model generates satisfying results regarding radiance maps, particularly
527
when the car driver faces direct sunlight generating overexposure in the camera.
528
Brightness flickering is reduced as shown by the evolution of the temporal av-
529
erage (compare temporal variations in IOP L and VBip . However, we found that
530
this time coherence is lost if the video frames are then colorized and gamma
PT
526
corrected, bringing back some chromatic flickering. This is a limitation our our
532
approach coming from the colorization method we chose (Mantiuk et al., 2009),
533
since flickers in the input are transmitted to the output.
AC
CE 531
534
Another interesting observation from Fig. 11 is about the temporal effect of
535
CGC layer. We show the mean log luminance of each frame at different levels of
536
our architecture: the input luminance map (L), the output of the OPL (IOPLt ) 8 Grzegorz
Krawczyk dataset website: http://resources.mpi-inf.mpg.de/hdr/video/
33
ACCEPTED MANUSCRIPT
and the output CGC stage (VBip ). Input sequence has fast changes in the overall
538
luminance (flickering) which are still present at the OPL stage. These temporal
539
variations are smoothed in the next stage thanks to CGC. This effect is best
540
seen on the videos shown on the paper website.
CR IP T
537
We found in our model that a critical parameter in terms of temporal co-
542
herence is τA , as seen in the gA (x, y, t) component of Eq. (9). As explained
543
in Wohrer and Kornprobst (2009), this constant relates to the time scale of the
544
adaptation and cannot be fixed from experimental data, because of the hypo-
545
thetical nature of the stage. Since this time constant is related to a exponential
546
kernel, and thus a low-pass time filter, it is inversely proportional to the cut off
547
frequency of such filter. We found through tests that the best value for this is
548
τA = 0.0005 [s], for which temporal coherence is improved compared with the
549
case without temporal convolutions. On the other hand, a bigger selection of
550
τA leads to an strong ghosting effect, because the tight bandwidth in the filter
551
prevents the natural changes in luminance across time per pixel, in contrast to
552
the luminance flickering we want to avoid.
553
4. Conclusion
ED
M
AN US
541
In this paper we show a promising way to address computational photogra-
555
phy challenges by exploiting the current research in neuroscience about retina
556
processing. Models based properties of the visual system, and more specifically
557
on detailed retinal circuitry, will certainly benefit from the research in neuro-
558
science. As such, choosing Virtual Retina appears to be a promising idea
559
since our goal is to pursue its development towards a better understanding of
560
retinal processing (Cessac et al., 2017b). Indeed, one of our goal is to systemat-
CE
PT
554
ically investigate how new mechanisms found in the retina could be interpreted
562
and be useful in the context of artificial vision problems. This conveys to this
563
work a high potential for further improvements.
AC 561
564
For static images our TMO works as expected, effectively compressing the
565
dynamic range to fit a normal 24-bit LDR image generating an equalized image.
34
ACCEPTED MANUSCRIPT
nf = 0
nf = 200
nf = 500
nf = 0
nf = 200
nf = 500
nf = 0
nf = 200
nf = 500
IOPL
AN US
VBip
CR IP T
L
(a)
nf = 0
nf = 200
VBip
nf = 200
PT
nf = 0
ED
IOPL
nf = 200
CE
nf = 0
Figure 11:
M
L
nf = 500
nf = 500
nf = 500
(b)
Results on the two videos from Grzegorz Krawczyk’s dataset: (a)
AC
Highway video and (b) Tunnel video. For each case we show sample frames of the video at different levels of our architecture: the input luminance map (L), the output of the OPL (IOPL ) and the output CGC stage (VBip ). On the right hand side, we show the time-evolution of the corresponding log spatial average of each frame. Results show how temporal coherence is improved at CGC level.
35
ACCEPTED MANUSCRIPT
In the case of videos, the TMO here proposed also work properly in the lumi-
567
nance information, being the CGC layer an important step to guarantee the
568
temporal coherence to the video. In videos, one of the limitations of the TMO
569
here presented is that the current colorization method generates color flicker-
570
ing on the output. This limitation arises because the original Virtual Retina
571
model is based on luminance maps only, i.e., monochrome. A workaround of this
572
problem could be either to include color processing in retina model or maybe
573
improve colorization method in post-processing stage.
CR IP T
566
The model here presented contains a series of parameters related to the dif-
575
ferent stages of global and local adaptation of the input luminance image. We
576
have observed in Section 3 the effect of different parameters in the global and lo-
577
cal contrast adaptation. Interestingly, the parameters related to the IOPL (x, y, t)
578
computation relating the balance between center and surround, define the ex-
579
tent of local adaptation. Areas with low contrast looks brighter is the size of
580
the surround neighborhood increases. This type of computation shows similar-
581
ities with observations in neurophysiology data of sensory systems, where the
582
system integrates information inside a wider neighborhood if the signal to noise
583
ratio is small, or more drastically, the shape of the surround varies according to
584
the input information (Pack et al., 2005; Fournier et al., 2011). This behavior
585
suggests that an adaptive selection of the surround size and shape, both at the
586
OPL and CGC could highly benefit the quality of the TMO.
PT
ED
M
AN US
574
We see three challenging perspectives for this work. First perspective is to
588
include more cells types to have a richer representation of the input and know
CE
587
how to exploit them. It has been shown that the retina contains a wide variety
590
of ganglion cell types simultaneously encoding features of the visual world (Gol-
591
lisch and Meister, 2010; Sanes and Masland, 2015; Masland, 2011; Lorach et al.,
AC
589
592
2012). A natural extension of the model here proposed could be to increase the
593
types of retinal ganglion cells in order to expand its computational capabilities.
594
This extension should be also joined with the proposition of efficient readouts
595
to obtain a single retina output traduced on an output image. The readout of
596
neural activity is related to a decoding mechanism, which is an open research 36
ACCEPTED MANUSCRIPT
topic in the neuroscience community (Quiroga and Panzeri, 2009; Warland et al.,
598
1997; Pitkow and Meister, 2012; Graf et al., 2011). Second perspective is to fur-
599
ther investigate other nonlinear mechanisms in the retina such as Berry et al.
600
(1999); Chen et al. (2013); Souihel and Cessac (2017). These mechanisms have
601
been shown to account for temporal processing such as anticipation and we
602
plan to study how they would impact the quality of tone mapped videos. Third
603
perspective is to include fixational eye movements for the static image sce-
604
nario (Martinez-Conde et al., 2013). Indeed, our eyes are never at rest. They
605
are continuously moving even in a fixation tasks. These eye movements, and in
606
particular the microsaccades part, are known to enhance visual perception. This
607
idea has been explored in the context of edge detection (Hongler et al., 2003),
608
superresolution (Yi et al., 2011) and coding (Masquelier et al., 2016). We think
609
it is interesting to investigate their effect also for tone mapping. Finally, the
610
quantitative comparison through TMQI values with other bio-inspired meth-
611
ods suggested complementary behaviors as we observed in Fig. 10(b)-(c). To
612
analyze in detail the complementary contributions of each computation stage,
613
projecting to merge the better aspects of different TMOs in a single one, could be
614
an interesting track to explore. In particular, the color de/multiplexing scheme
615
proposed by Benoit et al. (2010) is of special interest for us to deal with videos
616
and might reduce the flickering observed by our current colorization approach.
617
Appendix A. Photoreceptor adaptation and pupil model
PT
ED
M
AN US
CR IP T
597
Photoreceptors adjust their sensitivity over time and space, providing a first
619
adaptation step for the incoming light. The Virtual Retina model proposed
620
by Wohrer and Kornprobst (2009), originally designed to process LDR images,
CE
618
does not have this adaptation in consideration at the input stage so there is a
622
need to provide it without interfering the mechanisms of the rest of the model.
623
In the other hand, Dunn et al. (2007) showed that the photoreceptor adapta-
624
tion is mutually exclusive from the adaptation occurring further into the retina,
625
since the latter happens as the signal go from the cone bipolar cells to the gan-
AC 621
37
ACCEPTED MANUSCRIPT
626
glionar layer. Also, the authors provided a response model for cones, which in
627
light of the previous statement, would fit well with our model. Using electrophysiological data recorded from L type cone response, Dunn
CR IP T
et al. (2007) constructed an empirical model which relates the normalized response of a cone h(·) to the background intensity iB cast over it, presented in the following relation: h(iB ) = 1+
629
630
631
i1/2 iB
n ,
(A.1)
where i1/2 means the half maximal background intensity, estimated at 45·103 ±7· 103 P s−1 on their data. On the other hand, n is the Hill exponent describing
AN US
628
1
the relationship between response amplitude and background intensity iB , value estimated at n = 0.7 ± 0.05.
M
The model proposed in (A.1) is expressed in Absorbed photons per cone per second - P s−1 , and to easily use it in our model we need to convert it into cd m−2 . The following formula is used to convert between these units9 : i = f (l) = 10π · l · ρ2pupil ,
ED
(A.2)
632
where ρpupil is the radius of the pupil in millimeters and l is the luminance to
633
convert.
PT
There are several models to relate the radius of the pupil ρpupil with luminance covering different complexities (Watson and Yellott, 2012). For the sake
CE
of simplicity, we chose the de Groot and Gebhard (1952) model defined as:
¯ = which relates the average incoming spatial luminance L(t)
AC
634
¯ = 3.5875 exp −0.00092 7.597 + log L ¯ 3 , ρpupil (L) R
x,y
(A.3)
L(x, y, t)dxdy,
635
with the radius of the pupil ρpupil in millimeters. This model comes from an ex-
636
perimental fit which does not consider additional factors such as age, contraction
637
/ dilation or adapting field size. 9 http://retina.anatomy.upenn.edu/
~rob/lance/units_photometric.html
38
ACCEPTED MANUSCRIPT
Thus finally, the response of the photoreceptos h(·) can be now expressed ¯ as in terms of the luminance background lB and the background luminance L follows: 1+
1 i1/2 ¯ 2 10π·lB ·ρpupil (L)
n .
(A.4)
CR IP T
¯ = h(lB , L)
¯ Furthermore, since i1/2 is constant, we define a new expression l1/2 (L): ¯ = h(lB , L) 1+
638
639
1 ¯ l1/2 (L) lB
n
i1/2 ¯ 2 10π · ρpupil (L)
(A.5)
(A.6)
AN US
¯ = l1/2 (L)
where l1/2 represents the half maximal background luminosity for the fitted data - now in cd m−2 - and lB . Figure A.12 shows how l1/2 depends on the average ¯ luminance L.
10 4
ED
l1/2 (L¯)
10 5
M
10 6
PT
10 3
CE
10 2 0
AC
Figure A.12:
2000 4000 6000 8000 Average global luminance L¯
10000
¯ Plot of Equation (A.6) in function of the Characteristic curve of i1/2 (L).
¯ For each image i1/2 (L) ¯ behaves as a constant on Eq. (A.5), incoming average luminance L. yet over a video sequence the non-linear property of the curve helps to smooth luminance transients between frames.
640
39
ACCEPTED MANUSCRIPT
641
Acknowledgements Financial support: CONICYT-FONDECYT 1140403, CONICYT-Basal Project
642
FB0008, Inria. The authors are very grateful to Adrien Bousseau for the fruit-
644
ful discussions and his valuable feedback on several preliminary versions of this
645
paper.
646
References
647
Ashikhmin, M., 2002. A tone mapping algorithm for high contrast images. In:
648
Proceedings of the 13th Eurographics Workshop on Rendering. EGRW ’02.
649
Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, pp. 145–
650
156.
AN US
CR IP T
643
Aydin, T. O., Mantiuk, R., Myszkowski, K., Seidel, H.-P., Aug. 2008. Dynamic
652
range independent image quality assessment. ACM Transactions on Graphics
653
(TOG) 27 (3).
M
651
Aydin, T. O., Stefanoski, N., Croci, S., Gross, M., Smolic, A., 2014. Temporally
655
coherent local tone mapping of hdr video. ACM Transactions on Graphics
656
(TOG) 33 (6), 196.
Baccus, S., Meister, M., 2002. Fast and slow contrast adaptation in retinal circuitry. Neuron 36 (5), 909–919.
PT
657
ED
654
658
Balasubramanian, V., Sterling, P., 2009. Receptive fields and functional architecture in the retina. The Journal of Physiology 587 (12), 2753–2767.
CE
659
660
Banterle, F., Artusi, A., Debattista, K., Chalmers, A., 2011. Advanced High
662
Dynamic Range Imaging: Theory and Practice. AK Peters (CRC Press),
663
Natick, MA, USA.
AC
661
664
Basalyga, G., Montemurro, M. A., Wennekers, T., 2013. Information coding
665
in a laminar computational model of cat primary visual cortex. J. Comput.
666
Neurosci. 34, 273–83.
40
ACCEPTED MANUSCRIPT
667
Benoit, A., Alleysson, D., H´erault, J., Le Callet, P., 2009. Spatio-temporal tone
668
mapping operator based on a retina model. In: Computational Color Imaging
669
Workshop. Benoit, A., Caplier, A., Durette, B., Herault, J., 2010. Using human visual
671
system modeling for bio-inspired low level image processing. Computer Vision
672
and Image Understanding 114 (7), 758 – 773.
673
CR IP T
670
Berry, M., Brivanlou, I., Jordan, T., Meister, M., 1999. Anticipation of moving stimuli by the retina. Nature 398 (6725), 334–338.
674
Bertalm´ıo, M., 2014. Image Processing for Cinema. CRC Press.
676
Carandini, M., Demb, J. B., Mante, V., Tollhurst, D. J., Dan, Y., Olshausen,
677
B. A., Gallant, J. L., Rust, N. C., Nov. 2005. Do we know what the early
678
visual system does? Journal of Neuroscience 25 (46), 10577–10597.
AN US
675
Cessac, B., Kornprobst, P., Kraria, S., Nasser, H., Pamplona, D., Portelli, G.,
680
Vi´eville, T., 2017a. Pranas: a new platform for retinal analysis and simulation.
681
Frontiers in NeuroinformaticsTo appear.
ED
M
679
Cessac, B., Kornprobst, P., Kraria, S., Nasser, H., Pamplona, D., Portelli, G.,
683
Viville, T., 2017b. PRANAS: a new platform for retinal analysis and simula-
684
tion. Frontiers in NeuroinformaticsTo appear.
PT
682
Chen, E. Y., Marre, O., Fisher, C., Schwartz, G., Levy, J., da Silviera, R. A.,
686
Berry, M. J. I., 2013. Alert response to motion onset in the retina. Journal of
CE
685
Neuroscience 33 (1), 120132.
687
Chichilnisky, E. J., 2001. A simple white noise analysis of neuronal light re-
AC
688
689
sponses. Network: Comput. Neural Syst. 12, 199–213.
690
Chichilnisky, E. J., Kalmar, R. S., Apr. 2002. Functional asymmetries in ON
691
and OFF ganglion cells of primate retina. The Journal of Neuroscience 22 (7),
692
27372747.
41
ACCEPTED MANUSCRIPT
693
Chiu, K., Herf, M., Shirley, P., Swamy, S., Wang, C., Zimmerman, K., 1993.
694
Spatially nonuniform scaling functions for high contrast images. In: In Pro-
695
ceedings of Graphics Interface 93. pp. 245–253. Choudhury, P., Tumblin, J., 2003. The trilateral filter for high contrast im-
697
ages and meshes. In: Proceedings of the 14th Eurographics Workshop on
698
Rendering. EGRW ’03. Eurographics Association, Aire-la-Ville, Switzerland,
699
Switzerland, pp. 186–196.
700
CR IP T
696
de Groot, S. G., Gebhard, J. W., 1952. Pupil size as determined by adapting luminance. J. Opt. Soc. Am. 42 (7), 492–495.
AN US
701
Debevec, P. E., Malik, J., 1997. Recovering high dynamic range radiance
703
maps from photographs. In: Proceedings of the 24th Annual Conference
704
on Computer Graphics and Interactive Techniques. SIGGRAPH ’97. ACM
705
Press/Addison-Wesley Publishing Co., New York, NY, USA, pp. 369–378.
706
Doutsi, E., Fillatre, L., Antonini, M., Gaulmin, J., 2015a. Retinal-inspired fil-
707
tering for dynamic image coding. In: Image Processing (ICIP), 2015 IEEE
708
International Conference on. IEEE, p. 35053509.
ED
M
702
Doutsi, E., Fillatre, L., Antonini, M., Gaulmin, J., 2015b. Video analysis and
710
synthesis based on a retinal-inspired frame. In: Signal Processing Conference
711
(EUSIPCO), 2015 23rd European. IEEE, p. 22262230.
PT
709
Drago, F., Myszkowski, K., Annen, T., Chiba, N., 2003. Adaptive logarith-
713
mic mapping for displaying high contrast scenes. Computer Graphics Forum
CE
712
22 (3), 419–426.
714
Dunn, F. A., Lankheet, M. J., Rieke, F., 2007. Light adaptation in cone vision in-
AC
715
716
volves switching between receptor and post-receptor sites. Nature 449 (7162),
717
603–606.
718
Durand, F., Dorsey, J., 2002. Fast bilateral filtering for the display of high-
719
dynamic-range images. In: Akeley, K. (Ed.), Proceedings of the SIGGRAPH.
720
ACM Press, ACM SIGGRAPH, Addison Wesley Longman. 42
ACCEPTED MANUSCRIPT
721
Eilertsen, G., Mantiuk, R. K., Unger, J., 2017. A Comparative Review of Tone-
722
mapping Algorithms for High Dynamic Range Video. Computer Graphics
723
Forum 36 (2), 565592. Eilertsen, G., Wanat, R., Mantiuk, R., Unger, J., Oct. 2013. Evaluation of tone
725
mapping operators for hdr-video. Computer Graphics Forum 32 (7), 275–284.
726
Fattal, R., Lischinski, D., Werman, M., Jul. 2002. Gradient domain high dy-
CR IP T
724
namic range compression. ACM Trans. Graph. 21 (3), 249–256.
727
Ferradans, S., Bertalmio, M., Provenzi, E., Caselles, V., Oct. 2011. An Analy-
729
sis of Visual Adaptation and Contrast Perception for Tone Mapping. IEEE
730
Transactions on Pattern Analysis and Machine Intelligence 33 (10), 2002–
731
2012.
AN US
728
Ferschin, P., Tastl, I., Purgathofer, W., Nov 1994. A comparison of techniques
733
for the transformation of radiosity values to monitor colors. In: Proceedings of
734
1st International Conference on Image Processing. Vol. 3. pp. 992–996 vol.3.
735
Ferwerda, J. A., Pattanaik, S. N., Shirley, P., Greenberg, D. P., 1996. A model
736
of visual adaptation for realistic image synthesis. In: Proceedings of the 23rd
737
annual conference on Computer graphics and interactive techniques. ACM,
738
pp. 249–258.
ED
M
732
Fournier, J., Monier, C., Pananceau, M., Frgnac, Y., 2011. Adaptation of the
740
simple or complex nature of v1 receptive fields to visual statistics. Nature
741
neuroscience 14 (8), 10531060.
CE
PT
739
742
Garvert, M. M., Gollisch, T., 2013. Local and global contrast adaptation in retinal ganglion cells. Neuron 77 (5), 915928.
AC
743
744
745
Gollisch, T., Meister, M., Jan. 2010. Eye smarter than scientists believed: neural computations in circuits of the retina. Neuron 65 (2), 150–164.
746
Graf, A. B., Kohn, A., Jazayeri, M., Movshon, J. A., 2011. Decoding the activ-
747
ity of neuronal populations in macaque primary visual cortex. Nature neuro-
748
science 14 (2), 239245. 43
ACCEPTED MANUSCRIPT
749
Gryaditskya, Y., Pouli, T., Reinhard, E., Seidel, H.-P., Oct. 2014. Sky based
750
light metering for high dynamic range images. Comput. Graph. Forum 33 (7),
751
61–69. Hafner, D., Demetz, O., Weickert, J., 2014. Simultaneous hdr and optic flow
753
computation. In: Pattern Recognition (ICPR), 2014 22nd International Con-
754
ference on. IEEE, pp. 2065–2070.
755
CR IP T
752
H´erault, J., 2010. Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception. World Scientific.
756
Hongler, M., de Meneses, Y. L., Beyeler, A., Jacot, J., Sep. 2003. The resonant
758
retina: Exploiting vibration noise to optimally detect edges in an image. IEEE
759
Transactions on Pattern Analysis and Machine Intelligence 25 (9), 1051–1062.
760
Horn, B. K., 1974. Determining lightness from an image. Computer Graphics
AN US
757
and Image Processing 3 (4), 277 – 299.
761
Irawan, P., Ferwerda, J. A., Marschner, S. R., 2005. Perceptually based tone
763
mapping of high dynamic range image streams. In: Rendering Techniques.
764
pp. 231–242.
ED
M
762
Kalantari, N. K., Shechtman, E., Barnes, C., Darabi, S., Goldman, D. B., Sen,
766
P., Nov. 2013. Patch-based high dynamic range video. ACM Trans. Graph.
767
32 (6), 202:1–202:8.
PT
765
Kastner, D. B., Baccus, S. A., Apr. 2014. Insights from the retina into the
769
diverse and general computations of adaptation, detection, and prediction.
CE
768
Current Opinion in Neurobiology 25, 63–69.
770
Kim, K. J., Rieke, F., Jan. 2001. Temporal contrast adaptation in the input and
AC
771
772
output signals of salamander retinal ganglion cells. Journal of Neuroscience
773
21 (1), 287–299.
774
Kim, M. H., Kautz, J., 2008. Consistent tone reproduction. In: Proceedings
775
of the Tenth IASTED International Conference on Computer Graphics and
776
Imaging. CGIM ’08. ACTA Press, Anaheim, CA, USA, pp. 152–159. 44
ACCEPTED MANUSCRIPT
777
Kremkow, J., Jin, J., Komban, S. J., Wang, Y., Lashgari, R., Li, X., Jansen,
778
M., Zaidi, Q., , Alonso, J.-M., 2014. Neuronal nonlinearity explains greater
779
visual spatial resolution for darks than lights. PNAS 111 (8), 31703175. Kuang, J., Johnson, G. M., Fairchild, M. D., 2007. icam06: A refined image
781
appearance model for hdr image rendering. Journal of Visual Communication
782
and Image Representation 18 (5), 406 – 414, special issue on High Dynamic
783
Range Imaging.
CR IP T
780
Kung, J., Yamaguchi, H., Liu, C., Johnson, G., Fairchild, M., Jul. 2007. Eval-
785
uating hdr rendering algorithms. ACM Transactions on Applied Perception
786
4 (2).
787
AN US
784
Land, E. H., McCann, J. J., 1971. Lightness and retinex theory. Josa 61 (1), 1–11.
788
Larson, G. W., Rushmeier, H., Piatko, C., 1997a. A visibility matching tone
790
reproduction operator for high dynamic range scenes. IEEE Transactions on
791
Visualization and Computer Graphics 3 (4), 291–306.
M
789
Larson, G. W., Rushmeier, H., Piatko, C., Oct 1997b. A visibility matching tone
793
reproduction operator for high dynamic range scenes. IEEE Transactions on
794
Visualization and Computer Graphics 3 (4), 291–306.
ED
792
Li, Y., Sharan, L., Adelson, E. H., Jul. 2005. Compressing and companding
796
high dynamic range images with subband architectures. ACM Trans. Graph.
797
24 (3), 836–844.
CE
PT
795
Lorach, H., Benosman, R., Marre, O., Ieng, S.-H., Sahel, J. A., Picaud, S., Oct.
799
2012. Artificial retina: the multichannel processing of the mammalian retina
AC
798
800
achieved with a neuromorphic asynchronous light acquisition device. Journal
801
of Neural Engineering 9 (6), 066004.
802
Mangiat, S., Gibson, J., 2010. High dynamic range video with ghost removal.
803
In: SPIE Optical Engineering+ Applications. International Society for Optics
804
and Photonics, pp. 779812–779812. 45
ACCEPTED MANUSCRIPT
805
Mantiuk, R., Daly, S., Kerofsky, L., 2008. Display adaptive tone mapping. In: ACM Transactions on Graphics (TOG). Vol. 27. ACM, p. 68.
806
Mantiuk, R., Tomaszewska, A., Heidrich, W., 2009. Color correction for tone
808
mapping. In: Computer Graphics Forum. Vol. 28. Wiley Online Library, pp.
809
193–202.
CR IP T
807
810
Martinez-Conde, S., Otero-Millan, J., Macknik, S. L., Feb. 2013. The impact
811
of microsaccades on vision: towards a unified theory of saccadic function.
812
Nature Reviews Neuroscience 14 (2), 83–96.
Masland, R. H., Jun. 2011. Cell populations of the retina: The proctor lecture.
AN US
813
Investigative Ophthalmology and Visual Science 52 (7), 4581–4591.
814
815
Masmoudi, K., Antonini, M., Kornprobst, P., 2010. Another look at the retina
816
as an image scalar quantizer. In: Proceedings of the International Symposium
817
on Circuits and Systems (ISCAS).
Masquelier, T., 2012. Relative spike time coding and stdp-based orientation
819
selectivity in the early visual system in natural continuous and saccadic vision:
820
a computational model. Journal of Computational Neuroscience 32 (3), 425–
821
441.
ED
M
818
Masquelier, T., Portelli, G., Kornprobst, P., Apr. 2016. Microsaccades enable
823
efficient synchrony-based coding in the retina: a simulation study. Scientific
824
Reports 6, 24086.
CE
PT
822
Medathati, N. V. K., Neumann, H., Masson, G. S., Kornprobst, P., 2016. Bio-
826
inspired computer vision: Towards a synergistic approach of artificial and
827
biological vision. Computer Vision and Image Understanding, –.
AC
825
828
Meylan, L., Alleysson, D., S¨ usstrunk, S., 2007. Model of retinal local adaptation
829
for the tone mapping of color filter array images. J. Opt. Soc. Am. A 24 (9),
830
2807–2816.
46
ACCEPTED MANUSCRIPT
831
Miller, N. J., Ngai, P. Y., Miller, D. D., 1984. The application of computer
832
graphics in lighting design. Journal of the Illuminating Engineering Society
833
14 (1), 6–26. Mohemmed, A., Lu, G., Kasabov, N., 2012. Evaluating SPAN Incremental
835
Learning for Handwritten Digit Recognition. In: Neural Information Pro-
836
cessing. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 670–677.
CR IP T
834
Muchungi, K., Casey, M., Sep. 2012. Simulating Light Adaptation in the Retina
838
with Rod-Cone Coupling. In: ICANN. Springer Berlin Heidelberg, Berlin,
839
Heidelberg, pp. 339–346.
840
AN US
837
O’Carroll, D. C., Warrant, E. J., Feb. 2017. Introduction: Vision in dim light: highlights and challenges. Phil. Trans. R. Soc. B.
841
Oppenheim, A., Schafer, R., Stockham, T., Sep 1968. Nonlinear filtering of
843
multiplied and convolved signals. IEEE Transactions on Audio and Electroa-
844
coustics 16 (3), 437–466.
845
M
842
Oskarsson, M., 2015. Democratic Tone Mapping Using Optimal K-means Clustering. Springer International Publishing, Cham, pp. 354–365.
ED
846
Pack, C., Hunter, J., Born, R., 2005. Contrast dependence of suppressive in-
848
fluences in cortical area MT of alert macaque. Journal of Neurophysiology
849
93 (3), 1809–1815.
Pandarinath, C., Victor, J., Nirenberg, S., 2010. Symmetry Breakdown in the
CE
850
PT
847
ON and OFF Pathways of the Retina at Night: Functional Implications. The
852
Journal of neuroscience 30 (30), 1000610014.
AC
851
853
Pattanaik, S. N., Ferwerda, J. A., Fairchild, M. D., Greenberg, D. P., 1998. A
854
multiscale model of adaptation and spatial vision for realistic image display.
855
In: Proceedings of the 25th Annual Conference on Computer Graphics and
856
Interactive Techniques. SIGGRAPH ’98. ACM, New York, NY, USA, pp.
857
287–298.
47
ACCEPTED MANUSCRIPT
Pattanaik, S. N., Tumblin, J., Yee, H., Greenberg, D. P., 2000. Time-dependent
859
visual adaptation for fast realistic image display. In: Proceedings of the 27th
860
annual conference on Computer graphics and interactive techniques. ACM
861
Press/Addison-Wesley Publishing Co., pp. 47–54.
862
CR IP T
858
Pitkow, X., Meister, M., 2012. Decorrelation and efficient coding by retinal ganglion cells. Nature neuroscience 15 (4), 628635.
863
Purves, D., Augustine, G. J., Fitzpatrick, D., Katz, L. C., LaMantia, A.-S.,
865
McNamara, J. O., Williams, S. M., 2001. Neuroscience, 2nd Edition. Sinauer
866
Associates, Inc.
AN US
864
867
Quiroga, R. Q., Panzeri, S., 2009. Extracting information from neuronal popu-
868
lations: information theory and decoding approaches. Nature Reviews Neu-
869
roscience 10 (3), 173185.
870
Rahman, Z.-u., Jobson, D. J., Woodell, G. A., 1996. A multiscale retinex for color rendition and dynamic range compression. Tech. rep., NASA Langley.
M
871
Ratliff, C. P., Borghuis, B. G., Kao, Y.-H., Sterling, P., Balasubramanian, V.,
873
Oct. 2010. Retina is structured to process an excess of darkness in natural
874
scenes. Proc Natl Acad Sci U S A. 107 (40), 1736817373. Reinhard, E., Nov. 2002. Parameter estimation for photographic tone reproduc-
PT
875
ED
872
tion. J. Graph. Tools 7 (1), 45–52.
876
Reinhard, E., Devlin, K., 2005. Dynamic range reduction inspired by photore-
CE
877
ceptor physiology. IEEE Transactions on Visualization and Computer Graph-
879
ics 11 (1), 13–24.
AC
878
880
Reinhard, E., Pouli, T., Kunkel, T., Long, B., Ballestad, A., Damberg, G., Nov.
881
2012. Calibrated image appearance reproduction. ACM Trans. Graph. 31 (6),
882
201:1–201:11.
883
884
Reinhard, E., Stark, M., Shirley, P., Ferwerda, J., Jul. 2002. Photographic tone reproduction for digital images. ACM Trans. Graph. 21 (3), 267–276. 48
ACCEPTED MANUSCRIPT
Reinhard, E., Ward, G., Pattanaik, S., Debevec, P., 2005. High Dynamic Range
886
Imaging: Acquisition, Display, and Image-Based Lighting. The Morgan Kauf-
887
mann Series in Computer Graphics. Morgan Kaufmann Publishers Inc., San
888
Francisco, CA, USA.
889
Rieke, F., Dec. 2001. Temporal contrast adaptation in salamander bipolar cells. Journal of Neuroscience 21 (23), 9445–9454.
890
891
CR IP T
885
Rieke, F., Rudd, M. E., Dec. 2009. The challenges natural images pose for visual adaptation. Neuron 64 (5), 605–616.
892
Sanes, J., Masland, R., 2015. The types of retinal ganglion cells: current status
894
and implications for neuronal classification. Annu Rev Neurosci 38, 221–246.
895
Schlick, C., 1995. Quantization Techniques for Visualization of High Dynamic
AN US
893
Range Pictures. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 7–20.
896
Shapley, R., Enroth-Cugell, C., 1984. Visual adaptation and retinal gain controls. Progress in retinal research 3, 263–346.
M
897
898
Shapley, R. M., Victor, J. D., 1978. The effect of contrast on the transfer proper-
900
ties of cat retinal ganglion cells. The Journal of Physiology 285 (1), 275–298.
901
Souihel, S., Cessac, B., Sep. 2017. Modifying a biologically inspired retina sim-
902
ulator to reconstruct realistic responses to moving stimuli. Bernstein Confer-
903
ence 2017, submitted.
PT
ED
899
Takeshita, D., Smeds, L., Ala-Laurila, P., Feb. 2017. Processing of single-photon
905
responses in the mammalian on and off retinal pathways at the sensitivity
906
limit of vision. Phil. Trans. R. Soc. B.
AC
CE
904
907
908
909
910
Thoreson, W., Mangel, S., 2012. Lateral interactions in the outer retina. Prog Retin Eye Res. 31 (5), 407–441.
Tumblin, J., Rushmeier, H., Nov 1993. Tone reproduction for realistic images. IEEE Computer Graphics and Applications 13 (6), 42–48.
49
ACCEPTED MANUSCRIPT
911
Van Hateren, J., 2006. Encoding of high dynamic range video with a model of human cones. ACM Transactions on Graphics (TOG) 25 (4), 1380–1399.
912
Vance, P., Coleman, S. A., Kerr, D., Das, G., McGinnity, T., 2015. Modelling
914
of a retinal ganglion cell with simple spiking models. In: IEEE Int. Jt. Conf.
915
Neural Networks. pp. 1–8.
916
Victor, J. D., 1987. The dynamics of the cat retinal X cell centre. The Journal of Physiology 386 (1), 219–246.
917
918
Ward, G., 1994. Graphics gems iv. ACM, New York, NY, USA, Ch. A Contrastbased Scalefactor for Luminance Display, pp. 415–421.
AN US
919
920
Warland, D., Reinagel, P., Meister, M., 1997. Decoding visual information from a population of retinal ganglion cells. J. Neurophysiol 78, 23362350.
921
922
CR IP T
913
Watson, A. B., Yellott, J. I., 2012. A unified formula for light-adapted pupil size. Journal of Vision 12 (10), 12.
M
923
Wohrer, A., Kornprobst, P., 2009. Virtual Retina : A biological retina model and
925
simulator, with contrast gain control. Journal of Computational Neuroscience
926
26 (2), 219, dOI 10.1007/s10827-008-0108-4.
927
ED
924
Wohrer, A., Kornprobst, P., Antonini, M., Jun. 2009. Retinal filtering and image reconstruction. Tech. Rep. 6960, INRIA.
PT
928
Yee, Y. H., Pattanaik, S., 2003. Segmentation and adaptive assimilation for
930
detail-preserving display for high-dynamic range images. Visual Computer
931
19, 457–466.
CE
929
Yeganeh, H., Wang, Z., Feb 2013. Objective quality assessment of tone-mapped
AC
932
933
images. IEEE Transactions on Image Processing 22 (2), 657–667.
934
Yi, D., Jiang, P., Mallen, E., Wang, X., Zhu, J., Aug. 2011. Enhancement of
935
image luminance resolution by imposing random jitter. Neural Computing
936
and Applications 20 (2), 261–272.
50
ACCEPTED MANUSCRIPT
Zaghloul, K., Boahen, K., Demb, J. B., Apr. 2003. Different circuits for on and
938
off retinal ganglion cells cause different contrast sensitivities. The Journal of
939
Neuroscience : The Official Journal of the Society for Neuroscience 23 (7),
940
264554.
CR IP T
937
941
Zhang, X.-S., Gao, S.-B., Li, C.-Y., Li, Y.-J., 2015. A retina inspired model for
942
enhancing visibility of hazy images. Frontiers in Computational Neuroscience
943
9, 151.
944
Zhang, X.-S., Li, Y.-J., 2016. A Retina Inspired Model for High Dynamic Range Image Rendering. Springer International Publishing, Cham, pp. 68–79.
AC
CE
PT
ED
M
AN US
945
51