A comfortable brain-interface to video displays

A comfortable brain-interface to video displays

Neural Networks PERGAMON Neural Networks 12 (1999) 347–354 Contributed article A comfortable brain-interface to video displays Masahide Nomura* NEC...

341KB Sizes 2 Downloads 34 Views

Neural Networks PERGAMON

Neural Networks 12 (1999) 347–354

Contributed article

A comfortable brain-interface to video displays Masahide Nomura* NEC Fundamental Research Laboratories, 34 Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan Received 12 June 1998; received in revised form 8 October 1998; accepted 8 October 1998

Abstract Recent progress in and the popularization of computer graphics mean we now see many images that are composed artificially and include a lot of flicker to add to their impact. These highly flickering images, however, cause fatigue that affects our brain rather than our eyes. This is a content-dependent video display terminal (VDT) hazard that is unlike conventional VDT hazards. This paper shows that content-dependent VDT hazards are a genuine threat based on physiological evidence concerning the temporal response of visual cortical cells, and proposes a quantitative measure to estimate the risk of the hazard, and also provides an adaptive filtering method to reduce the risk. Images from the critical part of the ‘‘Pocket Monsters’’ TV animation episode were studied to confirm the effectiveness of this method. 䉷 1999 Elsevier Science Ltd. All rights reserved. Keywords: Temporal response; Flicker; Photosensitive seizure; Photo-induced seizures; Content-dependent VDT hazards; Resonance; Adaptive filter; Attention

1. Introduction 1.1. Background and motivation A wide variety of video display terminal (VDT) hazards have been recognized so far; musculoskeletal disorders, eye fatigue caused by the reflection of ceiling lights, dry eyes, and possible health hazards from electromagnetic radiation are some well known examples (US Department of Labor, 1977). Along with these conventional hazards, this paper proposes that content-dependent VDT hazards that include photosensitive seizures should also be considered. Photosensitive seizure was believed to occur only for people who are light-sensitive epileptic or subclinically light-sensitive epileptic. The recent ‘‘Pokemon incident’’ in Japan (TV Tokyo, 1997), however, called this into question. A surveillance committee of the Japanese Ministry of Health and Welfare reported that 10.6% of elementary, junior, and high-school students who watched that TV episode felt sick, and 7.2% of these suffered convulsions (Japanese Ministry of Health and Welfare, 1998; Yamaguchi et al., 1998). This percentage is 30 times larger than the proportion of light-sensitive epileptics in the general population (1/4000) suggesting that this hazard is more

* Tel.: ⫹81 298 50 1181; Fax: ⫹81 298 56 6136; e-mail: nomura @frl.cl.nec.co.jp

significant than previously thought (Harding, 1998; Japanese Ministry of Posts and Telecommunications, 1998). The evidence suggests, at the very least, that there is a content-dependent VDT hazard that many people might suffer from. The content-dependent VDT hazard is most likely visual, but the mechanism of the hazard differs from that of conventional VDT hazards. Conventional VDT hazards affect the user’s body or eyes, and the risk of the hazards can be reduced by improving the physical specifications or the configuration of the VDT environment. However, the mechanism of content-dependent VDT hazards is directly related to the neural nature of the brain, and to reduce the risk of these hazards may require a comfortable brain-interface to the content provided over video displays. In this paper, we review physiological evidence suggesting that flicker with a temporal frequency of around 10 Hz will simultaneously facilitate most visual cortical cells. That will cause enormous neural fatigue, or may trigger seizure in very rare cases. The paper describes a mathematical model of the temporal cell response. This response model is used to detect the flicker of retina images. Based on the response model, we propose an adaptive inter-frame temporal filter that reduces the risk of content-dependent VDT hazards. This filter adaptively reduces the temporal frequency component of image input above 10 Hz. We have confirmed the effectiveness of this filter by conducting a computer simulation based on the ‘‘Pocket Monsters’’ video image

0893-6080/99/$ - see front matter 䉷 1999 Elsevier Science Ltd. All rights reserved. PII: S0893-608 0(98)00135-X

348

M. Nomura / Neural Networks 12 (1999) 347–354

Fig. 1. The most preferred spatial and temporal intervals of visual cortical cells to two-step flashing bar apparent motion stimuli. Adapted from Mikami et al. (1986).

sequence that caused the widespread incidence of what appeared to be photosensitive seizures.

2. Theory 2.1. Temporal cell response

1.2. Physiological evidence Visual cortical cells show a wide variety of preferences for visual stimuli. For example, some cells show selectivity to the orientation of a bar, and other cells are selective to color, spatial frequency, motion direction, or speed (Livingstone and Hubel, 1988). Among these preferences, the temporal frequency preference, however, varies much less than other preferences. Mikami et al. (1986) reported the distribution of the most preferred spatial or temporal intervals of monkey V1 and MT cells for a two-step flashing bar apparent motion stimuli (Fig. 1). There is a great deal of variation, but with a systematic increase in the most preferred spatial intervals depending on the eccentricity. The means and the deviations of the preferred spatial interval of area V1 are smaller than those of area MT. The deviations of the most preferred temporal interval, however, do not depend on the eccentricity, even tuned to about 10 Hz, and are very similar for the two areas. Since the most preferred speeds are much faster with MT cells than with V1 cells, the wide variation in the most preferred speeds in the V1 and MT cells are based on the variation in the most preferred spatial frequency, not on that of the most preferred temporal frequency. Nevertheless, the variation in the preferred speed is greatest in area MT among the visual areas (Logothetis, 1994), the most preferred temporal frequency for area MT cells is about 10 Hz, which is the same as for area V1 cells. Therefore, the most preferred temporal frequency is likely to be the same in a cell population taken from the other visual areas. This physiological evidence suggests that visual stimuli flickering at around 10 Hz might highly excite many cells simultaneously, or, in other words, resonate. Such resonance will cause neural fatigue in the brain at least, and is likely to be a principal cause of content-dependent VDT hazards.

The first step in developing an adaptive video display filter is to model the temporal cell response to visual stimuli. Although each neural cell activity is measured with spikes, we ignored the raw action potentials and incorporated the exchanged contribution of the mean firing rate of spikes to membrane potential over a short time range, and defined an analog variable that represents the cell states. There are many processes in a cell response. An impulse input evokes a reverse potential through synapses, and this potential propagates through dendrites and then generates spikes that propagate through one or more axons. The temporal response of a cell consists of these elementary processes, which should be modelled mathematically. When we adopt the simplest model for the temporal response of each elementary process, the first-order approximation of the total cell response will be concatenated elementary processes pi …i ˆ 1; …; n†, in which the temporal response will act as successive low-pass filters as follows (Nomura, 1995):  dpi 1ÿ ˆ ⫺ pi ⫺ pi⫺1 ; dt t

p0 ˆ I; i ˆ 1; …; n:

…1†

The temporal response of an ideal sustained-type cell is obtained through a convolution of the input I, and the impulse response Kn as follows: 8 n⫺1 > < …t=t† et=t if t ⬎ 0; …2† Kn …t; t† ˆ …n ⫺ 1†! > : 0 otherwise pn …I; d† ˆ

 Z ÿ  I t ⫺ t 0 Kn t 0 ;

 t0 dt 0 : 1⫺d

…3†

The parameter d defines the degree of temporal blur, which changes the cut-off frequency of the low-pass filter. The number of elementary processes n affects the shape of

M. Nomura / Neural Networks 12 (1999) 347–354

349

Fig. 2. (a): Impulse responses of the successive low-path filters with 1, 4, 7 and 10 elementary processes n. (b): Image of a bar drifting downward and the filtered images of the bar motion with different parameters. The original image is a 30-frames/s apparent motion. The degree of blur becomes greater as either n or d increases. The shape of the response becomes close to Gaussian as n increases.

the impulse response. Fig. 2 shows the cell responses to a 30-frames/s apparent motion image of a rightward drifting bar. Each cell responds to the intensity change of a corresponding image pixel, and each image shows the cell response for the different model parameters. The degree of blur becomes greater as either d or n increase. The shape of the response becomes close to Gaussian as n increases. 2.2. Adaptive low-pass filter Since higher cut-off temporal frequencies differ and are distributed among biological cells, image filtering based on the cell response model can be used to control the number of excited cells by choosing parameters d and n in a way that reduces the risk of content-dependent VDT hazards. We arbitrarily fixed n ˆ 4, and changed d as needed. Figs. 3a,b show the temporal modulation transfer functions (TMTFs) of the model response with different values of d (0.05, 0.1, 0.15, 0.2, 0.25, and 0.3). The TMTFs were all low-pass, and the higher frequency edges were dependent on d (Fig. 3a). The responses were about 40% and 6.5% of the input at 5 Hz and 10 Hz, respectively, with d ˆ 0:3, and were about 90% and 60% with d ˆ 0:1 (Fig. 3b). A filter

with parameters n ˆ 4 and d ˆ 0:3 can significantly reduce the strongest television flicker. The effect of the video display filter, however, needs to be controlled adaptively according to the input images for practical implementations. Since the filter blurs the images of moving patterns, the filter effect must be weak to avoid degrading the quality of the original images when the risk of the hazard is low. We assume that the risk increases, approximately, with the total cell response. Therefore, we define a normalized measure for the risk index e for images I as follows: X X  ÿ wc pn Ic ; d0 ⫺ Ic eˆ

nx ny cˆRGB

nx ny Imax

;

…4†

where IR, IG, IB are images of R/G/B color components and the weight for the R/G/B images are normalized according to an equation: (wR ⫹ wG ⫹ wB ˆ 1). The risk index is the sum of the difference for each RGB component between the filtered and the original images. The risk index is normalized by the size (nx, ny), and the maximum intensity of the original image Imax. When the risk index is zero, the filter has no effect, and the filter effectiveness rises towards one when

Fig. 3. Temporal modulation transfer function (TMTF) of the model cell and the temporal blur modulation function of the adaptive video display filter. (a): TMTFs of the successive low-path filter with different values of the temporal blur parameter d (0.05, 0.1, 0.15, 0.2, 0.25 and 0.3). (b): The relative response dependence of the successive low-path filter on temporal blur parameter d at different input frequencies (5 Hz and 10 Hz). (c): The temporal blur modulation function of the adaptive video display filter. The function is a sigmoid defined by saturation blur d max, risk indexes for lower threshold eLow and higher threshold eHigh.

350

M. Nomura / Neural Networks 12 (1999) 347–354

Fig. 4. Risk index change and its histogram for the second half of ‘‘Computer Warrior Polygon, Pocket Monsters’’. The weight for the RGB components were set equally. (a): Risk index change of the original images. (b): The histogram of the risk indexes in the original images. (c): Risk index change of the filtered images. (d): The histogram of risk indexes in the filtered images. Asterisks (*) show the critical part of the episode, where the largest number of people felt sick or had seizures.

each RGB component includes increasingly intensive flicker, where the flicker intensity depends on d 0. As the risk index changes, the blur parameter d is adaptively modified through a modulation function defined as follows: !!! dmax e ⫺ eLow dˆ ⫺ 0:5 : …5† 1 ⫹ tanh 4 2 eHigh ⫺ eLow The effect of the filter becomes zero when e is smaller

than eLow, increases monotonically with an increase of e, and saturates at d max when e is larger than eHigh. The temporal blur modulation function of the risk index in shown in Fig. 3c. 3. Simulation results Recorded images of the latter half of the animated TV show ‘‘Pocket Monsters, the 38th episode’’ were analyzed

Fig. 5. The effect of the adaptive video display filter. Original images (left) and filtered images (right) of the critical part of ‘‘Computer Warrior Polygon, Pocket Monsters’’. Color images are shown separately in red (R), green (G) and blue (B) components for display in a gray-scale publication. Each image is a frame of 1/30 s.

M. Nomura / Neural Networks 12 (1999) 347–354

351

Fig. 6. The differences in the risk index change with RGB equally weighted and with red only weighted in the second half of ‘‘Computer Warrior Polygon, Pocket Monsters’’. (a): Upper half is the risk index with red only weighted. Lower half is the risk index with RGB equally weighted RGB. (b): Risk index histogram in the images. Upper half is for red-only. Lower half is for equal RGB. (c): The difference between the red-only risk index and the equal RGB risk index. (d): Histogram of the risk index differences.

to evaluate the effect of the filter, since this part of the animation is suspected to have caused the headaches, dizziness, and convulsions of many viewers. The TV episode was recorded with a normal-mode VHS home-video recorder, and digitized into 30-frames/s images of 320 × 240 pixels using a carefully tuned Apple Power Macintosh computer system with Radius PCI Video Vision Studio video card. Fig. 4a shows the time sequence of the risk index in the animation, where n and d 0 were set to 4 and 0.3, respectively. The histogram of the risk index of the original images (Fig. 4b) shows that 4.7% of 14 798 frames had a risk index equal to or higher than 0.05; this represents a total duration of 23.18 s. The risk index was equal to or higher than 0.1 in 1.48% of the frames equal to a total duration of 7.3 s. The frames with a moderate risk index (e ⬍ 0:1) were widely distributed in the original image time sequence. The frames with a high risk index (e ⬎ 0:1), however, were limited to roughly seven clusters in the sequence, where scenes included a lot of flicker that could cause viewer discomfort. Since the original images with the risk index below 0.1 did not seem hazardous, the parameters of the adaptive filter were set to suppress all risk indexes in the images to below 0.1. Fig. 4c shows the time sequence of the risk index in the filtered images. The parameters eLow and eHigh were set to 0.003 and 0.1, respectively. Both d 0 and d max were set to 0.3. The results show that the risk indexes in the original images were effectively suppressed. A histogram of the risk index

(Fig. 4d) shows that only 1.48% of the frames had a risk index equal to or higher than 0.05, with a total duration of 7.3 s. The number of frames in which the risk index was equal to or higher than 0.1 was negligible in the filtered images. Fig. 5 shows a short sequence of the actual video images. The images are separated into RGB components for display in a gray-scale publication. Each image is a frame of 1/30 s. The dominant color of the original images alternated sharply between red and green-blue, and the color contrast change was very harsh. In the filtered images, however, the color contrast change was much softer: colors alternated between purple-red and purple-blue, and did so much more slowly. The filtered images were much easier to watch. The original images with a risk index below 0.05 remained practically unchanged, and the filter softened the images only when there was a harsh flicker over a large area. Although the critical value of the risk index that is commonly acceptable will have to be determined through extensive clinical study, our simulation results show that the adaptive video display filter can automatically and effectively evaluate and remove dangerous flicker.

4. Discussion The most effective temporal frequencies to evoke a neural response in the brain have been reported from clinical

352

M. Nomura / Neural Networks 12 (1999) 347–354

Fig. 7. Two cell network models for band-pass temporal response. (a): TMTF of a feedforward (FF) inhibition model. An excitatory response is followed by an inhibition (weight 0.75). Both an excitatory unit (labelled e) and an inhibitory unit (labelled i) are modelled by the successive low-path filters. Parameters of the model were arbitrarily set to emulate the TMTF of a human at a standard video display luminance (7.10 trolands) ( Kelly, 1961). (b): The temporal response of the FF model to a chirp input. (c): Normalized Fourier spectrum of an impulse response from a feedback (FB) inhibition network. The excitatory response is inhibited by FB (weight 0.99). Parameters of the model were also arbitrarily set to emulate a human TMTF. (d): The temporal response of the FB model to a chirp input. (e): Normalized Fourier spectrum of an impulse response of a hybrid network. The excitatory unit receives both recurrent excitation (weight 1.0) and recurrent inhibition (weight 0.915). Parameters of the model were also arbitrarily set to emulate a human TMTF. (f): The temporal response of the hybrid model to a chirp input.

studies. These reported critical frequencies range from 10 to 30 Hz, depending on conditions: the luminance or color of flashing light, the age of subjects, or whether the eyes are open or closed (Harding, 1998; Kasteleyn-Nolst, 1989). A clinical study using a luminance comparable to that of a TV display showed that the most effective frequency is 15-Hz flicker with red light (Japanese Ministry of Posts and Telecommunications, 1998; Takahashi and Tsukahara, 1976; Takahashi et al., 1981). The 10-Hz frequency used in this paper is coincident with the critical temporal frequency reported from a clinical study using black-and-white stimuli with a luminance comparable to that of a TV display (Kelly, 1961). The mean luminance dependence of a human TMTF can be used when the reference temporal blur d 0 is defined as a function of the mean luminance IAVE as d 0(IAVE). Clinical studies have shown that red-light flicker is more dangerous than flicker with other colors (Takahashi and Tsukahara, 1976; Takahashi et al., 1981), and monochrome images are safer (Harding, 1998). This means that the weight for red wR in the risk index of the adaptive video display filter should be greater than the weight for the other colors wG and wB. Fig. 6 shows the differences in the risk indexes when RGB are weighted equally (eRGB) and when red only is weighted (eR). The differences between the two kinds of indexes are small and seem trivial in this case. Computer simulation on some other images from TV suggests that the risk index with RGB equally weighted works perfectly in most cases; however, the difference from the red-only risk index may be greater in other cases.

The parameter values used in the simulation – d 0, d max, eLow, and eHigh – seemed appropriate from a practical point of view, but the most appropriate values may differ between individual subjects. Therefore, clinical or psychophysical studies with large groups of subjects are needed to determine the most appropriate parameter values to minimize the risk of content-dependent VDT hazards and the degradation of the motion image. The UK Independent Television Commission (ITC) requires that flashing lights or rapidly changing or flickering images that change at a rate of more than 3 Hz be avoided (Independent Television Commission, 1994). TV Tokyo has gone beyond the ITC regulation by introducing a guideline that restricts red flicker (TV Tokyo, 1998). The risk index defined in this paper, however, is more quantitative, and thus more objective than these regulations. This paper also provides an effective and practical means to reduce the risk through the use of the filter. Since the successive low-pass filter (Eq. (1)) models only excitatory responses, the inhibitory input needs to be included to obtain a physiologically plausible cell response. The simplest mechanism for the inhibition is a feedforward (FF) mechanism, whose dynamics are relatively easy to understand. Figs. 7a,b show the TMTF of a simple FF network and the network response to a chirp input whose temporal frequency changes from 0 to 25 Hz. The system parameters of the model are set to emulate a psychophysical TMTF of a human at the mean luminance comparable to

M. Nomura / Neural Networks 12 (1999) 347–354

standard TV displays. The network resonates at the frequency at which the maximum in the TMTF occurs. Since feedback (FB) mechanisms in the cortex have been suggested (Douglas et al., 1989), FB inhibition is also physiologically plausible. One of the simplest FB mechanisms is subtractive inhibition. Figs. 7c,d show that the gain of the resonance was greater in the FB network than in the FF network when system parameters were set to emulate the same TMTF. Moreover, Douglas et al. (1995) suggested there may be an FB amplification mechanism in the cortex, and the gain of some of the FB network is suppressed by co-existing weaker inhibitory FB. Figs. 7e,f show the behavior of a simple excitatory FB amplification with a weak FB. In this case, the resonance gain can be much greater than in the two previous cases, and depends more sensitively on the system parameters. We believe that some cases of photosensitive seizures may be explained qualitatively with such simple models, so this resonance may cause an abnormal oscillation, or transition to a chaotic state in the entire neural network of the brain in rare cases. In contrast, neural fatigue is likely to be facilitated by the resonance even though the neural network of the brain stays in a normal state. Based on these simple models, two factors appear to be critical in triggering photosensitive seizures by video images. One is the temporal frequency, and the other is the system parameters. The critical temporal frequency is probably the most preferred temporal frequency of most cells: around 10 Hz. This source of risk can be reduced effectively by using the video display filter proposed in this paper. The critical system parameters will be partly determined genetically. They can be modified epigenetically, however, by learning or due to physical injury, or can be modulated by states of the brain: i.e., emotion, level of attention, or other sensory input. Therefore, the flicker at about 10 Hz is the most likely principal cause of seizures in the ‘‘Pokemon incident’’, but the susceptibility of viewers would have been affected by several other factors. In the episode, the time at which most people suffered ill effects coincided with the climax of the episode, when people would watch most intently and feel the strongest emotion. Therefore, the incidence of the ill effects would be different for viewers who did not understand the Japanese language, watched without sound, or did not pay close attention. The entire brain network of normal subjects does not oscillate. System parameters of the brain network, however, may be located close to the regions capable of oscillation (Douglas et al., 1995; Crick and Koch, 1998). This suggests the brain network may become capable of oscillation under certain circumstances, for example, when a person is paying very close attention to a TV program or is feeling strong emotion. To effectively incorporate this second factor into the adaptive

353

video display filter, though, will require a much deeper understanding of perception in the brain. 5. Summary The most preferred responses of the majority of visual cortical cells are obtained through visual stimuli that have a dominant temporal frequency of around 10 Hz. Evidence suggests that images that include a large amount of flicker at about 10 Hz may cause neural fatigue. This paper proposes that such neural fatigue is a content-dependent VDT hazard, and a comfortable brain-interface should be applied to reduce the risk. This paper provides a quantitative measure of the risk of the content-dependent VDT hazard, and an effective and practical means to immediately reduce the risk through use of an adaptive filter. Simulation results for the critical part of the ‘‘Pocket Monsters’’ episode confirmed the effectiveness of this filter.

References Crick, F., & Koch, C. (1998). Constraints on cortical and thalamic projections: the no-strong-loops hypothesis. Nature, 391, 245–250. Douglas, R. J., Martin, K. A. C., & Whitteridge, D. (1989). A canonical microcircuit for neocortex. Neural Computation, 1, 480–488. Douglas, R. J., Koch, C., Mahowald, M., Martin, K. A. C., & Suaresz, H. (1995). Recurrent excitation in neocortical circuits. Science, 269, 981– 985. Harding, G. F. A. (1998). TV can be bad for your health. Nature Medicine, 4 (3), 265–267. Independent Television Commission (1994). Guidance note: Use of flashing images or repetitive patterns. London, November. Japanese Ministry of Health and Welfare (1998). Minutes of 2nd meeting of clinical research, Group on photo-induced seizures, Feb. 20th (in Japanese). Japanese Ministry of Posts and Telecommunications (1998). An interim report from the ‘‘Study Group on Broadcasting and Audio–Visual Sensory Perception’’, April 6th (in Japanese). Kasteleyn-Nolst, D. G. A. (1989). Photosensitivity in epilepsy: electrophysiological and clinical correlates. Acta Neurologica Scandinavica, Supplementum, 125. Copenhagen: Munksgaard. Kelly, D. H. (1961). Visual responses to time-dependent stimuli. I: amplitude sensitivity measures. Journal of Optical Society of America, 51, 422–429. Livingstone, M., & Hubel, D. (1988). Segregation of form, color, movement and depth: Anatomy, physiology and perception. Science, 240, 740–749. Logothetis, N. K. (1994). Physiological studies of motion inputs, Visual detection of motion. San Diego, CA: Academic Press pp. 177–216. Mikami, A., Newsome, W. T., & Wurtz, R. H. (1986). Motion selectivity in Macaque visual cortex II: Spatiotemporal range of directional interactions in MT and V1. Journal of Neuroscience, 55 (6), 1328–1339. Nomura, M. (1995). Modeling of cell response in area MT. Technical report of The Institute of Electronics, Information and Communication Engineers, NC94–107, 241–247 (in Japanese). Takahashi, T., & Tsukahara, Y. (1976). Influence of color on the photoconvulsive response. Electroencephalography and Clinical Neurophysiology, 41, 124–136. Takahashi, T., Tsukahara, Y., & Kaneda, S. (1981). Influence of pattern and red color on the photoconvulsive response and photic driving. Tohoku Journal of Experimental Medicine, 133, 129–137.

354

M. Nomura / Neural Networks 12 (1999) 347–354

TV Tokyo (1997). ‘‘Computer Warrior Polygon, Pocket Monsters’’. TV Tokyo network, Japan, Dec. 16th. TV Tokyo (1998). News release note: Investigation report on ‘‘Pocket Monsters’’ problem. April 9th. US Department of Labor (1997). ‘‘Working Safely with Video Display

Terminals’’. Occupational Safety and Health Administration (OSHA) Publication 3092, March 5th. Yamaguchi et al. (1998). Clinical research on photo-induced seizures. Report of Special Research by Japanese Ministry of Health and Welfare, March (in Japanese).