From local features to global perception – A perspective of Gestalt psychology from Markov random field theory

From local features to global perception – A perspective of Gestalt psychology from Markov random field theory

Neurocomputing 26}27 (1999) 939}945 From local features to global perception } A perspective of Gestalt psychology from Markov random "eld theory Son...

319KB Sizes 1 Downloads 82 Views

Neurocomputing 26}27 (1999) 939}945

From local features to global perception } A perspective of Gestalt psychology from Markov random "eld theory Song Chun Zhu *, Ying Nian Wu Department of Computer Science and Information Science, Ohio State University, 2015 Neit Avenue, Columbus, OH 43210, USA Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA

Abstract In this paper, we present a mathematical theory for the Gestalt psychology from the perspective of a Markov random "eld (MRF) theory. We demonstrate how global perception of textures and shapes are su$ciently characterized by the propagation of local features. Unlike traditional MRF theory that studies the interactions within small cliques, the structures of the random "elds studied in this paper are the biologically relevant local features, such as Gabor "lters at various frequencies and orientations, and Gestalt rules grouping local edge fragments. Our MRF models are learnt from the statistics in natural textures and shapes. We demonstrate three experiments in this paper: the modeling and computation of texture, shape, and medial axis in the random "elds.  1999 Elsevier Science B.V. All rights reserved. Keywords: Gestalt psychology; Texture perception; Shape modeling; Medial axis transform; Markov random "eld; Minimax entropy

1. Introduction In early visual computation, while much is known about the functions of individual neurons in extracting primary visual cues, very little is known about the dynamic interactions between these nerve cells. For example, how are the local texture features } extracted by neurons at various frequencies and orientations } integrated into a global perception of texture? how are the edge fragments } detected by cells in V1 } are assembled to form the perception of the whole shape?

* Corresponding author. Tel.: #1-614-292-4890; fax: #1-614-292-2911. E-mail address: [email protected] (S.C. Zhu) 0925-2312/99/$ } see front matter  1999 Elsevier Science B.V. All rights reserved. PII: S 0 9 2 5 - 2 3 1 2 ( 9 9 ) 0 0 0 8 9 - 2

940

S.C. Zhu, Y.N. Wu / Neurocomputing 26}27 (1999) 939}945

The perception of texture and shape has been intensively studied in psychophysics [3]. The most in#uential theory in the literature that accounts for the grouping from `parta to `wholea is the Gestalt psychology [4]. The Gestalt psychology established an enormous number of laws for grouping local texture features and edge elements into global perception in favor of `the best, simplest and most stable shapea. But what is meant by the best and most stable shape? Is there a `"elda, as speculated by the Gestalt psychologists, computed in the massive neuron connections? In the past few decades, despite the accumulation of a large number of biologic experiments and discoveries, these questions remains basically unanswered due to the overwhelming complexity of the neuron systems. In this paper, we present a mathematical theory for the Gestalt psychology from the perspective of a modern Markov random "eld (MRF) theory. We demonstrate how global perception of texture and shape are su$ciently characterized through the propagation of local features. In the next section, we give a brief description of our MRF theory, and demonstrate three groups of experiments on modeling texture, shape, and medial axis.

2. Experiments on texture, shape, and medial axis The Markov random "eld (MRF) theory originated from statistical mechanics. A beautiful aspect of the MRF theory is that global properties } the macro behaviors of a massive physical system could arise through the propagation of the local interactions between elements. The MRF theory was brought to vision science in the middle of 1970s for modeling various image patterns [1]. However, the traditional MRF models are insu$cient due to two major limitations. (1) The small cliques have physical meanings in statistical mechanics, but are less relevant to natural visual patterns or the biological systems, and thus have very limited power of expression. (2) The MRF failed to model high-order statistics which prevail in natural patterns. In the past three years, the authors have been studying the MRF theory in a series of experiments on modeling texture and shape [8}10], and we discovered that the criticism should not be attributed to the MRF theory, but to the suboptimal choices of the local features and statistics. In our experiments, we found that global realistic texture and shape patterns can arise from simple local features such as Gabor "lters and Gestalt grouping rules for edge fragments, as demonstrated in three groups of experiments below. 2.1. Experiment I: Modeling and computing texture by MRF In modeling surface textures, let I denote the texture image whose intensities are random variables on a 2D lattice. At each pixel, a set of Gabor "lters are adopted to extract local features at multiple frequencies and orientations as it is in V1. Then we pool the responses for each "lter band across the image lattice, and compute the empirical histograms H?(I), a"1,2,2, K, with a being the index of

S.C. Zhu, Y.N. Wu / Neurocomputing 26}27 (1999) 939}945

941

bands. Then a MRF model p(I) is learned so that p(I) reproduces all the empirical histograms while it has a maximum entropy. This yields a texture model





1 ) p(I)" exp ! 1j?,H?(I)2 , Z ?

(1)

where Z is a normalization constant, j? is a vector that weights the histogram H?(I) by inner product. These j?, a"1,2,2, k are Lagrange multipliers that are approximated as step functions and are learned from the texture pattern. The dependence between the "lter bands are automatically accounted for by the learning process. In this MRF model p(I), the neighborhood structures are speci"ed by the window functions of these Gabor "lters, which are much more expressive than the small cliques. A detailed description of the model is referred to [10]. Thanks to the nature of maximum entropy distribution, random samples from p(I) reveal the global perception captured by the chosen "lters and statistics. The sampling is conducted by the Markov chain Monte Carlo technique. An example of texture modeling is displayed in Fig. 1. In Fig. 1, the left is an observed texture pattern of the mud ground, based on which a model p(I) is learned with "ve "lters. The right "gure in Fig. 1 is a random sample from p(I) by MCMC. Obviously, these two texture images have similar appearances by human visual perception. Adding other "lter bands and statistics does not lead to perceivable in#uence on the quality of the randomly sampled images. In other words, a small number of "lter channels and their marginal statistics are su$cient in capturing the texture pattern. A wide spectrum of realistic texture images are modeled in [10] using the Gabor "lters at various frequencies and orientations.

Fig. 1. Left: a given texture pattern } mud ground with footprints. Right: a random sample from ), with a model p(I) which uses "ve Gabor "lters.

942

S.C. Zhu, Y.N. Wu / Neurocomputing 26}27 (1999) 939}945

2.2. Experiment II: Modeling and computing object shape in MRF Our second experiment is to study random "eld models for object shapes, such as the object shapes in Fig. 2. It has long been evident in psychophysics experiments that human early visual perception strongly favors certain shapes and con"gurations over others without high level identi"cation. Gestalt psychologists account for these experiments by a set of Gestalt laws, such as, co-linearity, co-circularity, proximity, parallelism, symmetry, and so on. We consider these grouping rules as important local feature extractors by analogy to the Gabor "lters in texture modeling. Recently, there have been reported in neurophysiology that neurons in the early stage of the visual pathway extract local shape features such as parallelism and symmetry [6]. Our "rst step in shape modeling is to de"ne a shape on a random "eld whose elements represent edge fragments } caught by an edge detector in V1. For example, Fig. 3a shows the shape of a dog, which consists of many elongate parts } neck, head, legs, tail. The edge elements at the two sides of each part are parallel or symmetric to each other } as indicated by the line segments connecting points at the two sides. Thus the dog shape can be viewed as an abstract graph in Fig. 1b, where each node is an edge elements and the immediate neighbors of each node include the two adjacent nodes along the contour and the nodes across the region. Such neighborhood graph can be easily computed by a stochastic process which we will brie#y discuss in our third experiment below. We de"ne a Markov random "eld on this neighborhood graph. The graph in Fig. 3b characterizes some Gestalt laws, such as proximity, co-linearity, co-circularity, parallelism, symmetry, and so on. All of these laws are measured in the local neighborhood based on the positions and orientations of the edge elements. Informally speaking, these Gestalt measures are viewed as local shape "lters. At each node of the graph, a feature vector is extracted by the shape "lters. For example, at a point s on the shape contour C(s), s3[0,1], the curvature i(s) and the derivative of the curvature i(s) measure the degree of co-linearity and co-circularity, respectively. Suppose we denote by r(s) the length of the line segment connecting from point s to the other side of the region as shown in Fig. 3a, r(s) is normalized by the perimeter of the

Fig. 2. Examples of the observed natural shapes.

S.C. Zhu, Y.N. Wu / Neurocomputing 26}27 (1999) 939}945

943

Fig. 3. An example, (a) a dog shape with parallel and symmetric edges matched, (b) the abstract representation of (a) as an adjacency graph.

shape. Then r(s), r(s), r(s) are, respectively, measurements for the across region proximity, parallelism, and symmetry. Like texture modeling, we compute the empirical histogram for each of the features mentioned above, and a maximum entropy model is learned from a dataset of shapes (see Fig. 2) so that it can produce the same empirical histograms. This yields

 

1 p(C)" exp ! j(i(s))#j( i(s))#j(r(s))#j( r(s)) Z



#j( r(s)) ds ,

(2)

where the potential functions j?, a"1,2,2, 5 are learned in a non-parametric form. Three randomly sampled shapes from p(C) are displayed in Fig. 4. A detailed description of our theory and experiments is referred to [8]. These randomly sampled shapes resemble the ones in Fig. 2, with only local features matched. 2.3. Experiment III: Stochastic computation of medial axis Our third experiment is inspired by recent experiments in both psychophysics and neurophysiology, which reported that early vision can compute the medial axis transform of a shape almost simultaneously with edge detection and "gure ground segregation [5,6]. We argue that the computation of all three process } edge detection, "gure ground segregation and medial axis transform can be done simultaneously and iteratively in a stochastic manner. In this paper, we only show the stochastic computation of medial axis given the shape contour.

944

S.C. Zhu, Y.N. Wu / Neurocomputing 26}27 (1999) 939}945

Fig. 4. Randomly sampled shape from a generic shape model.

Fig. 5. The symmetry axes of a kangaroo shape at time steps, from left to right: 5 iterations, 10 iterations, and the "nal computed skeleton.

Given a stimulus signal, the global perception is simply computed by minimizing the potential function in the Gibbs distributions. The computation of the minimum potential is accomplished through the propagation and relaxation of local interactions in a stochastic manner. We illustrate this in an example of computing medial axis } a global description of shape. Fig. 5 shows three stages of the iterations in computing the symmetric axis. The axis fragments are grouped together in Fig. 5c, where the stochastic process reaches a minimum potential and stabilizes at this solution } forming a perception of medial axis. A detailed account of our experiments is given in [9]. We argue that this stochastic algorithm are biologically plausible in comparison with the existing engineering algorithms like coring, Voronoi diagram and grass-"re transformation [2,7,5].

3. Discussion Due to space limitation, we brie#y discuss the possible impacts of the MRF theory and MCMC sampling techniques on Neurosciences. 1. The above MCMC process is particularly important in designing experiments in both psychology and neurophysiology, because it could precisely reproduce the

S.C. Zhu, Y.N. Wu / Neurocomputing 26}27 (1999) 939}945

945

desired statistical properties and it could rule out the e!ects from other unwanted features. 2. Learning and adaption of the features, such as "lters and Gestalt rules, has been studied in the same framework by calculating the non-accidental statistics on natural patterns. 3. Our theory favors the population coding ideas by demonstrating that statistics pooled from local responses are su$cient for realistic patterns.

References [1] J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. Roy. Stat. Soc. Ser. B 36 (1974) 192}236. [2] C.A. Burbeck, S.M. Pizer, Object representation by cores: identifying and representing primitive spatial regions, Vision Res. 35 (13) (1995) 1917}1930. [3] B. Julesz, Dialogues on Perception, 1995. [4] K. Ko!ka, Principles of Gestalt Psychology, Harcourt, Brace and Company, New York, 1935. [5] I. Kovacs, B. Julesz, Perceptual sensitivity maps within globally de"ned visual shapes, Nature 371 (1996) 644}646. [6] T.S. Lee, D. Mumford, P.H. Schiller, Neuronal correlates of boundary and medial axis representation in primate striate cortex, Invest. Opth. Vis. Sci. 36 (1995). [7] R. Ogniewicz, Discrete Voronoi Skeleton, Hartung-Gorre, 1993. [8] S.C. Zhu, Embedding Gestalt laws in the Markov random "eld, IEEE Workshop on Perceptual Organization, 1998. [9] S.C. Zhu, Stochastic computation of medial axis on MRF, Proceedings of the Computer Vision and Pattern Recognition, 1998. [10] S.C. Zhu, Y.N. Wu, D.B. Mumford, Minimax entropy principle and its applications to texture modeling, Neural Comput. 9 (8) (1997).

Song-Chun Zhu holds a Ph.D. degree in computer science from Harvard University in 1996. He was a postdoc at the Brown Applied Math division in 1996}1997, and he was a lecturer at Stanford computer science department in 1997}1998. He joined the faculty of Computer Science department at Ohio State University in 1998.