Journal of Physiology - Paris 106 (2012) 173–182
Contents lists available at SciVerse ScienceDirect
Journal of Physiology - Paris journal homepage: www.elsevier.com/locate/jphysparis
Geometry of imaginary spaces Jan J. Koenderink ⇑ Delft University of Technology, Man Machine Interaction Group, EEMCS, P.O. Box 5031, 2600 GA Delft, The Netherlands Katholieke Universiteit Leuven, Laboratorium voor Experimentele Psychologie, Tiensestraat 102, Bus 3711, 3000 Leuven, Belgium The Flemish Academic Centre for Science and the Arts, Academy Palace, Hertogsstraat 1, 1000 Brussel, Belgium
a r t i c l e
i n f o
Article history: Available online 25 November 2011 Keywords: Space Imaginary space Pictorial space Local sign External local sign Cues Natural perspective Depth
a b s t r a c t ‘‘Imaginary space’’ is a three-dimensional visual awareness that feels different from what you experience when you open your eyes in broad daylight. Imaginary spaces are experienced when you look ‘‘into’’ (as distinct from ‘‘at’’) a picture for instance. Empirical research suggests that imaginary spaces have a tight, coherent structure, that is very different from that of three-dimensional Euclidean space. This has to be due to some constraints on psychogenesis, that is the development of awareness. I focus on the topic of how, and where, the construction of such geometrical structures, that figure prominently in one’s awareness, is implemented in the brain. My overall conclusion—with notable exceptions—is that present day science has no clue. I indicate some possibly rewarding directions of research. Ó 2011 Elsevier Ltd. All rights reserved.
1. Natural perspective ‘‘Natural perspective,’’ or perspectiva naturalis, is best known from Euclid’s treatise (Burton, 1945) (Greek Optika; Latin: De aspectibus). It should be sharply distinguished from ‘‘painter’s perspective,’’ or perspectiva artificialis, which plays no role in this paper, but became generically known as ‘‘perspective.’’ The latter involves ‘‘Alberti’s Window,’’ after Alberti’s (1435) Della pittura, and deals with the representation of the visual field on planar surfaces. Unfortunately, these concepts are rarely distinguished. Here I interpret natural perspective in its original sense of ‘‘optics,’’ a proper subfield of physics. It deals with the potential of momentarily seeing things with a single, punctate eye,1 and is thus to be considered a form of information theory.2 It has nothing to do with the transport of radiant power, thus the frequent discussions on Euclid’s use of the extramission theory are void (Koenderink, 1982). I recapitulate the basics of natural perspective here. Consider three-dimensional Euclidean space E3 , augmented with a single ‘‘vantage point’’ O. Any point P 2 E3 O is seen at a unique direction, ⇑ Address: Delft University of Technology, Man Machine Interaction Group, EEMCS, P.O. Box 5031, 2600 GA Delft, The Netherlands. Tel.: +31 152784145; fax: +31 152787141. E-mail address:
[email protected] 1 In computer vision this is known as the ‘‘pinhole camera model.’’ In geometrical optics it is the center of the anterior nodal point of the optical system. In human vision it is most natural to use the center of rotation of the eye-ball. 2 Euclid uses the theory to account for visual acuity, by referring to a certain sparsity and thickness of rays. This can be related to the minimum étendue of about a wavelength squared of the wave theory of light. 0928-4257/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.jphysparis.2011.11.002
called its ‘‘visual direction’’ with respect to the vantage point. Consider the ‘‘optic array’’ (Gibson, 1950) at the vantage point, which is simply the unit sphere S2 , centered on O. Then the direction of P is conveniently labeled with its trace p 2 S2 , where p = (P O)/ kP Ok (see Fig. 1). One has P = O + .p, where . is the ‘‘range’’3 of P with respect to the vantage point. Points with the same traces are seen in the same direction. When their ranges are different they are still distinct. I refer to such points as mutually ‘‘parallel.’’4 Berkeley (MDCCIX) famously declared that such parallel points cannot be distinguished on the basis of optics proper, because ranges are, in modern jargon, not ‘‘optically specified.’’ Slightly idealized, the human condition involves primarily daylight, clear air, and opaque rigid objects. Daylight is solar radiation in a narrow visual band centered at about 560 nm, often scattered by cloud covers. Clear air optically acts much like the vacuum, a perfect medium of the propagation of radiation. Opaque objects have surfaces that scatter radiation into all directions. This leads to the following basic laws of visual optics: I. You can see any object given unobstructed range, II. You cannot see objects that are occluded by other objects, and
3 ‘‘Range’’ is preferred over ‘‘distance,’’ because in the latter case one should not omit to say ‘‘from the eye.’’ 4 This usage is natural in the singly isotropic plane introduced later, where one has a perfect metrical duality between points and lines. Then it is natural to have both ‘‘parallel lines,’’ and ‘‘parallel points.’’
174
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
The situation is much simpler in the tangent space of a point in the optic array.9 Let {u, v} denote Cartesian coordinates in the tangent plane of S2 , and w = log.. This space is the Cayley–Klein space (Cayley, 1859; Klein, 1893, 1872) that has two Euclidean and one isotropic dimension (Yaglom, 1968, 1979), for the aforementioned transformation group is
Fig. 1. The visual field S2 centered on the (black) vantage point, and a visual direction d on which are two parallel points P and Q. Both points have the same trace T 2 S2 .
III. If you can see it, then it can see you. This is ‘‘physical optics’’ in a nutshell. What remains is information theory, the formal tool essentially geometry. The most fundamental facts of natural perspective follow from invariance properties. Consider the optical information available to the human observer. There are many ways to specify it, e.g., you might file a complete description of the environment. But this would evidently be overkill. Following Berkeley (MDCCIX) and Gibson (1950) I specify the available optical structure as the mere simultaneous order of traces in the optic array.5 Any transformation of the environment that leaves this structure invariant is undetectable, and thus irrelevant to vision. One such group of transformations are the rotations about the vantage point.6 Another group is that of the homotheties (dilations or expansions) about the vantage point.7 From such considerations I deduce that only ratios of ranges are relevant, and that neither absolute range, nor range differences matter. Thus the log-range dimension has the structure of the affine line (Bennett, 1995) A1 . The group of rotation-dilations about the vantage point generates an important family of curves, namely the planar logarithmic spirals with center at the vantage point.8 These are shifted within themselves by the transformations. In any sufficiently limited neighborhood there is a unique arc connecting any two distinct points. Thus they may be considered pre-geodesics (Buseman, 1955) (see Fig. 2). They do not define a projective structure, for the nexus of geodesic arcs that connect each vertex of a triangle to each point of its opposite side fails to be a surface. It is a (non-convex) lens-like volume (Berger, 2007; Koenderink et al., 2010a) (see Fig. 3). A group of transformations that conserves the family of pregeodesics as a whole involves power-functions of the range (see next paragraph). This group was empirically discovered by the German sculptor Adolf Hildebrand at the close of the nineteenth century (von Hildebrand, 1901). It can be understood via the structure of ‘‘Shape From X’’ algorithms studied in machine vision (Belhumeur et al., 1999).
u0 ¼ h ðu cos u v sin uÞ þ t u ;
ð1Þ
v 0 ¼ h ðu sin u þ v cos uÞ þ tv ; w0 ¼ g u u þ g v v þ k w þ t w ;
ð2Þ ð3Þ
with h, k > 0. It is an eight-parameter group, whereas the Euclidean group of similarities is only a seven-parameter group. Reason is the elliptical (periodic) angle measure of Euclidean space.10 The interesting part is obtained by specializing to h = 1, u = 0, tu = tv = 0, a subgroup that affects only the ranges. The parameter tw is a mere shift in depth, that is a scaling of the range, and thus not interesting. The relevant parameters are g = {gu, gv}, which denote isotropic rotations, and k, which denotes the scaling of isotropic angles. The geometry of this space is well known through Strubecker’s (1941, 1942, 1943, 1945) work, modern texts on its differential geometry are also available (Yaglom, 1968, 1979; Sachs, 1990). Unfortunately, the best texts are in German. However, the books by Yaglom (1968, 1979) have been translated into English, and make for an excellent introduction, albeit only for the planar case. This introduction merely mentions the more important aspects of natural perspective. Unfortunately, there do not seem to exist any textbooks on this important topic. Gibson’s (1950) work is an informal attempt, but fails to arrive at the essential structure. 2. Imaginary space When you look ‘‘into’’ a picture of an imaginary landscape, e.g., in a science fiction comic book, you experience spaces that do not exist in a physical sense, and thus have to be considered ‘‘imaginary’’ quite literally. I see no essential differences with looking into the photograph of an actual scene, or your spatial awareness in front of an actual scene.11 Because too far removed from mainstream thought I will not press the latter point. I focus on the topic of whether such imaginary spaces can be granted geometrical structures, allow for formal descriptions, and perhaps some praxis of geodesy. Phenomenologically, these spaces are three-dimensional (or, rather, ‘‘two plus one dimensional,’’ see below), and contain opaque, rigid objects. They differ from physical spaces in a number of important aspects, due to the fact that their ontologies are different. Physical scenes are composed of physical objects that have physical properties. The formal description of physical spaces is in terms of theoretical accounts (geometry, electromagnetic theory, 9 This case is more important than might be expected, for it describes the structure of typical pictorial spaces. 10 In the simpler case of the plane the transformations become
u0 ¼ hu þ tu ; w0 ¼ gu þ kw þ tw ;
5
The optic array is a mere formal device that has nothing to do with the structure of the human eye. It would make absolutely no difference if the eye were cubical instead of spherical. 6 Any such rotation can be undone by a voluntary eye movement. Thus it cannot generate exterospecific information. 7 Thus Lilliput and Brobdignac are optically indistinguishable before the introduction of Gulliver. 8 This is geometrically obvious. An algebraic proof may proceed from the metric
ds2 ¼
dx2 þ dy2 þ dz2 ; x2 þ y2 þ z2
which is invariant with respect to rotation-dilations about the origin.
whereas the analogous group of Euclidean similarities is
u0 ¼ hðu cos u w sin uÞ þ t u ; w0 ¼ hðu sin u þ w cos uÞ þ t w : In the latter case sizes are scaled by h, whereas the angles are not affected. In the former case both sizes and isotropic angles are scaled, the first by h, the latter by k. Both angle and distance measure are parabolic, whereas although the distance measure is also parabolic, the angle measure is elliptic in the latter case. 11 At least not in the case of the stationary, monocular observer. Binocularity and translational movement (not eye movements) make an essential difference since they generate exterospecific information.
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
175
Fig. 2. These figures illustrate a planar section through the eye. The figure at left shows a pencil of pre-geodesics through a fiducial point. The eye is indicated with the small circle at bottom. At right the same configuration has been transformed (conformly) to log-range—angle space. The eye is not in this space, it is located at infinite distance downwards from the (transformed) fiducial point, on the central mid-line. This central mid-line is a singular geodesic that passes through the eye, a visual ray. The pregeodesics are drawn at equal isotropic slope angle intervals. Note that the isotropic angle measure is parabolic (thus not periodic). The central mid-line is an isotropic direction, it has infinite slope.
statistical mechanics of the solid state, and so forth, see Feynman, 1970), and ‘‘pointer readings’’ (Eddington, 1929). In objective, formal accounts there figure no qualities, nor meanings. All there is, is observed structure, and formal theory. The observer is of no special importance (that would introduce subjectivity), but is yet another physical structure that just so happens to be included in the space.12 Imaginary spaces are composed of meaningful qualities in some nexus of simultaneous relations. These spaces do not contain the observer as just another object. The eye is not in the space at all. Imaginary spaces are necessarily subjective, because only present in personal awareness.13 This poses serious obstacles to the use of natural perspective in the description of imaginary spaces. Most importantly, as the eye is not in the space (Wittgenstein, 1992), there is no such a thing as the range of a point. A quality that might perhaps relate to range is that of ‘‘depth.’’ Depth is a feeling of ‘‘otherness,’’ or ‘‘remoteness.’’ Phenomenologically, depths of objects can be linearly ordered, and, to some extent, depth differences can be compared. There is no notion of an absolute depth though. Thus the depth dimension has a structure not unlike that of the affine line A1 . If there were to be a relation between depth and range, it would evidently have to be of a logarithmic nature. However, there is no necessary, causal relation between depth and range. There cannot be any such relation between the mental and the physical realms. If so, the physical would subsume the mental.
3. Psychogenesis of imaginary space The phenomenology of vision is that one is aware of an endless sequence of ‘‘presentations.’’ Presentations just happen, nothing you can do about them,14 except by closing or opening your eyes, looking into various directions, and so forth. Such voluntary actions are only a minor part of the vision-related actions that occur involuntarily. Presentations are structured, composed of qualities and meanings (Metzger, 1975). They are pre-reflective, 12 Thus verbal reports are to be considered nothing but the meaningless movement of air molecules. Behaviorist psychology was consistent in this respect, though it is ‘‘non-invasive physiology,’’ rather than psychology proper. 13 Verbal reports are potentially meaningful, first person reports are the only way to come to know about other people’s awareness, otherwise there is only physiology. 14 Much like sneezing.
and proto-rational (Riedl, 1984). In cognition qualities have been stripped off, and meanings formalized. You do not think presentations, they just happen. Ignoring cognition, I concentrate on immediate, optics related awareness.15 I focus mainly on simultaneous order, that is spatial qualities. This is the problem of ‘‘psychogenesis,’’ the genesis of the mental. Unfortunately, the term ‘‘psychogenesis’’ is usually applied in a different sense,16 the essential difference being time scale. I adapt the perfectly descriptive term ‘‘psychogenesis’’ though. The relevant time scale is that of the formation of a presentation, about a tenth of a second. No doubt the brain is involved in the generation of presentational awareness, but there consensus (in no way complete!) stops. In the mainstream account (Marr, 1982; Palmer, 1999), what happens when you stand in front of a scene and open your eyes, is roughly the following. The causal chain is cut at the level of the absorption of radiant power in the retinal photoreceptors, the layout of the receptor array being ‘‘given.’’ Thus one starts from samples of a two-dimensional scalar field, the retinal ‘‘image.’’ Then follows a sequence of image operations, yielding transformed images galore. Finally, there is a magic step: the set of derived images turns into a ‘‘representation of the scene in front of you.’’ ‘‘Magic,’’ because image transformations convert structures into structures. Algorithms cannot convert mere structure into quality and meaning, except by magic. In computer science this magic is implemented through ‘‘formats’’ (Knuth, 1997). The same sequence of keyboard presses may be interpreted as a password, a number, a word in the English language, some code, an assembler command, gibberish, . . . depending, on the format applied by the currently active algorithm. Input structure is not intrinsically meaningful, meaning needs to be imposed (magically) by some arbitrary format.
15 Thus the Gestalt school of psychology was interested in perception (immediate awareness) per se, whereas cognitive psychology deals only with the thought processes that follow presentations. This is a crucial ontological difference. 16 A common definition of ‘‘psychogenesis’’ is: 1. The origin and development of psychological processes, personality, or behavior. 2. Development of a physical disorder or illness resulting from psychic, rather than physiological, factors. This is evidently not my intended meaning. The interested reader should pursue (e.g., in Google search) ‘‘microgenesis’’ instead. In short, microgenesis is the pre-conscious process that purportedly presents you with immediate visual awareness. Its study was initiated by the psychologists of the early Gestalt schools. A modern account is given by Brown (1996).
176
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
a probing is met, as a spark of enlightenment, a germ of awareness.17 In formal terms, probing may be understood as ‘‘questioning.’’ Probing is intentional, probing for something. The intention, that is the question, presupposes possible answers. Thus the meaning is in the question, not in the answer. In that sense questions are like formats. The difference is that questions are intentional (Brentano, 1874) to start with. Formats proper are merely reactive, questions (probings) are intentional, world directed, active, and therefore meaningful by their very nature. This account is seamlessly in line with biological thoughts (the ethology of Lorenz (1973), Tinbergen (1951), Riedl (1984), etc.), differently from the mainstream account of vision which is in many respects unduly anthropocentric.18 3.2. The Sherlock model
Fig. 3. A geodesic triangle defined by the points {1, 0, 0}, {0, 2, 0}, and {0, 0, 4} (vantage point at {0, 0, 0}), with some geodesic arcs that connect a vertex to its opposite side. Notice that the nexus of geodesic arcs fails to ‘‘mesh,’’ although only by a little margin. The triangle is not a patch of surface, but is ‘‘thick,’’ volumetric.
The same input structure may thus give rise to multifarious, perhaps mutually incompatible, meanings. This immediately applies to psychogeny, the process that makes you ‘‘see the scene in front of you’’ when you open your eyes in broad daylight. This is Berkeley’s (MDCCIX) original argument. It seems impossible to refute. The mainstream account bridges the ontological gap via spooky Deus ex machina mechanisms. 3.1. Probing Main reason that one is forced to refer to such spooky mechanisms is that visual perception is understood as ‘‘inverse optics’’ (Poggio, 1984). Because optical structure is just structure, that is meaningless, one is forced to postulate mysterious mechanisms for intentions, qualities and meanings. The only way out of this dilemma is to deny the existence of the latter, which is to deny the existence of the mental realm (Dennett, 1992). Preferred by some (Dennett, 1992), this is perhaps less desirable to most of us. Alternatives to the mainstream account have to invert the chain of events, that is to say, replace inverse optics with ‘‘controlled hallucination.’’ Such accounts have been proposed by Bergson (1907), Schrödinger (1992), among more. A modern account is by Brown (1996). The mainstream tends to ignore these as ‘‘unscientific.’’ Yet it is actually the mainstream account itself that is incoherent, and has to rely on magic. Organisms have a natural urge to grow, and expand their realm. Such tendencies are observed from the simplest organisms to man. Organisms poke their environment, partly randomly, partly intentionally. When poking meets resistance it becomes probing. When probing meets resistance, it is informative. Organisms learn about their world through informative probings. In Schrödinger’s (1992) view the world lights up to organisms when resistance to
The mechanism of optics related awareness (seeing) is perfectly illustrated by the time honored methods of forensic investigation, which may be labeled with the name of that prototypical detective Sherlock Holmes (see Conan Doyle, 1887). As the investigator is confronted with the scene of the crime, what is he to do? Compare the dumb village policeman to the superior detective. The clueless village policeman will proceed to collect and photograph anything even mildly out of the ordinary. This results in a file of mutually unrelated facts. The size of this file is potentially limitless, for the world is infinitely structured. There is no end to which fact, perhaps even on the molecular scale (think of DNA traces), might eventually prove to be important. Facts are not ‘‘evidence,’’ they are simply facts. Facts yield no account of what happened at the time of the crime. Record of a headless corpse is only suggestive. Speculation goes beyond the facts. This ‘‘bottom up’’ modus operandi of the village policeman is analogous to the ‘‘inverse optics’’ account of visual perception. Sherlock Holmes’ method is different. He conceives of likely ‘‘plots,’’ and on the basis of these hunts for evidence. In doing this, he ignores the bulk of facts. Any mere fact may become evidence in the context of some plot, whereas it will be mere structure that may be ignored in the context of other plots. Different facts (say a discarded cigarette butt and a broken flower pot) become meaningfully related in the context of some plots, but are totally unrelated in other contexts. Some plots ‘‘work’’ (perhaps to different extents), others do not. Sherlock Holmes keeps generating plots until one fits a variety of otherwise mutually unrelated facts so well, that the odds are overwhelmingly in its favor. Since the probabilities of unrelated rare facts combine multiplicatively, this process is almost bound to yield virtual certainty (Pearl, 2000). Generating plots is not that hard either, at least when the investigator understands the environment he is working in. It is like playing the game of twenty questions with nature, his chances to win are substantial. Some very successful optimization algorithms work like this, starting from mere random guesses, for instance ‘‘harmony finding’’ (Geem et al., 2001). The analogy to vision is immediate. Facts in vision are optical structures. They are overwhelmingly abundant, but meaningless. Evidence is fact considered in the context of a plot. In vision such evidence is known as ‘‘cue.’’ Facts in themselves are not cues. 17 That the only learning is by mistakes will readily be accepted. The ‘‘magic’’ in Schrödinger’s account is in the (micro-) enlightenment. It cannot be accounted for by the exact sciences. It is very intuitive in a phenomenological sense though. You vividly experience the results of your mistakes, whereas much sensorimotor behavior (e.g., walking under optical control) goes by unnoticed. 18 For instance, the mainstream account stresses the veridicality of perception. But evolution drives fitness, not veridicality. It develops idiosyncratic user interfaces, not representations of basic physics. Your vision is different from that of your cat or dog.
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
The observer selects structure, and promotes it to cue status. A cue is like the answer to a question (or probing), and therefore meaningful. It may prove misleading though. Wrong question, wrong answer! In the psychological literature plots are known as ‘‘situational awareness.’’ The general knowledge of the investigator is known as ‘‘background,’’ (Searle, 1983), ‘‘frames,’’ (Minsky, 1974), ‘‘mental models,’’ (Lakoff, 1987; Johnson-Laird, 1983), and so forth. It is crucial. Since plots are freely invented, they are technically speaking hallucinations.19 Holmes’ method of generating plots, discarding them when they do not fit the evidence, preferring one over another if the odds are in its favor, is much like the process of biological evolution. Only the fittest plots surface into awareness. Thus presentations (momentary visual awarenesses) are generated by endless diversification and merciless pruning. This renders ‘‘controlled hallucination’’ a powerful information generating mechanism (Dawkins, 1986). It is the only such mechanism known—or even imaginable. It is not essentially different from ‘‘the scientific method.’’ Notice that the ‘‘Sherlock model’’ is not in need of a special ‘‘attention mechanism’’ (Treisman, 1969), for the very method is attention at work. Nor is it in need of some special mechanism to ‘‘solve the binding problem’’ (Revonsuo and Newman, 1999). There is no such a thing as a binding problem. Any disjunct structures that figure as evidence in some plot are thereby automatically ‘‘bound.’’
3.3. Neural mechanisms In the mainstream account one often refers to the optical structure as ‘‘data,’’ or ‘‘information.’’ This is thoroughly misleading because to be understood in the Shannon (1948) sense of utterly meaningless information. As the brain structures transform the optical structure into a variety of structured neural activities, mainstream often uses semantic terms to describe them. This confuses facts with evidence. In the case of an ‘‘edge detector’’ (Canny, 1986) the very name suggests that the edge exists before being detected. This is nonsensical, the so-called edge detector is really nothing but a ‘‘first order directional derivative operator’’ (Koenderink and van Doorn, 1992). The latter term is to be preferred because it describes the transformation of structure into structure, whereas the former suggest some spooky operation. When vision is understood as inverse optics, it is natural to think of the early transformations as to take the brunt of the effort. The primary visual cortex is supposed to play a key role in constructing a ‘‘representation’’ of reality. From a biological perspective this is highly unlikely. Throughout the evolution of brains layer after layer was added, the earlier ones always remaining (Striedter, 2005). The later structures are built on top of the earlier ones and originally served to refine already existing processes. This is the way evolution proceeds (Dawkins, 1986). One should look for the origin of presentations in the early structures, rather than the recent ones. An alternative way to understand the role of primary visual cortex, and related structures, focusses on vision as ‘‘optics related awareness.’’ There is also vision as ‘‘optically guided behavior,’’ which is for the larger part irrelevant to momentary awareness, since it applies equally to zombies. The latter part is important to survival, and largely co-determined the evolution of the brain. I ignore it here, since it is unrelated to awareness, e.g., you are not aware of how you walk, you merely do it. 19 ‘‘Hallucination’’ has a bad ring to it. But notice that even a scientific theory is nothing but a hallucination until there is sufficient empirical evidence in its favor. How do you know you are not hallucinating? By noticing the coherence or mismatch of your (visual) presentations with those from other modalities (hearing, touch, etc.), with your situational awareness, and with your observed behavior of others. For any of these it is easy to point out spectacular failures of course.
177
To become aware of something is due to the promotion of certain optical structures to the status of evidence. It is not the thing itself, but some ‘‘sign’’ of it. Usually many layers of indirection can be distinguished. For example, consider how you may become visually aware of ‘‘human presence,’’ as may happen in the context of a hide and seek game. Seeing a person works, as does spotting part of a person (e.g., a foot sticking out from behind an occluder), as does finding a footprint. The ‘‘footprint’’ is really a depression in the sand that might have resulted from a dust devil, or lightning struck, but is taken by your vision to be an impression of a human foot. Of course ‘‘seeing a person’’ also involves the presence of a certain optical structure at the eye, a certain cortical activity, and so forth. Thus ‘‘seeing human presence’’ necessarily involves a long chain of indirection. The footprint, the field of radiant power, and the cortical activity are on the same ontological level, being all physical structures. The cortex is no more a ‘‘footprint detector’’ than the sand of the beach is. The cortical activity per se is just as meaningless as the depression in the sand. In this sense you use your cortex much as you use your muscles. Wet sand is much better than dry sand if you are interested in foot prints. Likewise, the cortex has developed into a highly functional substrate for the representation of optical structure. It may be understood as a volatile buffer of readily available, conveniently packaged facts. In order for this to be possible it is structured as a ‘‘geometry engine’’ (Koenderink, 1990; Petitot, 2008), implementing differential geometric operators that allow invariant, frugal description of optical facts. Analogous reasoning applies to various later, increasingly dedicated cortices. The plots must derive from the earliest structures of the brain, and go through processes of diversification and pruning as they branch out towards the newer structures (Brown, 1996). These hallucinations must pass through various dreamlike phases as they evolve, their initial qualities being emotional. The generation involves the immediately preceding states, as well as much earlier states (often denoted memories, background, etc.). Presentational awareness involves the single surviving plot from an evolution that involved numerous alternatives. As it occurs it immediately makes place for the next presentation, which is likely to be very similar, but may occasionally differ greatly. A small number of presentations (at most a few seconds worth) make up a ‘‘specious moment.’’ As presentations enter cognition they lose their immediate, vivid qualities and meaning. Thoughts are different from presentations in that they lack ‘‘mental paint.’’ You cannot know presentations, they just happen. Thus presentations are the final stage of an evolution, where further development is no longer possible (perhaps awaiting the overgrowth of more intricate brain structures). They are like the outer, rigidified crust of an ever changing, tremendously flexible process. The entities of awareness are the objects of your environment. Perversely, mainstream considers them to be the ‘‘causes’’ of your perceptions. This is mistaken, because the objects of your perception (e.g., a ‘‘chair’’) have no equivalents in physics (some odd, ill defined collection of elementary particles perhaps?). Here the behaviorists (Skinner, 1938) were more consistent than present day cognitive scientists, their immediate successors and heirs, in holding that a verbal utterance is nothing but the movement of air molecules. Although perhaps not entirely wrong, it certainly is not right. If anything, it is inhuman.
4. Iconogenesis On opening your eyes in broad daylight, presentations of the scene in front of you happen to you, at least if you are not blind
178
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
Fig. 4. The start of a Cˇech cohomology. At left two receptive fields overlap, leading to correlation. Inversely, correlation indicates overlap. At right the notion of inclusion: B is included in A iff for any C that overlaps with B, it is the case that C also overlaps with A. In a similar way one defines simplices and their boundaries. This enables one to boot up a Cˇech cohomology on the basis of a correlation structure.
or daydreaming. You experience objects, relations, causal histories and futures. You experience in terms of ‘‘mental paint’’ (qualities), and meaning (intentionality). Here I focus on spatial qualities only. I refer to this process of constructing the visual field as iconogenesis,20 a subprocess of psychogenesis.
Fig. 5. The fiber bundle E1 D (in reality the base space is E2 ). All visual rays, like RR0 , are mutually independent. The psychogenetic process assigns depths as if by sliding beads on strings, like the white bead on RR0 . The resulting cross section is not necessarily smooth. If it is (like here) one speaks of a ‘‘pictorial relief.’’
4.1. The geometry of the ‘‘visual field’’ The ‘‘visual field’’ is the simultaneous order of presentations when the depth quality is ignored. You become aware of it when looking at, instead of into, a painting, attending to the simultaneous order of pigments. For most human observers the topology of the visual field is well developed. Ill developed topology is known as ‘‘tarachopia,’’ scrambled visual field (Hess, 1982). One speaks of ‘‘local sign,’’ an address associated with each optic nerve fiber (Lotze, 1852). An idea by von Helmholtz (1977) provides a possible neural implementation. Correlation of nerve activities signals spatial overlap of receptive fields. As I showed (Koenderink, 1984) this can be ˇ ech cohomology developed into a mechanism that generates a C (see Fig. 4). A metric is probably calibrated via eye movements,21 the relevant observations and theory are due to von Helmholtz (1856, 1867). In this paper I simply treat the visual field as the Euclidean plane E2 , ignoring the various (important) conceptual problems with this notion. 4.2. The geometry of imaginary spaces You are aware of entities at various degrees of remoteness.22 This quality of remoteness is called ‘‘depth.’’ When a depth label is assigned to each point of the visual field, ‘‘visual space’’ becomes a fiber bundle E2 D, where D denotes the depth domain. Each point in the visual field has its own copy of the depth domain. In establishing a spatial configuration the psychogenetic process shifts depth values along the depth fibers, much like one shifts beads along the wires of an abacus (see Fig. 5). The structure of visual space results from this ‘‘Glasperlenspiel.’’ The resulting configuration is consistent with the depth cues identified by the iconogenetic process. The fibers of visual space are ‘‘visual rays,’’ these are loci in visual space that are composed of parallel points. Visual rays should not be confused with the rays of geometrical optics, or with the propagation of radiant power. They are closer to the notion of ‘‘rays’’ as used by Euclid (see Burton, 1945) and the other ancient authors. The rays of geometrical optics fan out from the anterior 20 ‘‘Iconogenesis’’ is an apt term for the subprocess of psychogenesis that is aimed at the genesis of visual awareness. 21 The crucial observation made by Helmholtz is that ‘‘Listing’s Law’’ of eye movement constrains the group of rotations of the eye ball to an abelian group. A study of the orbits of the group then leads to a useful geometrical structure of the visual field. 22 Remote ‘‘from the ego’’ if you want. The eye as a physical object is not involved.
nodal point of the eye,23 which, for relatively distant objects, is not that different from the bundle of concurrent rays at the center of rotation of the eye ball. Whether the human observer has established a correlation between the visual rays, and the rays of geometrical optics, is something on which the literature is silent. We have recently investigated this problem (Koenderink et al., 2009, 2010b), which may be referred to as the issue of ‘‘external local sign,’’ so as to distinguish it from Lotze’s local sign (Lotze, 1852; von Helmholtz, 1977; Koenderink, 1984). We find quite a bit of inter-observer variability. Most people treat their visual rays roughly as if it were a bundle of parallel geometrical optics rays (Fig. 6), as is evident from the perhaps surprisingly huge (tens of degrees of visual angle) errors they make in the judgment of angular relations. Depth is a serial order, as can be shown as follows (van Doorn et al., 2011). When I indicate two locations in a painting, of a realistic landscape say, you can generally tell which location is ‘‘closer.’’ Doing this for all pairs taken out of a set of N points, I collect P ¼ 12 NðN 1Þ pairwise rankings. A linear depth order of N items has only Q = N 1 degrees of freedom, thus it is a priori unlikely that it will account for the data. The observations have far too many degrees of freedom. For N = 50, a realistic number, one may account only for Q = 49 of the P = 1225 degrees of freedom. Thus it is non-trivial that I find that, empirically, such a set of observations can always be accounted for in terms of a linear order within the variance found in repeated sessions (see Fig. 7). Since P Q, this is a strong indication for the existence of a coherent one-dimensional realm. In the illustrated result the rankings were almost perfectly accounted for. Human observers easily resolved dozens of depth layers in a painting. Such methods work because a mark put on the picture plane will travel into depth in your visual awareness, until it attaches to the closest surface of an object. This property is crucial in the psychophysics, though its neural basis is fully in the dark. The reader may easily try this on a portrait by using a marker to put a freckle or beauty spot ‘‘on the cheek.’’ One sees the principle put in practice on many poster boards where politicians or super-models have acquired moustaches or black teeth. It is not hard to introduce a metric either (van Doorn et al., 2011). Instead of merely indicating two locations I put a circular disc at each location, and grant the observer control over their relative sizes (see Fig. 8). I instruct them to set these relative sizes such that the disks look like two ‘‘equally large’’ pictorial objects. 23 It is natural to let them ‘‘fan out’’ instead of ‘‘fan in,’’ because they are related to probing. This in no way implies an extramission theory of radiant power.
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
179
Fig. 6. At left geometrical rays fanning out from the eye. They roughly fill a half-space. At right the visual rays as the mind knows them. There is no eye in this picture and the rays do not fan out. Many observers have a bit of fanning out, about a ninety degree cone being typical.
Fig. 7. At the top left a frontal view of pictorial space, its base space. The base space is simply the picture plane. The picture is a copy after a wash drawing by Francesco Guardi, due to Anne-Sophie Bonno (http://www.atelier-bonno.fr/). The colored dots are points whose depths were psychophysically compared, one pair at a time, in random order. This yields the ranking order shown in the 3D plot at top right, and in the plan and elevation at bottom. The lines are the depth fibers of pictorial space.
180
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
Fig. 8. In the left column a pointer and target, in the right column the relative size cue. In the top row the left side is closer, in the bottom row the right size. These probes would be superimposed over a picture in an actual experiment.
Fig. 9. At left a frontal view of pictorial space, which is simply the picture plane. The colored dots are points whose depths were psychophysically compared by way of the size cue probe illustrated in Fig. 8 right. The points labeled with square marks are in the far field. This yields a metrical order shown in the 3D plot at right.
Fig. 10. At left a frontal view of pictorial space, which is simply the picture plane. The colored dots are points whose depths were psychophysically compared by means of two-way pointing, using the probes illustrated in Fig. 8 left. Pointing yields a metrical order shown in the 3D plot at right.
I consider the logarithm of the size ratio as a measure of the depth difference, using the heuristic that depth differences should combine linearly. Given the depth differences between the P ¼ 12 NðN 1Þ pairs taken out of N locations, I find the best fitting N depth values that explain the differences. Since absolute depth is undefined I arbitrarily set the average to zero. Perhaps surprisingly, this works really well, within the spread from repeated sessions. Since Q = N 1 P, this is again a strong indication for a onedimensional realm, this time with a metric (see Fig. 9). The depths for different observers are related as w0 = guu + gvv + k w + tw, with apparently idiosyncratic g and k. Notice that tw is determined by
the constraint put on the average. Thus the depth domain D apparently has the structure of the affine line A1 . How are these depths related to the coordinates of the picture plane? One way to study this is as follows (van Doorn et al., 2011). I superimpose the pictures of a pointer and of a target on the picture plane. The picture of the pointer can be put in various (pictorial) spatial attitudes, and is put under the observer’s control. The task is to let the pointer apparently point to the target in pictorial space (see Fig. 8). Since the pointing can be done either way one obtains N(N 1) directions. The two-way directions, combined with the distance in the picture plane define a unique parabolic arc,
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
181
Fig. 11. At top left a drawing by Picasso that appears to combine multiple viewpoints. We measured the spatial attitudes of tangent planes of the pictorial relief of the body of the frontmost person at 422 points. The scatterplot at bottom left compares the depths at corresponding points for two different observers. The coefficient of variation is only 0.49. A nonlinear gauge transformation (illustrated at top right) brings the coefficient of variation on a much higher level, namely 0.97 (scatter plot at bottom right). Apparently the observers use different ‘‘mental viewpoints,’’ and, moreover, assume different mental viewpoints for different parts of the picture. The surfaces at top right are the loci of the zero and unit points on the depth fibers of one observer that correspond to the canonical loci of the other observer.
and thus another depth difference, a total of P ¼ 12 NðN 1Þ of them. Once again I obtain a metrical depth order (see Fig. 10). The procedure turns out to be consistent too. Moreover, the three methods mutually agree up to transformations of the type w0 = guu + gvv + kw + tw. An example of a more general ‘‘gauge transformation’’ is shown in Fig. 11. Observers had to adjust a gauge figure superimposed on the picture so as to sample the spatial attitude of the tangent planes to a pictorial relief at 422 barycentra of the faces of a regular hexagonal triangulation. From this one finds the depths at the vertices that best explain these spatial attitude observations. The triangulation covers the larger part of the torso of the frontmost figure in the drawing. The gauge figure was a small wireframe ellipse that had to be adjusted so as to appear as ‘‘a circle painted upon the surface.’’ Notice that the pictorial relief covers a white area in the picture, thus it has to be due to contour information. In this case the stimulus is so ambiguous that the observers come up with somewhat different presentations. For the case of more realistic drawings, or photographs, we find that the simple affine transformation invariably succeeds, implying that the pictorial space is homogeneous. In the present case an overall affine transformation does not suffice though. One needs a transforma-
tion that varies slightly from place to place in the picture plane. Locally, this transformation is again of the type w0 (u, v, w) = guu + gvv + kw + tw, but the parameters {gu(u, v), gv(u, v), k(u, v)} (the depth shift is irrelevant) change smoothly from place to place. With such a transformation the coefficient of variation for the regression of depths from two observers at corresponding points increases from 0.49 (straight regression) to 0.97 (regression with the best transformation). As ‘‘best’’ transformation we applied the optimal quartics (leading to the highest coefficient of variation) for the parameters {gu(u, v), gv(u, v), k(u, v)}. In the figure we present the surfaces w0 (u, v, 0) and w0 (u, v, 1). These can be interpreted geometrically as frontoparallel planes at depths w = 0,1 for one observer, mapped into the pictorial space of another observer. Observations such as these suggest that it may be of interest to consider more general ‘‘gauge transformations’’ than the simple overall affinities. Apparently the idea of a consistent ‘‘imaginary space’’ is viable in the sense that it yields an economical description of a large body of otherwise unrelated data. Formally, it is consistent with a Cayley-Klein space (Cayley, 1859; Klein, 1893, 1872; Yaglom, 1979) of the type described formally by Strubecker (1941, 1942, 1943, 1945) and Sachs (1990).
182
J.J. Koenderink / Journal of Physiology - Paris 106 (2012) 173–182
5. Final remarks Human ‘‘imaginary space’’ is a fiber bundle E2 A1 , where the base space is the ‘‘visual field,’’ and the fibers the ‘‘depth’’ domain. The psychogenetic process shifts depth values along visual rays like beads on their strings. It does this on the basis of ‘‘depth cues’’ that are identified as such by the process itself. The result is ambiguous by its very nature, and the observer’s optical awareness consists of a sequence of ‘‘presentations’’ that are often quite similar to each other, though occasionally very different, just think of the familiar ‘‘flips’’ of a Necker cube. A large part of the ambiguity can be formalized as the group of isotropic rotations, angular scalings, and depth translations in a singly isotropic (otherwise Euclidean) Cayley–Klein space. This can be understood as resulting from the fundamental invariance properties of natural perspective. The psychogenetic process constrains its articulations through probing the visual front end. This part of the brain is readily available for formal descriptions that are close to the neural hardware. The implementation of the group of isotropic similarities, a geometrical object that can easily be probed through psychophysical means, remains fully in the dark though. Processes that seem readily amenable to neurophysiological study are the implementation of ‘‘probings’’ of the primary visual regions by the deep structures where the intentional processes are launched. Acknowledgments The empirical work on pictorial space (van Doorn et al., 2011) was done with Johan Wagemans (Laboratorium voor Experimentele Psychologie, Katholieke Universiteit Leuven) and Andrea van Doorn (Industrial Design, Delft University of Technology). This work was supported by the Methusalem program by the Flemish Government (METH/08/02), awarded to Johan Wagemans. References Alberti, L.B., 1435. Della Pittura. Various translations available online. Belhumeur, P., Kriegman, D., Yuille, A., 1999. The bas-relief ambiguity. International Journal of Computer Vision 35, 33–44. Bennett, M.K., 1995. Affine and Projective Geometry. Wiley, New York. Berger, M., 2007. A Panoramic View of Riemannian Geometry. Springer, New York. Bergson, H., 1907. L’Evolution créatrice. Various translation available online. Berkeley, G., MDCCIX. An Essay Towards a New Theory of Vision. Printed by Aaron Rhames, at the Back of Dicks Coffee-House, for Jeremy Pepyat, Bookseller in Skinner-Row, Dublin (text available online). Brentano, F., 1874. Psychologie vom empirischen Standpunkt. Verlag von Duncker & Humblot, Leipzig, Germany. Brown, J.W., 1996. Time, Will, and Mental Process. Plenum Press, New York. Burton, H.E., 1945. The optics of Euclid. Journal of the Optical Society of America 35, 357–372. Buseman, H., 1955. Geometry of Geodesics. Academic Press, New York. Canny, J., 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 679–714. Cayley, A., 1859. Sixth memoir upon the quantics. Philosophical Transactions of the Royal Society London 149, 61–70. Conan Doyle, A., 1887. A Study in Scarlet. Published in Beeton’s Christmas Annual. Dawkins, R., 1986. The Blind Watchmaker. W.W. Norton & Company, New York. Dennett, D., 1992. Consciousness Explained. The Penguin Press, London. Eddington, A. 1929. The nature of the physical world. Many editions (text available online). Feynman, R.P., 1970 . The Feynman Lectures on Physics: The Definitive and Extended Edition, second ed., vol. 3. Addison-Wesley. Geem, Z.W., Kim, J.H., Loganathan, G.V., 2001. A new heuristic optimization algorithm: harmony search. Simulation 76, 60–68. Gibson, J.J., 1950. The Perception of the Visual World. Houghton Mifflin, Boston. Hess, R.F., 1982. Developmental sensory impairment: amblyopia or tarachopia. Human Neurobiology 1, 17–29. Johnson-Laird, P.N., 1983. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge University Press, Cambridge. Klein, F., 1893, original 1872. Vergleichende Betrachtungen über neuere geometrische 641 Forschungen. Mathematische Annalen 43, 63–100.
Knuth, D.E., 1997. The Art of Computer Programming, vol. 1. Addison-Wesley, Reading, MA. Koenderink, J.J., 1982. Different concepts of ‘‘ray’’ in optics: link between resolving power and radiometry. American Journal of Physics 50, 1012. Koenderink, J.J., 1984. The concept of local sign. In: van Doorn, A.J., van de Grind, W.A., Koenderink, J.J. (Eds.), Limits in Perception. VNU Science Press, Utrecht. Koenderink, J.J., 1990. The brain a geometry engine. Psychological Research 52, 22– 127. Koenderink, J.J., van Doorn, A.J., 1992. Generic neighborhood operators. IEEE PAMI 14, 597–605. Koenderink, J.J., van Doorn, A.J., Todd, J.T., 2009. Wide distribution of external local sign in the normal population. Psychological Research 73, 14–22. Koenderink, J.J., Albertazzi, L., van Doorn, A.J., van Ee, R., van de Grind, W.A., Kappers, A.M.L., Lappin, J.S., Norman, J.F., Oomes, A.H.J., te Pas, S.P., Phillips, F., Pont, S.C., Richards, W.A., Todd, J.T., Verstraten, F.A.J., de Vries, S., 2010a. Does monocular visual space contain planes? Acta Psychologica 134, 40–47. Koenderink, J.J., van Doorn, A.J., de Ridder, H., Oomes, S., 2010b. Visual rays are parallel. Perception 39, 1163–1171. Lakoff, G., 1987. Women, Fire, and Dangerous Things: What Categories Reveal About the Mind. University of Chicago Press, Chicago. Lorenz, K., 1973. Die Rückseite des Spiegels. Piper Verlag, München. Lotze, R.H., 1852. Medicinische Psychologie oder Physiologie der Seele. Weidmansche Buchhandlung, Leipzig. Marr, D., 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, New York. Metzger, W., 1975. Gesetze des Sehens. Verlag Waldemar Kramer, Frankfurt a. M. Minsky, M., 1974. A Framework for Representing Knowledge. MIT–AI Lab Memo, 306. Palmer, S.E., 1999. Vision Science: Photons to Phenomenology. Bradford Books/MIT Press, Cambridge, MA. Pearl, J., 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge. Petitot, J., 2008. Neurogéométrie de la vision. Modèles mathématiques et physiques des architectures fonctionnelles. Les Editions de l’Ecole Polytechnique Distribution Ellipses, Paris. Poggio, T., 1984. Low–level vision as inverse optics. In: Rauk, M. (Ed.), Proceedings of Symposium on Computational Models of Hearing and Vision. Academy of Sciences of the Estonian S.S.R., pp. 123–127. Revonsuo, A., Newman, J., 1999. Binding and Consciousness. Consciousness and Cognition 8, 123–127. Riedl, R., 1984. Biology of Knowledge: The Evolutionary Basis of Reason. John Wiley & Sons, Chichester, UK. Sachs, H., 1990. Isotrope Geometrie des Raumes. Friedrich Vieweg & Sohn, Braunschweig. Schrödinger, E., 1992. What is Life? With Mind and Matter and Autobiographical Sketches. Cambridge University Press, Canto Editio, Cambridge. Searle, J., 1983. Intentionality: An Essay in the Philosophy of Mind. Cambridge University Press, Cambridge. Shannon, C., 1948. A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656. Skinner, B.F., 1938. The Behavior of Organisms: An Experimental Analysis. Appleton-Century-Crofts, New York. Striedter, G.F., 2005. Principles of Brain Evolution. Sinauer Associates, Inc., Sunderland MA, USA. Strubecker, K., 1941. Differentialgeometrie des isotropen Raumes I. Sitzungsberichte der Akademie der Wissenschaften Wien 150, 1–43. Strubecker, K., 1942. Differentialgeometrie des isotropen Raumes II. Mathematische Zeitschrift 47, 743–777. Strubecker, K., 1943. Differentialgeometrie des isotropen Raumes III. Mathematische Zeitschrift 48, 369–427. Strubecker, K., 1945. Differentialgeometrie des isotropen Raumes IV. Mathematische Zeitschrift 50, 1–92. Tinbergen, N., 1951. The Study of Instinct. Oxford Clarendon Press, London, UK. Treisman, A.M., 1969. Strategies and models of selective attention. Psychological Review 76, 282–299. van Doorn, A.J., Wagemans, J., de Ridder, H., Koenderink, J.J., 2011. Space perception in pictures. In: Rogowitz, B.E., Pappas, T.N. (Eds.), Human Vision and Electronic Imaging XVI, SPIE Proceedings, vol. 7865. von Helmholtz, H., 1856,1867. Handbook of Physiological Optics, 2 vols. (text available online). von Helmholtz, H., 1977. Epistemological writings. IV. The facts in perception. Appendix 1. On the localization of the sensations of internal organs. In: Cohen, R.S., Elkana, Y. (Eds.), Boston Studies in the Philosophy of Science XXXVIII. D. Reidel Publishing Company, Dordrecht, Holland/Boston USA. von Hildebrand, A., 1901. Das Problem der Form in der bildenden Kunst, third ed. (text available online). Wittgenstein, L.J.J., 1992. Tractatus Logico Philosophicus. Kegan Paul, London, entry 5.633. Yaglom, I.M., 1968. Complex Numbers in Geometry. Academic Press, New York (Transl. E. Primrose). Yaglom, I.M., 1979. A Simple Non–Euclidean Geometry and Its Physical Basis: An Elementary Account of Galilean Geometry and the Galilean Principle of Relativity. Springer, New York (Transl. A. Shenitzer).