Neuroethology in the service of neurophonetics

Neuroethology in the service of neurophonetics

Journal of Neurolinguistics 26 (2013) 511–525 Contents lists available at SciVerse ScienceDirect Journal of Neurolinguistics journal homepage: www.e...

1MB Sizes 3 Downloads 111 Views

Journal of Neurolinguistics 26 (2013) 511–525

Contents lists available at SciVerse ScienceDirect

Journal of Neurolinguistics journal homepage: www.elsevier.com/locate/ jneuroling

Theoretical article

Neuroethology in the service of neurophonetics Harvey M. Sussman a, b, * a b

Department of Linguistics, University of Texas, Austin, TX 78712, USA Department of Communication Sciences & Disorders, University of Texas, Austin, TX 78712, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 3 January 2013 Accepted 28 February 2013

Single-neuron recording methods, as commonly used in neuroethology studies, provide the needed spatial and temporal resolution capacities to generate explicit hypotheses addressing the ‘how’ of language processing. The goal of this article is to describe two well documented neural processing mechanisms that can provide insights into (1) the auditory decoding of speech sounds, and (2) disambiguation of context-induced variability in stop place perception. The neural unit underlying speech sound processing is the combination-sensitive neuron, and the neural entity best suited to resolve context-induced variability in the speech signal is the neural column. The ‘absorption’ of stimulus variability via signal-specific columnar encoding is contrasted to exemplar-based treatments of stimulus variability in neural systems. Ó 2013 Elsevier Ltd. All rights reserved.

Keywords: Neuroethology Neural columns Combination-sensitive neurons F2 variability Locus equations

1. Introduction Single linguistic events, such as generating a verb past tense, making a semantic association, or monitoring a phoneme target sound, occur across intervals of time best measured in milliseconds, and within neural spaces best measured in microns. A predominant methodological tool, however, used to study these language events, functional magnetic resonance imaging (fMRI), far exceeds the temporal and spatial resolutions needed to monitor these events in real time. The basic analysis unit of fMRIdthe 3D-voxel, customarily varies between 3 mm3 and 4 mm3. This volume of neural tissue encompasses activity spanning 100,000s of neurons interconnected by millions of synapses. The resultant blood oxygenation level dependent (BOLD) signal, thought to be correlated to the linguistic event, reaches its peak 6–7 s after the event is over. Despite these methodological shortcomings, fMRI

* Department of Linguistics, University of Texas, Austin, TX 78712, USA. Tel.: þ1 512 471 9002; fax: þ1 512 471 4340. E-mail address: [email protected]. 0911-6044/$ – see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jneuroling.2013.02.004

512

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

investigations have greatly enhanced our knowledge of ‘where’ in the brain linguistic operations might occur (e.g. Hickok & Poeppel, 2004). The more difficult challenge is understanding ‘how’ these brain structures do what they do. The brain is, after all, an alien structure, in the sense that we have a very limited understanding of how cognitive-based events happen within this amazingly complex and other worldly tissue. The purpose of this article is not to criticize the use of fMRI in language studies, but rather to enlighten a subset of linguists to the existence of another source of brain-related data that can possibly provide insights into ‘how’ language processes might be carried out by neural entities. The field of neuroethologydthe study of animal communication, primarily uses single-neuron electrophysiological recordings to discover the acoustic elements of species-specific input sounds a neuron is specialized to detect and process. Data based on single-cell recordings are the gold standard in terms of maximizing both temporal and spatial resolutions in monitoring neural activity. 1.1. Rationale for using an animal model to study the human model A neuroethological-based approach to study human language processing has limitations of its own to resolve. There must be compatibility between the level of language structure examined and the data emanating from neuroethology laboratories. For example, single-cell electrophysiological recordings from animal brains cannot inform us as to how syntactic or semantic operations are carried out in human brains. Compatibility does exist, however, at the level of auditory processing. The laws of physics dictate the structure of sound, whether they be shaping a species-specific call, biosonar echos, or modulating a second formant transition. At the outset, I would like to dispel potential skepticism from those who frown upon the notion of using an animal model to provide a theoretical springboard for conceptualizing language in the human model. It is safe to conclude that the human brain is a product of evolution. It may not be elegantly designed, or operate as an energy efficient device, but, in spite of its many shortcomings, it gets the job done (Linden, 2007). Secondly, and most importantly, evolution tends to have similar solutions for similar problems. The ‘conservation of mechanism’ principle (Gerhart & Kirschner, 1997) states that once mechanisms are invented by nature and serve successfully, those mechanisms are retained during evolution and may be modified and thereby adapted for new and higher-order processing. A third point relates to the fact that there are more similarities than differences in both the structure and function of auditory neural systems across species biologically specialized for processing sounds. Us humans can take our rightful place alongside crickets, rats, bats, frogs, birds, barn owls, and monkeys. 2. Neuroethology findings with phonetic-related implications 2.1. Combination-sensitive neurons: synchronous processing of multiple sound parameters Neuroethology studies have uncovered, across a wide variety of species, a ubiquitous type of auditory neuron specialized for spectral integrationdthe combination-sensitive neuron. These neurons are specifically “tuned to coincidence (synchronization) of impulses from different neurons in time, frequency and/or amplitude domains” (Suga, 1994, p. 135). The frequency differences between two arriving signals can span up to three octaves, and the temporal discrepancy between the two inputs can vary from zero to tens of milliseconds (Suga, 1989). The earliest stage of auditory processing that has recorded activity of combination-sensitive neurons in sound processing is the midbrain central nucleus of the inferior colliculus (Mittman & Wenstrup, 1995; Portfors & Wenstrup, 1999). The afferent inputs to these inferior collicular cells have been shown to project from the ipsilateral ventral and intermediate nuclei of the lateral lemniscus (Yavuzoglu, Schofield, & Wenstrup, 2011). Combination-sensitive neurons have been documented across: (i) frogs (Fuzessery & Feng, 1983; Mudry, Constantine-Paton, & Caprnica, 1977); (ii) birds (Margoliash, 1983; Margoliash & Fortune, 1992; Takahashi & Konishi, 1986); (iii) mammals: mustached bats (Olsen & Suga, 1991a, 1991b; Suga, O’Neill, Kujirai, & Manabe, 1983; Suga, O’Neill, & Manabe, 1978); brown bats (Neuweiler, 1983; Neuweiler, 1984); mouse (Hoffstetter & Ehret, 1992); cat (Sutter & Schreiner, 1991); and primates (Kadia & Wang, 2003; Olsen, 1994; Olsen & Rauschecker, 1992). The response characteristics of combination-

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

513

sensitive neurons are highly similar across species, indicating shared operational mechanisms. It would certainly be a fluke of evolution if humans, highly evolved for sound processing, were left out of this ‘not-so-exclusive’ club. 2.1.1. A primer on echo-location in the mustached bat The processing characteristics of combination-sensitive neurons will be illustrated by describing the signal used by the mustached bat as it seeks its dinner. In biosonar echo-location the bat emits a complex pulse consisting of four harmonics (H1 ¼ 30 kHz, H2 ¼ 60 kHz, H3 ¼ 90 kHz, and H4 ¼ 120 kHz). These harmonics consist of a constant frequency (CF1, CF2, CF3, CF4) portion, with durations in 10 s of msec, followed by a frequency modulated (FM) portion with a duration of only a few msec (FM1, FM2, FM3, FM4). When the emitted pulse ‘bounces off’ an object (say a mosquito), an echo signal returns to the bat containing the same four harmonics, but now the CF portions are Doppler-shifted, varying as a direct function of the changing velocity of the target preydthe laws of physics in operation for encoding a biological function. In all crucial respects the bat’s multi-harmonic pulse-echo signal closely resembles human speech, as they both contain steady-state frequency portions transitioning to rapidly changing FM sweeps. Interestingly, the second harmonic of the bat’s pulse-echo pair plays a disproportionate role in terms of information load and territory mapped in the bat’s primary auditory cortex. Human speech perception is similarly heavily in debt to the disproportionately useful acoustic structure of the F2 transition, perhaps the most informative segment of the acoustic speech signal, as this resonance directly encodes the movement path of the tongue. A specific illustration of how combination-sensitive neurons function, that has direct phoneticallyrelevant implications to the processing of human speech, can be seen in the work of Fitzpatrick, Kanwal, Butman, and Suga (1993) in the echolocating mustached bat. Though the illustrative example is from analyses of the biosonar signal of the bat, combination-sensitive neurons also respond to the bat’s social vocalizations as well (Esser, Condon, Suga, & Kanwal, 1997; Ohlemiller, Kanwal, & Suga, 1996). Social vocalizations in the bat contain many identifiable syllable-like ‘chunks,’ each characterized by multiple frequencies varying across stereotypical temporal patterns. The bat’s auditory cortex contains spectrotopic maps formed by combination-sensitive neurons sensitive to specific CFi  CFn pairings, FMi  FMn pairings, and most interesting, FMi  CFn pairings, i.e., transitions þ steady states. Fitzpatrick et al. (1993) teased apart separate harmonic components of a pulseecho pair and measured neuronal firing responses to these various separate vs. combined components. A maximum discharge occurred when the neuron was stimulated by the full [H1 þ H2] pairing (the [CF1 þ FM1] of the pulse compared to the [CF2 þ FM2] of the time-delayed returning echo). When presented with only H1, only H2, or just the CF1 portion with the entire H2, the neuron did not respond. However, when only the FM1 of the pulse was paired with the CF2 of the echo, the neuron was facilitated to the same extent as when the complete pulse-echo pairing (H1 þ H2) was presented. This result clearly illustrates the existence of specific information-bearing elements contained within the total signal ensemble. In other words, an isolated FM transition from one portion of the signal, together with a second sound element, can be as informative as the total signal [(pulse FM1 þ CF1) þ (echo FM2 þ CF2)]. In addition, it illustrates that combination-sensitive neurons can respond selectively to steady-state portions of formants and FM components that occur at a later point in time (relative to the CF portion). This evolved specialization for biosonar processing in the bat was necessary because of the time it takes for a returning Doppler-shifted echo signal to reach the auditory processors in the bat’s neural system relative to the earlier emitted pulse. The very existence of such highly specialized, delay-tuned, combination-sensitive neurons is highly suggestive of a class of auditory neurons that would be beneficial for speech perception, particularly for decoding stop þ vowel sequences. Hypothetically speaking, each and every FM1–3 portion of CV transitions can be processed in relation to the midvowel resonances that occur 30–60 ms later. The utility of delay-tuned response specializations will be made more explicit in later sections describing locus equations and their ability to categorize stop place utterances across-vowel contexts. 2.1.2. ‘Meow’ detectors in the cat Another example of neuronal sensitivities to time-varying acoustic stimuli with direct implications for speech processing can be found in Nelson, Erulkar, and Bryan (1966). Recording from single neurons

514

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

in the cat’s inferior colliculus, the authors classified populations of neurons that only responded to upward or downward FM sweeps. Within each of these groups, different subgroups were found that only responded to FM sweeps that started and ended at specific frequencies. Further subclassifications of these neurons were uncovered that maximally discharged if the rate of change of FM tonal sweeps fell within specified Hz/ms ranges. In sum, specialized neural detectors have been documented in cats that respond to specific rise times, further organized into specific ranges of onset-to-offset frequencies. Thus, the cat possesses highly specialized auditory neurons capable of responding to the vast array of FM (and amplitude modulated) sounds characterizing any possible input meow. The full range of rise time durations spans intervals that would be commensurate to differences among stops, semi-vowels, and diphthongs in human speech. 2.1.3. The ‘chuck’ call in the squirrel monkey Another example of a complex, multiple parameter, sensitivity in combination-sensitive neurons has been shown in the squirrel monkey. Recording in the dorsal section of the thalamic medial geniculate n., neurons were found capable of decoding complex, multi-component, sequences of acoustic segments (Olsen, 1994). This species-specific call is known as a ‘chuck,’ and consists of a specific linear ordering of three concatenated simple sounds: a ‘peep,’ a ‘yap,’ and a ‘cackle.’ The temporal ordering of these three components is crucial, as the combination-sensitive neurons only respond to the three sound elements when heard in that exact sequence. The ‘peep’ is an initial upward FM sweep, followed immediately by a downward FM component (the ‘yap’), followed by the constant frequency ‘cackle’ segment. Lower order neurons selectively respond to each of the three simple sounds, but no combination-sensitive neuron would fire to only one of the simple sounds, but only to the complete, serially arranged, three component, chuck sound. If the ordering of the three components was changed, or a component eliminated, the combination-sensitive neuron greatly reduced its response. Reversing the order of the three elements completely silenced the cell. Considering human speech, which is characterized by having multiple acoustic cues (e.g., for stop place: burst spectra, formant transition direction and extent, VOT intervals, segment durations), such combinationsensitive neurons can provide a glimpse into possible neural algorithms that have evolved to process speech sound combinations in their proper serially-ordered temporal sequence. 3. Resolving context-induced phonetic variability – the functional role of specialized neural columns The phonetic study of variation is alive and well. Investigators study variation caused by speaker gender, speaker size, and social communicative contexts. Perhaps the most complex and pervasive form of variability, however, is due to phonetic context. A classic example of context-induced variation lies at the heart of the enigma known as the ‘non-invariance’ issue (Liberman & Mattingly, 1985). Despite a limited relevance to only six phonemes, the stop consonants/bdgptk/, this coding enigma has caused contentious theoretical divisions in the field of speech perception for the last seven decades. Simply stated, the non-invariance issue is the lack of a one-to-one relationship between the acoustic signal and the phoneme. The ‘poster tokens’ that have characterized this context-induced variability dilemma can be seen in two formant synthesis schematics for CVs such as [di], [du], [da], and [do], made famous by Haskins Labs researchers. Despite the invariant perception of the alveolar stop/d/ across all four vowel contexts, the acoustic signal shows four distinctly different F2 transitions that encode the alveolar stopdrising for/i/contexts, and falling, to different extents, for/u a o/. Such varying acoustic signals for the same consonant spawned the Motor Theory of Speech Perception (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967; Liberman & Mattingly, 1985), as well as Direct Realism (Fowler, 1986). The prevailing conclusion from the motor/gesturalist view was that “there is simply no way to define a phonetic category in purely acoustic terms” (Liberman & Mattingly, 1985, p. 12). Neuroethology findings, however, obtained from recording single-neuron activity in sound localization networks in the barn owl, have provided an elegant example of how the brain actually resolves acoustic variability, when that variability is directly shaped by the laws of physics. In other words, the variability is inherent in the input signal, motivating the evolution of a normalization algorithm that

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

515

can sort it all out by simply deriving an acoustic-based commonality across the array of varying signal elements. In describing this work on sound localization in the barn owl, clear similarities will emerge between (i) the problem faced by the barn owl in locating the spatial sound source of a possible dinner, and (ii) that faced by a young child tasked with sorting stop þ vowel sound sequences into bilabial, alveolar, and velar categories. As the barn owl story unfolds think of ‘equivalence classes’ for interaural time differences as being analogous to phonetic/phonological categories, (i.e., [ba bi be bo bu..], [da di de do du..] vs. [ga gi ge go gu..]). Both the owl and the infant face common problems in that they have to utilize signal processing algorithms and neural mapping strategies that function to normalize the variability of an input signal shaped by the laws of physics. The goal for both species is allowing equivalence classes to form in neural representations in a self-organized and emergent fashion. The only difference is that the variability is co-temporal in the owl, and in the young child it must be learned over a lengthy period of development and the magic of acquiring language(s). 3.1. The interaural time difference columns in the inferior colliculus of the barn owl The neuroethology example to be described below is based on the work of Takahashi and Konishi (1986). It will require a short tutorial in understanding what delay lines are, and how they function to code the interaural time differences (ITDs) that signal azimuthal location of a sound source (i.e., in the horizontal dimension). The time difference between a sound arriving at the closer ear, relative to the more distant ear, informs the owl as to where a sound is coming from. There is no explicit clock in the owl’s brain directly encoding time. Instead, the owl derives time as an emergent property. The owl forms an auditory map of space using two acoustic input parameters that are inherently ambiguous by themselvesdthe frequencies contained within the complex input sound and the respective phase differences of those frequencies as they differentially arrive at the owl’s ears. The phase differences are formed by virtue of the time it takes for the acoustic wave to reach the lag ear relative to the lead ear. Phase and frequency information are processed by specialized combination-sensitive neurons evolved for this specific purpose. Unfortunately, a given phase difference is totally ambiguous independent of frequency, as each frequency component in the complex input sound yields a different phase difference. A simple analogy can clarify the ambiguity underlying frequency-phase relationships in general. Think of two runners racing around an oval track. If at some point in the race you see runner A ahead of runner B by six yards, can you unequivocally state runner A is winning based on this single snapshot view? The six yards indicates the ‘phase difference’ between the two runners, but what about frequency? Frequency, in this instance, is how many times each runner has circumvented the track. So without frequency data one doesn’t know if runner B is actually about to overtake runner A, as he has gone around the track more times, or if he is actually trailing runner A? Moral of the story: phase information is not informative (¼ambiguous) in the absence of frequency. The actual phase disparities between the sound’s arrival at each ear is initially calculated in the brain stem nucleus known as the nucleus laminaris. Delay lines in avian species such as the barn owl have been anatomically verified (Sullivan & Konishi, 1986) and work as tonotopically organized coincidence detectors. Delay lines are horizontally organized rows of neurons, at each tonotopic frequency band, spanning the owl’s extensive frequency range. These neurons basically function as ‘and’ gates. Signal inputs from each ear project to the same delay lines, but from opposite directions, with the lead ear signal temporally ahead of the lag ear signal. Synaptic activations of each of the delay line neurons proceeds, neuron-by-neuron, but from opposing directions. The spatial position of the neuron in the delay line receiving the largest synaptic activation (¼the simultaneously arriving activation), encodes the phase disparity, at that frequency. If an input sound is at the midline, equidistant from each ear, the ‘middle neuron’ of the delay line is maximally drivenda true place code. These phase disparities are sent to the next upstream nucleus, the central n. of the inferior colliculus. Takahashi and Konishi (1986) recorded from combination-sensitive neurons specialized for frequency/phase relationships in the central n. of the owl’s inferior colliculus. A diagram adapted from their study and summarizing their findings is shown in Fig. 1. The lower portion shows, in highly schematic form, the organization of the respective phase disparities, expressed as a percent, with

516

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

respect to each tonotopic frequency area. Positive percentages indicate right ear leads, and negative percents left ear lags. Zero phase percents correspond to a sound arriving at the midline, simultaneously to each ear. The combined x-axis information (phase disparities) with y-axis information (frequency) yields a z-axis, in this case the emergent ITD value, expressed in microseconds. In short, the owl derives clock time by comparing frequency and phase information together. The inherent ambiguity can easily be seen if one compares, for example, a 20% phase difference in the 2 kHz component of the input sound relative to the 4 kHz frequency component. A 20% phase difference at 2 kHz equals an ITD of 100 ms; the same 20% phase difference at 4 kHz equals 50 ms ITD. What’s a poor barn owl to do?

Fig. 1. A schematic diagram showing the organization of combination-sensitive neuron responses to frequency and phase components of input sounds in the central nucleus of the barn owl’s inferior colliculus (lower portion). The columnar projections from this nucleus project to the external nucleus, shown above. The shaded column at 50 ms ITD (interaural time difference) illustrates how this sound localization network achieves an invariant encoding of azimuthal sound localization in the barn owl.

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

517

In the upper portion of the figure, the external n. of the inferior colliculus is shown. This higherorder region receives direct projections from the various columnar arrays of the central n. The external n. however, possesses invariant coding of spatial location. Within circumscribed areas, all neurons respond to the same azimuthal location of a complex sound. Since variability characterizes the neural arrays of the lower central n. and invariance characterizes the neural arrays of the higher-order external n., the investigators wanted to ascertain how that invariant coding was achieved. As an example, the shaded area of the external n. is meant to encapsulate a region where all the neurons are activated by an input sound that is 30 degrees off center to the right. A 30 degree phase disparity is equivalent to a clock time for a sound that arrives at the right ear 50 ms ahead of the lagging left ear. To trace the origin of the ascending sensory inputs to this 30 degree-specific region, a radioactive tracer substance was injected into the region. The chemical substance is absorbed by the terminal endings of the central n. axons and propagates, via retrograd transport, down the collective axons to the neuronal cell bodies that were the sources of these synaptic inputs. The subsequent appearance of the radioactive material in a single columnar array of the central n. revealed that they all originated from the (shaded) column that possessed the full array of phase-frequency combinations collectively encoding 50 ms ITDs. Said in another way, the individual lamina of the column, all having different frequency-phase pairings, share an important commonalitydtime, specifically the ITD. The columnar array operates as a functional unit, encoding specific ITDs. The variable phase/frequency combinations, acting as a collective, yield a fixed ITD as an emergent property, allowing the owl to locate food in the dark. Fig. 2 schematically shows, in a different format, what Takahashi and Konishi (1986) documented in their electrophysiological recordings of single-cell discharges as the owl was behaviorally seeking the source of a target sound. Frequency values are shown in the far left hand column, followed by the respective periods (T) of each of these harmonic components, expressed in microseconds. For a common 50 ms ITD (¼30 degrees to right), the corresponding percent phase differences between the two ears is shown in the far right hand column, ranging from 10% at 2 kHz to 50% at 10 kHz. This array of phase disparities, for each input frequency component, represents the activations of combinationsensitivities recorded in the central n.dthe highlighted column that ‘lit up’ due to arrival of the radioactive tracer from the invariant 30 degree region of the external n. Each laminar portion of the 50 ms ITD column maps a given portion of the total assemblage of frequency/phase relations. Top to bottom, they spanned all possible occurrences of the two combinatorial bits of information. Tying all the laminar together, forming an ‘equivalence class’ for a given ITD, is the simple fact that all the frequency/phase pairs encode the same exact clock time. Could there be a similar, columnar-based, encoding system for all the variable F2 transitions that make up a stop place equivalence class? 3.1.1. The co-temporality problem If the ITD columns documented in the barn owl are to be used as a theoretical springboard for a possible model to resolve vowel context-induced F2 transition variability in speech, then a major problem exists. All frequency/phase inputs to the barn owl arrive simultaneously, as they are all frequency components of a single complex sound (that is why an owl cannot locate the source of a signal consisting of a single frequency). Listening to words beginning with a stop þ vowel sequence, however, occurs one word at a time. Hence, a representational coding problem arises: how does a single on-line CV find its way to postulated ‘category-level’ columns that inherently incorporate all possible vowel contexts with a given stop? Obviously, such hypothetical neural entities must develop over several years of language acquisition, driven by the wide variety of sensory inputs in the ambient language(s). 3.1.2. Resolving the co-temporality problem: object recognition in the macaque ‘what’ system A viable neural example of a columnar structure that organizes and develops with experience, over time, might very well exist in area TE of the macaque’s temporal lobe. Tanaka (1993) sought to uncover selectivity properties of neurons in the “what” area performing object recognition in the macaque monkey. Starting with pictures of complete real world objects (e.g., the head of a tiger), they systematically simplified objects, reducing them, step-by-step, into simple geometric-based visual features that still captured the visual essence of the full object. At all levels of deconstruction they monitored the effectiveness of the specific stimulus to excite the neuron being recorded from. Tanaka

518

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

Fig. 2. A simplified table showing how the variable frequency/phase relationships at each frequency region encode the common 50 ms ITD.

systematically derived a set of 12 critical visual features that could sufficiently activate single neurons in area TE. Afterward, when making vertical penetrations of the same TE cortex they found neurons along the way that shared similar or closely related visual features to the basic critical feature previously documented in isolated neurons. The overlap of effective stimuli and the subtle variations of near neighbors served to enhance the precision of object representation. What emerges from area TE is characterized by an invariance to stimulus position, a constancy of shape across inherent variations. Such columns do not contain co-temporal inputs, but rather must develop from differing visual experiences obtained over extended time periods during development. As stated by Tanaka: “Object recognition is not template matching between input image and stored images, but is a flexible process in which considerable changes in images, resulting from different illumination, viewing angle, and articulation of the object, can be tolerated” (2003, p. 685). Thus, visual cortical columns performing object recognition also work to buffer or absorb signal variations. They create as a collective, operating within tolerance limits, as long as the categories of visual features contain lawful transforms that possess a stimulus-based commonality.

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

519

4. Locus equations: a phonetic tool illustrating a lawful commonality in stop D vowel utterances Over the past two decades my colleagues and I have been exploring the limits, characteristics, and functional implications of a plotting algorithm known as locus equations (see Lindblom & Sussman, 2012 for a review). Briefly stated, locus equations are linear regressions of the frequency of the F2 transition sampled at its onset, on the frequency of F2 when measured in the vowel nucleus. These frequency data points are plotted for a single stop consonant produced with a wide range of following vowels. F2onsets are plotted along the y-axis and F2midpoints along the x-axis. For a given stop place category, e.g., [dV] as in “deet, dit, debt, date, dat, dot, dut, doot, daught, dote,” data coordinates have been consistently shown to tightly cluster in a positively correlated scatterplot. The scatterplot is fit with a linear regression line (the locus equation) of the form

F2onset ¼ k  F2vowel þ c

(1)

where k and c are constants, slope and y-intercept respectively. Fig. 3 shows a representative locus equation scatterplot for [dVt] tokens with 10 vowel contexts, each randomly repeated, five times. Notice the extremely lawful and linear appearance of this scatterplot of an alveolar stop category, despite the existence of 10 highly variable F2 transitions. The R2 values of locus equations customarily exceed .90, a not-so-often-seen characteristic of speech acoustic data across varying contexts. The context-induced variability of the F2 transition basically disappears when displayed in a locus equation format. Simply displaying a category-level representation of stop consonant þ vowel sequences, rather than individual tokens, elicits a completely different perspective on the age-old variability problem associated with stop þ vowel utterances. Notice that/d/in front vowel contexts and/d/ in back rounded vowel contexts all lie along the same linear acoustic-based path, despite the physical fact that the former have rising F2 transitions and the latter falling F2 transitions. An intriguing question arises: is the undeniable acoustic lawfulness of locus equation plots an indication of a possible, higherorder, auditory-based lawfulness?

Fig. 3. A representative example of a locus equation scatterplot for production of [dVt] words. Ten vowel contexts were used with five repetitions per token.

520

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

Similar highly linear scatterplots exist for labial and velar stops, and each stop place category is characterized by a different slope value, shaped by the particular degree of coarticulation characterizing production of the three stop classes. Slopes approaching 1.0 signify maximum degrees of coarticulation, as the vowel heavily influences the point of occlusion of the tongue; flatter slopes, as found in alveolar plots, signify lesser degrees of coarticulation, as the place of occlusion is less responsive to vowel contextual influences. Velar plots lie in between labial and alveolar values, and possess two distinctly different, but linear, subgroupsdvelars produced with back vowels and velars produced preceding front vowels (Sussman, McCaffrey, & Matthews, 1991). Coarticulation, from a locus equation perspective, is not seen as an unfortunate by-product of achieving a solution to the temporal rate limitations of the ear, as proponents of the Motor Theory have claimed (Liberman & Mattingly, 1985), but rather is an active articulatory mechanism to create acoustically contrastive stop place sound categories. Interestingly, when the locus equation-derived parameters of slope and y-intercept were used as predictor variables in a discriminant analysis procedure, across 20 speakers (10 male and 10 female), category affiliation (bdg) was predicted with 100% accuracy (Sussman et al., 1991). Linear, tightly clustered, locus equation plots for stop place productions appear to be a linguistic universal and have, to date, been documented for speakers of English, Swedish, Spanish, French, Arabic, Estonian, Urdu, and Thai. The linear and tightly clustered locus equation scatterplots, statistically contrastive across the three stop place categories, can simply be an epiphenomenon of phonetic structure, or perhaps they reflect a functional significance. At minimum, they illustrate that a self-organized ‘normalization’ of the vowel context-induced variability exists in the speech sound waves as they are spoken by normal speakers. The stop þ V input sounds, as a category-level collective, inherently possess a lawful, statistically correlated relationshipdideal inputs for combination-sensitive neurons tasked to categorize and map sound categories. Neurologically impaired speakers, with highly unintelligible articulations, such as children with Developmental Apraxia of Speech, do not produce linear, tightly clustered scatterplots, and most importantly, they produce locus equation slopes that are highly similar rather than distinctively different (Sussman, Marquardt, Doyle, & Knapp, 2002). The low intelligibility of spoken bV, dV, and gV utterances in DAS speakers is a by-product of this failure to differentially tweak coarticulatory extents in the production of the three stop classes. Sussman, Duder, Dalston, and Cacciatore (1999), analyzing babbled ‘CVs’ (perceived as stops), early first words, and meaningful speech of a single child, from seven to 40 months, documented the developmental pattern of coarticulation. Babbling utterances, unanimously transcribed as [bV] and [dV] tokens, exhibited flat slopes for labials and steep slopes for alveolar-like babbles. As her babbles emerged into first words, and as her first words more closely approached adult-like phonological forms, the labial CVs became characteristically steeper (higher extents of coarticulation) and the formerly steep [dV]s of babbles gradually flattened out to closely resemble adult [dV] sequences characterized by minimal extents of coarticulation. 5. Can specially evolved neural columns be the coding mechanism to resolve context-induced phonetic variability in human speech? Most neuroscientists would agree that a “column now refers to cells in any vertical cluster that share the same tuning for any given receptive field attribute” (Horton & Adams, 2005, p. 837). However, despite its ubiquitous presence across species, the stereotyped cortical column has failed to live up to its original promise in providing a unifying principle to explain how the cerebral cortex functions (Horton & Adams, 2005). For example, ocular dominance columns are found in some species, but not others, with no noticeable difference in visual abilities in either case. This dampening of the original optimism that greeted Mountcastle’s (1957, 1978) pioneering work in describing somatosensory columns is captured by Horton and Adams: “At some point, one must abandon the idea that columns are the basic functional entity of the cortex. It now seems doubtful that any single, transcendent principle endows the cerebral cortex with a

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

521

modular structure. Each individual area is constructed differently, and each will need to be taken apart, cell by cell, layer by layer, circuit by circuit and projection by projection to describe fully the architecture of the cortex” (p. 851). The difficulties encountered in verifying a uniform functional role for cortical columns across species implies that each species, if need be, can evolve their own uniquely adaptive means for representing/ coding information-bearing parameters of input signals. There is no doubt that specialized columns exist, as I have previously described two such functionally unique columnsdthe ITD columns in the inferior colliculus of the barn owl, and the object recognition columns in inferior temporal cortex of the macaque. The inferior collicular ITD columns, so elegantly deciphered by Takahashi and Konishi (1986), represent a unique functional unit, consisting of vertically organized clusters of neurons, spanning tonotopically organized lamina, and all processing similar physical attributes of the input signaldvariable phases in relation to a large range of frequency components. Obviously a species whose very existence depends on foraging for food at night would evolve an effective neural structure to help localize that food source. Similarly, macaques possess object recognition columnar units that encode a variety of basic shapes. By representing the ‘tolerance bandwidths’ of acceptable images affiliated with a given category of object, Tanaka’s ‘what’ columns help resolve the invariance problem unique to visual processing. Variations in visual features are lawfully related, just like acoustic features in speech. The neural maturation of these object recognition columns in the macaque cannot develop from only a few exposures, but rather, like language, must be learned by years of experiential stimulation over time periods measured in years. What sort of neural columns would be uniquely beneficial to humans? The emergence of spoken language would necessitate the development of never-before-seen neural algorithms tasked to normalize variability in the input speech signal. There is no better neural structure to instantiate a normalization process than columnar architectures that can bind together input sounds, shaped by lawful variations, and collectively characterizing an equivalence class that forms a phonologicallyrelevant grouping of sounds. Again, Horton and Adams (2005): “Comparative anatomy provides a glimpse at brain structure during only one moment in evolution. We know very little about how brains evolved to take their present form. It is apparent that considerable natural variation occurs among members of a species in normal brain structure. Nature, being a tinkerer (Jacob, 1977) probably uses this variation as a substrate for brain evolution. Columnar structures that have no function in some species may acquire a function in others through evolution.” [p. 851]. 5.1. Postulated neural columns in humans tasked to absorb variability in F2 transitions coding stop place of articulation Based on (i) the ubiquitous existence of neural columns across species, (ii) their highly specialized and unique, evolutionarily-driven functional encoding properties, and as documented in the barn owl and macaque monkey, (iii) their documented ability to resolve signal variation/ambiguity, I suggest there is a high probability that neural columns exist in auditory processing regions of the human brain that can function to resolve and normalize inherent variation/ambiguities in human speech. The basic requirement to fulfill such a normalizing function is that signal variability be governed by the laws of physics. The shaping and filtering of speech sounds are such a signal. The trick is to find a speech-based acoustic commonality that can serve to bind the ‘allophonic’ variations of stop þ V sequences together, and thus be contrastive across stop place. In [bV, dV, gV] locus equation scatterplots the slope of the linear regression function is a statistically significant predictor (100%) of stop place category affiliation (Sussman et al., 1991). The locus equation slope is a graphic-based statistic characterizing how each coordinate, affiliated with a given stop place category, organizes itself within F2 transition acoustic space. The neural correlate of what a locus equation slope signifies, however, is a difficult concept to conceptualize. If the collective F2onsets w F2offsets comprising a given stop consonant, parameterized as a LE slope and y-intercept, reflect the phonetic level at which non-overlapping equivalence classes first emerge, then the phonetic category is the logical level of phonological organization for the brain to

522

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

represent (or map). Using the ITD columns of the barn owl as a model neural system for establishing invariant coding of highly variable input signals, the next logical thought is to entertain the idea that auditory columns can exist that are tasked to orthogonally encode the various F2onsets in relation to their F2offsets in the vowel nucleus. Neural units capable of such encoding talents already exist. Combination-sensitive neurons functionally similar to those documented by Fitzpatrick et al. (1993) in the mustached bat can easily provide the neuronal units needed to encode the onset Hz of an F2 transition in relation to its delay-tuned F2 offset in the vowel. A hypothetical version of such columns is shown in Fig. 4. Activation of sets of such columnar entities, spanning tolerance ranges allowing for speaker-variations, would reflect a neural architecture possessing the coding commonality that is so strongly suggested by locus equation slopes. While an agnostic-based position is perhaps prudent, there are other factors that can indirectly support the notion of F2 transition variability-absorbing columns. It is well known that input signals to learning algorithms or neural-based models, tasked to establish categories, perform best when high degrees of statistical regularities exist across the variable inputs (Barlow, 1989; Christiansen, Allen, & Seidenberg, 1998; Marti & Rinzel, 2012; Saffran, Aslin, & Newport, 1996). Auditory inputs possessing high degrees of statistical regularity in their acoustic-based co-occurrences (e.g., r2 values > .90) would be ideal input signals for F2 specialized combination-sensitive neurons to learn, respond to, and subsequently organize/map. A recent study (Berry & Weismer, under review) provides an added insight into the locus equation paradigm. Berry and Weismer sought to test the claim that locus equation slopes were a valid index of the extent of coarticulation existing between the vowel and preceding stop. They generated a large scale (800 tokens per speaker), nearly continuous variation, of speaking rate from extremely slow-to-

Fig. 4. Schematic conception of columnar arrays encoding F2 transition onsets and offsets for stop place equivalence classes.

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

523

slow-to-fast-to-extremely fast in [bVt] words (beat, bit, bait, bet, bat, boot, boat, bought, bot). Vowel durations were used to categorize the different speaking rates. When the data were pooled across the 10 vowel contexts (the typical locus equation methodology) the locus equation slopes varied systematically with speaking ratedslope coefficients increased with faster rates and decreased with slower rates. Berry & Weismer then analyzed the speaking rate effect using a within-vowel metric rather than (the typical) across-vowel contexts. The systematicity of slope values reflecting coarticulation extents as a function of speaking rate completely disappeared. Moreover, locus equation scatterplots, now separately derived for each of the 10 vowels, for each of the discrete speaking rate categories, revealed a total lack of correlation between F2 onset and F2 midvowel. The signature locus equation linearity and tight clustering of data points around the regression line was totally missing. The mean r2 was only .26 across the 40 locus equations, with 14 having r2’s < .10. As the authors stated: “In effect, variability is required to evidence invariance” (p. 17). This empirical demonstration nicely highlights the main message of this articledinvariance is an emergent property of categories comprised of signal elements systematically varying along a lawful dimension. One of the most intensively studied cases is stop place categorization as signaled by varying F2 transitions. We now have an empirically established metric, locus equations, that illustrate lawful categorical orderliness; what remains is understanding how neural architectures mediate this higher-order acoustic organization. Neural columns, in principle functioning in a similar fashion to ITD columns in the barn owl, are a logical first guess. Unfortunately, no current methodologies can unequivocally test for the presence of such functional columns in humans. Chang, Rieger, Johnson, Berg, and Knight (2010) have come the closest in providing a neural-based suggestion of phonetic maps. Using preoperative recordings from intracranial high density cortical electrode arrays implanted in human superior temporal cortex, they reported neural responses evidencing, with fine temporal precision and spatial localization, “ba, da, ga” distinctions along a categorical perception F2 transition continuum. Using such an intracranial recording system, with the three stops (/bdg/) across several vowel contexts would provide a crucial test of distinct stop place maps spanning vowel contexts. A locus equation-based columnar notion would predict just such representations. They may well be distributed throughout auditory processing areas, but if they do exist, spatial distinctness should be evident. Brain imaging studies using whole brain multivariate pattern-based analyses (e.g., Raizada & Poldrtack, 2007; Raizada, Tsao, Liu, & Kuhl, 2010) would also be appropriate to assess the existence of multi-voxel networks functioning as category-level instantiations of stop place groupings.

6. Exemplar-based vs. neuronal columnar-based approaches to resolving variability The resolution of stimulus variability via columnar-based algorithms provides a novel contrast to the popular nonanalytic, instance-based, or exemplar view of cognition (Brooks, 1978; Jacoby & Brooks, 1984; Pisoni, 1992). Both views consider variability to be informative and lawful, but instance-based views do not consider perceptual normalization as necessary, as all episodic instances are supposedly represented. Advocates of exemplar theory disagree with traditional abstractionist approaches that view variability as unwanted noise to be eliminated (Liberman et al., 1967). Instance-based accounts of perception hold that the variable attributes of the speech signal are retained as “part of the internal representation of speech in memory” (Pisoni, 1992, p. 1). Episodic information is not eliminated, but encoded and retained as part of the perceptual analysis. The particulars are preferred over the abstract generalizations. While such approaches provide an interesting alternative to traditional abstractionist views of speech perception, it must be remembered that exemplar approaches in speech research have only used talker based variability (due to size, gender), and not context-induced variability, such as the variable F2 transitions in stop place categorization. Encoding and remembering every single instance of a stop þ vowel utterance that one hears in a lifetime does not seem like an elegant and useful way for the human brain to process information. While the exact neural mechanisms that would bind F2onsets w F2offsets, belonging to a particular stop place category, together in orthogonally mapped 2-D columnar arrangements is presently unknown, the basic hypothesis of such columnar structures possesses a neural reality that is missing in

524

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

exemplar notions. Diane Ackerman (2005) has eloquently captured the take home message of this argument when she said: “The brain is a five-star generalizer. It simplifies and organizes, reducing a deluge of sensory information to a manageable sum..The brain doesn’t have room to record the everythingness of everything, nor would that be a smart strategy.”

References Ackerman, D. (2005). An alchemy of mind: The marvel and mystery of the brain. New York: Scribner. Barlow, H. B. (1989). Unsupervised learning. Neural Computation, 1(3), 295–311. Berry, J., & Weismer, G. Speaking rate effects on locus equation slope. Journal of Phonetics, under review. Brooks, L. (1978). Nonanalytic concept formation and memory for instances. In E. Rosch, & B. Lloyd (Eds.), Cognition and categorization). Hillsdale, NJ: Erlbaum. Chang, E. F., Rieger, J. W., Johnson, K., Berg, M. S., & Knight, R. T. (2010). Categorical speech representation in human temporal gyrus. Nature Neuroscience, 13, 1428–1432. Christiansen, M. H., Allen, J., & Seidenberg, M. (1998). Language and Cognitive Processes, 13(2/3), 221–268. Esser, K. H., Condon, C. J., Suga, N., & Kanwal, J. S. (1997). Syntax processing by auditory cortical neurons in the FM–FM area of the mustached bat. Proceedings of the National Academy of Sciences of the United States of America, 94, 14019–14024. Fitzpatrick, D. C., Kanwal, J. S., Butman, J. A., & Suga, N. (1993). Combination-sensitive neurons in the primary auditory cortex of the mustached bat. Journal of Neuroscience, 13, 931–940. Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics, 14, 3–28. Fuzessery, Z. M., & Feng, A. S. (1983). Mating call selectivity in the thalamus and midbrain of the leopard frog (Rana p. pipiens): single and multiunit analyses. Journal of Comparative Physiology, 150, 333–344. Gerhart, J., & Kirschner, M. (1997). Cells, embryos, and evolution. Oxford, UK: Blackwell Science. Hickok, G., & Poeppel, D. (2004). Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition, 92, 67–99. Hoffstetter, K. M., & Ehret, G. (1992). The auditory cortex of the mouse: connections of the ultrasonic field. Journal of Comparative Neurology, 323, 370–386. Horton, J. C., & Adams, D. L. (2005). The cortical column: a structure without a function. Philosophical Transactions of the Royal Society of London, Biological Sciences, 360, 837–862. Jacob, F. (1977). Evolution and tinkering. Science, 196, 1161–1166. Jacoby, L. L., & Brooks, L. R. (1984). Nonanalytic cognition: memory, perception, and concept learning. In G. Bower (Ed.), The psychology of learning and motivation (pp. 1–47). New York: Academic Press. Kadia, S. C., & Wang, X. (2003). Spectral integration in A1 of awake primates: neurons with single and multipeaked tuning characteristics. Journal of Neurophysiology, 89, 1603–1622. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74, 431–461. Liberman, A. M., & Mattingly, I. (1985). The motor theory of speech perception revised. Cognition, 21, 1–36. Lindblom, B., & Sussman, H. M. (2012). Dissecting coarticulation: how locus equations happen. Journal of Phonetics, 40, 1–19. Linden, D. J. (2007). The accidental mind: How brain evolution has given us love, memory, dreams, and god. Cambridge: Belknap, Harvard University Press. Marti, D., & Rinzel, J. (2012). Dynamics of feature categorization. Neural Computation, http://dx.doi.org/10.1162/NECO_a_00383. Mittman, D. H., & Wenstrup, J. J. (1995). Combination-sensitive neurons in the inferior colliculus. Hearing Research, 90, 185–191. Mudry, K. M., Constantine-Paton, M., & Caprnica, R. R. (1977). Auditory sensitivity of the diencephalon of the leopard frog, Rana p. pipiens. Journal of Comparative Physiology, 114, 1–13. Margoliash, D. (1983). Acoustic parameters underlying the responses of song-specific neurons in the white crowned sparrow. Journal of Neuroscience, 12, 1039–1057. Margoliash, D., & Fortune, E. S. (1992). Temporal and harmonic combination-sensitive neurons in the zebra finch’s HVc. Journal of Neuroscience, 12, 4309–4326. Mountcastle, V. B. (1957). Modality and topographic properties of single neurons of cat’s somatic sensory cortex. Journal of Neurophysiology, 20, 408–434. Mountcastle, V. B. (1978). An organizing principle for cerebral function: the unit module and the distributed system. In G. M. Edelman, & Mountcastle (Eds.), The mindful brain: Cortical organization and the group selective theory of higher brain function (pp. 7–51). Cambridge, MA: MIT Press. Nelson, P. G., Erulkar, S. D., & Bryan, S. S. (1966). Responses of units of the inferior colliculus to time-varying acoustic stimuli. Journal of Neurophysiology, 29, 834–860. Neuweiler, G. (1983). Echolocation and adaptivity to ecological constraints. In F. Huber, & H. Markl (Eds.), Neuroethology and behavioral physiology: Roots and growing pains). New York/Heidelberg: Springer-Verlag. Neuweiler, G. (1984). Foraging, echolocation and audition in bats. Naturwissenschaften, 71, 446–455. Ohlemiller, K., Kanwal, J. S., & Suga, N. (1996). Facilitative responses to species-specific calls in cortical FM–FM neurons of the mustached bat. NeuroReport, 7, 1749–1755. Olsen, J. F. (1994). Medial geniculate neurons in the squirrel monkey sensitive to inter-component delays that categorize species-specific calls. Abstracts of the Association for Research in Otolaryngology, 17, 21. Olsen, J. F., & Rauschecker, J. P. (1992). Medial geniculate neurons in the squirrel monkey sensitive to combinations of components in a species-specific vocalization. Society for Neuroscience Abstracts, 18, 883.

H.M. Sussman / Journal of Neurolinguistics 26 (2013) 511–525

525

Olsen, J. F., & Suga, N. (1991a). Combination-sensitive neurons in the medial geniculate body of the mustached bat: encoding of relative velocity information. Journal of Neurophysiology, 65, 1254–1273. Olsen, J. F., & Suga, N. (1991b). Combination-sensitive neurons in the medial geniculate body of the mustached bat: encoding target range information. Journal of Neurophysiology, 65, 1275–1296. Portfors, C. V., & Wenstrup, J. J. (1999). Delay-tuned neurons in the inferior colliculus of the mustached bat: implications for analyses of target distance. Journal of Neurophysiology, 82, 1326–1338. Pisoni, D. B. (1992). Some comments on invariance, variability and perceptual normalization in speech perception. In Proceedings of the 1992 international conference on spoken language processing. Banff, Canada, Oct. 12–16. Raizada, R. D. S., & Poldrtack, R. A. (2007). Selective amplification of stimulus differences during categorical processing of speech. Neuron, 56(4), 726–740. Raizada, R. D. S., Tsao, F.-M., Liu, H.-M., & Kuhl, P. K. (2010). Linking brain-wide multivoxel activation patterns to behavior: examples from language and math. Neuroimage, 51(1), 462–471. Saffran, J. J., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month old infants. Science, 274, 1926–1928. Suga, N. (1989). Principles of auditory information-processing derived from neuro-ethology. Journal of Experimental Biology, 146, 277–286. Suga, N. (1994). Multi-function theory for cortical processing of auditory information: implications of single unit and lesion data for future research. Journal of Comparative Physiology, A, 175, 135–144. Suga, N., O’Neill, W. E., Kujirai, K., & Manabe, T. (1983). Specificity of combination-sensitive neurons for processing of complex biosonar signals in the auditory cortex of the mustached bat. Neurophysiology, 49, 1573–1627. Suga, N., O’Neill, W. E., & Manabe, T. (1978). Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the mustached bat. Science, 200, 778–781. Sullivan, W. E., & Konishi, M. (1986). Neural map of interaural phase difference in the owl’s brainstem. Proceedings of the National Academy of Sciences of the United States of America, 83, 8400–8404. Sussman, H. M., Duder, C., Dalston, E., & Cacciatore, A. (1999). An acoustic study of the development of CV coarticulation: a case study. Journal of Speech, Language, and Hearing Research, 42, 1080–1096. Sussman, H. M., Marquardt, T., Doyle, J., & Knapp, H. (2002). Phonemic integrity and contrastiveness in developmental apraxia of speech. In F. Windsor, M. L. Kelly, & N. Hewlett (Eds.), Investigations in clinical phonetics and linguistics (pp. 311–326). Mahwah, N.J.: Lawrence Erlbaum. Sussman, H. M., McCaffrey, H. A., & Matthews, S. A. (1991). An investigation of locus equations as a source of relational invariance for stop place categorization. Journal of the Acoustical Society of America, 90, 1309–1325. Sutter, M. L., & Schreiner, C. E. (1991). Physiology and topography of neurons with multipeaked tuning curves in cat primary cortex. Journal of Neurophysiology, 65, 1207–1226. Takahashi, T., & Konishi, M. (1986). Selectivity for interaural time difference in the owl’s midbrain. Journal of Neuroscience, 6, 3413–3422. Tanaka, K. (1993). Neural mechanisms of object recognition. Science, 262, 685–688. Yavuzoglu, A., Schofield, B. R., & Wenstrup, J. J. (2011). Substrates of auditory frequency integration in a nucleus of the lateral lemniscus. Neuroscience, 169, 906–919.