ORIGINAL ARTICLES An Artificial Neural Network Analogue of Learning in Autism Ira L. Cohen
An artificial neural network is simulated that shares formal qualitative similarities with the selec6pe aUen6on and generaliza6on deficits seen in people with autism. The model is based on neuropathological studies which suggest that affected individuals have either too few or too many neuronal connec6ons in various regions of the brai~ In simuloJions where the model was taugl~ to discriminate children with autism from children with mental retard~on, having too few simulated neuronal connections led to rela6vely inferior discrimina6on of the two groups in a training set and, consequently, relatively inferior generalization of the discrimination to a nm~l test set. Too many connections produced excellent discriminalion but inferior generalization because of overemphasis on details unique w the training set. It is concluded that, within the comext of the current model, the neuropathological observoJions that have been described in the literature are su~cien! to explain some of the unique pattern recognition and discrindna6on learning abilities seen in some people with autism as well as their problems with generalization and concept acquisition. The model generates testable hypotheses that have implications for understanding the pathogenesis, treatment, and phenomenology of autism.
Key Words: Autism, neural networks, neuropathology, pathogenesis, discrimination learning, generalization
Introduction Autism is a perplexing disorder characterized by complex deficits in receptive and expressive social-communication skills and attention (e.g., lack of gesture, poor eye contact, echolMia, perseveration, and poor generalization skills) along with unique skills, in some persons, in low-level
r-tom the New Yodt State O f f ~ of Me~.~ Retardation and Develolnnent Disabilities, New Ymk SIn= L,tuLm= fe¢ lhsic Rmean:h in DevelapmenmiDisabilitics, Slatca l s I ~ N Y . Addne~ nqpt~ requests to Dr. ira L Colza, Department of Psychology, New York Stue Imtitete fm Bask: Re~em~ in Developmental Di~ilitie~ 10~OFotest Hill Rd., SlmlmIsland. lqY 10314. Received Sqmember 22.1993: revised December 13,1993.
© 1994S0cietyofBiological Psychiatry
processing and recall of spatio-temporal, visual, and/or auditory patterns such as learning of routines, hypeflexia~ and savant skills (see general discussion). The disorder has multiple etiologies (Gillberg and Coleman 1992) and, possibly, multiple behavioral subgroups (Cohen et al 1993) and has attracted the attention of researchers since the time of Kanher's (1943) initial description of the syndrome. There have been a number of biological theories that have been developed to accou_-t~for some of the phenomena (cf. Gillberg and Coleman 1992) but, until recently, postulated defects in neuronal systems have been purely hypothetical. This occurred, in paxt, because neuroanatomical defects associated with autism had not been consistently found in earfier imaging or autopsy investigations. For example, in (3006-3223/94,'$07.00
6
mtoc~Yc'gts'm't'
I.L Cohen
lttll;llt:5,.~ll
one of the first published neuropathological investigations, Williams et al (1980) were unable to identify specific changes in four people with "autistic features." It should be noted, however, that one of the cases was reported to have reduced density of Purkinje cells in the cerebellum and, in two cases with severe retardation, reduced dendritic spine density in neocortical pyramidal cells was observed in a Golgi analysis. The significance of the dendritic spine density observations was qualified, however, because of methodological issues. Further, one of the two clearly autistic cases did not have a reduced dendritic spine count and his hippocampal neurons were well-developed. More recently, neuropathological investigations of affected individuals have revealed apparent differences that appear to be associated with the s~ndrome~ Bauman and Kemper (1985, 1986) have reported, in two cases with autism (a 29-year-old man and an I l-year-old gid), neuropathological anomalies in a variety of both forebrain and hindbrain structures. Increased neuronal density and smaller neurons were noted in the following medial temporal suuctures: the hippocampas; entorhinal cortex; cortical, medial, and central nuclei and basal complex of the amygdalae; the mammillarybody; the medial septal nucleus (pars posterior) and the cingulate cortex. Neuronal loss was noted in the diagonal band of Broca and the biventer, gracile, tonsile and inferior semilunar lobules of the neocerebellar cortex (Purkinje and granule cells) as well as the fastigial, giobose, and emboliform nuclei. ~n a third case, a 9-year-old boy with autism, Kemper (1988) reported finding polymicrogyria in the orbital frontal cortex and frontal opereulum. Raymond et al (1989) have also reported, based on a Golgi analysis of hippocampal pyramidal cells in 2 cases with autism and two controls, reduced dendritic branching in CAI and CA4 and reduced area of the perikaryon in CA4 neurons. In a recent summary that included an additional case of a 28-year-old man, Kemper and Bauman (1993) continued to note increased cell packing density in these sa~mete~no.~_! lobe s ~ a c ~ s . In U~ cerebellum and inferior olives, however, cellular changes appeared to be age related with large cells in the two youngest cases and small cells and some cell loss in the two oldercases. Ritvo et al (1986) also reported lower PuAinje cell counts in the cerebellum in 4 cases ranging in age from 10 to 22 years at the time of their death. One of these cases also showed microgyria of the left occipital and temporal lobes; asymmetry of the ventral-medial temporal lobes and asymmetry of the cerebellar tonsils. Some pneumoencephalographic and computerized axial tomography (CAT) imaging studies (e.g., Hauser et al 1975; Jacobson et al 1988) have suggested ventricular anomalies in a subgroup of persons with autism but other CAT scan studies could not replicate these observations (Creasey et al 1986; Hareherik et al 1985; Rnmsey et al 1988). More
recently, higher resolution magnetic resonance imaging (MR/) studies have tended to reveal problems in the forebrain and hindbrain consistent with some of the neuropathological studies described above, in particular, those related to aberrant cerebellar development (Courchesne et al 1987, 1988, 1993; Gaffney et al 1987a,b, 1988, 1989; Hashimoto et al 1991, 1992; Murakami et al 1989; Piven et al 1990). Other MRI studies have been unable to replicate some of these observations (Hsu et al 1991; Nowell et al 1990; Ritvo and Garber 1988) with others suggesting that etiological factors such as fragile X need to be taken into account (Reiss 1988). It would appear that, in some people with autism, there are defects in certain forebrain and hindbrain swuctures, albeit with much individual variation. The neuropatholngical observations are suggestive of abnormal wiring patterns in various brain regions possibly due to deficits in neuronal migration during fetal development (Piven et al 1990), curtailment of normal neuronal growth (Banman 1991) and/or aberrant development (Courchesne et al 1993). Although the locations may vary in some studies, the structural deficits that do appear to be consistent, as noted by Bauman (1991), appear to be of two types: too few neurons in one area, for example, the cerebellum, Whereas in other areas, there appear to be too many, for example, the amygdala and hippocampus Exuapolations as to the functional meaning of these structural abnormalities have focused on their anatomical locations (i.e., hippucampns, amygdala, etc.) and have been based on lesion experiments in animals and/or on clinical haman neuropsychological studies of patients with lesions in these same areas of the brain (Bauman and Kemper 1985; Courchesne et al 1988, 1991; Gaffney et al 1989; Hashimoto et al 1992). However, it is probably not the case that lesioning an adult or developing nervous system is functionally the same as an aberration in neuronal wiring patterns resulting from some prenatal insult or genetic
8a-~omaly. For example, some of the above autopsy studies implicate the hippocampus in the pathology of autism. One of the functious the hippocampus serves is providing the ability to form spatial maps of the environment (cf. Wilson and McNaughton 1993). Lesioning the hippocampus in primates disrupts the acquisition of spatial maps and the control of spatially directed movements (cf. Rolls 1990a). Yet many people with autism seem to be especially adept at encoding, discriminating, and recalling spatial maps such as complicated routes to various places and can be very upset if these routes are not followed precisely (Wing and Attwnod 1987). To date, no one has specifically addressed the ramifications of an abnormal wiring pattern for a theory of autism. Too many or too few neurons could, for example, translate
Neural Network ~amlogueof Autism
mOt.PS'VaUATaY
into too many or too few neuronal connections, respectively, other things being equal. What couM be the importance of having too many neurons and neuronal connections versus having too few and how might this impact on cognDivefunctioning ? The purpose of this mrticle is to suggest how the behavioral phenomena of autism relate to the problem of having "too few" or "too many" neurons or neuronal connections based on a simulated connectionistic model derived from the field of computational neuros¢ience. The discussion is, admittedly, speculative, but it does have an advantage in its parsimony and its ability to simulate and tie together a diverse set of data about autism: neuroanatomy, neuropathology, pathogenesis, behavioral or "sympto,'n~,aic" characteristics, learning, generalization, and behavior modification in one single framework as well as making specific, testable predictions.
Computational Neuroscience The field of computational neuroscience (also known in part as the study of neural networks) is relatively new and its history has been described in detail elsewhere (Carpenter and Grossberg 1991; Churchland and Sejnowski 1992; Levine 1991; Rumelhart and McClelland 1986; Schwartz 1990). Its global subject matter is nothing less than an attempt to mathematically model, on a computer, the computational complexity of neurons and nervous systems. What makes neural networks unusual is their ability to mimic certain aspects of neural and cognitive functioning as well as to solve or compute complex problems that elude more traditional statistical techniques. A very brief description of learning in neural networks is provided below in order to provide the reader with a foundation for understanding the model described below. For a more extended and precise description, the interested reader is referred to Carpenter and Grossberg (1991), Churchland and Sej~,~,~,,.~ and Levine ~tool~ ~d~,-~ familiar with nowski i,oo,~x • neural networks may skip these sections. . . . .
, . . . . . . . . . . . . . . .
Learning In Neural Networks A neural network is instantiated on a computer as a set of mathematical equations that model processing elements, their interconnections, and connection weights which are, in turn, analogous to neurons, synaptic connections and changes in synaptic strength due to learning, respectively, as shown in Figure 1. As in biological nervous systems, information in many neural networks is distributed across many parallel connections. A typical network learns to associate an input with an output pattern (e.g., associating stock market prices on a given week, the output, with stock market prices from previous weeks, the input (NeuralWare 1991) or associating
7
SAMPLEFEEDFORWARDNE1WORK i Oulputkayw
I
2
) Hick:Ion Loyer
5
N
Input Layer
Figure 1. A typical multilayer feed-forward neural network is shown with n input processing elements, n hidden processing elements, and n output processing elements. In this network, all of the input processing elements are connected m all of the hidden processing elements, which are, in turn, connected to all of the output processing elements. These multiple conneoions allow for distribution of information in the network.
a diagnosis ofbipolardisorder, as opposed to schizophrenia, the output, with a given pattern of answers in a stmcuged interview, the input) by altering the strength of the connection weights among the interconnected processing elements (cf. Hebb 1949) based on the feedback it receives from the error in (i.e., accuracy of) its output. These connection weights may be either positive, resulting in excitatory input, or negative, resulting in inhibitory input, to the subsequent processing element(s). Each processing element arrives at a decision to send input to the next processing element based on the summed weighted input from the connections it receives. The magnitude of the output in the hidden and output layers is usually based on a nonlinear, sigmoid function (the transfer function) analogous to the "all or none" response of a neuronal axon" That is, as the weighted input increases, the magnitude of the output from the processing element increases in an "S-shaped" or ogival manner as shown in Figure 2. Weights are modified in many systems according to a gradient descent rule which forces the network to find a configuration of weights in which the objective is to lead to a global minimum error solution in what is
8
lUOt,MYC'~IIA'mV
I.L.Cohen
I094'36,'S-}0
TRANSFORMED OUTPUT
SAMPLE TARGET PROCESSING ELEMENT
SOURCE PROCESSING ELEMENTS
Figure 2. This figure is a schematic of a processing element as the target with input from som~ processing elements and a bias processingelement
i
~
that always provides a constant input. Also shown is the equation for combining weights, Wu, with inputs, S.
//
HYPERBOLIC T A N G E N T
"BIAS" PROCESSING ELEMENT known as "weight space," as shown in Figure 3. Such a solution defines the optimal state of the network. Problems may occur when a particular weight configuration pushes the network into a "local error minimum," rather than a global minimum. In this case, error has not been minimized and further learning cannot take place. Such problems can be minimized by adding noise to the network during training which can push the network over the "energy barrier," pictured in Figure 3, toward the global minimum. The parallel distributed processing characteristics of many neural networks enables them to exhibit "graceful degradation," a phenomenon akin to I~shley's Mass Action principle, that is, their performance gradually degrades as more and more connections are removed thereby making them relatively fault tolerant. Another advantage of distributed representation is that_ this a_rchi~tec.Jure helps neu.,~' networks to generalize and make inferences by interpolation (NeuralWare 1991).
Neural Network Subtypes The feedback that leads to weight modification in such systems falls under two headings: supervised and unsupervised. In supervised learning, the net is trained to recognize certain patterns by being given external feedback as to the accuracy of its performance and making suitable modifications in the sizes of the weights until learning takes place. In unsupervised or "self-organizing" systems, there is no external teacher. Rather, the goal for the system, as defined by its developer, is to internally organize itself such that it forms categories solely based on the characteristics of the input. This is sometimes implemented by a form of cluster
yielding a value, X, which is, in turn, the parameter in the transfer function. The transfer function shown here produces an S-shaped, nonfinear, hyperboric tangent output, which is a function of the input X.
m
analysis on the input data. It also can occur by internal monitoring of error, for example, how discrepant a predicted input pattern is from the network's experience with previous input patterns (ChurcMand and Sejnowski 1992). When the input and output patterns that are to be associated differ, the network is referred to as hetero-associafive. Such hetero-associative networks have been used in a variety of practical applications such as classifying sonar targets (Gorman and Sejnowski 1988), clinical diagnosis of autism (Cohen et al 1993) and teaching a net how to speak (Sejnowski and Rosenberg 1987) as well as in models of cortical or cerehellar function (Gluck and Myers 1992). In an auto-associafive network, the task is to reproduce the input, that is, the input and output patterns should be the same. Once this association has been learned, such networks &"e awe to reproduce an accurate pattern given an incomplete or noisy input. This type of pattern enhancement and recognition has had extensive application in signal processing as well as in models of hippocampal function (Gluck and Myers 1992, Rolls 1990a,b). Some of these network models are more realistic than others in terms of their similarity to biological systems. For example, the neural network model developed by Hopfield (1982) used a physical analogy to spin glasses (a magneticlike material with particles that spin in mixtures of attraction and repulsion), which applied the physical concepts of energy, temperature, and annealing to a neural network in order to solve certain computations. This model has been extended to include other concepts from statistical mechanics and has been nicknamed the Boltzmann machine (cf. Churchland and Sejnowski 1992).
Neural Network Analogue of Autism
ntot.~fcmA-rtty 1994:36:5-2o
~
GY
~ g l h r l g U
9
Figure 3. A two-dimemicmalplot is shown wJatiug the error in a ueural network to the strength of a ceMection between two ~ etemeres. The black circle n~t'ersto the state of the networkat agiven poim in time. The small depres~,~ indicate local minima in this enm surface in which the network can become
g trapped ruth=than e= global minimum error,~also shown. The humps indicate '*energy barriers" to changes in state in the network, which can be overcome by the ~__d!tionof noiseduring the learning phase.
COI~,'ECTION I~'IGHT On the other hand, Grossberg's Adaptive Resonance Theory (cf. Carpenter and Grossberg 1991 ) self-organizing systems invoke neural and psychological concepts such as on-center/off-surround receptive fields, lateral inhibition, top-down and bottom-up processing, emotion, reinforcement, arousal and orienting, among others in their anatomies. The rationale and methods for network instantiation of these concepts are described in more detail in Carpenter and Grossberg (1991). The complexity of the biological subject matter that has been modeled also varies from computational models of long-term potentiation to models of cell interactions in the hippocampus to more global models of information processing in selected b~in regions (cf. Chn_rcb_iand and Sejnowski 1992).
Network/.o chitectures The simplest associative networks consist of one layer with feedback or two layers, an input and an output, analogous to the sensory-motor distinction in animal nervous systems. Rolls (1990 a, b) has proposed a model of hippocampal functioning based on the similarity between the wiring system of this brain region to simple associative networks. Both single-layer and two-layer networks are, however, extremely limited in their ability to learn complex associations (Churchland and Sejnowski 1992). These networks have largely been supplanted in various applications by multi-layer networks with nonlinear connections among the layers. Such networks can solve very complex problems
that cannot be computed by simple, linear associative networks (Rumelhart and McClelland 1986). Gluck and Meyers (1992) have utilized such a multi-layer network in their model of hippocampal function. One of the most frequently used multilayer associative networks is one that uses a back-propagation~ g algorithm to adjust the strengths of the weighted connections among the layers (cf. Rumelhart et al 1986). Figure I shows a typical feed-forward back-propagation network that consists of at least three layers: (1) an input layer consisting of a number of processing elements that usually correspond to the number of attributes in the d~ta; (2) one or more "hidden" layers with the first hidden layerreceiving connections from the input layer and (3) an output layer that arrives at a prediction or classification based on input fro~ the hidden layer(s). The number of processing elements in the output layer usually depends on the number of categories that require classification. This type of architecture is known as a Cued-forward model because information flows in one direction only. Weight changes occur in the hidden layeroutput layer connections and then the input layer-hidden layer connections based on the error in the output layer, which is then "back-propagated" to the middle and then to the input layers, hence, the name. Each individual processing element can have both excitatory and inhibitory influences on the elements to which it is connected. Thus, this anatomy is not completely similar to biological nervous systems. Nevertheless, back-propagation networks have been extensively used in nervous system models. For exam-
10
mot. p s ~ Y
pie, Lehky and Sejnowski (1988, 1990: cited in Churchland and Sejnowski ! 992) have been able to use a back-propagation network to extract information on curvature of objects from shaded image input to an artificial retina. In learning this task, the hidden layer processing elements developed properties similar to the receptive fields of simple cells in the visual cortex, Therefore, even though such back-propa. gation models are not similar in all respects to biological nervous systems, they mimic some of their properties and may help to explain the properties of real nervous systems and this is, in part, the motivation for their use (Churchland and Sejnowski 1992), Given the above qualifications as to the real-life applicability of back-propagation models to the nervous system, there would seem to be enough similarities that one can begin to ask questions of such a back-propagation network that could have importance for understanding parts of a biological nervous system. Here, however, I wish to go one step further. There are certain features of back-propagation models that have an uncanny similarity to the characteristics of autism, when the observed neuropathology is taken into account, and it is this similarity that I will now address. A Back-Propagation Analogue o f the Neuropathology o f Autism Neural networks essentially perform a mapping function (Churehland and Sejnowski 1992), that is, they compute mathematical functions that relate a set of data in a domain (e.g., real-life input values such as phonemes) to a range (e.g., internal representations of vowels and consonants). Assuming that neural networks share some characteristics with biological nervous systems, the effect of having too man), or too few neurons and neuronal connections on the quali~ of this network mapping can be simulated on a computer. In one or two layer networks, one of the most important issues in network construction concerns the number of connections needed to store a distributed representation of a given association. If all inputs &qd outputs are completely interconnected, the netwo~ can learn different associations. As the number of associations to be learned increases, however, there is a greater chance that new associations will resemble parts ofalready stored associations. Ifthe network adjusts its weights in order to learn these similar patterns, already established patterns will degrade. Thus, having "too many" connections in a simple one or two layer network can lead to a problem with storage capacity. That is, the number of associations that can be stored is relatively low and the quality of these memories is affected if additional learning t.~es place. This problem has been called the "stabilityplasticity dilemma" (Carpenter and Grossberg 1991). Several solutions to this problem have been proposed and one of these is to reduce the number of connections in the network,
I.L. Cohen
that is, to have "sparse" connectivity where fewer processing units are used to represent a piece of information (Churchland and Sejnowski 1992). A similar problem in pattern degradation can operate in multilayer networks such as back-propagation (Carpenter and Grossberg 1991; Churchland and Sejnowski 1992). Yet multilayernetworks, such as back-propagation models, can have an additional problem with respect to the number of connections in the model. Back-propagation networks employ a hidden layer (or layers) that can form higher-order representations of the input pattern and these higher-order representations can solve very difficult nonlinear problems. In forming these representations, the hidden layer performs a type of curve fitting, that is, trying to find the best curve that will fit all of the data points in the set on which it is trained, the "training set" (Churchland and Sejuowski 1992). As in any other curve-fitting procedure, one desires the simplest function that will best describe the data. Why? Because, as shown in Figure 4, whi:- a ve~, simple function may poorly describe the training set, a complex curvilinear function may fit a training set "too wel'~,"that is, it describes the training set so perfectly that it is relatively poorer at generalizing what it has learned to a different, albeit conceptually similar sample, the "test set," than would a simpler function. In a back-propagation network, this curve-fitting problem translates into determining the number of processing elements needed in the hidden layer, as well as the number of hidden layers to use, in order to (a) best fit the training set and (b) best describe the test set. It has been determined that, in fact, a single layer with enough processing elements can fit almost any mathematical function (Homik et al 1989). As with the curve fitting analogy described above, however, as the number of processing elements and cotmections increase, the fit to the test set may deteriorate (NeuralWare 1991). Thus, although too few processing elements may not be able to solve a problem to begin with, having too many may i m p ~ gene.,-~!ization.This problem in, ove~tting the data set is also a function of the number of training trials (NeuralWare 1991). I have modeled this phenomenon by using a readily available data set slightly modified from a similar one presented elsewhere (Cohen et al 1993) and that seemed particularly appropriate to the matter at hand. The data set consisted of information obtained from caregivers of children with autism and of children with mental retardation not associated with autism using the Autistic Behavior Interview (ABI), an interview-based rating scale described in more detail by Cohen et al (1993). A simplified nervous system was modeled with a feed-forward, hetero-associative, back-propagation network, the anatomy of which is described below. In order to examine the learning and generalization characteristics of this network, two data sets were created. The first
Neural Network Analogue of Autism
BIOl.PS'tcmA'ntv 19~1:36:$-2o
A
|
B =
.
.
.
_
.
II
C .
.
.
.
.
.
.
_
I !
m
i
:
| |
40
|
|
| D
I°i . . . . .i... . . .!..... . !.. .. . . ....... .. 0
2
1
S
B
10
0
2
!
I
S
I
0
lO
O
2
4
t
I
10
I
Figure 4. This figure demonstrates the problem in finding the optimal function which beth describes data and allows for generalization. The figures show a set of data points representing training set data (open circles) and test set dala (closed diamonds). The training set data were fit by either a linear (A), cubic (B), or spline (C) function. The line provides a poor fit to beth sets of dataj The spline fits the training data extremely well and beuer than the cubic function but does not fit the test data as well. The cubic function provides the best fit to both data sets. set, called the training set, was used to teach the network to discriminate between the two groups. The second set, called the test set, was used to test how well the network could generalize what it had learned about the training set to a data set it had not already experienced. Adequate generalization, in this case, is therefore measured by the e ~tent to which the results of the discrimination of the training set match those of the test set. Thus, if 95% of the training cases are adequately discriminated, 95% of the test cases should also be classified correctly. If fewer test cases are classified correctly, generalization is then relatively impaired.
Method
Subjects A total of 114 interviews were obtained for the training set data, 57 from caregivers of children with autism between
the ages of 2 and 11 years (mean --- 5.41 years; SD = 2.01) and 57 from caregivers o f children with mental retarflatlon not associated with autism in the same age range (mean = 5.70 years; SD = 222). The children with autism met the DSM-III (APA 1980) criteria for Infantile Autism and the DSM-m-R (APA 1987) criteria for Autistic Disorder based on observations, record review, informal interviews and clinical judgment. The children with mental retardation did not meet these same diagnostic criteria but did show impairments in intellectual and adaptive functioning. In addition to age, the children in the two groups were matched or. overall level of motor skills with 20 in the moderate to severe range of impairment; 17 in the mild range of impairment; and 20 in the normal range for each group according to standard scores obtained on the Vineland Adaptive Behavior Scales (VABS; Sparrow et al 1984). Sex ratios in the two groups
12
I.L. Cohen
lUOLNYC'mAYItY
l~&.20
also did not differ significantly in that 89% of the children with autism and 81% of the children with mental retardation were boys [X2(I) = 1.11 ]. The available test set consisted of 25 interviews of parents of children with autism and 19 interviews of parents of children with mental retardation not associated with autism in the same age range as the training set. The test set groups did not differ significantly from the training set groups on age or on sex ratios (all p values > 0.05). The distribution of motor skills standard scotm in the test set was different from the training set for the cases with autism [X2(4)- 21.38; p < 0.001 ] with the majority of cases falling into the moderate range of impairment. For the cases with mental retardation, the distribution of motor skills standard scores in the test set was also different from the training set [X2(4) - 11.49; p = 0.02] with the majority of cases falling into the normal range. It was not anticipated that this difference would differentially affect test performance as the nets had equal experience with each level of motor skill in the ~ n g set. Indeed, this was the case because the results shown below were quite similar to those obtained in previous studies of a similar data set when nets were trained and tested on the same sample using a modified jackknife procedure (Cohen et al 1993). Etiologically, three of the cases with autism had fragile X syndrome (one in the training set and two in the test set) and two of the cases with autism in the test set had Down syndrome. For the cases with mental retardation, nine had fragile X syndrome (seven in the training set and two in the test set), 11 had Down syndrome (eight in the training set and three in the test set), one had an 18q deletion in the training set and one was diagnosed with leukodystrophy in the test set.
Procedure ABI items are arranged into four broad behavioral areas: Reciprocal Social Interaction, Verbal/Nonverbal Communication Skills, Restricted Interests, and Mood and Arousal Level. There are currently a total of 18 subscales in which each is composed of six items (seven in the case ofrepetitive behavior] arranged within the subscale according to an ascending i~ierarchy of ability (for items reflecting appropriate beha~ors) or seve6ty (for items reflecting inappropriate behaviors). Scoring for each item in the subscale uses a Liken scale ranging from 0 - 3 : 0 = never, I = rarely/ emerging, 2 = sometimes/partially, and 3 = often/typically. The score for each subscale is the total score of the items. Interrater and test-retest reliabilities and measures of internal consistency are good (Cohen et al 1993). Eleven scales from the ABI served as inputs to a feed-forward, hetero-associative, back-propagation network: 1. Reactions to Pain, 2. Eye Contact, 3. Gesture, 4. Imagination, 5. Tactile Defensiveness, 6. Social Interaction, 7. Verbal Perseveration, 8. Topic Perseveration, 9. Repetitive Behavior,
o991o. ~ ~ , a ~
Expn~on, and 11. Intom~on/Un-
derstanding. These items were chusen because they had geviously bern shown to have value in classification (Cohen etal 1993). Thus, the network had 11 lw0cessing elements as inputs and 2 processing elements as oeqmts with the latter conespouding to menial retardatiou and autism, respectively. The NmualWmks Professional Plus H program ( N ~ a r e , 1991) was used for all emnputafions. For a computational problem employing 11 input and 2 output processing elements and 114 cases, a hidden layer size of about three is recommended for relatively noise-free data with fewer processing elements if the data are more noisy (NeuralWare 1991), for example, if such data have less than perfect intenater or test-retest reliability. Therefore, in order to model the effects of having too few or too many neurons in this artificial nervous system, the number of processing elements in the hidden layer was systematically varied as follows: 1, 2, 3, 4, 6, 9, and 13. Based on previous experience, the range selected was felt to be adequate to examine the predicted effects. The series progression after three p.,ecessing elements was obtained by adding the previous number to the second previous number in the sequence, that is, 6-- 4 + 2, 9 - 6 + 3, etc. For each number of processing elements in the hidden layer, the net was initializedand trained for 32,000 trials and then tested and the proportion of correct classifications for each group in the training and test sets computed. This procedure was repeated six more times for a total of seven runs. Thus, for the total of seven different variations in processing element size in the hidden layer, 49rims were obtained. In order to model the effect of training experience on an underdetermined or overdetermined network, additional tests were conducted. Specifically, the networks with one and nine hidden neurons (at least three times the recommended number) were again initiali7ed, retrained and the proportion of correct classifications for the training and test sets computed at 2,000, 4,000, 8,000, 16000, and 32,000 trials. This procedure was replicated four more times at each hidden layer size. Thus, the networks with one and nine processing elements in the hidden layer were each trained and tested for up to 32,000 trials a total of five different times with the average percentage of accuracy computed at various stages during the course of acquisition of the discrimination. In all simulations, the number of training trials and number of replications selected were based on previous experience with this data set. The connection weights among the processing elements were first randomized to range from -0.I, to +0.1 with the weights selected using a random number generator that used a seed number of 57. The backpropagation method subsequently used to change weights during training has been described in detail elsewhere (Cohen et al 1993).
Neural Network Analogue of Autism
During training, the learning rate for the hidden layer was 0.30 for the fLrst 10,000 trials and reduced to 0.15 for the next 20,000 trials and then to0.0375 for the last 2,000 trials. The learning rate for the output layer was one-half of the hidden layer rates. Twenty percent uniform, random noise was added to the connections between the input and hidden layers for the first 10,000 trials and reduced to 10% for the next 20,000 trials and then to 0% for the last 2,000 trials in order to facilitate generalization and maximize chances of reaching a global minimum error solution as described above.
Results Effects o f Number o f Processing Elements on Classification Accuracy As shown in Figure 5, the effects of varying the number of processing elements in the hidden layer on accuracy of learning the training set data was straightforward. As the number of processing elements increased, so did the accuracy of classification according to a curve of diminishing returns, eventually reaching a level of over 98% accuracy for the group with mental retardation and 100% for the group with autism. Here, in this apparently simple classification problem, having only one processing element led to relatively "poorer" learning, 96% and 87%, accuracy, respectively. The effect of varying the number of processing elements in the hidden layer on generaliza.'.inn was more complicated. As the number of processing elements increased, generalization to the data set of children with autism improved, eventually reaching over 95% accuracy. At the same time, however, generafization to the data set of children with mental retardation decreased from 100% to 82%. It should also be noted, however, that the variation in outcomes increased as the number of processing elements increased. With as many as nine hidden layer processing elements, some of the network replications were as accurate in generalization as networks withfewer processing elements. An analysis of variance on the overall data set confirmed a highly significant group by training set by number of hidden processing elements interaction [F(6,84)= 21.00; p < 0.0001 ]. Effects of Number of Training Trials on Classification and Generalization Accuracy in Underdetermined and Overdetermined Networks The interaction of hidden layer size with the number of training trials on the acquisition of classification accuracy in the training set data is shown in Figure 6. In the net with one processing element in the hidden layer, accuracy of classifi-
BIOLFSYCmATmY 1994~36:5-20
13
cation for the group with re~-dation increased across trials from 81%--95%. The classification accuracy for the group with autism remained unchanged at about 86%. Thus. particular anatomy had limited capectty to acquire the requisite discrimination. For the net with n i ~ Wocessing elements, however, acquisition of classification accuracy occurred rapidly and reached 100% for the group with autism and 99% for the group with retardation by 32,000 trials. Multivariate analysis of variance again revealed a significant net size by diagnostic group by . - . ' - ~ of ~'-:~-g~'-:'~-a interaction [Rao R(4,13 ) = 7.42; p < 0.0025]. The corresponding develotnnental progession of generalization is shown in Figure 7. In the netwodc with one hidden layer processing element, classitication accuracy was quite high to begin with at 97% by I000 trials and improved to 100% by 8000 trials for the gronp with retardation. For the group with autism, however, as training progressed, classification accuracy decreased from 86% at 1000 trials to 75% by 32,000 trials. In the more complicated network, a similar phenomenon occurred. Generalization started out at 86% for the group with autism and improved to 95% by 16,000 trials. Atthe same time, generalization to the children with mental retardation deteriorated from 97% at 1,000 trials to 86% by 32,000 trials. The balance between learning and generaliTation accuracy appears to have been optimal at about 10,000 trials. Multivariate analysis of variance again revealed a significant net size by diagnostic group by number of training trials interaction [Rao R(4,t 3 ) = 17.99;p < 0.0001]. Inspection of Figures 6 and 7 revealed that, in the net with one processing element in the hidden layer, generalization to the group with autism in the test set was negatively correlated with the acquisition of classification ac~.uracy for the group with retardation in the training set (Spearman rho =-0.90, p = 0.037).
Discussion In this simulation of a simple nervous system, having too few processing elements led to relatively weak learning and generalization while having too many processing elements for the data set led to good learning but relatively weak generalization. In both instances, after learning had reached asymptote, gener'oliTation was good m one of the data sets but relatively poor to the other. In the net with only one or two hidden processing elements, acquisition of information on the group with retardation appeared to interfere with generalization to the group with autism as the generalization decrement for the group with autism (Figure 7) was negatively correlated with the acquisition rate for the group with retardation (Figure 6). The level of classification accuracy did not change for the group with autism. This effect is most likely due to the limited capacity of this network. By conwast, with the net containing nine hidden layer processing
14
I.L. Cohen
IlIOL PSYCHIATRY
itt4,.3b:5-~
•
i
~
=
!
i
i
i
I
i
i
|
!
i
!
f 95
!
f
a
!
.
l
.
!
!
!
!
:i!i
r!t
i
:
;
i,f
,,o,
i
90
i.
!
i
i
........; ~ T ! - ! ~ T - ~ I T " ........ .........
0
l-Z ILl
85
........i - : ! i ........ i ~ ......... ~ " " ! - ( i a
......
13_
80
. . ~ . . . . . ......... . ! ! ~ . .......... . . . . . . t . , . . . . . ~ ? . . . . , , . ~ ! ....... . . . . . . . . i ~ , . .......... . . . . . t .......... .......~ i i ~ ! i i i i i i i i ~
ii !i iiiiil 75
ili i i~
iii
iiiiiiii:iiii i
70
i!~
i
!
ii 1234
i
i
~
i
i
i
:
6
i
i
i
i
]
i
i
i
ii
.
ii
i
i
i
i
!
i
i
!
i
i
i
]
i
i
! i i i i i i
i
i i
-*
i
1
i ! i i
i
]
i
!
~
i
i
]
~
i
]
!
}
i
i
MR
]
' - , - GROUP: AUT
i i
|
.
9
13
GROUP:
1234
6
9
13
NUMBER OF HIDDEN PROCESSING ELEMENTS Figure 5. The percentage of correct classifications per group is shown for both the training and test sets as a function of the number of hidden ~x:essing elements in the hidden layer. Open squares with dotted lines indicate the group with retardation (MR) and closed squares and solid lines the group with autism (AU). Unconnected points show the minimum and maximum classification percentages for each group and data set at each hidden layer size.
elements, the weak generalization could not be attributed to limited _~capa_city_k*~-._ause-~uisiti.on. was quite good for both groups. Rather, it would appear that "too much" detail had been learned about the group with retardation that was unique to that group in the training situation and therefore did not interfere g~th classification performance but that did interfere with generalization o f the classification to the test set. Thus, as the nets became more complicated and correspondingly more accurate in their pattern recognition skills, they gave more weight to certain features of the group with retardation in the training set, which improved training set classification accuracy by emphasizing features unique to that data set. This emphasis on idiosyncratic features, however, impaired their ability to differentiate autism from retardation in the test set. Conversely, with only one or two processing elements in the hidden layer, not enough discriminatory information had been learned about the group
with autism to permit generalization to the comparable test set.
General Discussion These observations of a strong ability to recognize and discriminate stimulus patterns along with a weak generalization ability in overdetermined networks due to attention to idiosyncratic features of a stimulus are qualitatively similar, in many respects, to the learning and behavioral characteristics of some children with autism. As Kanner (1943) described, some of these people have an extraordinary ability to recognize and discriminate complex patterns, be they visual, auditory, spatial, or temporal. For example, in the visual modality, the child known as Nadia (Selfe 1977) had an exceptional ability to draw pictures from memory with exquisite detail. Langdell (1978) has also reported that chil-
Neural Network Analogue of Autism
15
amcPsYcaucmv
i,9,-.36,~o
100
95 I-C.) i,i rY 0 I--. Z LLI
90
85
r~ LLI EL
80
-=
GROUP: MR - , , - GROUP:
75
AU
70 24-
8
16
32
24.
8
16
32
Figure 6. The percentage of correct classifications per group for the training set is shown for hidden layer sizes of one and nine processing elements as a function of the number of consecutive training trials ranging from 2,000 m 32,000. Open squares with doted lines indicate the group with retardation (MR) and closed squares and solid lines the group with autism (AU). dren with autism were superior to controls in recognizing upside-down faces. Other children have a facility with maps or show unusuat fascination with diverse visual patterns such as spinning objects, their hands, or their face in the mirror. For those who can perform on the Wechsler tests, children with autism are noted for their very good performance on vtsuo-spatial tasks such as block design and object assembly (cf. Frith and Baron-Cohen 1987). Auditorially, many such children precisely echo what they hear, including the intonation pattern and they can be better than matched controls when asked to recall random word strings (Hermelin and O'Connor 1967). Spatially, as noted above, many other children with autism can quickly learn and recall complicated routes as well as locations of significant objects in space. Temporally, some readily learn routines and can become quite upset if those routines are altered. In almost all cases, however, this pattern recognition skill does not extend into an ability to perceive more complex abstract
relationships among the patterns, such as semantic comprehension, with many having quite literal interpretations of words and sentences (Kanner !943) and a concomitant inability to comprehend their world. Some "higher-functioning" people with autism do have lower-order concepts but have difficulty with higher abstractions such as demonstrating awareness of another's "mental state" (Frith and BaronCohen 1987). As well, children with autism are notoriously poor at generalization and the need to train generalization is built into behavior modification programs (Koegel et al 1982). Lovaas et al (1971) have attributed this generalization decrement to a problem in "stimulus overselectivity." In this study, children with autism were found to attend to only one of three components of a compound cue, whereas nonhandicapped children responded to all three and those with mental retardation responded to two. Subsequent studies have replicated this phenomenon but also have shown that when
16
l)lOhI~YCNIATRY 1~1;]6:S-20
I.L. Cohen
100
g5 I.=.1 rw n,, 0 0 I.Z ILl 0 rt," ILl
90
85
0-
80
-=
75
GROUP:
MR '-.,- GROUP: AU
70 24-
8
16
32
24
8
16
32
TRIALS x I000 Figure 7. The percentage of correct classifications per group for the test set is shown for hidden layer sizes of one and nine processing eleracnts as a function of the number of consecutive training trials ranging from 2,000 to 32,000. Open squares with dotted lines indicate the group with retardation (MR) and closed squares and solid lines the group with autism (AU).
children with autism are taught stimulus discriminations involving multiple stimulus attributes, they will learn the discrlminaroa_ by focu~isingon a characteristic of the_stimulus that is unique to the discrimination at hand but which then interferes with generalization to a related but dissimilar stimulus (Schreibman and Lovaas 1973). The present analysis would predict that this overselectivity phenomenon is 8 direct consequence of the child's neuronal connectivity pattern, is functionally related to the child's experience with stimuli that require discrimination, and will depend on when it is assessed during learning. It also suggests that at least two subclasses of such children exist. As analogized in Figures 5 and 6, some children with autism may have too few neurons or neural connections m those brain areas necessary for learning about relevant stimulus-stimulus associations or stimulus-reinforcer associations, such as the amygdala and hippocampus (Raymond et al 1989), and so may not learn enough about a discrimina-
tion to permit generalization. Theoretically, then, it is predicted that these would be the children who have trouble acquL~ng .o.,=.,v.,, . . . =~ discriminations, attend to a restricted range of stimuli and show relatively poor pattern discrimination in any modality, with, perhaps, deterioration in generalization to one of the stimuli with extended training as a result of pattern interference due to limited capacity. Others, however, may have too many neurons in these same brain regions leading them to learn, after extended training, too many unique details that interfere with generalization, much like S, the mnemonist whose fantastic eidetic memory was also a hindrance to learning (Luria 1968). These children would be those with excellent or superior pattern recognition skills who, at the same time, have difficulty abstracting. Their problem in generalizing would be revealed by teaching them a novel discrimination with two different stimulus sets in their preferred modality with frequent probing for generalization during acquisition and stabilization. It would
Neural Network Analogue of Autism
ntoc PS¥CI41ATRY 1994:36-~-20
17
More generally, if information is processed in the brain in a manner analogous to an interrelated hierarchy of nmlfilayer neural networks or "modules" (cf. ChurchlmK! and Sejnowski's discussion on Dalnasio hierarchies, 1992, pp 3 ! 7-329; Eichenbanm 1993), than either deficient or faulty information processing in one module could impair subsequent higher-order acquisition of conceptual ~ in another module, even if that module is intact. This aberrant mapping could explain why such children are intpaired in their ability to go beyond pattern encoding and recognition into higher-order concepts. It does appear, however, that such concepts can be acquiredifthey are explicitly taught at an early enough stage of development (see below). How could there be too many or too few connections in the brains of such children and how would a normally developing brain deal with this issue? Aside from the mechanisms discussed in the introduction such as neuronal migration deficits or cessation of development, it should also be noted that the normally growing brain produces more neurons than it needs (Rakic 1991) and. in normally developing brains, excess neurons are removed through a "pruning" process. LaMantia and Rakic (1990) have reported that newborn monkeys lose over 70% of their axons in the hippocampal commissure. Overproduction of synaptic connections has also been noted in the neocortex by Rakic et al (1986). Although such cell or connection death may be genetically programmed, the death of certain connections may be a Darwinian competitive process in which, as a function of experience, connections that have been strengthened through learning or exposure survive, whereas those that are weak, die (Rakic 1991). The results from the simulations described above suggest that this pruning is necessary for proper learning and generalization. Indeed, pruning of connections in previously trained, overdetermined, artificial neural networks does lead to improved generalization of the network (Sietsma and Dow 1988). Conceivably, the brains of ckcildren with autism could be relatively impaired in their ability to strengthen connections and/or prune (Rapin 1993) out weak connections (perhaps because there are too many neurons) or may prune out relatively too many connections (possibly because there are !~1 -- ~ ....... e ~a n. .d hAaa s tt~t~ m ~ n v i- i- l ~- u r jjjUItll&&~g%~lt l i g h t t ; ~ t ~ 1~. ~. o. e .. , c. r i .h ~. . . ~ wo l. ,' ,~sw ~ . tuw :::~.j too few neurons) in critical brain regions. Insufficient prunonal connections in some children witL autism, it is easy to ing would interfere with retrieval of previously acquired then see from the above discussion why they would have a concepts by input of incorrect information and, as noted in unique facility in learning complicated spatial routes. The the discussion on network architecture in the introduction, spatial mapping of these children would be so precise, howcould lead to deterioration of such concepts because of ever, that they would be hampered in generalization and interference from the acquisition of similar, but incorrect, therefore, any changes in a route would be perceived as concepts. Pruning of too many connections could also lead different and not leading toward their intended goal. This" to concept deterioration according to Lashley's Mass Acwould then account for their need for routes to remain tion principle. In either case, impairment in comprehension unchanged. Thus, it is predicted that there should be a funcwould lead to less attention being paid to the environment tional relation between the number of neuronal connections and further problems with development of appropriate in regions such as the hippocampus and pattern recognition neuronal connections. or discrimination learning ability.
be predicted that this group would readily learn the rewarded training set. The accuracy of generalization to a nonrewarded test set, however, would be a decreasing function of the number of reinforced training trials for at least one of the members of the test set. That is, overselective attention in this higher functioning group would not be evident during early stages of acquisition of a discrimination. Figure 6 also can be related, to a certain extent, to the pathogenesis of certain children with autism. If we can equate number of training trials to the life experience of a child, it could be that as the child with too many neurons or neuronal connections learns about his or her environment, learning will proceed well and, eventually, the child will correctly generalize. Continued exposure to the same experiences, however, will lead to overlearning and subsequent attention to idiosyncratic details, deterioration in concept formation and poor generalization. Perhaps this can explain, in part, the apparent regression in functioning such children show between 18 and 30 months of age. For the child with too few neurons, regression would also occur but there would be no early evidence of apparent normal concept acquisition. Rolls (1990b) has noted that the hippocampus indirectly receives highly processed input from parietal cortex, inferior temporal cortex, and superior temporal cortex by way of the entorhinal cortex and thus receives highly processed information on spatial, visual and auditory specifications of stimuli, respectively. The entorhinal cortex and hippocampus appear to have too many neurons in certain individuals with autism, as noted above, and the MRI studies suggest aberrant parietal cortical structure in others (Courchesne et al 1993; liven et al 1990). Therefore, in some people with autism, this confluence of inputs that are, possibly, incorrectly processed due to a larger than normal number of neuronal connections in some of these structures, co:,ld account for some of the unique visual, auditory, spatial, and temporal pattern recognition abilities ~,-ld sameness preferences seen in these individuals. For example, if the hippocampus performs in a manner that is similar to the type of
I8
mot, psYctttA~v t~gt:36:~20
it therefore may be that the reason why intensive, structured learning experiences lead to dramatic improvement in some children (Lovaas 1987: McEachlin et al 1993) is because such experiences force activity-dependent strengthening of appropriate connections and pruning of weak ones and therefore result in normalization of connection numbers and strengths. That is, behavior therapy may work, first, by focusing the child's attention on a larger and more varied data set than the child is used to attending. Then, through constant repetition of these many different patterns along with immediate feedback, higher-order recognition would emerge by weakening previously established, idiosyncratic connections that do not "pay-off' and strengthening those that do. It is interesting to note that in artificial distributed neural networks, increasing the size of the data set for a fixed network size leads to enhanced performance, especially when the data set is noisy (NeuralWare 1991). A critical period may exist for such intervention (Lovaas 1987) because synaptic patterns that are already established are, theoretically, difficult to change (Munro 1986). Although not all brains of people with autism appear to show abnormal numbers of cells or abnormal dendritic branching, having too many or too few connections need not translate into having too many or too few neurons, only too many or too few synapses. Such an effect could possibly be revealed in excessive or deficient numbers of dendritic spines, areas in which a great deal of computation could take place (Rail and Segev 1990; Shepard 1990). In fragile X syndrome, a disorder associated with autism (Cohen et al 1991), postmortem evaluation of cerebral cortices have revealed abnormally shaped and reduced numbers of dendritic spines (Hinton et al 1991). Finally, a remark should be made about those autopsies that have not, as yet, revealed obvious anomalies or those that may never reveal such peculiarities despite the exten-
I.L. Cohen
sive use of sophisticated neuropathological techniques. As displayed in the range data shown in Figure 5, some of the nets with nine processing elements in the hidden layer were as good at learning and generalization as those with two or three despite the fact that all learned on the same data set. Those nets with nine processing elements in the hidden layer that found a good solution clearly took a different path in weight space during learning. Therefore, it is conceimble that the potential beneficial or deleterious effects of having many neuronal connections is the result of a natural process. The development of synaptic connections in the maturing brain could be a dynamical system, that is, one showing a strong sensitivity to initial conditions (Abraham et al 1990; Mpitsos 1993). It may be that having many neurons or neuronal connections has an adaptive advantage (Rakic 1991) that, in a complex dynamical system with chaotic attractors modified by an unknown control parameter, could lead to either a good outcome (a gifted child) or one that is less advantageous (autism). The number of neuronal connections in the brains of such children, therefore, would not be statistically unusual, although connection numbers ought to be near the extremes of normality. More neuropathological studies are required to answer this and other questions raised in this manuscript. Hopefully, boot-strapping studies using more sophisticated network models based on the resuits of hypothesis-driven real-life experiments will help to refine our models of the syndrome of autism. The authoris gratefullyindebtedto DovJ. Shazeer,GroupLeader,Image RecognitionSystems,CharlesStarkDraper Laboratory,Cambridge,MA, MonicaWrzolek,Dept. of PathologicalNeumbiologyand AusmaRabe. Dept.of PsychobiologyoftheNewYorkStateInstituteforBasicResearch in DevelopmentalDisabilitiesfor their helpfulcomments.Thisworkwas supportedby fundsfromthe NewYorkStateOfficeof MentalRetardation andDevelopmentalDisabilities.
References Abraham F, Abraham RH, Shaw CD (I 990): A Visual Introduction to Dynamical Systems Theoryfor Psychology. Santa Cruz, CA: Aerial Press. American PsychiatricAssociation (I 980): Diagnostic andStatisticai Manual of Mental Disorders (3rd ed.). Washington, DC: American Psychiatric Press. American PsychiatricAssociation (1987): Diagnostic and Statistical Manual of Mental Disorders 3rd ed., rev. Washington, DC: American PsychiatricPress. Bauman M (1991): Microscopic neuroanatomic abnormalities in autism. Pediatrics 87:791-796. Bauman M, Kemper TL (1985): Histoanatomic observations of the brain in early infantile autism. Neurology 35:866-874. Baaman M, Kemper TL (1986): Developmental cerebellar abnormalities: A consistent finding in early infantile autism. Neurology 36:190. (Abst)
Carpenter GA, Grossberg S (199 I): Pattern Recognition by SelfOrganizing Neural Networks. Cambridge, MA: The MIT Press. Churchland PS, Sejnowski TJ (I 992): The Computational Brain. Cambridge, MA: MIT Press. Cohen IL, Sudhalter V, Pfadt A, Jenkins EC, Brown WT, Vietze PM (1991): Why are autism and the fragile X syndrome associated? Conceptual and methodological issues. Am J Hum Genet 48(2): 195-202. Cohen IL, Sudhalter V, Landon-Jimenez D, Keogh M. (1993): A neural network approach to the classification of autism. J Autism Dev Disord 23:443--466. Courchesne E, Hesselink JR, Jernigan TL, Yeung-Courchesne R (I 987): Abnormal neuroanatomy in a nonretarded person with autism. Arch Neurol 44:335-341.
Neural Netwodk ~
of Autism
Coen:hesne E, Yenag,Cmnches~ R, h m s Get, Hesselink JR, Jemigan "iT,(1988): H~mplasia of cerebellar vemud lobules Vl and VIi in autism. NEnglJMed318:1349-1354. Comchesne F..,Press GA, Yenng4~macbesne R (1993): Parietal lobe abnormalities detected with MR in patients with infantile autism. AmJRoentgeno/160:387-393. Cnmsey H. Rumsey JM, Schwmz M, Duam R, Rapoport R.., Rapopon SI (1986): Brain ~ , as measmed by quantitative CT scanning, in autistic men. Arch Neurol 43:~72. Eichenbaum 14 (1993): Thinking about brain cell assemblies. Science 261:993-994. Frith U, Baron-Cohen S (1987): Perception in autistic children. In Cohen DJ, Donneilan AM (eds), Handbook of Am/sin and Permsive Developmema/D/sorders. New York: Wiley, pp. 85-I02. Gaffney GR, Tsai LY, Kuperman S, Minchin S (1987a): Cerebellar stmctme in autism.Am J Dis Child 141:!330-1332. Gaffney GR, Kuperman S, Tsai LY, Minchin S, 14assanein KM (198To): Midsagittal magnetic resonance imaging of autism. Br S PsycMat~, 151:831-833. Gaffney GR, Kuperman S, Tsai LY, Minchin S (1988): Morphological evidence for brainstem involvement in infantile autism. Biol Ps)~hiatry 24:578-586. Gaffney GR, Kuperman S, Tsai LY, Minehin S (1989): Forebrain structure in infantile autism. J Am Acad Child Psychiatry 28(4):534-537. Gillberg C, Coleman M (1992): The Biology of the Autistic Syndromes 2nd ed. New York: Cambridge University Press. Gluck MA, Myers CE (1992): Hiplmcampal-system function in stimulus n~presentation and generalization: A computational theory. Proceedings, 1992 Cognitice Science Socie~. Conference. Hillsdale, HI: Erlbaum. Gonnan RP, Sejoowski TJ (1988): Analysis of hidden units in a layered network trained to classify sonar targets. Neural Net~orks i :75-89. 14archexik DF, Cohen DJ, Oft S, et ai (1985): Computed tomographic brain scanning in four neumpsychiatric disorders of childhood. Am JPs).~hiatry 142:73 !-734. Hashimoto T, Tayama M, MiyaTaki M, et al (1991): Reduced midbrain and pons size in children with autism. Toh~shima J ExpMed 38:!5-18. Hashimoto 1", Tayama M, Miyazaki M, et al (1992): Reduced brainstem size in children with autism. Brain Dev 14:94-97. Hauser SL, DeLong GR, Rosman NP (1975): Pneumographic findings in infantile autism syndrome: A correlation with temporal lobe disease. Brain 98:667-688. Hebb DO (1949): The Organization of Behavior. New York:
Wiley. 14ermelin B, O'Connor N (1967): Remembering of words by psychotic and subnormal children. Br J Psycho158:213-218. Hinton VJ, Brown WT, Wisniewski K, Rudelli RD (I 99 !): Analysis of neocortexin three males with the fragile X syndrome. Am J Med Genet 41:289-294. Hopfield J (1982): Neural networks and physical systems with emergent collective computational abilities. Proc Nat Acad Sci. 79:2554-2558.
NOLtSYO~tV t994;Ng~20
19
Homik K. Stinchcombe IVl.White H (1989): Multilayer faaffarwind networks an~ universal ~ NemuLq~r~nts 2:259-368. Hsu M, Y e n n g ~ R, Cmm:hesne E, Press G& (1991): Absence of mazae~ resonate imagingevidm~ nf pona.e abmxmality in infantile autism. Arch Nenro/ogy 48:1160.1163. laenbson R, Le Coutem"A, Howlin P, Ruu~ M (1988): Selective sutgordcMabnormalitiesin autism.P~xk~l Med 18:39-48. Kanner L (1943): Autistic distmhuces of affective contact. Nerv Child 2:217-250. Kemper'lL (19gS): Neumanmomic ~ of Dyskxia and Antism. In Swarm JW, Messer A (eds), Disorders o f ~ e ing Nermus Systems: C&mging W~,wson 77reirOrigins.New York" Alan IL Liss,pp 125-154. Kemper TL, Bamnan ML(1993): The c o n t r i b u f i o n o f ~ logic studies to the understanding of autism, khav Nenm/ l 1:175-187. Kcegel RL, Rincover A, Egel AL (1982): Educating and Understanding Autistic Children. San Diego: College-Hill Press. LaManfia AS, Rakic P. ( i 990): Axon overgedmion and elimination in the coqms callosum of the developing rhesus ~ e y . J Neurosci 10:.2156-2175. Langdell T (I 978): Recognition of faces: An approach to the study of antism. J Child Psychol Psychiat 19:255-268. Lehky SR, Sejnowski TJ (1988): Network model of shape from shading:. Neural function raises from both receptive and pmjecfive fields. Nature 333:452-454. Lehky SR. Sejnowski TJ (1990): Neural network model of visual cortex for determining surface curvature from images of shaded surface. Proc R Soc Lond B 240:25 !-278. Levine DS (1991): Introduction to Neural and Cognit~e Modeling. HiHsdale,NJ: Erlbaum. Lovaas Ol (1987): Behavioral treatment and normal educational and intellectual functioning in young autistic children. J Consuit Ciin Psychoi 55:3--9. Lovaas O1, Schreibman L, Koegel RL, Rehm R (197 1): Selective responding by autistic children to multiple sensory input. J Abnorm Psycho177 :211-222. Luria AR (1968): The Mind o f a Mnemonist. New York: Basic Books. McEachlin J J, Smith 1",Lovaas Ol (I 993): Long-term outcome for children with autism who received early intensive behavioral treatment. Am J Ment Retard 97:359-372. Mpitsos G (1993): Dyna~.ics of brain function and the learning behavior of invertebrates. Paper presented at the Fifth Annual Convention of the American Psychological Society, Chicago, June 25. Munro PW (1986): State-dependent factors influencing neural plasticity: A partial account of the critical period. In McClelland JL, Rumelhart DE (eds), ParalleIDistributed Processing: Explorations in the Microstructure of Cognition, Vol. 2. Cambridge, MA: MIT Press. Murakami JW, Courchesne E, Press GA, Yeung-Courchesne R, Hesselink JR (! 989): Reduced cerebellar hemisphere size and its relationship to vermal hypoplasia in autism. Arch Neurol 46:689-694.
.~0
BIOl.PSYCHIATRY 1¢¢4:~:5-~)
NeumIWare, ( 1991 ): Neural ConqTuting, Pittsburgh: Author. Nowel! MA, Hackney DB. Muraki AS, Coleman M (1990): Varied MR appearance of autism: Fifty-three pediatric patients having the full autistic syndrome, MRI 8:81 !-816. Piven J, lkrthier ML, Starkstein SE, Nehme E, Pearlson G, Folstein S (1990): Magnetic resonance imaging evidence for a defect of cerebral cortical development in autism. Am J PsychiotO" 147:734-739. Rakic P (1991): Plasticity of cortical developmem. In Brauth SE, Hall WS, Dooling P,J (eds), Plastici~.' of Development. Cambridge, MA: MIT Press, pp 127-161. Rakic P, Bourgeois J-P, Eckenhoff ME, Zecevic N, Goldman-Rakic PS (1986): Concurrent overproduction of synapses in diverse regions of the primate cerebral cortex. Science 232:232235. Rail W, Segev ! (19a,0): Dendritic Branches, Spines. Synapses. and Excitable Spine Clusters. In Schwartz EL (ed), Computational Neuroscience. Cambridge, MA: MIT Press, pp 69-8 I. Rapin i (1993): Autism: A Complex Developmental Disorder of Brain Fmwtion. Paper presented at the 25th Anniversary Celebration of The New York State Institute for Basic Research, Staten Island, NY. May 19. Raymond G, Bauman M, Kemper T (1989): The hippocampus in autism: Golgi analysis. Programs and Abstracts, Child Neurol Sot- 26:483--484. Reiss AL (i 988): Cerebellar hypoplasia and autism. NEnglJ Med 319:! 152-1153 (letter). Ritvo ER, Garber HJ ( 1988): Cerebellar hypoplasia and autism. N Engl J Med 319: I 152 (letter). Ritvo ER. Freeman BJ, Scheibel AB, Duong PT, Robinson H, Guthrie D (1986): Lower Purkinje cell counts in the cerebella of four autistic patients: Initial findings of the UCLA-NSAC Autopsy Research Report. Am J Psychiato" 143:862-866. Rolls ET ( 1990a): Functions of the primate hippocampus in spatial processing and memory. In Kesner RP, Ohon DS (eds), Neurobiology of Comparative Cognition. Hiilsdale, NJ: Erlbanm. Rolls ET (1990b): Functions of neuronal networks in the hipptrcampus and of backprojections in the cerebral cortex in memory. In McGaugh JL, Weinberger NM, Lynch G reds), Brain Organization and Memo~;" Ceils, Systems, and Circuits. New York: Oxford University Press.
I.L. Cohen
Rumelhart DE, Hinton GE, Williams RJ (1986): Learning internal representations by error propagation. In Rumelhart DE, McClelland JL (eds), Parallel Distributed Processing: Explorations in the Microstrncture of Cognition. Vol. I. Foundations, Cambridge, MA: MIT Press, pp 3 ! 8-362. Rumelhart DE, McClelland JL (1986): Parallel Distributed Processing: Explorations in the Microstracture of Cognition, Vols. I and 2. Cambridge, MA: MIT i~ess. Rumsey JM, Creasey H, Stepanek JS, et al (1988): Hemispheric asymmetries, fourth ventricular size, and cerebellar morphology in autism, JAntism Dev Disord 18:127-137. Schreibman L, Lovaas O! (1973): Overselective responding to social stimuli by autistic children. J Abnorm Child P~chol !:!52-168, Schwartz EL (1990): Cmnputational Neuroscience. Cambridge, MA: M[T Press. Sejnowski TJ, Rosenberg CR (1987): Parallel networks that learn to pronounce English text. Complex Systems I: 145-168. Self¢ L (1977): Nadia: A Case of Extraordinary Drawing Ability in an Autistic Child. London: Academic Press. Sbepard GM (i 990): The significance of real neuron architectures for neural network simulation. In Schwartz EL (ed), Computational Neuroscience. Cambridge, MA: M1T Press, pp 82-96. Sietsma J, Dow RJF (1988): Neural net pruning--why and how. Proceedings. IEEE International Conference on Neural Networks 1:375-383. Sparrow SS, Balla DA, Cicchetti DV (1984): Vineland Adaptive Behavior Scales. lnten,iew Edition. Survey Form Manual. Circle Pines, MN: American Guidance. Williams RS, Hanser SL, Purpura DP, DcLong GR, Swisher CN (I 980): Autism and mental retardation: Neuropathologic studies performed in four retarded persons with autistic behavior. Arch Neurology 37:749-753, Wilson MA, McNaughton BL (! 993): Dynamics of the hippocampal ensemble code for space. Science 261:1055-1058. Wing L, Attwood A (1987): Syndromes of autism and atypical development. In Cohen DJ, Donnellan AM (eds), Handbook of Autism and Pen,asive Developmental Disorders. Silver Spring, MD: Winston, pp ~:L-! 9o