Neural networks as models of psychopathology

Neural networks as models of psychopathology

REVIEW ARTICLE Neural Networks as Models of Psychopathology Lars Aakerlund and Ralf Hemmingsen Neural network modeling is situated bet34,een neurobio...

1MB Sizes 0 Downloads 92 Views

REVIEW ARTICLE

Neural Networks as Models of Psychopathology Lars Aakerlund and Ralf Hemmingsen Neural network modeling is situated bet34,een neurobiology, cognitive science, and neuropsychology. The structural and functional resemblance with biological computation has made artificial neural networks (ANN) useful for exploring the relationship between neurobiology and computational performance, i.e., cognition and behavior. This review provides an introduction to the theory of ANN and how they have linked theories from neurobiology and psychopathology in schizophrenia, affective disorders, and dementia. Biol Psychiatry 1998;43:471-482 © 1998 Society of Biological Psychiatr3: Key Words: Artificial neural networks, review, psychopathology, schizophrenia, mania, dementia

Introduction he theory of ANN was born as an attempt to create a computed model that utilized simple, interconnected computational units, so-called neural elements (Rosenblatt 1958). ANN turned out to be able to solve humanlike problems such as generalization, parallel processing of multiple and noisy inputs, and learning from examples. Studies of cerebral changes during information processing, by receptor binding studies, imaging techniques [magnetic resonance imaging (MRI), positron-emission tomography (PET)], and electrophysiologic techniques have revealed various changes in psychiatric disorders. The results, though, are often conflicting and with great overlaps between groups (Chua and McKenna 1995). The possible causality between biological changes and psychopathology remains unclear, and it seems unlikely that the traditional scientific way of building up more biological knowledge will solve this problem.

T

The Computer as a Model of the Brain Almost three decades ago Enoch Callaway paralleled the information processing system of the brain with a set of executable programs in a traditional computer (Callaway 1970). The computer program consists of a set of instrucFrom the Department of Clinical Psychiatry., Bispebjerg University Hospital, Copenhagen, Denmark. Address reprint requests to Lars Aakerlund, Psychiatric Dept E, Bispebjerg Hospital. Bispebjerg bakke 23, 2400 Copenhagen NV, Denmark. Received January 21. 1997: revised May 8, 1997; revised September 22, 1997; accepted October 10, 1997.

~3 i 998 S<~cietyof Biological Psychiatry

tions, e.g., a do-loop or a test operation, which Callaway called "plans." The executable plans of the human brain then range from neuronal interaction, to reflexes, motor acts, and high-level strategic planning. In schizophrenia excessive interference between different cognitive processes may interfere with the execution of plans. The theory was supported by cognitive tests showing that increasing number of cognitive operations necessary in a task, results in increasing difference between normal and schizophrenic subjects. Additional tests to confirm or refute the theory included recordings of evoked related potentials during visual and auditory stimulation. Interindividual differences should be correlated to results from different sense modalities. Callaway did not: use ANN, but he introduced a clear and consistent way of making testable theories by using the computer as a model of normal and deviant cognition.

What Is Neural Computation? The neuron is the basic computational unit of the brain. It receives signals from other neurons, and if the. total signal is strong enough, the neuron forwards an excitatory or inhibitory signal. The cerebral neurons can be viewed as densely interconnected simple devices forming computational units on the basis of the nature and strengths of their connections. These are modifiable and enable the neuronal network to base processing of current input on earlier input, i.e., learning from experience. The cerebral cortex can be conceived as a "computational surface" using multiple parallel distributed systems to process information (Ojemann 1994; Thomson and Deuchars 1994). Histologically, parallel columnar aggregates o1! neuronal networks have been found in the cerebral cortex (Damasio 1989; Diamond 1979; Selemon and Goldman-Rakic 1988: Malach 1994). From this simple, but highly interconnected structure the brain can outperform the largest supercomputer with rule-based information processing, on problems involving association, categorization, generalization, and feature extraction. Though the concept of neural computation is not necessary to make computer models of the brain (Callaway 1970), ANN are metaphors of the brain, pa~l:ly because they solve these kinds of problems, and partly because of a structural resemblance; on the basis of a weighted sum of signals from other units lhe basic computational uni! forwards a signal (Figure l). A learning algorithm allows the strengths of the connections ("the weights") to change 0i)06-322 ]/98/$19.00 Pll S0006-3223C97)00489 7

472

toOL PSYCHIATRY

L. Aakerlund and R. Hemmingsen

1998;43:471-482

IxI

tween two neurons in different states unaltered. To illustrate The Hopfeld Rule, the network in figure 1 is presented with two memory patterns: memory 1 - 1,~1~ = 1, 1~2 = 1, ~2~ = - 1 , [.L22= memory 2--I X l l :----" 1, ~LI2 =

Figure 1. The Hopfield Network. Four neurons, symmetrically interconnected. The arrows show the connections by which the neurons affect each other with their own activation multiplied by the strength of the connection. in response to input examples, and the network acquires knowledge through reorganization of weights. (For a thorough introduction to neural networks see Herz et al 1991.) A number of ANN with different architecture and rules for learning and propagation of signals have been developed for different purposes, but they all work fundamentally as described. Networks that have proved suitable for modeling psychopathology will be highlighted in the following.

--l,

The Hopfield Network (HN) (Hopfield 1982, 1984; Hopfield et al 1983) was the first used in psychiatry in the pioneering works of Ralph Hoffman (Hoffman 1987; Hoffman and Dobscha 1989; Hoffman and McGlashan 1993). The HN differs from a traditional computer in the way it stores memories. The traditional computer has memories organized in stacks, which it searches methodically when retrieving data. To identify a person on the basis oL e.g., the face, the computer must search through all learned faces. The HN memory consists of a number of processing units with the coordinates (x,y) (Figure 1), which can take activation values of either 1 or - 1 (sometimes 1 and 0). The units are densely interconnected with symmetric weights. When two neurons have the same activation values (1/1 or - 1 / - 1), the weight between them is strengthened and weakened when they are different: "~ m

m

tn

where 7~.v_~j is the final weight between neuron (x,y) and (i,j) and Ix~jand Ix,, are their activation states for memory m. This is the Hopfield Rule (Hopfield 1982), saying that the final weight between two neurons is found by summarizing the products of their activation states tor each memory pattern learned. It is identical to The Hebb Rule (Hebb 1949), except that this leaves the connection be-

-1.

IX21 ~--- l , ].L22 =

To determine the final weight T~y~ij, the state of neuron (x,y) is multiplied by the state of neuron (i,j) for each memory and then summarized: (Tll~l~ = 1 × 1 + 1 × 1 = 2 ) * Tll_~j2=l × 1+

1 ×

Tl~_~2t = 1 × - 1 + Tll-,22=

-1= 1X

0

1= 0

1 × 1-~- 1 × --1 = 0

7~21 = -1 × -1 + 1 × 1 = 2 (*Each neuron in a HN feeds its own activation state back to itself.) That is, for all 4×4=

The Hopfield Network as a Model of Psychopathology

1;

16

connections. The network has only four neurons to limit its complexity, and due to that the final weights only obtain the values 2, 0, and - 2 . As a matter of fact, a HN can only store a number of memories equal to 10-15% of the total number of neurons, before it retrieves pattems with no resemblance to memory patterns (Hopfield 1982).

"Memory" The trained network will reproduce a memory as a response to an unseen pattern or part of one. After some updatings of the neurons' activations (cycles), the system will be attracted to certain low-energy states that minimize the degree of statistical energy in the system. It has attractor dynamics. The reproduced memory will be "a best fit" to the presented pattern, and the model has a content-addressable memory, which retrieves a whole memory when presented with only part of it. Data from patients with brain damage show that humans link information from different sense modalities when restoring information (Damasio 1989). A patient with damage in the visual association cortex can recognize a person by his voice and not by his face, .just as the HN restores a complete memory when presented with only part of it. A HN can store images, e.g., faces, composed of black and white pixels represented by neuron values of ! or 0 (black or white). More neurons give more details. When the network has stored a number of faces, it can be presented with a nose. If the nose is sufficiently unique, the network

Neural Networks and Psychopathology

BIOL PSYCHIATRY 1998;43:471-482

will retrieve the whole face. The retrieval depends on how many images it has learned and the number of neurons sharing the same value in different images. The distance between the retrieval and the relevant memory is measured in Hamming Units (HUs), which are the number of neurons with different activation in the two patterns. In the example above, the number of HUs between memory 1 and 2 is 3. Ralph Hoffman used a 50-neuron Hopfield network to model associative memory and gestalt seeking (Hoffman 1987). Hoffman stressed three abilities of the model: 1. Memories are retrieved according to similarity to the input. This resembles human memory, which seems to use associations rather than labels of memories location. 2. The model generalizes; if some memories share features, the system can retrieve a composite of them. 3. Memories are not localized, but distributed across all neurons.

Modeling Psychopathology The general excitability or temperature of the system (Hoffman 1987) was varied to model the manic psychosis. The: temperature T determines the probability p that neuron (x) will be turned on when it receives the collective input l(x) p = 1/il + exp(-I(x)/Y))

(2)

A higher temperature makes the neuron fire with less predictability, given a certain input. Increasing the temperature after training made the system jump from one memory to the other without retrieving a single memory. This resembles the manic jumping from one idea and memory to the other. The model suggests that manic states are caused by excessive noisy neuronal activity that destabilizes cognitive processes. Overloading the network with memories reflected either excessive memory load or decreased storage capacity of the system (Hoffman 1987). A network of 50 neurons learned 3-13 memories. When the number of memories exceeded 8, the system started to retrieve nonmemory patterns. As the system is "an association finder," these are loose or unusual associations. Retrieving memories with no similarity with the input corresponds to the internally generated images in the "imaginary" experiences in delirious states. Hoffman, however, focused on schizophrenia. The overloaded system continuously retrieved certain patterns, "parasitic states," as responses to any input. These states represented the coalescence of several memories and were strong attractors. Similar cerebral states would be experienced as

473

alien thoughts or lack of control of thoughts and associations, as in schizophrenia.

Synaptic Pruning and Schizophrenia From birth to 5 years of age the synaptic density of the human frontal cortex increases and then declines by 3 0 - 4 0 % to adult levels at the age of 16 (Huttenlocher 1979). This reduction is possibly due to pruning of corticocortical axonal projections (Innocenti 1981 ; Ivy and Killackey 1982), and this has been linked to cerebral metabolism by parallel shifts in early childhood (Chugani and Phelps 1986). PET and functional MRI studies have shown reduced frontal cerebral blood flow in schizophrenia (Berman et al 1992), and deviant frontotemporal interaction during word generation (Frith et al 1995; Yorgelun-Todd et al 1996). Hoffman and Dobscha modeled synaptic pruning gone too far as a cause of schizophrenia (Hoffman and Dobscha 1989). A 100-neuron Hopfield network learned nine memory patterns. Then a proportion of the connections was eliminated according to the theory of "neural Darwinism" (Edelman 1987); a connection between two distant neurons is pruned away unless it is strong (Figure 2). If [ T~y_,,j I <

o ×

[(i - x) 2 ÷ O -

then prune axon (x,y) --~ (i j)

(3)

The pruning coefficient, p, determines how short and strong a connection must be to survive. 1. The proportion of synapses pruned varied from .6 to .85. 2. The HUs from a test pattern to a memory were either 20 or 33, reflecting two different degrees of input ambiguity. The quality of a response was expressed by the HUs it deviated from the appropriate memory. A deviation from 0 to 3 was "a direct hit" and 3-10 "an approximation?" Deviations greater than 10 were either "a generalization," denoting a coalescence of two or three memories resembling the test pattern, or a "loose association," which had no resemblance. The smallest rectangle of neurons considered to match a memory was 12, called "a patch" because the probability that such a rectangle will show up by chance is 5%. As hypothesized from neurophysiological studies (Huttenlocher 1979; Innocenti 1981; Ivy and Killackey 1982), the system was robust to pruning until a certain limit, then the performance declined rapidly. For test patterns 20 HUs from a memory the network made "direct hits" for pruning up to 85%, and errors were "approximations" or "generalizations." With patterns 33

474

BIOLPSYCHIATRY 1998;43:471-482

HUs from a memory already 70% pruning resulted in "loose associations," so the pruned network was sensitive to ambiguous input. This correlates with schizophrenic patients seeming to benefit from unambiguous communicational input. The pruned network produced patterns with isolated patches from memories, called "functional fragmentation." This models the specific schizophrenic contamination response in Rorschach Tests, where multiple separate gestalts fuse in the description of one picture (Lerner et al 1985). The more extensively pruned network made parasitic foci; parts of the network continuously made patches with no resemblance to either input or any memory. The cerebral correlate would be symptoms like thought insertion, hallucinations, and delusions. These may be parasitic loci in the patient's stream of thoughts, processing of sensory input, and picture of the world, respectively. If the foci bear no information, the experience would be thought blocking, avolition, or passivity (Hoffman and McGlashan 1993).

Dopaminergic Neuromodulation Eric Chen (Chen 1994) tested the effect of making neuronal firing less stochastic, as this is a possible effect of dopamine (Berretta et al 1990). This is captured by the activation function of a HN (Eq. 2). A proper temperature allows the network to reach the global minimum of statistical energy and not a local minimum (Chen 1994). Environmental stress and cognitive load were modeled by the memory load. A 100-neuron HN was trained on 3-15 memories, and retrieval of spurious states and proper memories was compared. Spurious states dominated increasingly as memory load increased. Moderate levels of noise could partly compensate for this, but too high temperature resulted in nonretrieval; not even spurious states were retrieved. The hypotheses derived were the following. Too high dopaminergic activity results in retrieval of spurious states, especially for high memory loads. This explains psychotic symptoms during stress, reduced neuronal resources, and high dopaminergic activity. Too low dopaminergic activity results in nonretrieval, explaining the cognitive failure in, e.g., Parkinson's disease.

Hippocampus E. Chen (Cben 1995) modeled the theory that hippocampus makes the cerebral representations of resembling sensory input stimuli less alike by making reciprocal cortical connections (Rolls 1989). Input to hippocampus from temporal, frontal, and parietal association areas represents resembling environmental stimuli, which

L. Aakerlund and R. Hemmingsen

A

m //

B



"."<

Figure 2. Neuronal Darwinism. (A). The connections from one neuron to six others. The thickness indicates their strength. The rule of neuronal Darwinism says that the longer and weaker a connection is, the more likely it will be pruned away. (B) The strong and short connections left after pruning.

should be discriminated in memory. Schizophrenic hippocampai pathology has been found in volume Oeste and Lohr 1989), cellular numbers (Falkai and Bogats 1988), and organization (Conrad et al 1991), and abnormal separation of input has been found as anomalous semantic categories in schizophrenia (Chert et al 1994). Varying hippocampal orthogonalization of input was modeled by a HN trained on 3-15 patterns with 0, 10, or 20% correlation (identical neuronal activation). Increasing correlation gave increasing retrieval of spurious attractors, even for low memory loads. As spurious attractors resemble productive schizophrenic psychopathology, the model explains how this may result fi'om disturbed hippocampal separation of input.

Temporal Lobe Stevens (Stevens 1992) hypothesizes schizophrenia to be caused by pathologic synaptic regeneration and sprouting in brain regions receiving degenerated temporal lobe projections. The theory is based on imaging, postmortem human, and animal studies (reviewed in Stevens 1992). Ruppin et al (1996) used an 800-neuron HN storing 40 memories as a model of the frontal lobe, which was considered the site of working memory and associative content-addressable memory, and projection target of" the temporal lobe. Degenerated temporal projections were modeled by decreasing the external input supplied to the network at each cycle. This contrasts with the traditional HN, which receives its external input only once. One effect of degenerating temporal projections is thought to be expansion of diffuse projections from other sites (Stevens 1992). That is, these impulses were considered random compared to specific temporal memory cues (Ruppin et al 1996). This was simulated by increasing the noise level, making the probability of neuronal firing more random (Eq. 2). Reactive frontal synaptic modifications were modeled

Neural Networks and Psychopathology

BIOLPSYCHIATRY

475

1998;43:471 482

in two ways; a uniform strengthening of internal connections modeled reactive frontal synaptic regeneration (Stevens 1992), and an activity-dependent Hebbian increase modeled a synaptic effect of dopamine (Horn and Ruppin 1995; Ruppin et al 1996). The latter changed the weights slightly during presentation of test patterns. When the external input was weakened, the network could not retrieve memories. The unspecific strengthening of internal connections and increase of noise appeared to compensate for this, but at a price; the unmanipulated network only retrieved a memory when supplied with an external input. The manipulated network did so when presented with low-activity, noninput patterns. Due to Hebbian weight changes the network gradually changed its weights during presentation of more and more noninput patterns. After 200 presentations this resulted in increasing retrieval of fewer attractor states and after 800, the network converged to a single nonmemory state. When provided with a regular input the network initially produced the memory expected, but after 500 presentations of low-activity patterns a single attractor state became the only response to any input. The price for the compensations to degraded input was spontaneous retrieval of memories without input, resembling hallucinations and delusions. Modeling a hyperdopaminergic state by continuous Hebbian weight changes resulted in retrieval of fewer attractor states with no resemblance to either input or memories. This resembles fixed and bizarre symptoms.

The Hopfield Network as a Model of Dementia J.R.G. Carrie's HN model of dementia (Carrie 1993) modeled the effect on memory of diffuse neuronal death in aging and dementia. The network was presented with part of a pattern and retrieved the best memory match. Carrie then disabled an increasing number of neurons. During retrieval, the network could not activate the disabled neurons, and also failed to activate those receiving their impulses. The memory decline was almost linear, with no sudden drops. Clinically, information acquired before the onset of dementia (remote memories) is better preserved than information learned after (recent memories) (Sagar et al 1988; Kopelman 1989). In the model retrieval was identical tbr memories learned either before or after the lesion. As diffuse neuronal disabling in the model could not account for a heterogenous memory loss, Carrie argues fiw a less diffuse neuronal cell loss in, e.g., Alzheimer's dementia. These conclusions were questioned by Ruppin and Reggia (1995): 1) when gradually turning off neurons in a HN, a sudden deterioration in memory is expected; and 2t

the model should capture the described difference between recent and remote memory. The described 400-neuron HN of Horn and Ruppin (1995) learned a number of memories, and an abrupt memory decline occurred when 100 neurons were disabled. The gradual decline seen in Alzheimer's disease could be obtained by diffusely strengthening the connections of surviving neurons. Neuroanatomical studies indicate this mechanism in both normal aging and Alzheimer's disease (Bertoni-Freddari et al 1990; DeKosky and Scheff 1990). Learning during increasing neurona~ loss in the model showed better retrieval of remote memories than of recent memories, and synaptic compensation increased this difference. The effect of relearning was modeled by training the network, damaged to different degrees, with already learned memories. In accordance with Alzheimer patients being critically dependent on contextual support in both rehearsal and retrieval (Herlitz et al 1991), relearning made the critical memory decline occur for smallei neuronal loss. A false positive response was the retriewd of a memor). instead of a nonresponse when presented with a new pattern (the system was not content-addressable). Fo~ gradually disabling of neurons and strengthening of connections, the number of false positive responses increased and then declined again. This picture has been observed in Alzheimer patients (Kopelman 1985). The model predicts the following. Patients with well-preserved remote memory should have poor recent memory and more synaptic sprouting. Patients with good recent memory should have poor remote memory and less synaptic sprouting. This is testable, though it requires biopsies or autopsies. Memory training for late-stage Alzheimer patients could be potentially harmful, if used without control of remote memory function.

Back Propagation Networks as Models of Neurotransmitter Effects Back propagation (BP) is one algorithm for adjustment of weights. It is of the gradient descent type (Rumelhart et al 1986); each input-output pair is presented a number of times, and each time the weights are slightly adjusted. BP networks, as described in Figure 3, have provided good models of human cognition (Cohen et al 1090): text to speech translation (Seidenberg and McClelland 1989). determination of object shape fi'om shading (Lehky and Sejnowski 1988), and spatial orientation from relinal and eye position information (Zipser and Andersen 1988). Language acquisition (Rumelhart and McClelland 1986b; Plunkett and Marchman 1993) and dyslexia (Hinton and Shallice 1991; Plaut et al 1996~ have also been modeled.

476

B1OLPSYCHIATRY

L. Aakerlund and R. Hemmingsen

1998;43:471-482

Cognitive Tests The BP network learns input-output pairs, but it is insensitive to the order or context of inputs. Elman (1990) incorporated feedback connections in a BP network, making the Simple Recurrent Network (SRN), which makes internal representations of each input's context (Figure 4). Using a variant of the SRN, Cohen and Servan-Schreiber modeled the behavioral effect of corticale dopamine (Cohen and Servan-Schreiber 1993). This group explores the finding that schizophrenic patients have reduced context sensitivity (Cohen and Servan-Schreiber 1992, 1993; Servan-Schreiber et al 1990). The context can be task instructions or some prior stimuli, and the prefrontal cortex is assumed to mediate the processing of this information (Weinberger et al 1988; Friedman and Goldman-Rakic 1994; Maher et al 1995). In the Continuous Performance Test (CPT), presented letters must occur in the right context to be response targets. In the CPT-AX test (Servan-Schreiber et al 1990), an X represents a target only when preceded by an A. A hit is a response to A-X, a fillse alarm is a response to anything else, and a miss is a missing response to A-X. Agents increasing dopaminergic tone result in fewer misses and unaltered false alarms if given to normal subjects (Peloquin and Klorman 1986; Rapoport et al 1980; Klorman et al 1984). A cellular effect of dopamine is to enduringly increase the response to both inhibitory and excitatory input (reviewed by Foote and Morrison 1987). Cohen and coworkers modeled this by changing the activation function of the network:. The logistic activation function [F(x)] determines a neuron's activation by squashing any input x to the space between 0 and 1 (Figure 5): 1

F(x) - 1 + e -(cx+8t

(4)

G and B are constants called gain and bias. This function resembles real neural activation (Freeman 1979) and works well with BP. Note that though equation 4 deals with continued valued neurons and equation 2 with discrete valued neurons, the temperature parameter is the inverse of the gain. Increasing B gives a larger response to any input x; the neuron's excitability is increased. Increasing G increases the response difference to large and small input, just as dopamine's cellular effect. The signal-tonoise ratio (SNR) is increased, and the network discriminates resembling inputs better and detects stimuli on a noisy background. Increasing G in a network constructed for the CPT-AX (Servan-Schreiber et al 1990) decreased the rate of misses without increasing the false alarms. Decreasing the gain of neurons holding information of prior stimuli simulated schizophrenic performance: a large increase in misses and a slight increase in false alarms.

Note that Chen (1994) hypothesizes high cortical dopamine in schizophrenia and decreases the temperature (= I/G), and Cohen and Servan-Schreiber hypothesize low cortical dopamine and decrease G. Hoffman et al (1995) used the SRN (Figure 4) to test the combined effect of pruning and changing gain and bias. The network learned sentences in which each word had a "phonetic" code corresponding to a fraction of the total number of input neurons, randomly selected. The output represented the word's semantic meaning and syntactic role. Between sentences were pauses during which output neurons were free to turn on or off. The network was tested with different degrees of nonspecific activation (noise) added to word input and pauses. "Silent pauses" then became "noisy pauses." Different errors occurred; hallucinations denoted an output during a "silent pause," illusions an output during a "noisy pause," and misidentifications was confusing one word for another. Word detection rate was the frequency of correctly identified words. The intact network correctly identified 98% of words, with some illusions when noise was high. Empirical testing confirmed this (Hoffman et al 1995). Hoffman simulated high dopamine innervation by low bias, and vice versa, because dopamine has been shown to reduce both baseline and evoked activity in cortical neurons (Parfitt et al 1990; Thierry et al 1988). Reducing the bias resulted in hallucinations and more misidentifications and illusions in conditions of noise. Hoffman et al (1995) consider the primary neuropathological defect of schizophrenia to be excessive pruning of cortical connections, especially frontally. When deleting 24% of the weaker connections from the context module to the hidden layer (Figure 4), the network still performed well, but adding noise resulted in illusions and misidentifications. Between 48% and 75% pruning this could be compensated by increasing the bias, but only at the expense of decreased word detection rate. Above 75% pruning hallucinations could no longer be eliminated. The model predicted the following. Hallucinations can arise from reduced neuronal excitability caused by high cortical dopamine, as well as from excessive pruning of cortical connections. Reduction of cortical dopamine will reduce hallucinations at the price of word detection sensitivity, especially for noisy input. The main prediction is that schizophrenic patients who report hallucinated "voices" should also demonstrate speech perception impairments. This was confirmed by testing of subjecls. During simultaneous repeating of a heard voice with different levels of noise, normal subjects had the best word sensitivity and hallucinating patients the worst. Both pruning and bias increase were necessary to model the observed pattern of hallucinations, illusions, and word sensitivity in the hallucinating patients.

Neural Networks and Psychopathology

BIOLPSYCHIATRY

477

1998;43:471-482

-% x"

__5.

Input layer

Hiddenlayer

Outputlayer

Figure 3. The Backpropagation Network. A feedforward, threelayered network with four input neurons, three hidden neurons, and two output neurons. All connections are between layers and none within layers. The network learns by adjusting its weights: 1) present an input of activation on the input neurons; 2) propagate the input through the connections to the output neurons; 3) compare the activations of the output neurons with the desired output for the input and compute the error; and 4) propagate the error signal backwards and adjust the connections gradually to minimize the error for all input-output pairs presented.

Suggestions for Future Research Our use of the Adaptive Resonance Theory (ART) network is based on aspects of schizophrenic psychopathology: Eleanor Rosch (Rosch 1973, 1975a, 1975b) started the investigations of conceptual categorization in development psychology, and George Lakoff (Lakoff 1987) showed its importance in language and cognition. Disturbances of conceptual categories seem important to semantic memory in schizophrenia (McKenna et al 1994), and schizophrenic subjects have wider and looser semantic categories than normal subjects (Chen et al 1994). This was found by measuring reaction time to decide the truth of "a robin/ostrich/bat/rifle is a bird," which depends on how prototypical the item is in the category "birds." Items near the category's border have the longest reaction times (Baddeley 1990). The abnormality found may account for ovefinclusive thinking: "a tendency for concepts to become pathologically large, their boundaries loose and their content accordingly broad, vague and overlapping" (McKenna et al 1994). As the only schizophrenic abnormality, overinclusive thinking has been associated with formal thought disorders and acute psychosis (Payne 1973), and is found with different test methods.

• •

0 o _0

• Input layer Context layer

Hidden layer

Output layer

Figure 4. The Simple Recurrent Network. To tile network from Figure 3 (connections not shown for the sake of clarity) are added three context neurons that hold a copy (white arrow) of the activation of the hidden neurons during the preceding input. For the next input !:he hidden neurons receive activation from both the input neurons and the context neurons (black arrows). Along the sequence of input-output pairs the weights change to reflect both the input-output associations and the order or context of the pairs.

ART--A Model of Conceptual Categorization and Schizophrenia Gail Carpenter and Stephen Grossberg (Grossberg 1976; Carpenter and Grossberg 1987) (Figure 6) made the ART. It is self-organizing and solves the stability-plasticity dilemma (Carpenter and Grossberg 1988); i! categorizes input without supervision and unlike normal self-organizing networks, old memories are not disturbed when the system's capacity is exceeded. It remains stable and yet plastic to new input. The network's categories have prototypes with the member's most common features, just as a certain person's prototypical dog has. ART has been used by Grossberg to model aspects of mental disturbances (Grossberg and Pepe 1971; Grossberg 1970, 1984), but the idea of conceptual categorization was not implied. Different cerebral structures may be modeled by ART. The feature layer models the encoding of low-level input into sensory cortical areas, and the category layer makes contextually relevant clusters of input. Antonio Damasio has suggested (Damasio 1990) that frontotemporal cortex serves as an organizing convergence zone for memory details, allowing the simultaneous activation of each detail in its original context in early sensory areas. In the network the category layer has this function. Schizophrenic subjects have reduced amplitude and increased latency of the N400 component of event-related potentials (ERPs) as response to semantically incongruous words, as in: "Every Sunday people pray in their local nest (church t" (Adams et al 1993). The normalization following N400 (the late positive component) was retracted, indicating a

478

BIOLPSYCHIATRY

L. Aakerlund and R. Hemmingsen

1998;43:471-482

0.5

II E

.' I/I// / /

'

,,;

;

l

/; - - - - --- I1

0.0 ~ -8

'v 0

8

Figure 5. The Logistic Activation Function. Five different logistic curves (Eq. 5) showing the relationships between total neuronal input (x-axis) and activation (y-axis) for different values of bias and gain. 1: bias = 4, gain = I. II: bias = 0, gain = 0.5. 111:bias = 0, gain -- 1. IV: bias = 0, gain = 2. V: bias = -4, gain := 1. schizophrenic deficit in integrating information into the context {Andrews et al 1993). If the existing ART categories constitute the expected context, an incongruous word would elicit a search for a new category or meaning. If the word is congruous with the expectation (church), no search is needed. Thus, the network's search for an acceptable category or meaning models human reaction to novelty, measured by N400. The stability of the system's categories, and its discrimination between external and internal input require close control by the attentional gain unit (AGU) (see Figure 6). During the matching process, the AGU should be quiscent. Otherwise, features outside the expected prototype will be active and the orienting subsystem (OS) must accept category members that do not satisfy the vigilance criterion. If a category, neuron is active without presence of external input and the AGU fails, the system generates a false internal picture; it contains all the information of a prototype, but was not elicited by external activation. The system then cannot distinguish internally generated activation from real input. The result resembles hallucinations in two ways; it has the same qualitative characteristics as sensory input, and it represents a meaningful prototype. The cortico-striatothalam~v-cortical circuit is hypothesized to regulate sensory input to the cortex (Alexander et al 1990; Andreasen et al 1994). In schizophrenia, reports (reviewed by Glenth~ij and Hemmingsen 1997) indicate a primary pathologic cortical feedback to basal ganglia, especially the left striatum, resulting in a disturbed interaction between dopamine and glutamate in striatum. This disturbs the ~/-aminobutyric acid (GABA)-ergic outflow to thalamus, which makes thalamic gating of sensory stimuli

to mesocortical areas defective (Andreasen ct al 1994; Carlsson 1988; Carlsson and Carlsson 1990). The result is disturbed attention, maladapted behavior, and hallucinations (Glenth~j and Hemmingsen 1997). The AGU models striatum and thalamus; its activity is regulated by sensory input and feedback from memory-encoding areas, which enables it to filter input to the memory modules. If the semantic meaning of a word is represented by the presence or absence of definitions like has feathers / can fly / is black, this can be coded by certain patterns of activation on the F1 layer. A defective filter function of AGU will make categories broad and unstable, as reported by Chen et al (1994), because the system must accept category members that do not meet the vigilance criterium. The prototypes may have unusual features, because the network was receiving irrelevant input from internal sources during the encoding of category members. An A R T Model o f the Wisconsin Card Sorting Test ( W C S T ) A characteristic schizophrenic error in the WCST (Heaton 1981) is perseveration (Cuesta et al 1993; Sullivan et al 1993) (Figure 7). In the present model, WCST is viewed as a test of categorization; each response card must be sorted according to one of three criteria, represented by the four stimulus cards (Figure 7). The ART network is modified in the tollowing ways. The number of connections is reduced as shown, and the AGU can change the altentional focus of the system to react to the conductor's "'yes" or "no." If the criterion is number and the network chooses correctly, e.g., one red triangle in response to the card one blue star, the AGU selectively primes the number features during matching of the chosen stimulus card lbr the next response. If the number category is answered by a "no," the color andfi)rm feature neurons are primed and, as a real subject, the network has a 50% chance of choosing correctly. The orienting subsystem is acrtive with a vigilance of 1/3 to adjust weights and to aw~id answers such as one red triangle to the card two blue stars (with no shared features). When the network has been presented with cards with all four numbers, it can work without the selective priming, using the established weights; however, the AGU is needed again when the criterion changes. A defect in AGU, either during the adjustment of weights or when the criterion shifts, will result in perseveration with the old criterion. Erroneous signals from the category layer will make AGU dysfunctional, as described. PREDICTIONS FROM THE MODEl..

E r r o n e o u s feed-

back from prefrontal cortex to thalamus and striatum results in defects in sensory filter and selective attention. This causes 1) abnormal defining features of categories,

Neural Networks and Psychopathology

BIOL PSYCHIATRY

479

1998;43:471-482

CI

~

Prototype

w

~j

Attentional Gain control

Orienting subsystem Reset C! ? \

" AGU

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

,

OS m

Input

+ F1

Input pattern Figure 6. The ART network. The ART1 network receives binary input on the neurons in the feature layer (FI), which are connected to the neurons in the category layer (C1) by modifiable weights. These associate a pattern of activity on Ft with one CI category neuron. An input is primarily associated with a C1 neuron on the basis of existing FI---~CI weights. Then the C1--*F1 weights read out the FI activation encoded by the C1 neuron lithe expected prototype). In the orienting subsystem (OS), the number of shared activated neurons in original input F1 (I) and expected prototype F1 (CI), divided by the number of activated neurons in FI (I), is compared to v-, the vigilance. If v is exceeded, the C1 neuron is accepted, and the relevant F1--~C1 and CI---~FI weights change, to code for the overlap between F1 (I) and F1 (CI), which is now the protoO,pe of the category. If v is not exceeded, the OS tells CI that the chosen C1 neuron is nonacceptable (the curved arrow). This procedure continues until an acceptable or unused CI neuron is found. The attentional gain unit (AGU) shall be activated when an external input is present on FI, and deactivated when C1 reads out an expectation to Fl. When activated, AGU sends a priming signal to all F1 neurons (small top-down arrow), which need two out of three signals to be turned on: external input/C1 expectation/AGU priming signal.

which have broad and anomalous boundaries; and 2) activation of prototypes in sensory cortex by prefrontal cortex without external input. This may be tested; presence of hallucinations should be associated with abnormal semantic categorization, ERP abnormalities in P400, and perseveration in WCST. Functional MRI and PET could test correlation of activation in prefrontal and sensory cortex during hallucinations and tests of categorization. As conceptual categorization is critical for reasoning (Lakoff 198"?), categories encoded during acute psychosis will have great impact on the patient's habitual reasoning. This is one argument for minimizing acute psychotic episodes.

Comments One might argue that the function of the brain is too complicated to be modeled by the networks described. The hypotheses about mental illnesses, developed by these models, contradict this claim significantly. These networks mimic aspects of normal human performance in,

e.g., cognitive tests or using content-addressable memory. Then parameters resembling cerebral structure or function are c h a n g e d - - n u m b e r s of neurons, strengths of connections, or the activation function. The resulting deficits resemble those in, e.g., schizophrenia or Alzheimer's disease. Of course, input and output are simplifications of real sensory input and human behavior, and the neurodynamics simplify those of real neurons. The cellular effects of dopamine are modeled by varying quite different parameters in the network (Cohen and Serwm-Schreiber 1993; Servan-Schreiber et ai 1990; Hoffman et al 1995); however, as long as no complete theory of d o p a m i n e ' s effects exists, there is no final network model of dopamine in the brain. It is crucial to identify these simplifications when making conclusions and explain them according to neurophysiological and psychological data. Networks performing simple computations make stronger predictions about cerebral function than networks with complex dy.namics solving tasks from real human life. The A R T network captures aspects of human cognition and makes

480

BIOLPSYCHIATRY 1998;43:47]-482

Red

layer

L. Aakerlund and R. Hemmingsen

Green

Yellow

Blue



/ A

\ \

• O n e b l u e star Figure 7. An ART model of The Wisconsin Card Sorting Test (WCST). The subject must categorize the response cards, here one blue star, on the four stimulus cards, representing the criteria number, color, or form. "Yes" or "no" from the conductor tells the subject if the chosen category was right. The criterion shifts for every 10 right answers, and the subject perseveres if he/she still uses the old criterion. The cards are represented on the feature layer by three times four neurons, representing three different parameters with four options each. The category layer has one neuron for each of the stimulus cards, and the connections are limited as indicated. The orienting subsystem is omitted for the sake of clarity. See text for the systems dynamics.

predictions about underlying brain functions. These predictions can be tested empirically. We think that using neural network theory in these ways allows a m i n i m u m of hasty conclusions to be drawn and new insight into human cognition and mental illnesses to be gained. Fundingfor this project was provided by The SchizophreniaFoundation of 1986 and The Foundationof Emil Hertz and Wife Inger Hertz. L.A. is the recipient of an award from The Clinical Research Foundation of Copenhagen. The authors would like to thank HenningOrum and Alice L. Madseu, MD for critical review and helpful comments on this report.

References Adams J, Faux SF, Nestor PG, Shenton M, Marcy B, Smith S, et al (1993): ERP abnormalities during semantic processing in schizophrenia. Schizophr Res 10:247-257. Alexander CE, Crutcher MD, Delong MR (1990): Basal gangliathalamocortical circuits: Parallel substrates for motor, oculomotor, "prefrontal" and "limbic" functions. Prog Brain Res 85:119-146.

Andreasen NC, Arndt ST, Swayze V II, Cizadlo T, Flaum M, O'Leary D, et al (1994): Thalamic abnormalities in schizophrenia visualized through magnetic resonance image averaging. Science 266:294-298. Andrews S, Shelley A, Ward PB, Fox A, Catts SV, McConaghy NB (1993): Event-related potential indices of semantic processing in schizophrenia. Biol Psychiatry 34:443-458. Baddeley AD (1990): Human Memory: Theory and Practice. Hove, U.K.: Erlbaum. Berman K, Torrey EF, Daniel DG, Weinberger DR (1992): Regional cerebral blood flow in monozygotic twins discordant for schizophrenia. Arch Gen Psychiatry 49:927-933. Berretta N, Berton F, Bianchi R, Capogna M, Grancesconi W, Brunelli M (1990): Effects of dopamine, DI and D2 dopaminergic agonists on the excitability of hippocampal CA1 pyramidal cells in guinea pig. Exp Brain Res 83:124-130. Bertoni-Freddari C, Fatoretti P, Casoli T, Meier-Ruge W, Ulrich J (1990): Morphological adaptive responses of the synaptic junctional zones in the human dentate gyrus during aging and Alzheimer disease. Brain Res 1990;517:69-75. Callaway E HI (1970): Schizophrenia and interference. An analogy with a malfunctioning computer. Arch Gen Psychiatry 22:193-208. Carlsson A (1988): The current status of the dopamine hypothesis of schizophrenia. Neuropsychopharmacology 1: 179-186. Carlsson M, Carlsson A (1990): Interactions between glutamatergic and monoaminergic systems within the basal ganglia-Implications for schizophrenia and Parkinson's disease. Trends Neurosci 13:272-276. Carpenter GA, Grossberg S (1987): A massively parallel architecture for a self organizing neural pattern recognition machine. Computer Vision Graphics Image Processing 37: 54-115. Carpenter GA, Grossberg S (1988): The ART of adaptive pattern recognition by a self-organizing neural network. 1EEE Computer 21:77- 88. Carrie JRG (1993): Evaluation of a neural network model of amnesia in diffuse cerebral atrophy. Br J Psychiato' 163: 217-222. Chen EYH (1994): A neural network model of cortical infornaation processing in schizophrenia. I: Interaction between biological and social factors in symptom formation. (?an J Psychiatry 39:362-367. Chen EYH (1995): A neural network model of cortical information processing in schizophrenia. II---Role of hippocampalcortical interaction: A review and a model. Can J Psychiatry 40"21-25. Chen EYH, Wilkins AJ, McKenna PJ (1994): Semantic memory is both impaired and anomalous in schizophrenia. Psychol Med 24:193-202. Chua SE, McKenna PJ (1995): Schizophrenia--A brain disease? A critical review of structural and functional cerebral abnormality in the disorder. Br J Psychiatry 166:563-582. Chugani HT, Phelps ME (1986): Maturational changes in cerebral function in infants determined by 18FDG positron emission tomography. Science 231:840-843. Cohen JD, Servan-Schreiber D (1993): A theory of dopamine function and its role in cognitive deficits in schizophrenia. Schizophr Bull 19:85-104.

Neural Networks and Psychopathology

B1OLPSYCHIATRY 1998;43:471-482

481

Cohen JD, Servan-Schreiber D (1992): Context, cortex, and dopamine: A connectionist approach to behavior and biology in schizophrenia. Psychol Rev 99:45-77. Conrad AJ, Abebe T, Austin R, Forsythe S, Scheibel AB (1991): Hippocampal pyramidal cell disarray in schizophrenia as a bilateral phenomenon. Arch Gen Psychiatry 48:413-417. Cuesta MJ, Peralta V, Caro F, de Leon J (1995): Schizophrenic syndrome and Wisconsin Card Sorting Test dimensions. Psychiatry Res 58:45-51. Damasio AR (1989): Time-locked multiregional retroactivation: A systems-level proposal for neural substrates of recall and recognition. Cognition 33:25-62.

Herlitz A, Adolfsson R, Backman L, Nilsson LG (1991): Cue utilization following different forms of encoding in mildly, moderately, and severely demented patients with Alzheimer's disease. Brain Cogn 15:119-130.

Damasio AR (1990): Synchronous activation in multiple cortical regions: A mechanism for recall. Semin Neurosci 2:287-296. DeKosky ST, Scheff SW (1990): Synapse loss in frontal cortex biopsies in Alzheimers disease: Correlation with cognitive severity. Ann Neurol 27:457-464.

Hoffman RE, Dohscha SK (1989): Cortical pruning and the development of schizophrenia: A computer model. Schizophr Bull 15:477-490. Hoffman RE, McGlashan TH (1993): Parallel distributed processing and the emergence of schizophrenic symptoms. Schizophr Bull 15:119-140.

Diamond IE (1979): The subdivisions of neocortex: A proposal to revise the traditional view of sensory, motor, and association areas. Prog Psychobiol Physiol Psychol 8:1-43. Edelman GM (1987): Neural Darwinism: The Theory ~f Neurohal Group Selection. New York: Basic Books. Elman JL (1990): Finding structure in time. Cogn Sci 14: 179-211. Falkai P, Bogats B (1988): Cell loss and volume reduction in the entorhinal cortex of schizophrenics. Biol Psychiatry 24: 515-521. Foote SL, Morrison JH (1987): Extrathalamic modulation of cortical function. Annu Rev Neurosci 10:67-95. Freeman WJ (1979): Nonlinear gain mediating cortical stimulusresponse relations. Biol Cybern 33:243-247. Friedman RF, Goldman-Rakic PS (1994): Coactivation of prefrontal cortex and inferior parietal cortex in working memory tasks revealed by 2DG functional mapping in the rhesus monkey. J Neurosci 14:2775-2788. Frith CD, Friston KJ, Herold S, Silbersweig D, Fletcher P, Cahill C, et al (1995): Regional brain activity in chronic schizophrenic patients during the performance of a verbal fluency task. Br J Psychiatry 167:343-349. Glenth0j BY, Hemmingsen R (1997): Dopaminergic sensitization: Implications for the pathogenesis of schizophrenia. Prog Neuropsychopharmacol Biol Psychiatry 21:23-46. Grossberg S (1970): Schizophrenia: Possible dependence of associational span, bowing, and primacy vs. recency on spiking threshold. Behav Sci 15:359-362. Grossberg S (1976): Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors. Biol Cybe rn 23:121-134. Grossberg S (1984): Some normal and abnornaal behavioral syndromes due to transmitter gating of opponent processes. Biol Psychiatry 19:1075- l I 17. Grossberg S, Pepe J (1971): Spiking threshold and overarousal effects in serial learning. J Statis Phys 3:95-125. Heaton RK (1981): Wisconsin Card Sorting Test Manual. Odessa, FL: Psychological Assessment Resources. Hebb DO (1949): Organization of Behavior. New York: Wiley, pp 60-66.

Hertz J, Krogh A, Palmer RG (1991): Introduction to the Theor) of Neural Computation. Redwood City, CA: Addison-Wesley Publishing Company. Hinton GE, Shallice T (1991): Lesioning an attractor network: Investigations of acquired dyslexia. Psychol Rev 98:74-95. Hoffman RE (1987): Computer simulations of neural information processing and the schizophrenia mania dichotomy. Arch Gen Psychiatry 44:178-189.

Hoffman RE, Rapaport J, Ameli R, McGlashan TH, Harcherik D, Servan-Schreiber D (1995): A neural network simulation of hallucinated voices and associated speech perception impairments in schizophrenic patients. J Cogn Neurosci 7:479-496. Hopfield JJ (1982): Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci USA 79:2554-2558. Hopfield JJ (1984): Neurons with graded response have collective computational properties like those of two states neurons. Proc Natl Acad Sci USA 81:3088-3092. Hopfield JJ, Feinstein DI, Palmer RG (1983): "Unlearning" has a stabilizing effect in collective memories. Nature 304: 158-159. Horn D, Ruppin E (1995): Compensatory mechanisms in an attractor neural network model of schizophrenia. Neural Comp 7:182-205. Huttenlocher PR (1979): Synaptic density in human frontal cortex--Developmental changes and the effects of aging. Brain Res 163:195-205. Innocenti GM (1981): Growth and reshaping of axons in the establishment of visual cortical callosal connections. Science 212:824-827. Ivy GO, Killackey HP (1982): Ontogenetic changes in the projections of neurocortical neurons. J Neurosci 6:735-743. Jeste DV, Lohr JB (1989): Hippocampal pathologic findings in schizophrenia. A morphometric study. Arch Gen Psychiatry 46:1019-1024. Klorman R, Bauer LO, Coons HW, Lewis JL, Peloquin J, Perlmutter RA, et al (1984): Enhancing effects of methylphenidate on normal young adults' cognitive processes. Psychopharmacol Bull 20:3-9. Kopelman MD (1985): Rates of forgetting in Alzheimer-type dementia and Korsakoff's syndrome. Neuropsychologia 23: 623-638. Kopelman MD (1989): Remote and autobiographical memory. temporal context memory, and frontal atrophy in Korsakoff and Alzheimer patients. Neuropsychologia 27:437-460. Lakoff G (1987): Women, Fire and Dangerous Things. What Categories Reveal about the Mind. Chicago: University of Chicago Press.

482

BIOL PSYCHIATRY 1998;43:471-482

Lehky SR, Sejnowski TJ (1988): Network model of shape-fromshading: Neural function arises from both receptive and projective fields. Nature 333:452-454. Lerner H, Sugarman A, Barbour CG (1985): Patterns of ego boundary disturbance in neurotic, borderline and schizophrenic patients. Psychoanal Psychol 2:47-66. Maher BA, Manschreck TC, Woods BT, Yurgelun-Todd DB, Tsuang MT (1995): Frontal brain volume and context effects in short-term recall in schizophrenia. Biol Psychiatry 37: 144-150. Malach R (1994): Cortical columns as devices for maximizing neuronal diversity. Trends Neurosci 3:101-104. McKenna PJ, Mortimer AM, Hodges JR (1994): Semantic memory and schizophrenia. In: David AS, Cutting JC, editors. The Neuropsychology of Schizophrenia. Hove, U.K.: Erlbaum. Ojemann G (1994): Intraoperative investigations of the neurobiology of language. Discus Neurosci 10:51-57. Parfitt KD, Gratton A, Bickford-Wimer PC (1990): Electrophysiological effects of selective D1 and D2 dopamine receptor agonists in the medial prefrontal cortex of young and aged Fisher 344 rats. J Pharmacol Exp Ther 25:539-545. Payne RW (1973): Handbook of abnormal psychology. In: Eysenck HJ, editor. Cognitive Abnormalities. London: Pitman, 446-464. Peloquin LJ, Klorman R (1986): Effects of methylphenidate in normal children's mood, event-related potentials, and performance in memory scanning and vigilance. J Abnorm Psychol 95:88-98. Plant DC. McClelland JL, Seidenberg MS, Patterson K (1996): Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychol Rev 103:56-115. Phinkett K, Marchman V (1993): From rote learning to system building: Acquiring verb morphology in children and connectionist nets. Cognition 48:21-69. Rapoport JL, Buchsbaum MS, Weingartner H, Zahn PZ, Ludlow C, Mikkelsen EJ (1980): Dextroamphetamine. Its cognitive and behavioral effects in normal and hyperactive boys and normal men. Arch Gen Psychiatry 37:933-943. Rolls ET (1989): Functions of neuronal networks in the hippocampus and neocortex in memory. In: Byrne JH, Berry WO, editors. Neural Models of Plasticity: Experimental and Theoretical Approaches. San Diego, CA: Academic Press, pp 240--265. Rosch EH (1973): Natural Categories. Cognitive Psychol 4:328350. Rosch EH (1975a): Congitive reference points. Cognitive Psychol 7:532-547. Rosch EH (1975b): Cognitive representations of semantic categories. J Exp Psychol: General 104:192-133. Rosenblatt (1958): The perception: A probabilistic model for information storage and organization in the brain. Psychol Rev 65:386-408.

L. Aakerlund and R. Hemmingsen

Rumelhart DE, McClelland JL (1986a): Parallel Distributed Processing, Explorations in the Microstructure of Cognition. Vol I: Foundations. Cambridge, MA: MIT Press. Rumelhart DE, McClelland JL (1986b): Parallel Distributed Processing, Explorations in the Microstructure of Cognition. Vol H: Psychological and Biological Models. Cambridge, MA: MIT Press. Rumelhart DE, Hinton GE, Williams RJ (1986): Learning internal representations by backpropagating errors. Nature 323:533-536. Ruppin E, Reggia JA (1995): A neural model of memory in diffuse cerebral atrophy. Br J Psychiatry 166:19-28. Ruppin E, Reggia JA, Horn D (1996): Pathogenesis of schizophrenic delusions and hallucinations: A neural model. Schizophr Bull 22:105-123. Sagar HJ, Cohen NJ, Sullivan EV (1988): Remote memory function in Alzheimer's disease and Parkinson's disease. Brain 111 : 185-206. Seidenberg MS, McClelland JL (1989): A distributed development model of word recognition and naming. Psychol Rev 96:523-568. Selemon LS, Goldman-Rakic PS (1988): Common cortical and subcortical targets of the dorsolateral prefrontal and posterior parietal cortices in the rhesus monkey: Evidence for a distributed neural network subserving spatially guided behavior. J Neurosci 8:4049-4068. Servan-Schreiber D, Printz H, Cohen JD (1990): A network model of catecholamine effects: Gain, signal to noise ratio, and behavior. Science 249:892-895. Stevens JR (1992): Abnormal reinnervation as a basis for schizophrenia: A hypothesis. Arch Gen Psychiatry 49: 238 -243. Sullivan EV, Mathalon DH, Zipursky RB, Kersteen-Tucker Z, Knight RT, Pfefferbaum A (1993): Factors of the Wisconsin Card Sorting Test as measures of frontal-lobe function in schizophrenia and in chronic alcoholism. Psychiatry Res 46:175-199. Yhierry AM, Mantz J, Milla C, Glowinski J (1988): Influence of the mesocortical prefrontal dopamine neurons on their target cells. Ann N Y Acad Sci 537:101-111. Thomson AM, Deuchars J (1994): Temporal and spatial properties of local circuits in neocortex. Trends Neurosci 17: 119-126. Weinberger DR, Berman KF, Illowsky BP (1988): Physiological dysfunction of dorsolateral prefrontal cortex in schizophrenia: A new cohort of evidence for a monoaminergic mechanism. Arch Gen Psychiatry 45:606-615 Yorgelun-Todd DA, Waternaux CM, Cohen BM, Gruber SA, English CD, Renshow PF (1996): Functional magnetic imaging of schizophrenic patients and comparison subjects during word production. Am J Psychiatry 153:200-205. Zipser D, Andersen RA (1988): A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331:679-684.